linear processor array: Topics by Science.gov

Sample records for linear processor array

Iterative color-multiplexed, electro-optical processor.

PubMed

Psaltis, D; Casasent, D; Carlotto, M

1979-11-01

A noncoherent optical vector-matrix multiplier using a linear LED source array and a linear P-I-N photodiode detector array has been combined with a 1-D adder in a feedback loop. The resultant iterative optical processor and its use in solving simultaneous linear equations are described. Operation on complex data is provided by a novel color-multiplexing system.
Lenslet array processors.

PubMed

Glaser, I

1982-04-01

By combining a lenslet array with masks it is possible to obtain a noncoherent optical processor capable of computing in parallel generalized 2-D discrete linear transformations. We present here an analysis of such lenslet array processors (LAP). The effect of several errors, including optical aberrations, diffraction, vignetting, and geometrical and mask errors, are calculated, and guidelines to optical design of LAP are derived. Using these results, both ultimate and practical performances of LAP are compared with those of competing techniques.
Noncoherent parallel optical processor for discrete two-dimensional linear transformations.

PubMed

Glaser, I

1980-10-01

We describe a parallel optical processor, based on a lenslet array, that provides general linear two-dimensional transformations using noncoherent light. Such a processor could become useful in image- and signal-processing applications in which the throughput requirements cannot be adequately satisfied by state-of-the-art digital processors. Experimental results that illustrate the feasibility of the processor by demonstrating its use in parallel optical computation of the two-dimensional Walsh-Hadamard transformation are presented.
Ring-array processor distribution topology for optical interconnects

NASA Technical Reports Server (NTRS)

Li, Yao; Ha, Berlin; Wang, Ting; Wang, Sunyu; Katz, A.; Lu, X. J.; Kanterakis, E.

1992-01-01

The existing linear and rectangular processor distribution topologies for optical interconnects, although promising in many respects, cannot solve problems such as clock skews, the lack of supporting elements for efficient optical implementation, etc. The use of a ring-array processor distribution topology, however, can overcome these problems. Here, a study of the ring-array topology is conducted with an aim of implementing various fast clock rate, high-performance, compact optical networks for digital electronic multiprocessor computers. Practical design issues are addressed. Some proof-of-principle experimental results are included.
A class of parallel algorithms for computation of the manipulator inertia matrix

NASA Technical Reports Server (NTRS)

Fijany, Amir; Bejczy, Antal K.

1989-01-01

Parallel and parallel/pipeline algorithms for computation of the manipulator inertia matrix are presented. An algorithm based on composite rigid-body spatial inertia method, which provides better features for parallelization, is used for the computation of the inertia matrix. Two parallel algorithms are developed which achieve the time lower bound in computation. Also described is the mapping of these algorithms with topological variation on a two-dimensional processor array, with nearest-neighbor connection, and with cardinality variation on a linear processor array. An efficient parallel/pipeline algorithm for the linear array was also developed, but at significantly higher efficiency.
Sequence information signal processor

DOEpatents

Peterson, John C.; Chow, Edward T.; Waterman, Michael S.; Hunkapillar, Timothy J.

1999-01-01

An electronic circuit is used to compare two sequences, such as genetic sequences, to determine which alignment of the sequences produces the greatest similarity. The circuit includes a linear array of series-connected processors, each of which stores a single element from one of the sequences and compares that element with each successive element in the other sequence. For each comparison, the processor generates a scoring parameter that indicates which segment ending at those two elements produces the greatest degree of similarity between the sequences. The processor uses the scoring parameter to generate a similar scoring parameter for a comparison between the stored element and the next successive element from the other sequence. The processor also delivers the scoring parameter to the next processor in the array for use in generating a similar scoring parameter for another pair of elements. The electronic circuit determines which processor and alignment of the sequences produce the scoring parameter with the highest value.
Implementation of context independent code on a new array processor: The Super-65

NASA Technical Reports Server (NTRS)

Colbert, R. O.; Bowhill, S. A.

1981-01-01

The feasibility of rewriting standard uniprocessor programs into code which contains no context-dependent branches is explored. Context independent code (CIC) would contain no branches that might require different processing elements to branch different ways. In order to investigate the possibilities and restrictions of CIC, several programs were recoded into CIC and a four-element array processor was built. This processor (the Super-65) consisted of three 6502 microprocessors and the Apple II microcomputer. The results obtained were somewhat dependent upon the specific architecture of the Super-65 but within bounds, the throughput of the array processor was found to increase linearly with the number of processing elements (PEs). The slope of throughput versus PEs is highly dependent on the program and varied from 0.33 to 1.00 for the sample programs.
SPECIAL ISSUE ON OPTICAL PROCESSING OF INFORMATION: Optoelectronic processors with scanning CCD photodetectors

NASA Astrophysics Data System (ADS)

Esepkina, N. A.; Lavrov, A. P.; Anan'ev, M. N.; Blagodarnyi, V. S.; Ivanov, S. I.; Mansyrev, M. I.; Molodyakov, S. A.

1995-10-01

Two new types of optoelectronic radio-signal processors were investigated. Charge-coupled device (CCD) photodetectors are used in these processors under continuous scanning conditions, i.e. in a time delay and storage mode. One of these processors is based on a CCD photodetector array with a reference-signal amplitude transparency and the other is an adaptive acousto-optical signal processor with linear frequency modulation. The processor with the transparency performs multichannel discrete—analogue convolution of an input signal with a corresponding kernel of the transformation determined by the transparency. If a light source is an array of light-emitting diodes of special (stripe) geometry, the optical stages of the processor can be made from optical fibre components and the whole processor then becomes a rigid 'sandwich' (a compact hybrid optoelectronic microcircuit). A report is given also of a study of a prototype processor with optical fibre components for the reception of signals from a system with antenna aperture synthesis, which forms a radio image of the Earth.
FFT Computation with Systolic Arrays, A New Architecture

NASA Technical Reports Server (NTRS)

Boriakoff, Valentin

1994-01-01

The use of the Cooley-Tukey algorithm for computing the l-d FFT lends itself to a particular matrix factorization which suggests direct implementation by linearly-connected systolic arrays. Here we present a new systolic architecture that embodies this algorithm. This implementation requires a smaller number of processors and a smaller number of memory cells than other recent implementations, as well as having all the advantages of systolic arrays. For the implementation of the decimation-in-frequency case, word-serial data input allows continuous real-time operation without the need of a serial-to-parallel conversion device. No control or data stream switching is necessary. Computer simulation of this architecture was done in the context of a 1024 point DFT with a fixed point processor, and CMOS processor implementation has started.
General linear codes for fault-tolerant matrix operations on processor arrays

NASA Technical Reports Server (NTRS)

Nair, V. S. S.; Abraham, J. A.

1988-01-01

Various checksum codes have been suggested for fault-tolerant matrix computations on processor arrays. Use of these codes is limited due to potential roundoff and overflow errors. Numerical errors may also be misconstrued as errors due to physical faults in the system. In this a set of linear codes is identified which can be used for fault-tolerant matrix operations such as matrix addition, multiplication, transposition, and LU-decomposition, with minimum numerical error. Encoding schemes are given for some of the example codes which fall under the general set of codes. With the help of experiments, a rule of thumb for the selection of a particular code for a given application is derived.
Optical systolic solutions of linear algebraic equations

NASA Technical Reports Server (NTRS)

Neuman, C. P.; Casasent, D.

1984-01-01

The philosophy and data encoding possible in systolic array optical processor (SAOP) were reviewed. The multitude of linear algebraic operations achievable on this architecture is examined. These operations include such linear algebraic algorithms as: matrix-decomposition, direct and indirect solutions, implicit and explicit methods for partial differential equations, eigenvalue and eigenvector calculations, and singular value decomposition. This architecture can be utilized to realize general techniques for solving matrix linear and nonlinear algebraic equations, least mean square error solutions, FIR filters, and nested-loop algorithms for control engineering applications. The data flow and pipelining of operations, design of parallel algorithms and flexible architectures, application of these architectures to computationally intensive physical problems, error source modeling of optical processors, and matching of the computational needs of practical engineering problems to the capabilities of optical processors are emphasized.
Acoustooptic linear algebra processors - Architectures, algorithms, and applications

NASA Technical Reports Server (NTRS)

Casasent, D.

1984-01-01

Architectures, algorithms, and applications for systolic processors are described with attention to the realization of parallel algorithms on various optical systolic array processors. Systolic processors for matrices with special structure and matrices of general structure, and the realization of matrix-vector, matrix-matrix, and triple-matrix products and such architectures are described. Parallel algorithms for direct and indirect solutions to systems of linear algebraic equations and their implementation on optical systolic processors are detailed with attention to the pipelining and flow of data and operations. Parallel algorithms and their optical realization for LU and QR matrix decomposition are specifically detailed. These represent the fundamental operations necessary in the implementation of least squares, eigenvalue, and SVD solutions. Specific applications (e.g., the solution of partial differential equations, adaptive noise cancellation, and optimal control) are described to typify the use of matrix processors in modern advanced signal processing.
Implementation and Assessment of Advanced Analog Vector-Matrix Processor

NASA Technical Reports Server (NTRS)

Gary, Charles K.; Bualat, Maria G.; Lum, Henry, Jr. (Technical Monitor)

1994-01-01

This paper discusses the design and implementation of an analog optical vecto-rmatrix coprocessor with a throughput of 128 Mops for a personal computer. Vector matrix calculations are inherently parallel, providing a promising domain for the use of optical calculators. However, to date, digital optical systems have proven too cumbersome to replace electronics, and analog processors have not demonstrated sufficient accuracy in large scale systems. The goal of the work described in this paper is to demonstrate a viable optical coprocessor for linear operations. The analog optical processor presented has been integrated with a personal computer to provide full functionality and is the first demonstration of an optical linear algebra processor with a throughput greater than 100 Mops. The optical vector matrix processor consists of a laser diode source, an acoustooptical modulator array to input the vector information, a liquid crystal spatial light modulator to input the matrix information, an avalanche photodiode array to read out the result vector of the vector matrix multiplication, as well as transport optics and the electronics necessary to drive the optical modulators and interface to the computer. The intent of this research is to provide a low cost, highly energy efficient coprocessor for linear operations. Measurements of the analog accuracy of the processor performing 128 Mops are presented along with an assessment of the implications for future systems. A range of noise sources, including cross-talk, source amplitude fluctuations, shot noise at the detector, and non-linearities of the optoelectronic components are measured and compared to determine the most significant source of error. The possibilities for reducing these sources of error are discussed. Also, the total error is compared with that expected from a statistical analysis of the individual components and their relation to the vector-matrix operation. The sufficiency of the measured accuracy of the processor is compared with that required for a range of typical problems. Calculations resolving alloy concentrations from spectral plume data of rocket engines are implemented on the optical processor, demonstrating its sufficiency for this problem. We also show how this technology can be easily extended to a 100 x 100 10 MHz (200 Cops) processor.
Sequence information signal processor for local and global string comparisons

DOEpatents

Peterson, John C.; Chow, Edward T.; Waterman, Michael S.; Hunkapillar, Timothy J.

1997-01-01

A sequence information signal processing integrated circuit chip designed to perform high speed calculation of a dynamic programming algorithm based upon the algorithm defined by Waterman and Smith. The signal processing chip of the present invention is designed to be a building block of a linear systolic array, the performance of which can be increased by connecting additional sequence information signal processing chips to the array. The chip provides a high speed, low cost linear array processor that can locate highly similar global sequences or segments thereof such as contiguous subsequences from two different DNA or protein sequences. The chip is implemented in a preferred embodiment using CMOS VLSI technology to provide the equivalent of about 400,000 transistors or 100,000 gates. Each chip provides 16 processing elements, and is designed to provide 16 bit, two's compliment operation for maximum score precision of between -32,768 and +32,767. It is designed to provide a comparison between sequences as long as 4,194,304 elements without external software and between sequences of unlimited numbers of elements with the aid of external software. Each sequence can be assigned different deletion and insertion weight functions. Each processor is provided with a similarity measure device which is independently variable. Thus, each processor can contribute to maximum value score calculation using a different similarity measure.
PCI-based WILDFIRE reconfigurable computing engines

NASA Astrophysics Data System (ADS)

Fross, Bradley K.; Donaldson, Robert L.; Palmer, Douglas J.

1996-10-01

WILDFORCE is the first PCI-based custom reconfigurable computer that is based on the Splash 2 technology transferred from the National Security Agency and the Institute for Defense Analyses, Supercomputing Research Center (SRC). The WILDFORCE architecture has many of the features of the WILDFIRE computer, such as field- programmable gate array (FPGA) based processing elements, linear array and crossbar interconnection, and high- performance memory and I/O subsystems. New features introduced in the PCI-based WILDFIRE systems include memory/processor options that can be added to any processing element. These options include static and dynamic memory, digital signal processors (DSPs), FPGAs, and microprocessors. In addition to memory/processor options, many different application specific connectors can be used to extend the I/O capabilities of the system, including systolic I/O, camera input and video display output. This paper also discusses how this new PCI-based reconfigurable computing engine is used for rapid-prototyping, real-time video processing and other DSP applications.
An acceleration framework for synthetic aperture radar algorithms

NASA Astrophysics Data System (ADS)

Kim, Youngsoo; Gloster, Clay S.; Alexander, Winser E.

2017-04-01

Algorithms for radar signal processing, such as Synthetic Aperture Radar (SAR) are computationally intensive and require considerable execution time on a general purpose processor. Reconfigurable logic can be used to off-load the primary computational kernel onto a custom computing machine in order to reduce execution time by an order of magnitude as compared to kernel execution on a general purpose processor. Specifically, Field Programmable Gate Arrays (FPGAs) can be used to accelerate these kernels using hardware-based custom logic implementations. In this paper, we demonstrate a framework for algorithm acceleration. We used SAR as a case study to illustrate the potential for algorithm acceleration offered by FPGAs. Initially, we profiled the SAR algorithm and implemented a homomorphic filter using a hardware implementation of the natural logarithm. Experimental results show a linear speedup by adding reasonably small processing elements in Field Programmable Gate Array (FPGA) as opposed to using a software implementation running on a typical general purpose processor.
Method and structure for skewed block-cyclic distribution of lower-dimensional data arrays in higher-dimensional processor grids

DOEpatents

Chatterjee, Siddhartha [Yorktown Heights, NY; Gunnels, John A [Brewster, NY

2011-11-08

A method and structure of distributing elements of an array of data in a computer memory to a specific processor of a multi-dimensional mesh of parallel processors includes designating a distribution of elements of at least a portion of the array to be executed by specific processors in the multi-dimensional mesh of parallel processors. The pattern of the designating includes a cyclical repetitive pattern of the parallel processor mesh, as modified to have a skew in at least one dimension so that both a row of data in the array and a column of data in the array map to respective contiguous groupings of the processors such that a dimension of the contiguous groupings is greater than one.
Two-dimensional acousto-optic processor using circular antenna array with a Butler matrix

NASA Astrophysics Data System (ADS)

Lee, Jim P.

1992-09-01

A two-dimensional acousto-optic signal processor is shown to be useful for providing simultaneous spectrum analysis and direction finding of radar signals over an instantaneous field of view of 360 deg. A system analysis with emphasis on the direction-finding aspect of this new architecture is presented. The peak location of the optical pattern provides a direct measure of bearing, independent of signal frequency. In addition, the sidelobe levels of the pattern can be effectively reduced using amplitude weighting. Performance parameters, such as mainlobe beamwidth, peak-sidelobe level, and pointing error, are analyzed as a function of the Gaussian laser illumination profile and the number of channels. Finally, a comparison with a linear antenna array architecture is also discussed.
APRON: A Cellular Processor Array Simulation and Hardware Design Tool

NASA Astrophysics Data System (ADS)

Barr, David R. W.; Dudek, Piotr

2009-12-01

We present a software environment for the efficient simulation of cellular processor arrays (CPAs). This software (APRON) is used to explore algorithms that are designed for massively parallel fine-grained processor arrays, topographic multilayer neural networks, vision chips with SIMD processor arrays, and related architectures. The software uses a highly optimised core combined with a flexible compiler to provide the user with tools for the design of new processor array hardware architectures and the emulation of existing devices. We present performance benchmarks for the software processor array implemented on standard commodity microprocessors. APRON can be configured to use additional processing hardware if necessary and can be used as a complete graphical user interface and development environment for new or existing CPA systems, allowing more users to develop algorithms for CPA systems.
JPRS Report, Science & Technology, China, High-Performance Computer Systems

DTIC Science & Technology

1992-10-28

microprocessor array The microprocessor array in the AP85 system is com- posed of 16 completely identical array element micro - processors . Each array element...microprocessors and capable of host machine reading and writing. The memory capacity of the array element micro - processors as a whole can be expanded...transmission functions to carry out data transmission from array element micro - processor to array element microprocessor, from array element

The Use of a Microcomputer Based Array Processor for Real Time Laser Velocimeter Data Processing

NASA Technical Reports Server (NTRS)

Meyers, James F.

1990-01-01

The application of an array processor to laser velocimeter data processing is presented. The hardware is described along with the method of parallel programming required by the array processor. A portion of the data processing program is described in detail. The increase in computational speed of a microcomputer equipped with an array processor is illustrated by comparative testing with a minicomputer.
Scalable Engineering of Quantum Optical Information Processing Architectures (SEQUOIA)

DTIC Science & Technology

2016-12-13

arrays. Figure 4: An 8-channel fiber-coupled SNSPD array. 1.4 Post -fabrication-tunable linear optic fabrication We have analyzed the...performance of the programmable nanophotonic processor (PNP) that is dynamically tunable via post -fabrication active phase tuning to predict the scaling of...various device losses. PACS numbers: 42.50. Ex , 03.67.Dd, 03.67.Lx, 42.50.Dv I. INTRODUCTION Quantum key distribution (QKD) enables two distant authenticated
A new measuring machine in Paris

NASA Technical Reports Server (NTRS)

Guibert, J.; Charvin, P.

1984-01-01

A new photographic measuring machine is under construction at the Paris Observatory. The amount of transmitted light is measured by a linear array of 1024 photodiodes. Carriage control, data acquisition and on line processing are performed by microprocessors, a S.E.L. 32/27 computer, and an AP 120-B Array Processor. It is expected that a Schmidt telescope plate of size 360 mm square will be scanned in one hour with pixel size of ten microns.
DFT algorithms for bit-serial GaAs array processor architectures

NASA Technical Reports Server (NTRS)

Mcmillan, Gary B.

1988-01-01

Systems and Processes Engineering Corporation (SPEC) has developed an innovative array processor architecture for computing Fourier transforms and other commonly used signal processing algorithms. This architecture is designed to extract the highest possible array performance from state-of-the-art GaAs technology. SPEC's architectural design includes a high performance RISC processor implemented in GaAs, along with a Floating Point Coprocessor and a unique Array Communications Coprocessor, also implemented in GaAs technology. Together, these data processors represent the latest in technology, both from an architectural and implementation viewpoint. SPEC has examined numerous algorithms and parallel processing architectures to determine the optimum array processor architecture. SPEC has developed an array processor architecture with integral communications ability to provide maximum node connectivity. The Array Communications Coprocessor embeds communications operations directly in the core of the processor architecture. A Floating Point Coprocessor architecture has been defined that utilizes Bit-Serial arithmetic units, operating at very high frequency, to perform floating point operations. These Bit-Serial devices reduce the device integration level and complexity to a level compatible with state-of-the-art GaAs device technology.
20-GFLOPS QR processor on a Xilinx Virtex-E FPGA

NASA Astrophysics Data System (ADS)

Walke, Richard L.; Smith, Robert W. M.; Lightbody, Gaye

2000-11-01

Adaptive beamforming can play an important role in sensor array systems in countering directional interference. In high-sample rate systems, such as radar and comms, the calculation of adaptive weights is a very computational task that requires highly parallel solutions. For systems where low power consumption and volume are important the only viable implementation is as an Application Specific Integrated Circuit (ASIC). However, the rapid advancement of Field Programmable Gate Array (FPGA) technology is enabling highly credible re-programmable solutions. In this paper we present the implementation of a scalable linear array processor for weight calculation using QR decomposition. We employ floating-point arithmetic with mantissa size optimized to the target application to minimize component size, and implement them as relationally placed macros (RPMs) on Xilinx Virtex FPGAs to achieve predictable dense layout and high-speed operation. We present results that show that 20GFLOPS of sustained computation on a single XCV3200E-8 Virtex-E FPGA is possible. We also describe the parameterized implementation of the floating-point operators and QR-processor, and the design methodology that enables us to rapidly generate complex FPGA implementations using the industry standard hardware description language VHDL.
Solution for the nonuniformity correction of infrared focal plane arrays.

PubMed

Zhou, Huixin; Liu, Shangqian; Lai, Rui; Wang, Dabao; Cheng, Yubao

2005-05-20

Based on the S-curve model of the detector response of infrared focal plan arrays (IRFPAs), an improved two-point correction algorithm is presented. The algorithm first transforms the nonlinear image data into linear data and then uses the normal two-point algorithm to correct the linear data. The algorithm can effectively overcome the influence of nonlinearity of the detector's response, and it enlarges the correction precision and the dynamic range of the response. A real-time imaging-signal-processing system for IRFPAs that is based on a digital signal processor and field-programmable gate arrays is also presented. The nonuniformity correction capability of the presented solution is validated by experimental imaging procedures of a 128 x 128 pixel IRFPA camera prototype.
High-performance ultra-low power VLSI analog processor for data compression

NASA Technical Reports Server (NTRS)

Tawel, Raoul (Inventor)

1996-01-01

An apparatus for data compression employing a parallel analog processor. The apparatus includes an array of processor cells with N columns and M rows wherein the processor cells have an input device, memory device, and processor device. The input device is used for inputting a series of input vectors. Each input vector is simultaneously input into each column of the array of processor cells in a pre-determined sequential order. An input vector is made up of M components, ones of which are input into ones of M processor cells making up a column of the array. The memory device is used for providing ones of M components of a codebook vector to ones of the processor cells making up a column of the array. A different codebook vector is provided to each of the N columns of the array. The processor device is used for simultaneously comparing the components of each input vector to corresponding components of each codebook vector, and for outputting a signal representative of the closeness between the compared vector components. A combination device is used to combine the signal output from each processor cell in each column of the array and to output a combined signal. A closeness determination device is then used for determining which codebook vector is closest to an input vector from the combined signals, and for outputting a codebook vector index indicating which of the N codebook vectors was the closest to each input vector input into the array.
Lithium niobate guided-wave beam former for steering phased-array antennas.

PubMed

Armenise, M N; Passaro, V M; Noviello, G

1994-09-10

We present the theoretical investigation, design, and simulation of a novel guided-wave optical processor for L-band-transmission beam forming in a linear array of phased active antennas. The proposed configuration includes two contradirectional surface acoustic-wave transducers, and it is based on a Y-cut, X-propagating Ti:LiNbO(3) planar waveguide supporting the lowest-order modes of both polarizations (TE(0) and TM(0)) at the free-space wavelength λ = 0.85 µm. A detailed comparison between the processor we propose and other optical and electronic architectures reported in the literature is carried out, exhibiting a number of significant advantages in terms of weight, total chip size, and power consumption, when the number of antenna elements is greater than 50.
Array processor architecture connection network

NASA Technical Reports Server (NTRS)

Barnes, George H. (Inventor); Lundstrom, Stephen F. (Inventor); Shafer, Philip E. (Inventor)

1982-01-01

A connection network is disclosed for use between a parallel array of processors and a parallel array of memory modules for establishing non-conflicting data communications paths between requested memory modules and requesting processors. The connection network includes a plurality of switching elements interposed between the processor array and the memory modules array in an Omega networking architecture. Each switching element includes a first and a second processor side port, a first and a second memory module side port, and control logic circuitry for providing data connections between the first and second processor ports and the first and second memory module ports. The control logic circuitry includes strobe logic for examining data arriving at the first and the second processor ports to indicate when the data arriving is requesting data from a requesting processor to a requested memory module. Further, connection circuitry is associated with the strobe logic for examining requesting data arriving at the first and the second processor ports for providing a data connection therefrom to the first and the second memory module ports in response thereto when the data connection so provided does not conflict with a pre-established data connection currently in use.
Potential of minicomputer/array-processor system for nonlinear finite-element analysis

NASA Technical Reports Server (NTRS)

Strohkorb, G. A.; Noor, A. K.

1983-01-01

The potential of using a minicomputer/array-processor system for the efficient solution of large-scale, nonlinear, finite-element problems is studied. A Prime 750 is used as the host computer, and a software simulator residing on the Prime is employed to assess the performance of the Floating Point Systems AP-120B array processor. Major hardware characteristics of the system such as virtual memory and parallel and pipeline processing are reviewed, and the interplay between various hardware components is examined. Effective use of the minicomputer/array-processor system for nonlinear analysis requires the following: (1) proper selection of the computational procedure and the capability to vectorize the numerical algorithms; (2) reduction of input-output operations; and (3) overlapping host and array-processor operations. A detailed discussion is given of techniques to accomplish each of these tasks. Two benchmark problems with 1715 and 3230 degrees of freedom, respectively, are selected to measure the anticipated gain in speed obtained by using the proposed algorithms on the array processor.
Feasibility study, software design, layout and simulation of a two-dimensional Fast Fourier Transform machine for use in optical array interferometry

NASA Technical Reports Server (NTRS)

Boriakoff, Valentin

1994-01-01

The goal of this project was the feasibility study of a particular architecture of a digital signal processing machine operating in real time which could do in a pipeline fashion the computation of the fast Fourier transform (FFT) of a time-domain sampled complex digital data stream. The particular architecture makes use of simple identical processors (called inner product processors) in a linear organization called a systolic array. Through computer simulation the new architecture to compute the FFT with systolic arrays was proved to be viable, and computed the FFT correctly and with the predicted particulars of operation. Integrated circuits to compute the operations expected of the vital node of the systolic architecture were proven feasible, and even with a 2 micron VLSI technology can execute the required operations in the required time. Actual construction of the integrated circuits was successful in one variant (fixed point) and unsuccessful in the other (floating point).
Array processor architecture

NASA Technical Reports Server (NTRS)

Barnes, George H. (Inventor); Lundstrom, Stephen F. (Inventor); Shafer, Philip E. (Inventor)

1983-01-01

A high speed parallel array data processing architecture fashioned under a computational envelope approach includes a data base memory for secondary storage of programs and data, and a plurality of memory modules interconnected to a plurality of processing modules by a connection network of the Omega gender. Programs and data are fed from the data base memory to the plurality of memory modules and from hence the programs are fed through the connection network to the array of processors (one copy of each program for each processor). Execution of the programs occur with the processors operating normally quite independently of each other in a multiprocessing fashion. For data dependent operations and other suitable operations, all processors are instructed to finish one given task or program branch before all are instructed to proceed in parallel processing fashion on the next instruction. Even when functioning in the parallel processing mode however, the processors are not locked-step but execute their own copy of the program individually unless or until another overall processor array synchronization instruction is issued.
Range-azimuth decouple beamforming for frequency diverse array with Costas-sequence modulated frequency offsets

NASA Astrophysics Data System (ADS)

Wang, Zhe; Wang, Wen-Qin; Shao, Huaizong

2016-12-01

Different from the phased-array using the same carrier frequency for each transmit element, the frequency diverse array (FDA) uses a small frequency offset across the array elements to produce range-angle-dependent transmit beampattern. FDA radar provides new application capabilities and potentials due to its range-dependent transmit array beampattern, but the FDA using linearly increasing frequency offsets will produce a range and angle coupled transmit beampattern. In order to decouple the range-azimuth beampattern for FDA radar, this paper proposes a uniform linear array (ULA) FDA using Costas-sequence modulated frequency offsets to produce random-like energy distribution in the transmit beampattern and thumbtack transmit-receive beampattern. In doing so, the range and angle of targets can be unambiguously estimated through matched filtering and subspace decomposition algorithms in the receiver signal processor. Moreover, random-like energy distributed beampattern can also be utilized for low probability of intercept (LPI) radar applications. Numerical results show that the proposed scheme outperforms the standard FDA in focusing the transmit energy, especially in the range dimension.
Preliminary study on the potential usefulness of array processor techniques for structural synthesis

NASA Technical Reports Server (NTRS)

Feeser, L. J.

1980-01-01

The effects of the use of array processor techniques within the structural analyzer program, SPAR, are simulated in order to evaluate the potential analysis speedups which may result. In particular the connection of a Floating Point System AP120 processor to the PRIME computer is discussed. Measurements of execution, input/output, and data transfer times are given. Using these data estimates are made as to the relative speedups that can be executed in a more complete implementation on an array processor maxi-mini computer system.
Rectangular Array Of Digital Processors For Planning Paths

NASA Technical Reports Server (NTRS)

Kemeny, Sabrina E.; Fossum, Eric R.; Nixon, Robert H.

1993-01-01

Prototype 24 x 25 rectangular array of asynchronous parallel digital processors rapidly finds best path across two-dimensional field, which could be patch of terrain traversed by robotic or military vehicle. Implemented as single-chip very-large-scale integrated circuit. Excepting processors on edges, each processor communicates with four nearest neighbors along paths representing travel to north, south, east, and west. Each processor contains delay generator in form of 8-bit ripple counter, preset to 1 of 256 possible values. Operation begins with choice of processor representing starting point. Transmits signals to nearest neighbor processors, which retransmits to other neighboring processors, and process repeats until signals propagated across entire field.
Electrically reconfigurable logic array

NASA Technical Reports Server (NTRS)

Agarwal, R. K.

1982-01-01

To compose the complicated systems using algorithmically specialized logic circuits or processors, one solution is to perform relational computations such as union, division and intersection directly on hardware. These relations can be pipelined efficiently on a network of processors having an array configuration. These processors can be designed and implemented with a few simple cells. In order to determine the state-of-the-art in Electrically Reconfigurable Logic Array (ERLA), a survey of the available programmable logic array (PLA) and the logic circuit elements used in such arrays was conducted. Based on this survey some recommendations are made for ERLA devices.
A novel VLSI processor architecture for supercomputing arrays

NASA Technical Reports Server (NTRS)

Venkateswaran, N.; Pattabiraman, S.; Devanathan, R.; Ahmed, Ashaf; Venkataraman, S.; Ganesh, N.

1993-01-01

Design of the processor element for general purpose massively parallel supercomputing arrays is highly complex and cost ineffective. To overcome this, the architecture and organization of the functional units of the processor element should be such as to suit the diverse computational structures and simplify mapping of complex communication structures of different classes of algorithms. This demands that the computation and communication structures of different class of algorithms be unified. While unifying the different communication structures is a difficult process, analysis of a wide class of algorithms reveals that their computation structures can be expressed in terms of basic IP,IP,OP,CM,R,SM, and MAA operations. The execution of these operations is unified on the PAcube macro-cell array. Based on this PAcube macro-cell array, we present a novel processor element called the GIPOP processor, which has dedicated functional units to perform the above operations. The architecture and organization of these functional units are such to satisfy the two important criteria mentioned above. The structure of the macro-cell and the unification process has led to a very regular and simpler design of the GIPOP processor. The production cost of the GIPOP processor is drastically reduced as it is designed on high performance mask programmable PAcube arrays.
Multimode power processor

DOEpatents

O'Sullivan, G.A.; O'Sullivan, J.A.

1999-07-27

In one embodiment, a power processor which operates in three modes: an inverter mode wherein power is delivered from a battery to an AC power grid or load; a battery charger mode wherein the battery is charged by a generator; and a parallel mode wherein the generator supplies power to the AC power grid or load in parallel with the battery. In the parallel mode, the system adapts to arbitrary non-linear loads. The power processor may operate on a per-phase basis wherein the load may be synthetically transferred from one phase to another by way of a bumpless transfer which causes no interruption of power to the load when transferring energy sources. Voltage transients and frequency transients delivered to the load when switching between the generator and battery sources are minimized, thereby providing an uninterruptible power supply. The power processor may be used as part of a hybrid electrical power source system which may contain, in one embodiment, a photovoltaic array, diesel engine, and battery power sources. 31 figs.
Multimode power processor

DOEpatents

O'Sullivan, George A.; O'Sullivan, Joseph A.

1999-01-01

In one embodiment, a power processor which operates in three modes: an inverter mode wherein power is delivered from a battery to an AC power grid or load; a battery charger mode wherein the battery is charged by a generator; and a parallel mode wherein the generator supplies power to the AC power grid or load in parallel with the battery. In the parallel mode, the system adapts to arbitrary non-linear loads. The power processor may operate on a per-phase basis wherein the load may be synthetically transferred from one phase to another by way of a bumpless transfer which causes no interruption of power to the load when transferring energy sources. Voltage transients and frequency transients delivered to the load when switching between the generator and battery sources are minimized, thereby providing an uninterruptible power supply. The power processor may be used as part of a hybrid electrical power source system which may contain, in one embodiment, a photovoltaic array, diesel engine, and battery power sources.
Reduction of solar vector magnetograph data using a microMSP array processor

NASA Technical Reports Server (NTRS)

Kineke, Jack

1990-01-01

The processing of raw data obtained by the solar vector magnetograph at NASA-Marshall requires extensive arithmetic operations on large arrays of real numbers. The objectives of this summer faculty fellowship study are to: (1) learn the programming language of the MicroMSP Array Processor and adapt some existing data reduction routines to exploit its capabilities; and (2) identify other applications and/or existing programs which lend themselves to array processor utilization which can be developed by undergraduate student programmers under the provisions of project JOVE.

Parallel processing in a host plus multiple array processor system for radar

NASA Technical Reports Server (NTRS)

Barkan, B. Z.

1983-01-01

Host plus multiple array processor architecture is demonstrated to yield a modular, fast, and cost-effective system for radar processing. Software methodology for programming such a system is developed. Parallel processing with pipelined data flow among the host, array processors, and discs is implemented. Theoretical analysis of performance is made and experimentally verified. The broad class of problems to which the architecture and methodology can be applied is indicated.
Contextual classification on a CDC Flexible Processor system. [for photomapped remote sensing data

NASA Technical Reports Server (NTRS)

Smith, B. W.; Siegel, H. J.; Swain, P. H.

1981-01-01

A potential hardware organization for the Flexible Processor Array is presented. An algorithm that implements a contextual classifier for remote sensing data analysis is given, along with uniprocessor classification algorithms. The Flexible Processor algorithm is provided, as are simulated timings for contextual classifiers run on the Flexible Processor Array and another system. The timings are analyzed for context neighborhoods of sizes three and nine.
Conditions for space invariance in optical data processors used with coherent or noncoherent light.

PubMed

Arsenault, H R

1972-10-01

The conditions for space invariance in coherent and noncoherent optical processors are considered. All linear optical processors are shown to belong to one of two types. The conditions for space invariance are more stringent for noncoherent processors than for coherent processors, so that a system that is linear in coherent light may be nonlinear in noncoherent light. However, any processor that is linear in noncoherent light is also linear in the coherent limit.
Implicit, nonswitching, vector-oriented algorithm for steady transonic flow

NASA Technical Reports Server (NTRS)

Lottati, I.

1983-01-01

A rapid computation of a sequence of transonic flow solutions has to be performed in many areas of aerodynamic technology. The employment of low-cost vector array processors makes the conduction of such calculations economically feasible. However, for a full utilization of the new hardware, the developed algorithms must take advantage of the special characteristics of the vector array processor. The present investigation has the objective to develop an efficient algorithm for solving transonic flow problems governed by mixed partial differential equations on an array processor.
The Solution of Linear Complementarity Problems on an Array Processor.

DTIC Science & Technology

1981-01-01

INITIALIZE T04E 4IASK COMON /1SCA/M1AA ITERAIIJ)NSp NIUvld ITEWAILUNSPNUJ4d RUMboaJNI Co6 C3MAON /ISCA/I GRIL )POINTSo Y LiRIUPOINTS CDMAION /SUaLMjAT...GRI1D# WIDTH GRIfl LOGICAL MASWI MASK MASK INTEGE" X GRIL )POINTSo Y GRIOPUINTS 14JTEGEM MAX ITERATIONS# NUMB ITERArIONS9 NIJMO ROPIS, NUMB COLS C LOCAL
Reconfigurable signal processor designs for advanced digital array radar systems

NASA Astrophysics Data System (ADS)

Suarez, Hernan; Zhang, Yan (Rockee); Yu, Xining

2017-05-01

The new challenges originated from Digital Array Radar (DAR) demands a new generation of reconfigurable backend processor in the system. The new FPGA devices can support much higher speed, more bandwidth and processing capabilities for the need of digital Line Replaceable Unit (LRU). This study focuses on using the latest Altera and Xilinx devices in an adaptive beamforming processor. The field reprogrammable RF devices from Analog Devices are used as analog front end transceivers. Different from other existing Software-Defined Radio transceivers on the market, this processor is designed for distributed adaptive beamforming in a networked environment. The following aspects of the novel radar processor will be presented: (1) A new system-on-chip architecture based on Altera's devices and adaptive processing module, especially for the adaptive beamforming and pulse compression, will be introduced, (2) Successful implementation of generation 2 serial RapidIO data links on FPGA, which supports VITA-49 radio packet format for large distributed DAR processing. (3) Demonstration of the feasibility and capabilities of the processor in a Micro-TCA based, SRIO switching backplane to support multichannel beamforming in real-time. (4) Application of this processor in ongoing radar system development projects, including OU's dual-polarized digital array radar, the planned new cylindrical array radars, and future airborne radars.
A wideband software reconfigurable modem

NASA Astrophysics Data System (ADS)

Turner, J. H., Jr.; Vickers, H.

A wideband modem is described which provides signal processing capability for four Lx-band signals employing QPSK, MSK and PPM waveforms and employs a software reconfigurable architecture for maximum system flexibility and graceful degradation. The current processor uses a 2901 and two 8086 microprocessors per channel and performs acquisition, tracking, and data demodulation for JITDS, GPS, IFF and TACAN systems. The next generation processor will be implemented using a VHSIC chip set employing a programmable complex array vector processor module, a GP computer module, customized gate array modules, and a digital array correlator. This integrated processor has application to a wide number of diverse system waveforms, and will bring the benefits of VHSIC technology insertion into avionic antijam communications systems.
Developing infrared array controller with software real time operating system

NASA Astrophysics Data System (ADS)

Sako, Shigeyuki; Miyata, Takashi; Nakamura, Tomohiko; Motohara, Kentaro; Uchimoto, Yuka Katsuno; Onaka, Takashi; Kataza, Hirokazu

2008-07-01

Real-time capabilities are required for a controller of a large format array to reduce a dead-time attributed by readout and data transfer. The real-time processing has been achieved by dedicated processors including DSP, CPLD, and FPGA devices. However, the dedicated processors have problems with memory resources, inflexibility, and high cost. Meanwhile, a recent PC has sufficient resources of CPUs and memories to control the infrared array and to process a large amount of frame data in real-time. In this study, we have developed an infrared array controller with a software real-time operating system (RTOS) instead of the dedicated processors. A Linux PC equipped with a RTAI extension and a dual-core CPU is used as a main computer, and one of the CPU cores is allocated to the real-time processing. A digital I/O board with DMA functions is used for an I/O interface. The signal-processing cores are integrated in the OS kernel as a real-time driver module, which is composed of two virtual devices of the clock processor and the frame processor tasks. The array controller with the RTOS realizes complicated operations easily, flexibly, and at a low cost.
A digital retina-like low-level vision processor.

PubMed

Mertoguno, S; Bourbakis, N G

2003-01-01

This correspondence presents the basic design and the simulation of a low level multilayer vision processor that emulates to some degree the functional behavior of a human retina. This retina-like multilayer processor is the lower part of an autonomous self-organized vision system, called Kydon, that could be used on visually impaired people with a damaged visual cerebral cortex. The Kydon vision system, however, is not presented in this paper. The retina-like processor consists of four major layers, where each of them is an array processor based on hexagonal, autonomous processing elements that perform a certain set of low level vision tasks, such as smoothing and light adaptation, edge detection, segmentation, line recognition and region-graph generation. At each layer, the array processor is a 2D array of k/spl times/m hexagonal identical autonomous cells that simultaneously execute certain low level vision tasks. Thus, the hardware design and the simulation at the transistor level of the processing elements (PEs) of the retina-like processor and its simulated functionality with illustrative examples are provided in this paper.
Interconnection arrangement of routers of processor boards in array of cabinets supporting secure physical partition

DOEpatents

Tomkins, James L [Albuquerque, NM; Camp, William J [Albuquerque, NM

2007-07-17

A multiple processor computing apparatus includes a physical interconnect structure that is flexibly configurable to support selective segregation of classified and unclassified users. The physical interconnect structure includes routers in service or compute processor boards distributed in an array of cabinets connected in series on each board and to respective routers in neighboring row cabinet boards with the routers in series connection coupled to routers in series connection in respective neighboring column cabinet boards. The array can include disconnect cabinets or respective routers in all boards in each cabinet connected in a toroid. The computing apparatus can include an emulator which permits applications from the same job to be launched on processors that use different operating systems.
Reconfiguration Schemes for Fault-Tolerant Processor Arrays

DTIC Science & Technology

1992-10-15

partially notion of linear schedule are easily related to similar ordered subset of a multidimensional integer lattice models and concepts used in [11-[131...and several other (called indec set). The points of this lattice correspond works. to (i.e.. are the indices of) computations, and the partial There are...These data dependencies are represented as vectors that of all computations of the algorithm is to be minimized. connect points of the lattice . If a
Parallel computing on Unix workstation arrays

NASA Astrophysics Data System (ADS)

Reale, F.; Bocchino, F.; Sciortino, S.

1994-12-01

We have tested arrays of general-purpose Unix workstations used as MIMD systems for massive parallel computations. In particular we have solved numerically a demanding test problem with a 2D hydrodynamic code, generally developed to study astrophysical flows, by exucuting it on arrays either of DECstations 5000/200 on Ethernet LAN, or of DECstations 3000/400, equipped with powerful Alpha processors, on FDDI LAN. The code is appropriate for data-domain decomposition, and we have used a library for parallelization previously developed in our Institute, and easily extended to work on Unix workstation arrays by using the PVM software toolset. We have compared the parallel efficiencies obtained on arrays of several processors to those obtained on a dedicated MIMD parallel system, namely a Meiko Computing Surface (CS-1), equipped with Intel i860 processors. We discuss the feasibility of using non-dedicated parallel systems and conclude that the convenience depends essentially on the size of the computational domain as compared to the relative processor power and network bandwidth. We point out that for future perspectives a parallel development of processor and network technology is important, and that the software still offers great opportunities of improvement, especially in terms of latency times in the message-passing protocols. In conditions of significant gain in terms of speedup, such workstation arrays represent a cost-effective approach to massive parallel computations.
Optical systolic array processor using residue arithmetic

NASA Technical Reports Server (NTRS)

Jackson, J.; Casasent, D.

1983-01-01

The use of residue arithmetic to increase the accuracy and reduce the dynamic range requirements of optical matrix-vector processors is evaluated. It is determined that matrix-vector operations and iterative algorithms can be performed totally in residue notation. A new parallel residue quantizer circuit is developed which significantly improves the performance of the systolic array feedback processor. Results are presented of a computer simulation of this system used to solve a set of three simultaneous equations.
Digital Parallel Processor Array for Optimum Path Planning

NASA Technical Reports Server (NTRS)

Kremeny, Sabrina E. (Inventor); Fossum, Eric R. (Inventor); Nixon, Robert H. (Inventor)

1996-01-01

The invention computes the optimum path across a terrain or topology represented by an array of parallel processor cells interconnected between neighboring cells by links extending along different directions to the neighboring cells. Such an array is preferably implemented as a high-speed integrated circuit. The computation of the optimum path is accomplished by, in each cell, receiving stimulus signals from neighboring cells along corresponding directions, determining and storing the identity of a direction along which the first stimulus signal is received, broadcasting a subsequent stimulus signal to the neighboring cells after a predetermined delay time, whereby stimulus signals propagate throughout the array from a starting one of the cells. After propagation of the stimulus signal throughout the array, a master processor traces back from a selected destination cell to the starting cell along an optimum path of the cells in accordance with the identity of the directions stored in each of the cells.
Real time processor for array speckle interferometry

NASA Astrophysics Data System (ADS)

Chin, Gordon; Florez, Jose; Borelli, Renan; Fong, Wai; Miko, Joseph; Trujillo, Carlos

1989-02-01

The authors are constructing a real-time processor to acquire image frames, perform array flat-fielding, execute a 64 x 64 element two-dimensional complex FFT (fast Fourier transform) and average the power spectrum, all within the 25 ms coherence time for speckles at near-IR (infrared) wavelength. The processor will be a compact unit controlled by a PC with real-time display and data storage capability. This will provide the ability to optimize observations and obtain results on the telescope rather than waiting several weeks before the data can be analyzed and viewed with offline methods. The image acquisition and processing, design criteria, and processor architecture are described.
Real time processor for array speckle interferometry

NASA Technical Reports Server (NTRS)

Chin, Gordon; Florez, Jose; Borelli, Renan; Fong, Wai; Miko, Joseph; Trujillo, Carlos

1989-01-01

The authors are constructing a real-time processor to acquire image frames, perform array flat-fielding, execute a 64 x 64 element two-dimensional complex FFT (fast Fourier transform) and average the power spectrum, all within the 25 ms coherence time for speckles at near-IR (infrared) wavelength. The processor will be a compact unit controlled by a PC with real-time display and data storage capability. This will provide the ability to optimize observations and obtain results on the telescope rather than waiting several weeks before the data can be analyzed and viewed with offline methods. The image acquisition and processing, design criteria, and processor architecture are described.
Vestibular receptor cells and signal detection: bioaccelerometers and the hexagonal sampling of two-dimensional signals

NASA Technical Reports Server (NTRS)

Mugler, D. H.; Ross, M. D.

1990-01-01

The inner ear contains sensory organs which signal changes in head movement. The vestibular sacs, in particular, are sensitive to linear accelerations. Electron microscopic images have revealed the structure of tiny sensory hair bundles, whose mechanical deformation results in the initiation of neuronal activity and the transmission of electrical signals to the brain. The structure of the hair bundles is shown in this paper to be that of the most efficient two-dimensional phased-array signal processors.
A high-accuracy optical linear algebra processor for finite element applications

NASA Technical Reports Server (NTRS)

Casasent, D.; Taylor, B. K.

1984-01-01

Optical linear processors are computationally efficient computers for solving matrix-matrix and matrix-vector oriented problems. Optical system errors limit their dynamic range to 30-40 dB, which limits their accuray to 9-12 bits. Large problems, such as the finite element problem in structural mechanics (with tens or hundreds of thousands of variables) which can exploit the speed of optical processors, require the 32 bit accuracy obtainable from digital machines. To obtain this required 32 bit accuracy with an optical processor, the data can be digitally encoded, thereby reducing the dynamic range requirements of the optical system (i.e., decreasing the effect of optical errors on the data) while providing increased accuracy. This report describes a new digitally encoded optical linear algebra processor architecture for solving finite element and banded matrix-vector problems. A linear static plate bending case study is described which quantities the processor requirements. Multiplication by digital convolution is explained, and the digitally encoded optical processor architecture is advanced.
Adaptive linear predictor FIR filter based on the Cyclone V FPGA with HPS to reduce narrow band RFI in AERA radio detection of cosmic rays

DOE Office of Scientific and Technical Information (OSTI.GOV)

Szadkowski, Zbigniew

We present the new approach to a filtering of radio frequency interferences (RFI) in the Auger Engineering Radio Array (AERA) which study the electromagnetic part of the Extensive Air Showers. The radio stations can observe radio signals caused by coherent emissions due to geomagnetic radiation and charge excess processes. AERA observes frequency band from 30 to 80 MHz. This range is highly contaminated by human-made RFI. In order to improve the signal to noise ratio RFI filters are used in AERA to suppress this contamination. The first kind of filter used by AERA was the Median one, based on themore » Fast Fourier Transform (FFT) technique. The second one, which is currently in use, is the infinite impulse response (IIR) notch filter. The proposed new filter is a finite impulse response (FIR) filter based on a linear prediction (LP). A periodic contamination hidden in a registered signal (digitized in the ADC) can be extracted and next subtracted to make signal cleaner. The FIR filter requires a calculation of n=32, 64 or even 128 coefficients (dependent on a required speed or accuracy) by solving of n linear equations with coefficients built from the covariance Toeplitz matrix. This matrix can be solved by the Levinson recursion, which is much faster than the Gauss procedure. The filter has been already tested in the real AERA radio stations on Argentinean pampas with a very successful results. The linear equations were solved either in the virtual soft-core NIOSR processor (implemented in the FPGA chip as a net of logic elements) or in the external Voipac PXA270M ARM processor. The NIOS processor is relatively slow (50 MHz internal clock), calculations performed in an external processor consume a significant amount of time for data exchange between the FPGA and the processor. Test showed a very good efficiency of the RFI suppression for stationary (long-term) contaminations. However, we observed a short-time contaminations, which could not be suppressed either by the IIR-notch filter or by the FIR filter based on the linear predictions. For the LP FIR filter the refreshment time of the filter coefficients was to long and filter did not keep up with the changes of a contamination structure, mainly due to a long calculation time in a slow processors. We propose to use the Cyclone V SE chip with embedded micro-controller operating with 925 MHz internal clock to significantly reduce a refreshment time of the FIR coefficients. The lab results are promising. (authors)« less
System and method for cognitive processing for data fusion

NASA Technical Reports Server (NTRS)

Duong, Tuan A. (Inventor); Duong, Vu A. (Inventor)

2012-01-01

A system and method for cognitive processing of sensor data. A processor array receiving analog sensor data and having programmable interconnects, multiplication weights, and filters provides for adaptive learning in real-time. A static random access memory contains the programmable data for the processor array and the stored data is modified to provide for adaptive learning.

Simulation of continuously logical base cells (CL BC) with advanced functions for analog-to-digital converters and image processors

NASA Astrophysics Data System (ADS)

Krasilenko, Vladimir G.; Lazarev, Alexander A.; Nikitovich, Diana V.

2017-10-01

The paper considers results of design and modeling of continuously logical base cells (CL BC) based on current mirrors (CM) with functions of preliminary analogue and subsequent analogue-digital processing for creating sensor multichannel analog-to-digital converters (SMC ADCs) and image processors (IP). For such with vector or matrix parallel inputs-outputs IP and SMC ADCs it is needed active basic photosensitive cells with an extended electronic circuit, which are considered in paper. Such basic cells and ADCs based on them have a number of advantages: high speed and reliability, simplicity, small power consumption, high integration level for linear and matrix structures. We show design of the CL BC and ADC of photocurrents and their various possible implementations and its simulations. We consider CL BC for methods of selection and rank preprocessing and linear array of ADCs with conversion to binary codes and Gray codes. In contrast to our previous works here we will dwell more on analogue preprocessing schemes for signals of neighboring cells. Let us show how the introduction of simple nodes based on current mirrors extends the range of functions performed by the image processor. Each channel of the structure consists of several digital-analog cells (DC) on 15-35 CMOS. The amount of DC does not exceed the number of digits of the formed code, and for an iteration type, only one cell of DC, complemented by the device of selection and holding (SHD), is required. One channel of ADC with iteration is based on one DC-(G) and SHD, and it has only 35 CMOS transistors. In such ADCs easily parallel code can be realized and also serial-parallel output code. The circuits and simulation results of their design with OrCAD are shown. The supply voltage of the DC is 1.8÷3.3V, the range of an input photocurrent is 0.1÷24μA, the transformation time is 20÷30nS at 6-8 bit binary or Gray codes. The general power consumption of the ADC with iteration is only 50÷100μW, if the maximum input current is 4μA. Such simple structure of linear array of ADCs with low power consumption and supply voltage 3.3V, and at the same time with good dynamic characteristics (frequency of digitization even for 1.5μm CMOS-technologies is 40÷50 MHz, and can be increased up to 10 times) and accuracy characteristics are show. The SMC ADCs based on CL BC and CM opens new prospects for realization of linear and matrix IP and photo-electronic structures with matrix operands, which are necessary for neural networks, digital optoelectronic processors, neural-fuzzy controllers.
A high performance linear equation solver on the VPP500 parallel supercomputer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nakanishi, Makoto; Ina, Hiroshi; Miura, Kenichi

1994-12-31

This paper describes the implementation of two high performance linear equation solvers developed for the Fujitsu VPP500, a distributed memory parallel supercomputer system. The solvers take advantage of the key architectural features of VPP500--(1) scalability for an arbitrary number of processors up to 222 processors, (2) flexible data transfer among processors provided by a crossbar interconnection network, (3) vector processing capability on each processor, and (4) overlapped computation and transfer. The general linear equation solver based on the blocked LU decomposition method achieves 120.0 GFLOPS performance with 100 processors in the LIN-PACK Highly Parallel Computing benchmark.
Design and Evaluation of Fault-Tolerant VLSI/WSI Processor Arrays.

DTIC Science & Technology

1987-12-31

studies reported in this paper. In Section .3, the reliabuility characteristics of single-level FTPA’s are discusseri. Four different type of FTPA’s are...for processor arrays are proposed and studied . Stu- dies on algorithmic and software aspects relevant to systems are reported in items 4, 5, 8, 12 and...O’Keefe M., and Fortes, J. A. B., "A Comparative Study of Two Systematic Design Methodologies for Systolic Arrays," (Long Version) International Workshop on
Detection Performance of Horizontal Linear Hydrophone Arrays in Shallow Water.

DTIC Science & Technology

1980-12-15

random phase G gain G angle interval covariance matrix h processor vector H matrix matched filter; generalized beamformer I unity matrix 4 SACLANTCEN SR...omnidirectional sensor is h*Ph P G = - h [Eq. 47] G = h* Q h P s The following two sections evaluate a few examples of application of the OLP. Following the...At broadside the signal covariance matrix reduces to a dyadic: P 󈧬 s s*;therefore, the gain (e.g. Eq. 37) becomes tr(H* P H) Pn * -1 Q -1 Pn G ~OQp
Dynamic Programming and Transitive Closure on Linear Pipelines.

DTIC Science & Technology

1984-05-01

four partitions. 2.0 - 1.9 1.0t N. N 3N N -8 4 24 Figure41 An ideal solution to small problem sizes is to design an algorithm on an array where the...12 References [1] A.V. Aho, J. Hopcroft, and J.D. Ullman. The Design and Analysis of Computer Algorithms, Addison-Wesley, (1974). - " [2] R. Aubusson...K.E. Batcher, " Design of a Massively Parallel Processor," IEEE-TC, Vol. C-9, No. 9, (September, 1980), pp. 83-840. [4] K.Q. Brown, "Dynamic Programming
Radiation Hardened Electronics for Extreme Environments

NASA Technical Reports Server (NTRS)

Keys, Andrew S.; Watson, Michael D.

2007-01-01

The Radiation Hardened Electronics for Space Environments (RHESE) project consists of a series of tasks designed to develop and mature a broad spectrum of radiation hardened and low temperature electronics technologies. Three approaches are being taken to address radiation hardening: improved material hardness, design techniques to improve radiation tolerance, and software methods to improve radiation tolerance. Within these approaches various technology products are being addressed including Field Programmable Gate Arrays (FPGA), Field Programmable Analog Arrays (FPAA), MEMS Serial Processors, Reconfigurable Processors, and Parallel Processors. In addition to radiation hardening, low temperature extremes are addressed with a focus on material and design approaches.
A unified approach to VLSI layout automation and algorithm mapping on processor arrays

NASA Technical Reports Server (NTRS)

Venkateswaran, N.; Pattabiraman, S.; Srinivasan, Vinoo N.

1993-01-01

Development of software tools for designing supercomputing systems is highly complex and cost ineffective. To tackle this a special purpose PAcube silicon compiler which integrates different design levels from cell to processor arrays has been proposed. As a part of this, we present in this paper a novel methodology which unifies the problems of Layout Automation and Algorithm Mapping.
Accuracy requirements of optical linear algebra processors in adaptive optics imaging systems

NASA Technical Reports Server (NTRS)

Downie, John D.; Goodman, Joseph W.

1989-01-01

The accuracy requirements of optical processors in adaptive optics systems are determined by estimating the required accuracy in a general optical linear algebra processor (OLAP) that results in a smaller average residual aberration than that achieved with a conventional electronic digital processor with some specific computation speed. Special attention is given to an error analysis of a general OLAP with regard to the residual aberration that is created in an adaptive mirror system by the inaccuracies of the processor, and to the effect of computational speed of an electronic processor on the correction. Results are presented on the ability of an OLAP to compete with a digital processor in various situations.
Realtime photoacoustic microscopy in vivo with a 30-MHz ultrasound array transducer.

PubMed

Zemp, Roger J; Song, Liang; Bitton, Rachel; Shung, K Kirk; Wang, Lihong V

2008-05-26

We present a novel high-frequency photoacoustic microscopy system capable of imaging the microvasculature of living subjects in realtime to depths of a few mm. The system consists of a high-repetition-rate Q-switched pump laser, a tunable dye laser, a 30-MHz linear ultrasound array transducer, a multichannel high-frequency data acquisition system, and a shared-RAM multi-core-processor computer. Data acquisition, beamforming, scan conversion, and display are implemented in realtime at 50 frames per second. Clearly resolvable images of 6-microm-diameter carbon fibers are experimentally demonstrated at 80 microm separation distances. Realtime imaging performance is demonstrated on phantoms and in vivo with absorbing structures identified to depths of 2.5-3 mm. This work represents the first high-frequency realtime photoacoustic imaging system to our knowledge.
Space and frequency-multiplexed optical linear algebra processor - Fabrication and initial tests

NASA Technical Reports Server (NTRS)

Casasent, D.; Jackson, J.

1986-01-01

A new optical linear algebra processor architecture is described. Space and frequency-multiplexing are used to accommodate bipolar and complex-valued data. A fabricated laboratory version of this processor is described, the electronic support system used is discussed, and initial test data obtained on it are presented.
Toshiba TDF-500 High Resolution Viewing And Analysis System

NASA Astrophysics Data System (ADS)

Roberts, Barry; Kakegawa, M.; Nishikawa, M.; Oikawa, D.

1988-06-01

A high resolution, operator interactive, medical viewing and analysis system has been developed by Toshiba and Bio-Imaging Research. This system provides many advanced features including high resolution displays, a very large image memory and advanced image processing capability. In particular, the system provides CRT frame buffers capable of update in one frame period, an array processor capable of image processing at operator interactive speeds, and a memory system capable of updating multiple frame buffers at frame rates whilst supporting multiple array processors. The display system provides 1024 x 1536 display resolution at 40Hz frame and 80Hz field rates. In particular, the ability to provide whole or partial update of the screen at the scanning rate is a key feature. This allows multiple viewports or windows in the display buffer with both fixed and cine capability. To support image processing features such as windowing, pan, zoom, minification, filtering, ROI analysis, multiplanar and 3D reconstruction, a high performance CPU is integrated into the system. This CPU is an array processor capable of up to 400 million instructions per second. To support the multiple viewer and array processors' instantaneous high memory bandwidth requirement, an ultra fast memory system is used. This memory system has a bandwidth capability of 400MB/sec and a total capacity of 256MB. This bandwidth is more than adequate to support several high resolution CRT's and also the fast processing unit. This fully integrated approach allows effective real time image processing. The integrated design of viewing system, memory system and array processor are key to the imaging system. It is the intention to describe the architecture of the image system in this paper.
A Navier-Strokes Chimera Code on the Connection Machine CM-5: Design and Performance

NASA Technical Reports Server (NTRS)

Jespersen, Dennis C.; Levit, Creon; Kwak, Dochan (Technical Monitor)

1994-01-01

We have implemented a three-dimensional compressible Navier-Stokes code on the Connection Machine CM-5. The code is set up for implicit time-stepping on single or multiple structured grids. For multiple grids and geometrically complex problems, we follow the 'chimera' approach, where flow data on one zone is interpolated onto another in the region of overlap. We will describe our design philosophy and give some timing results for the current code. A parallel machine like the CM-5 is well-suited for finite-difference methods on structured grids. The regular pattern of connections of a structured mesh maps well onto the architecture of the machine. So the first design choice, finite differences on a structured mesh, is natural. We use centered differences in space, with added artificial dissipation terms. When numerically solving the Navier-Stokes equations, there are liable to be some mesh cells near a solid body that are small in at least one direction. This mesh cell geometry can impose a very severe CFL (Courant-Friedrichs-Lewy) condition on the time step for explicit time-stepping methods. Thus, though explicit time-stepping is well-suited to the architecture of the machine, we have adopted implicit time-stepping. We have further taken the approximate factorization approach. This creates the need to solve large banded linear systems and creates the first possible barrier to an efficient algorithm. To overcome this first possible barrier we have considered two options. The first is just to solve the banded linear systems with data spread over the whole machine, using whatever fast method is available. This option is adequate for solving scalar tridiagonal systems, but for scalar pentadiagonal or block tridiagonal systems it is somewhat slower than desired. The second option is to 'transpose' the flow and geometry variables as part of the time-stepping process: Start with x-lines of data in-processor. Form explicit terms in x, then transpose so y-lines of data are in-processor. Form explicit terms in y, then transpose so z-lines are in processor. Form explicit terms in z, then solve linear systems in the z-direction. Transpose to the y-direction, then solve linear systems in the y-direction. Finally transpose to the x direction and solve linear systems in the x-direction. This strategy avoids inter-processor communication when differencing and solving linear systems, but requires a large amount of communication when doing the transposes. The transpose method is more efficient than the non-transpose strategy when dealing with scalar pentadiagonal or block tridiagonal systems. For handling geometrically complex problems the chimera strategy was adopted. For multiple zone cases we compute on each zone sequentially (using the whole parallel machine), then send the chimera interpolation data to a distributed data structure (array) laid out over the whole machine. This information transfer implies an irregular communication pattern, and is the second possible barrier to an efficient algorithm. We have implemented these ideas on the CM-5 using CMF (Connection Machine Fortran), a data parallel language which combines elements of Fortran 90 and certain extensions, and which bears a strong similarity to High Performance Fortran. We make use of the Connection Machine Scientific Software Library (CMSSL) for the linear solver and array transpose operations.
Allocating application to group of consecutive processors in fault-tolerant deadlock-free routing path defined by routers obeying same rules for path selection

DOEpatents

Leung, Vitus J [Albuquerque, NM; Phillips, Cynthia A [Albuquerque, NM; Bender, Michael A [East Northport, NY; Bunde, David P [Urbana, IL

2009-07-21

In a multiple processor computing apparatus, directional routing restrictions and a logical channel construct permit fault tolerant, deadlock-free routing. Processor allocation can be performed by creating a linear ordering of the processors based on routing rules used for routing communications between the processors. The linear ordering can assume a loop configuration, and bin-packing is applied to this loop configuration. The interconnection of the processors can be conceptualized as a generally rectangular 3-dimensional grid, and the MC allocation algorithm is applied with respect to the 3-dimensional grid.
Frequency-multiplexed and pipelined iterative optical systolic array processors

NASA Technical Reports Server (NTRS)

Casasent, D.; Jackson, J.; Neuman, C.

1983-01-01

Optical matrix processors using acoustooptic transducers are described, with emphasis on new systolic array architectures using frequency multiplexing in addition to space and time multiplexing. A Kalman filtering application is considered in a case study from which the operations required on such a system can be defined. This also serves as a new and powerful application for iterative optical processors. The importance of pipelining the data flow and the ordering of the operations performed in a specific application of such a system are also noted. Several examples of how to effectively achieve this are included. A new technique for handling bipolar data on such architectures is also described.
Single-Scale Retinex Using Digital Signal Processors

NASA Technical Reports Server (NTRS)

Hines, Glenn; Rahman, Zia-Ur; Jobson, Daniel; Woodell, Glenn

2005-01-01

The Retinex is an image enhancement algorithm that improves the brightness, contrast and sharpness of an image. It performs a non-linear spatial/spectral transform that provides simultaneous dynamic range compression and color constancy. It has been used for a wide variety of applications ranging from aviation safety to general purpose photography. Many potential applications require the use of Retinex processing at video frame rates. This is difficult to achieve with general purpose processors because the algorithm contains a large number of complex computations and data transfers. In addition, many of these applications also constrain the potential architectures to embedded processors to save power, weight and cost. Thus we have focused on digital signal processors (DSPs) and field programmable gate arrays (FPGAs) as potential solutions for real-time Retinex processing. In previous efforts we attained a 21 (full) frame per second (fps) processing rate for the single-scale monochromatic Retinex with a TMS320C6711 DSP operating at 150 MHz. This was achieved after several significant code improvements and optimizations. Since then we have migrated our design to the slightly more powerful TMS320C6713 DSP and the fixed point TMS320DM642 DSP. In this paper we briefly discuss the Retinex algorithm, the performance of the algorithm executing on the TMS320C6713 and the TMS320DM642, and compare the results with the TMS320C6711.
Multichannel signal enhancement

DOEpatents

Lewis, Paul S.

1990-01-01

A mixed adaptive filter is formulated for the signal processing problem where desired a priori signal information is not available. The formulation generates a least squares problem which enables the filter output to be calculated directly from an input data matrix. In one embodiment, a folded processor array enables bidirectional data flow to solve the recursive problem by back substitution without global communications. In another embodiment, a balanced processor array solves the recursive problem by forward elimination through the array. In a particular application to magnetoencephalography, the mixed adaptive filter enables an evoked response to an auditory stimulus to be identified from only a single trial.
Prototype Focal-Plane-Array Optoelectronic Image Processor

NASA Technical Reports Server (NTRS)

Fang, Wai-Chi; Shaw, Timothy; Yu, Jeffrey

1995-01-01

Prototype very-large-scale integrated (VLSI) planar array of optoelectronic processing elements combines speed of optical input and output with flexibility of reconfiguration (programmability) of electronic processing medium. Basic concept of processor described in "Optical-Input, Optical-Output Morphological Processor" (NPO-18174). Performs binary operations on binary (black and white) images. Each processing element corresponds to one picture element of image and located at that picture element. Includes input-plane photodetector in form of parasitic phototransistor part of processing circuit. Output of each processing circuit used to modulate one picture element in output-plane liquid-crystal display device. Intended to implement morphological processing algorithms that transform image into set of features suitable for high-level processing; e.g., recognition.
Technology Developments in Radiation-Hardened Electronics for Space Environments

NASA Technical Reports Server (NTRS)

Keys, Andrew S.; Howell, Joe T.

2008-01-01

The Radiation Hardened Electronics for Space Environments (RHESE) project consists of a series of tasks designed to develop and mature a broad spectrum of radiation hardened and low temperature electronics technologies. Three approaches are being taken to address radiation hardening: improved material hardness, design techniques to improve radiation tolerance, and software methods to improve radiation tolerance. Within these approaches various technology products are being addressed including Field Programmable Gate Arrays (FPGA), Field Programmable Analog Arrays (FPAA), MEMS, Serial Processors, Reconfigurable Processors, and Parallel Processors. In addition to radiation hardening, low temperature extremes are addressed with a focus on material and design approaches. System level applications for the RHESE technology products are discussed.
Realtime photoacoustic microscopy in vivo with a 30-MHz ultrasound array transducer

PubMed Central

Zemp, Roger J.; Song, Liang; Bitton, Rachel; Shung, K. Kirk; Wang, Lihong V.

2009-01-01

We present a novel high-frequency photoacoustic microscopy system capable of imaging the microvasculature of living subjects in realtime to depths of a few mm. The system consists of a high-repetition-rate Q-switched pump laser, a tunable dye laser, a 30-MHz linear ultrasound array transducer, a multichannel high-frequency data acquisition system, and a shared-RAM multi-core-processor computer. Data acquisition, beamforming, scan conversion, and display are implemented in realtime at 50 frames per second. Clearly resolvable images of 6-µm-diameter carbon fibers are experimentally demonstrated at 80 µm separation distances. Realtime imaging performance is demonstrated on phantoms and in vivo with absorbing structures identified to depths of 2.5–3 mm. This work represents the first high-frequency realtime photoacoustic imaging system to our knowledge. PMID:18545502
The MasPar MP-1 As a Computer Arithmetic Laboratory

PubMed Central

Anuta, Michael A.; Lozier, Daniel W.; Turner, Peter R.

1996-01-01

This paper is a blueprint for the use of a massively parallel SIMD computer architecture for the simulation of various forms of computer arithmetic. The particular system used is a DEC/MasPar MP-1 with 4096 processors in a square array. This architecture has many advantages for such simulations due largely to the simplicity of the individual processors. Arithmetic operations can be spread across the processor array to simulate a hardware chip. Alternatively they may be performed on individual processors to allow simulation of a massively parallel implementation of the arithmetic. Compromises between these extremes permit speed-area tradeoffs to be examined. The paper includes a description of the architecture and its features. It then summarizes some of the arithmetic systems which have been, or are to be, implemented. The implementation of the level-index and symmetric level-index, LI and SLI, systems is described in some detail. An extensive bibliography is included. PMID:27805123

Microlens array processor with programmable weight mask and direct optical input

NASA Astrophysics Data System (ADS)

Schmid, Volker R.; Lueder, Ernst H.; Bader, Gerhard; Maier, Gert; Siegordner, Jochen

1999-03-01

We present an optical feature extraction system with a microlens array processor. The system is suitable for online implementation of a variety of transforms such as the Walsh transform and DCT. Operating with incoherent light, our processor accepts direct optical input. Employing a sandwich- like architecture, we obtain a very compact design of the optical system. The key elements of the microlens array processor are a square array of 15 X 15 spherical microlenses on acrylic substrate and a spatial light modulator as transmissive mask. The light distribution behind the mask is imaged onto the pixels of a customized a-Si image sensor with adjustable gain. We obtain one output sample for each microlens image and its corresponding weight mask area as summation of the transmitted intensity within one sensor pixel. The resulting architecture is very compact and robust like a conventional camera lens while incorporating a high degree of parallelism. We successfully demonstrate a Walsh transform into the spatial frequency domain as well as the implementation of a discrete cosine transform with digitized gray values. We provide results showing the transformation performance for both synthetic image patterns and images of natural texture samples. The extracted frequency features are suitable for neural classification of the input image. Other transforms and correlations can be implemented in real-time allowing adaptive optical signal processing.
TMVOC-MP: a parallel numerical simulator for Three-PhaseNon-isothermal Flows of Multicomponent Hydrocarbon Mixtures inporous/fractured media

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Keni; Yamamoto, Hajime; Pruess, Karsten

2008-02-15

TMVOC-MP is a massively parallel version of the TMVOC code (Pruess and Battistelli, 2002), a numerical simulator for three-phase non-isothermal flow of water, gas, and a multicomponent mixture of volatile organic chemicals (VOCs) in multidimensional heterogeneous porous/fractured media. TMVOC-MP was developed by introducing massively parallel computing techniques into TMVOC. It retains the physical process model of TMVOC, designed for applications to contamination problems that involve hydrocarbon fuels or organic solvents in saturated and unsaturated zones. TMVOC-MP can model contaminant behavior under 'natural' environmental conditions, as well as for engineered systems, such as soil vapor extraction, groundwater pumping, or steam-assisted sourcemore » remediation. With its sophisticated parallel computing techniques, TMVOC-MP can handle much larger problems than TMVOC, and can be much more computationally efficient. TMVOC-MP models multiphase fluid systems containing variable proportions of water, non-condensible gases (NCGs), and water-soluble volatile organic chemicals (VOCs). The user can specify the number and nature of NCGs and VOCs. There are no intrinsic limitations to the number of NCGs or VOCs, although the arrays for fluid components are currently dimensioned as 20, accommodating water plus 19 components that may be either NCGs or VOCs. Among them, NCG arrays are dimensioned as 10. The user may select NCGs from a data bank provided in the software. The currently available choices include O{sub 2}, N{sub 2}, CO{sub 2}, CH{sub 4}, ethane, ethylene, acetylene, and air (a pseudo-component treated with properties averaged from N{sub 2} and O{sub 2}). Thermophysical property data of VOCs can be selected from a chemical data bank, included with TMVOC-MP, that provides parameters for 26 commonly encountered chemicals. Users also can input their own data for other fluids. The fluid components may partition (volatilize and/or dissolve) among gas, aqueous, and NAPL phases. Any combination of the three phases may present, and phases may appear and disappear in the course of a simulation. In addition, VOCs may be adsorbed by the porous medium, and may biodegrade according to a simple half-life model. Detailed discussion of physical processes, assumptions, and fluid properties used in TMVOC-MP can be found in the TMVOC user's guide (Pruess and Battistelli, 2002). TMVOC-MP was developed based on the parallel framework of the TOUGH2-MP code (Zhang et al. 2001, Wu et al. 2002). It uses the MPI (Message Passing Forum, 1994) for parallel implementation. A domain decomposition approach is adopted for the parallelization. The code partitions a simulation domain, defined by an unstructured grid, using partitioning algorithm from the METIS software package (Karypsis and Kumar, 1998). In parallel simulation, each processor is in charge of one part of the simulation domain for assembling mass and energy balance equations, solving linear equation systems, updating thermophysical properties, and performing other local computations. The local linear-equation systems are solved in parallel by multiple processors with the Aztec linear solver package (Tuminaro et al., 1999). Although each processor solves the linearized equations of subdomains independently, the entire linear equation system is solved together by all processors collaboratively via communication between neighboring processors during each iteration. Detailed discussion of the prototype of the data-exchange scheme can be found in Elmroth et al. (2001). In addition, FORTRAN 90 features are introduced to TMVOC-MP, such as dynamic memory allocation, array operation, matrix manipulation, and replacing 'common blocks' (used in the original TMVOC) with modules. All new subroutines are written in FORTRAN 90. Program units imported from the original TMVOC remain in standard FORTRAN 77. This report provides a quick starting guide for using the TMVOC-MP program. We suppose that the users have basic knowledge of using the original TMVOC code. The users can find the detailed technical description of the physical processes modeled, and the mathematical and numerical methods in the user's guide for TMVOC (Pruess and Battistelli, 2002).« less
CORDIC-based digital signal processing (DSP) element for adaptive signal processing

NASA Astrophysics Data System (ADS)

Bolstad, Gregory D.; Neeld, Kenneth B.

1995-04-01

The High Performance Adaptive Weight Computation (HAWC) processing element is a CORDIC based application specific DSP element that, when connected in a linear array, can perform extremely high throughput (100s of GFLOPS) matrix arithmetic operations on linear systems of equations in real time. In particular, it very efficiently performs the numerically intense computation of optimal least squares solutions for large, over-determined linear systems. Most techniques for computing solutions to these types of problems have used either a hard-wired, non-programmable systolic array approach, or more commonly, programmable DSP or microprocessor approaches. The custom logic methods can be efficient, but are generally inflexible. Approaches using multiple programmable generic DSP devices are very flexible, but suffer from poor efficiency and high computation latencies, primarily due to the large number of DSP devices that must be utilized to achieve the necessary arithmetic throughput. The HAWC processor is implemented as a highly optimized systolic array, yet retains some of the flexibility of a programmable data-flow system, allowing efficient implementation of algorithm variations. This provides flexible matrix processing capabilities that are one to three orders of magnitude less expensive and more dense than the current state of the art, and more importantly, allows a realizable solution to matrix processing problems that were previously considered impractical to physically implement. HAWC has direct applications in RADAR, SONAR, communications, and image processing, as well as in many other types of systems.
Experience in highly parallel processing using DAP

NASA Technical Reports Server (NTRS)

Parkinson, D.

1987-01-01

Distributed Array Processors (DAP) have been in day to day use for ten years and a large amount of user experience has been gained. The profile of user applications is similar to that of the Massively Parallel Processor (MPP) working group. Experience has shown that contrary to expectations, highly parallel systems provide excellent performance on so-called dirty problems such as the physics part of meteorological codes. The reasons for this observation are discussed. The arguments against replacing bit processors with floating point processors are also discussed.
Multi-mode sensor processing on a dynamically reconfigurable massively parallel processor array

NASA Astrophysics Data System (ADS)

Chen, Paul; Butts, Mike; Budlong, Brad; Wasson, Paul

2008-04-01

This paper introduces a novel computing architecture that can be reconfigured in real time to adapt on demand to multi-mode sensor platforms' dynamic computational and functional requirements. This 1 teraOPS reconfigurable Massively Parallel Processor Array (MPPA) has 336 32-bit processors. The programmable 32-bit communication fabric provides streamlined inter-processor connections with deterministically high performance. Software programmability, scalability, ease of use, and fast reconfiguration time (ranging from microseconds to milliseconds) are the most significant advantages over FPGAs and DSPs. This paper introduces the MPPA architecture, its programming model, and methods of reconfigurability. An MPPA platform for reconfigurable computing is based on a structural object programming model. Objects are software programs running concurrently on hundreds of 32-bit RISC processors and memories. They exchange data and control through a network of self-synchronizing channels. A common application design pattern on this platform, called a work farm, is a parallel set of worker objects, with one input and one output stream. Statically configured work farms with homogeneous and heterogeneous sets of workers have been used in video compression and decompression, network processing, and graphics applications.
The Microcode for the Control Processor of the ARO (Array Oriented Processor) Array Processor.

DTIC Science & Technology

1983-08-01

oiNi .TADDR=DBASE+MODE" 4CONT ŕWAfT’ FOR MEM, MORE", MOV) ,DRO BSX "S IGN EXT, MORE" SADD D FLDSEI,(6,3),IMN TADT)R=5+ 1 JMP I NDE-’XEI) "JU> IP ’ T1...JDTV1: YIP DIVI; TDIV2: Y,’ IP DIV2; JASHII: JMP ASHI; 4 JASH2: JMP AS112; JXOR1: YIP XDRI; JXOR2: YIP XOR2; JSOB: JMP SOB; JBPL: JMP BPL; JBMI: YIP BMI;0...JBHI: JMP BHill JBLOS: J! IP BLOS; JBVC: YIP BVC; JBWS: JMP BVS; JBCC: JMP BCC; JBCS: YIP BCS; JEMT: YIP EMT; JTRAP: YIP TRAPQ; JCLR6: YIP CLR6; JCOII
Highly parallel reconfigurable computer architecture for robotic computation having plural processor cells each having right and left ensembles of plural processors

NASA Technical Reports Server (NTRS)

Fijany, Amir (Inventor); Bejczy, Antal K. (Inventor)

1994-01-01

In a computer having a large number of single-instruction multiple data (SIMD) processors, each of the SIMD processors has two sets of three individual processor elements controlled by a master control unit and interconnected among a plurality of register file units where data is stored. The register files input and output data in synchronism with a minor cycle clock under control of two slave control units controlling the register file units connected to respective ones of the two sets of processor elements. Depending upon which ones of the register file units are enabled to store or transmit data during a particular minor clock cycle, the processor elements within an SIMD processor are connected in rings or in pipeline arrays, and may exchange data with the internal bus or with neighboring SIMD processors through interface units controlled by respective ones of the two slave control units.
Development of a Receiver Processor For UAV Video Signal Acquisition and Tracking Using Digital Phased Array Antenna

DTIC Science & Technology

2010-09-01

53 Figure 26. Image of the phased array antenna...................................................................54...69 Figure 38. Computation of correction angle from array factor and sum/difference beams...71 Figure 39. Front panel of the tracking algorithm
A cost-effective methodology for the design of massively-parallel VLSI functional units

NASA Technical Reports Server (NTRS)

Venkateswaran, N.; Sriram, G.; Desouza, J.

1993-01-01

In this paper we propose a generalized methodology for the design of cost-effective massively-parallel VLSI Functional Units. This methodology is based on a technique of generating and reducing a massive bit-array on the mask-programmable PAcube VLSI array. This methodology unifies (maintains identical data flow and control) the execution of complex arithmetic functions on PAcube arrays. It is highly regular, expandable and uniform with respect to problem-size and wordlength, thereby reducing the communication complexity. The memory-functional unit interface is regular and expandable. Using this technique functional units of dedicated processors can be mask-programmed on the naked PAcube arrays, reducing the turn-around time. The production cost of such dedicated processors can be drastically reduced since the naked PAcube arrays can be mass-produced. Analysis of the the performance of functional units designed by our method yields promising results.
Method and apparatus for ultra-high-sensitivity, incremental and absolute optical encoding

NASA Technical Reports Server (NTRS)

Leviton, Douglas B. (Inventor)

1999-01-01

An absolute optical linear or rotary encoder which encodes the motion of an object (3) with increased resolution and encoding range and decreased sensitivity to damage to the scale includes a scale (5), which moves with the object and is illuminated by a light source (11). The scale carries a pattern (9) which is imaged by a microscope optical system (13) on a CCD array (17) in a camera head (15). The pattern includes both fiducial markings (31) which are identical for each period of the pattern and code areas (33) which include binary codings of numbers identifying the individual periods of the pattern. The image of the pattern formed on the CCD array is analyzed by an image processor (23) to locate the fiducial marking, decode the information encoded in the code area, and thereby determine the position of the object.
Acoustically based fetal heart rate monitor

NASA Technical Reports Server (NTRS)

Baker, Donald A.; Zuckerwar, Allan J.

1991-01-01

The acoustically based fetal heart rate monitor permits an expectant mother to perform the fetal Non-Stress Test in her home. The potential market would include the one million U.S. pregnancies per year requiring this type of prenatal surveillance. The monitor uses polyvinylidene fluoride (PVF2) piezoelectric polymer film for the acoustic sensors, which are mounted in a seven-element array on a cummerbund. Evaluation of the sensor ouput signals utilizes a digital signal processor, which performs a linear prediction routine in real time. Clinical tests reveal that the acoustically based monitor provides Non-Stress Test records which are comparable to those obtained with a commercial ultrasonic transducer.
On Mapping Homogeneous Graphs on a Linear Array-Processor Model.

DTIC Science & Technology

1983-10-01

D2 ..,Dk) is a family of ordered sets of computation vertices and DIUDzu ..UDk=VG. 2. For any D in D, if v, and v are in D then w X wxw 3. Let TD ...denote the indexing function associated with the ordered set D. For any pair of DP and Dq in D, if v. and vy are in D and Dq respectively then TD (Dp) < TD ...the indices assigned to the diagonals in D range from I to 1DI and if D is a diagonal in D then TD (Dp)=,,, that is, the index of D. in the ordering is
Bio-inspired optical rotation sensor

NASA Astrophysics Data System (ADS)

O'Carroll, David C.; Shoemaker, Patrick A.; Brinkworth, Russell S. A.

2007-01-01

Traditional approaches to calculating self-motion from visual information in artificial devices have generally relied on object identification and/or correlation of image sections between successive frames. Such calculations are computationally expensive and real-time digital implementation requires powerful processors. In contrast flies arrive at essentially the same outcome, the estimation of self-motion, in a much smaller package using vastly less power. Despite the potential advantages and a few notable successes, few neuromorphic analog VLSI devices based on biological vision have been employed in practical applications to date. This paper describes a hardware implementation in aVLSI of our recently developed adaptive model for motion detection. The chip integrates motion over a linear array of local motion processors to give a single voltage output. Although the device lacks on-chip photodetectors, it includes bias circuits to use currents from external photodiodes, and we have integrated it with a ring-array of 40 photodiodes to form a visual rotation sensor. The ring configuration reduces pattern noise and combined with the pixel-wise adaptive characteristic of the underlying circuitry, permits a robust output that is proportional to image rotational velocity over a large range of speeds, and is largely independent of either mean luminance or the spatial structure of the image viewed. In principle, such devices could be used as an element of a velocity-based servo to replace or augment inertial guidance systems in applications such as mUAVs.
Optoelectronic switch matrix as a look-up table for residue arithmetic.

PubMed

Macdonald, R I

1987-10-01

The use of optoelectronic matrix switches to perform look-up table functions in residue arithmetic processors is proposed. In this application, switchable detector arrays give the advantage of a greatly reduced requirement for optical sources by comparison with previous optoelectronic residue processors.
Resource and Performance Evaluations of Fixed Point QRD-RLS Systolic Array through FPGA Implementation

NASA Astrophysics Data System (ADS)

Yokoyama, Yoshiaki; Kim, Minseok; Arai, Hiroyuki

At present, when using space-time processing techniques with multiple antennas for mobile radio communication, real-time weight adaptation is necessary. Due to the progress of integrated circuit technology, dedicated processor implementation with ASIC or FPGA can be employed to implement various wireless applications. This paper presents a resource and performance evaluation of the QRD-RLS systolic array processor based on fixed-point CORDIC algorithm with FPGA. In this paper, to save hardware resources, we propose the shared architecture of a complex CORDIC processor. The required precision of internal calculation, the circuit area for the number of antenna elements and wordlength, and the processing speed will be evaluated. The resource estimation provides a possible processor configuration with a current FPGA on the market. Computer simulations assuming a fading channel will show a fast convergence property with a finite number of training symbols. The proposed architecture has also been implemented and its operation was verified by beamforming evaluation through a radio propagation experiment.
Microcomputer array processor system. [design for electronic warfare

NASA Technical Reports Server (NTRS)

Slezak, K. D.

1980-01-01

The microcomputer array system is discussed with specific attention given to its electronic warware applications. Several aspects of the system architecture are described as well as some of its distinctive characteristics.
Integrated Reconfigurable Aperture, Digital Beam Forming, and Software GPS Receiver for UAV Navigation

DTIC Science & Technology

2007-12-11

Implemented both carrier and code phase tracking loop for performance evaluation of a minimum power beam forming algorithm and null steering algorithm...4 Antennal Antenna2 Antenna K RF RF RF ct, Ct~2 ChKx1 X2 ....... Xk A W ~ ~ =Z, x W ,=1 Fig. 5. Schematics of a K-element antenna array spatial...adaptive processor Antennal Antenna K A N-i V/ ( Vil= .i= VK Fig. 6. Schematics of a K-element antenna array space-time adaptive processor Two additional
Electronic neural network for solving traveling salesman and similar global optimization problems

NASA Technical Reports Server (NTRS)

Thakoor, Anilkumar P. (Inventor); Moopenn, Alexander W. (Inventor); Duong, Tuan A. (Inventor); Eberhardt, Silvio P. (Inventor)

1993-01-01

This invention is a novel high-speed neural network based processor for solving the 'traveling salesman' and other global optimization problems. It comprises a novel hybrid architecture employing a binary synaptic array whose embodiment incorporates the fixed rules of the problem, such as the number of cities to be visited. The array is prompted by analog voltages representing variables such as distances. The processor incorporates two interconnected feedback networks, each of which solves part of the problem independently and simultaneously, yet which exchange information dynamically.
Systolic Processor Array For Recognition Of Spectra

NASA Technical Reports Server (NTRS)

Chow, Edward T.; Peterson, John C.

1995-01-01

Spectral signatures of materials detected and identified quickly. Spectral Analysis Systolic Processor Array (SPA2) relatively inexpensive and satisfies need to analyze large, complex volume of multispectral data generated by imaging spectrometers to extract desired information: computational performance needed to do this in real time exceeds that of current supercomputers. Locates highly similar segments or contiguous subsegments in two different spectra at time. Compares sampled spectra from instruments with data base of spectral signatures of known materials. Computes and reports scores that express degrees of similarity between sampled and data-base spectra.
A Versatile Multichannel Digital Signal Processing Module for Microcalorimeter Arrays

NASA Astrophysics Data System (ADS)

Tan, H.; Collins, J. W.; Walby, M.; Hennig, W.; Warburton, W. K.; Grudberg, P.

2012-06-01

Different techniques have been developed for reading out microcalorimeter sensor arrays: individual outputs for small arrays, and time-division or frequency-division or code-division multiplexing for large arrays. Typically, raw waveform data are first read out from the arrays using one of these techniques and then stored on computer hard drives for offline optimum filtering, leading not only to requirements for large storage space but also limitations on achievable count rate. Thus, a read-out module that is capable of processing microcalorimeter signals in real time will be highly desirable. We have developed multichannel digital signal processing electronics that are capable of on-board, real time processing of microcalorimeter sensor signals from multiplexed or individual pixel arrays. It is a 3U PXI module consisting of a standardized core processor board and a set of daughter boards. Each daughter board is designed to interface a specific type of microcalorimeter array to the core processor. The combination of the standardized core plus this set of easily designed and modified daughter boards results in a versatile data acquisition module that not only can easily expand to future detector systems, but is also low cost. In this paper, we first present the core processor/daughter board architecture, and then report the performance of an 8-channel daughter board, which digitizes individual pixel outputs at 1 MSPS with 16-bit precision. We will also introduce a time-division multiplexing type daughter board, which takes in time-division multiplexing signals through fiber-optic cables and then processes the digital signals to generate energy spectra in real time.

Optical laboratory solution and error model simulation of a linear time-varying finite element equation

NASA Technical Reports Server (NTRS)

Taylor, B. K.; Casasent, D. P.

1989-01-01

The use of simplified error models to accurately simulate and evaluate the performance of an optical linear-algebra processor is described. The optical architecture used to perform banded matrix-vector products is reviewed, along with a linear dynamic finite-element case study. The laboratory hardware and ac-modulation technique used are presented. The individual processor error-source models and their simulator implementation are detailed. Several significant simplifications are introduced to ease the computational requirements and complexity of the simulations. The error models are verified with a laboratory implementation of the processor, and are used to evaluate its potential performance.
Optical backplane interconnect switch for data processors and computers

NASA Technical Reports Server (NTRS)

Hendricks, Herbert D.; Benz, Harry F.; Hammer, Jacob M.

1989-01-01

An optoelectronic integrated device design is reported which can be used to implement an all-optical backplane interconnect switch. The switch is sized to accommodate an array of processors and memories suitable for direct replacement into the basic avionic multiprocessor backplane. The optical backplane interconnect switch is also suitable for direct replacement of the PI bus traffic switch and at the same time, suitable for supporting pipelining of the processor and memory. The 32 bidirectional switchable interconnects are configured with broadcast capability for controls, reconfiguration, and messages. The approach described here can handle a serial interconnection of data processors or a line-to-link interconnection of data processors. An optical fiber demonstration of this approach is presented.
Noise limitations in optical linear algebra processors.

PubMed

Batsell, S G; Jong, T L; Walkup, J F; Krile, T F

1990-05-10

A general statistical noise model is presented for optical linear algebra processors. A statistical analysis which includes device noise, the multiplication process, and the addition operation is undertaken. We focus on those processes which are architecturally independent. Finally, experimental results which verify the analytical predictions are also presented.
Low-Latency Embedded Vision Processor (LLEVS)

DTIC Science & Technology

2016-03-01

26 3.2.3 Task 3 Projected Performance Analysis of FPGA- based Vision Processor ........... 31 3.2.3.1 Algorithms Latency Analysis ...Programmable Gate Array Custom Hardware for Real- Time Multiresolution Analysis . ............................................... 35...conduct data analysis for performance projections. The data acquired through measurements , simulation and estimation provide the requisite platform for
Architecture and data processing alternatives for Tse computer. Volume 1: Tse logic design concepts and the development of image processing machine architectures

NASA Technical Reports Server (NTRS)

Rickard, D. A.; Bodenheimer, R. E.

1976-01-01

Digital computer components which perform two dimensional array logic operations (Tse logic) on binary data arrays are described. The properties of Golay transforms which make them useful in image processing are reviewed, and several architectures for Golay transform processors are presented with emphasis on the skeletonizing algorithm. Conventional logic control units developed for the Golay transform processors are described. One is a unique microprogrammable control unit that uses a microprocessor to control the Tse computer. The remaining control units are based on programmable logic arrays. Performance criteria are established and utilized to compare the various Golay transform machines developed. A critique of Tse logic is presented, and recommendations for additional research are included.
Design of a MIMD neural network processor

NASA Astrophysics Data System (ADS)

Saeks, Richard E.; Priddy, Kevin L.; Pap, Robert M.; Stowell, S.

1994-03-01

The Accurate Automation Corporation (AAC) neural network processor (NNP) module is a fully programmable multiple instruction multiple data (MIMD) parallel processor optimized for the implementation of neural networks. The AAC NNP design fully exploits the intrinsic sparseness of neural network topologies. Moreover, by using a MIMD parallel processing architecture one can update multiple neurons in parallel with efficiency approaching 100 percent as the size of the network increases. Each AAC NNP module has 8 K neurons and 32 K interconnections and is capable of 140,000,000 connections per second with an eight processor array capable of over one billion connections per second.
Optical linear algebra processors: noise and error-source modeling.

PubMed

Casasent, D; Ghosh, A

1985-06-01

The modeling of system and component noise and error sources in optical linear algebra processors (OLAP's) are considered, with attention to the frequency-multiplexed OLAP. General expressions are obtained for the output produced as a function of various component errors and noise. A digital simulator for this model is discussed.
Optical linear algebra processors - Noise and error-source modeling

NASA Technical Reports Server (NTRS)

Casasent, D.; Ghosh, A.

1985-01-01

The modeling of system and component noise and error sources in optical linear algebra processors (OLAPs) are considered, with attention to the frequency-multiplexed OLAP. General expressions are obtained for the output produced as a function of various component errors and noise. A digital simulator for this model is discussed.
Parallel processor for real-time structural control

NASA Astrophysics Data System (ADS)

Tise, Bert L.

1993-07-01

A parallel processor that is optimized for real-time linear control has been developed. This modular system consists of A/D modules, D/A modules, and floating-point processor modules. The scalable processor uses up to 1,000 Motorola DSP96002 floating-point processors for a peak computational rate of 60 GFLOPS. Sampling rates up to 625 kHz are supported by this analog-in to analog-out controller. The high processing rate and parallel architecture make this processor suitable for computing state-space equations and other multiply/accumulate-intensive digital filters. Processor features include 14-bit conversion devices, low input-to-output latency, 240 Mbyte/s synchronous backplane bus, low-skew clock distribution circuit, VME connection to host computer, parallelizing code generator, and look- up-tables for actuator linearization. This processor was designed primarily for experiments in structural control. The A/D modules sample sensors mounted on the structure and the floating- point processor modules compute the outputs using the programmed control equations. The outputs are sent through the D/A module to the power amps used to drive the structure's actuators. The host computer is a Sun workstation. An OpenWindows-based control panel is provided to facilitate data transfer to and from the processor, as well as to control the operating mode of the processor. A diagnostic mode is provided to allow stimulation of the structure and acquisition of the structural response via sensor inputs.
Spaceborne Processor Array

NASA Technical Reports Server (NTRS)

Chow, Edward T.; Schatzel, Donald V.; Whitaker, William D.; Sterling, Thomas

2008-01-01

A Spaceborne Processor Array in Multifunctional Structure (SPAMS) can lower the total mass of the electronic and structural overhead of spacecraft, resulting in reduced launch costs, while increasing the science return through dynamic onboard computing. SPAMS integrates the multifunctional structure (MFS) and the Gilgamesh Memory, Intelligence, and Network Device (MIND) multi-core in-memory computer architecture into a single-system super-architecture. This transforms every inch of a spacecraft into a sharable, interconnected, smart computing element to increase computing performance while simultaneously reducing mass. The MIND in-memory architecture provides a foundation for high-performance, low-power, and fault-tolerant computing. The MIND chip has an internal structure that includes memory, processing, and communication functionality. The Gilgamesh is a scalable system comprising multiple MIND chips interconnected to operate as a single, tightly coupled, parallel computer. The array of MIND components shares a global, virtual name space for program variables and tasks that are allocated at run time to the distributed physical memory and processing resources. Individual processor- memory nodes can be activated or powered down at run time to provide active power management and to configure around faults. A SPAMS system is comprised of a distributed Gilgamesh array built into MFS, interfaces into instrument and communication subsystems, a mass storage interface, and a radiation-hardened flight computer.
Design and implementation of highly parallel pipelined VLSI systems

NASA Astrophysics Data System (ADS)

Delange, Alphonsus Anthonius Jozef

A methodology and its realization as a prototype CAD (Computer Aided Design) system for the design and analysis of complex multiprocessor systems is presented. The design is an iterative process in which the behavioral specifications of the system components are refined into structural descriptions consisting of interconnections and lower level components etc. A model for the representation and analysis of multiprocessor systems at several levels of abstraction and an implementation of a CAD system based on this model are described. A high level design language, an object oriented development kit for tool design, a design data management system, and design and analysis tools such as a high level simulator and graphics design interface which are integrated into the prototype system and graphics interface are described. Procedures for the synthesis of semiregular processor arrays, and to compute the switching of input/output signals, memory management and control of processor array, and sequencing and segmentation of input/output data streams due to partitioning and clustering of the processor array during the subsequent synthesis steps, are described. The architecture and control of a parallel system is designed and each component mapped to a module or module generator in a symbolic layout library, compacted for design rules of VLSI (Very Large Scale Integration) technology. An example of the design of a processor that is a useful building block for highly parallel pipelined systems in the signal/image processing domains is given.
Image processing using Gallium Arsenide (GaAs) technology

NASA Technical Reports Server (NTRS)

Miller, Warner H.

1989-01-01

The need to increase the information return from space-borne imaging systems has increased in the past decade. The use of multi-spectral data has resulted in the need for finer spatial resolution and greater spectral coverage. Onboard signal processing will be necessary in order to utilize the available Tracking and Data Relay Satellite System (TDRSS) communication channel at high efficiency. A generally recognized approach to the increased efficiency of channel usage is through data compression techniques. The compression technique implemented is a differential pulse code modulation (DPCM) scheme with a non-uniform quantizer. The need to advance the state-of-the-art of onboard processing was recognized and a GaAs integrated circuit technology was chosen. An Adaptive Programmable Processor (APP) chip set was developed which is based on an 8-bit slice general processor. The reason for choosing the compression technique for the Multi-spectral Linear Array (MLA) instrument is described. Also a description is given of the GaAs integrated circuit chip set which will demonstrate that data compression can be performed onboard in real time at data rate in the order of 500 Mb/s.
Single-Event Transient Testing of Low Dropout PNP Series Linear Voltage Regulators

NASA Technical Reports Server (NTRS)

Adell, Philippe; Allen, Gregory

2013-01-01

As demand for high-speed, on-board, digital-processing integrated circuits on spacecraft increases (field-programmable gate arrays and digital signal processors in particular), the need for the next generation point-of-load (POL) regulator becomes a prominent design issue. Shrinking process nodes have resulted in core rails dropping to values close to 1.0 V, drastically reducing margin to standard switching converters or regulators that power digital ICs. The goal of this task is to perform SET characterization of several commercial POL converters, and provide a discussion of the impact of these results to state-of-the-art digital processing IC through laser and heavy ion testing
Fast, Massively Parallel Data Processors

NASA Technical Reports Server (NTRS)

Heaton, Robert A.; Blevins, Donald W.; Davis, ED

1994-01-01

Proposed fast, massively parallel data processor contains 8x16 array of processing elements with efficient interconnection scheme and options for flexible local control. Processing elements communicate with each other on "X" interconnection grid with external memory via high-capacity input/output bus. This approach to conditional operation nearly doubles speed of various arithmetic operations.
Case for a field-programmable gate array multicore hybrid machine for an image-processing application

NASA Astrophysics Data System (ADS)

Rakvic, Ryan N.; Ives, Robert W.; Lira, Javier; Molina, Carlos

2011-01-01

General purpose computer designers have recently begun adding cores to their processors in order to increase performance. For example, Intel has adopted a homogeneous quad-core processor as a base for general purpose computing. PlayStation3 (PS3) game consoles contain a multicore heterogeneous processor known as the Cell, which is designed to perform complex image processing algorithms at a high level. Can modern image-processing algorithms utilize these additional cores? On the other hand, modern advancements in configurable hardware, most notably field-programmable gate arrays (FPGAs) have created an interesting question for general purpose computer designers. Is there a reason to combine FPGAs with multicore processors to create an FPGA multicore hybrid general purpose computer? Iris matching, a repeatedly executed portion of a modern iris-recognition algorithm, is parallelized on an Intel-based homogeneous multicore Xeon system, a heterogeneous multicore Cell system, and an FPGA multicore hybrid system. Surprisingly, the cheaper PS3 slightly outperforms the Intel-based multicore on a core-for-core basis. However, both multicore systems are beaten by the FPGA multicore hybrid system by >50%.
An optical/digital processor - Hardware and applications

NASA Technical Reports Server (NTRS)

Casasent, D.; Sterling, W. M.

1975-01-01

A real-time two-dimensional hybrid processor consisting of a coherent optical system, an optical/digital interface, and a PDP-11/15 control minicomputer is described. The input electrical-to-optical transducer is an electron-beam addressed potassium dideuterium phosphate (KD2PO4) light valve. The requirements and hardware for the output optical-to-digital interface, which is constructed from modular computer building blocks, are presented. Initial experimental results demonstrating the operation of this hybrid processor in phased-array radar data processing, synthetic-aperture image correlation, and text correlation are included. The applications chosen emphasize the role of the interface in the analysis of data from an optical processor and possible extensions to the digital feedback control of an optical processor.
Periodic Application of Concurrent Error Detection in Processor Array Architectures. PhD. Thesis -

NASA Technical Reports Server (NTRS)

Chen, Paul Peichuan

1993-01-01

Processor arrays can provide an attractive architecture for some applications. Featuring modularity, regular interconnection and high parallelism, such arrays are well-suited for VLSI/WSI implementations, and applications with high computational requirements, such as real-time signal processing. Preserving the integrity of results can be of paramount importance for certain applications. In these cases, fault tolerance should be used to ensure reliable delivery of a system's service. One aspect of fault tolerance is the detection of errors caused by faults. Concurrent error detection (CED) techniques offer the advantage that transient and intermittent faults may be detected with greater probability than with off-line diagnostic tests. Applying time-redundant CED techniques can reduce hardware redundancy costs. However, most time-redundant CED techniques degrade a system's performance.
Fast and robust control of nanopositioning systems: Performance limits enabled by field programmable analog arrays.

PubMed

Baranwal, Mayank; Gorugantu, Ram S; Salapaka, Srinivasa M

2015-08-01

This paper aims at control design and its implementation for robust high-bandwidth precision (nanoscale) positioning systems. Even though modern model-based control theoretic designs for robust broadband high-resolution positioning have enabled orders of magnitude improvement in performance over existing model independent designs, their scope is severely limited by the inefficacies of digital implementation of the control designs. High-order control laws that result from model-based designs typically have to be approximated with reduced-order systems to facilitate digital implementation. Digital systems, even those that have very high sampling frequencies, provide low effective control bandwidth when implementing high-order systems. In this context, field programmable analog arrays (FPAAs) provide a good alternative to the use of digital-logic based processors since they enable very high implementation speeds, moreover with cheaper resources. The superior flexibility of digital systems in terms of the implementable mathematical and logical functions does not give significant edge over FPAAs when implementing linear dynamic control laws. In this paper, we pose the control design objectives for positioning systems in different configurations as optimal control problems and demonstrate significant improvements in performance when the resulting control laws are applied using FPAAs as opposed to their digital counterparts. An improvement of over 200% in positioning bandwidth is achieved over an earlier digital signal processor (DSP) based implementation for the same system and same control design, even when for the DSP-based system, the sampling frequency is about 100 times the desired positioning bandwidth.
Optical signal processing of spatially distributed sensor data in smart structures

NASA Technical Reports Server (NTRS)

Bennett, K. D.; Claus, R. O.; Murphy, K. A.; Goette, A. M.

1989-01-01

Smart structures which contain dense two- or three-dimensional arrays of attached or embedded sensor elements inherently require signal multiplexing and processing capabilities to permit good spatial data resolution as well as the adequately short calculation times demanded by real time active feedback actuator drive circuitry. This paper reports the implementation of an in-line optical signal processor and its application in a structural sensing system which incorporates multiple discrete optical fiber sensor elements. The signal processor consists of an array of optical fiber couplers having tailored s-parameters and arranged to allow gray code amplitude scaling of sensor inputs. The use of this signal processor in systems designed to indicate the location of distributed strain and damage in composite materials, as well as to quantitatively characterize that damage, is described. Extension of similar signal processing methods to more complicated smart materials and structures applications are discussed.
Image-Based Focusing

NASA Astrophysics Data System (ADS)

Selker, Ted

1983-05-01

Lens focusing using a hardware model of a retina (Reticon RL256 light sensitive array) with a low cost processor (8085 with 512 bytes of ROM and 512 bytes of RAM) was built. This system was developed and tested on a variety of visual stimuli to demonstrate that: a)an algorithm which moves a lens to maximize the sum of the difference of light level on adjacent light sensors will converge to best focus in all but contrived situations. This is a simpler algorithm than any previously suggested; b) it is feasible to use unmodified video sensor arrays with in-expensive processors to aid video camera use. In the future, software could be developed to extend the processor's usefulness, possibly to track an actor by panning and zooming to give a earners operator increased ease of framing; c) lateral inhibition is an adequate basis for determining best focus. This supports a simple anatomically motivated model of how our brain focuses our eyes.

Linear Approximation SAR Azimuth Processing Study

NASA Technical Reports Server (NTRS)

Lindquist, R. B.; Masnaghetti, R. K.; Belland, E.; Hance, H. V.; Weis, W. G.

1979-01-01

A segmented linear approximation of the quadratic phase function that is used to focus the synthetic antenna of a SAR was studied. Ideal focusing, using a quadratic varying phase focusing function during the time radar target histories are gathered, requires a large number of complex multiplications. These can be largely eliminated by using linear approximation techniques. The result is a reduced processor size and chip count relative to ideally focussed processing and a correspondingly increased feasibility for spaceworthy implementation. A preliminary design and sizing for a spaceworthy linear approximation SAR azimuth processor meeting requirements similar to those of the SEASAT-A SAR was developed. The study resulted in a design with approximately 1500 IC's, 1.2 cubic feet of volume, and 350 watts of power for a single look, 4000 range cell azimuth processor with 25 meters resolution.
Parallel processor for real-time structural control

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tise, B.L.

1992-01-01

A parallel processor that is optimized for real-time linear control has been developed. This modular system consists of A/D modules, D/A modules, and floating-point processor modules. The scalable processor uses up to 1,000 Motorola DSP96002 floating-point processors for a peak computational rate of 60 GFLOPS. Sampling rates up to 625 kHz are supported by this analog-in to analog-out controller. The high processing rate and parallel architecture make this processor suitable for computing state-space equations and other multiply/accumulate-intensive digital filters. Processor features include 14-bit conversion devices, low input-output latency, 240 Mbyte/s synchronous backplane bus, low-skew clock distribution circuit, VME connection tomore » host computer, parallelizing code generator, and look-up-tables for actuator linearization. This processor was designed primarily for experiments in structural control. The A/D modules sample sensors mounted on the structure and the floating-point processor modules compute the outputs using the programmed control equations. The outputs are sent through the D/A module to the power amps used to drive the structure's actuators. The host computer is a Sun workstation. An Open Windows-based control panel is provided to facilitate data transfer to and from the processor, as well as to control the operating mode of the processor. A diagnostic mode is provided to allow stimulation of the structure and acquisition of the structural response via sensor inputs.« less
Solving very large, sparse linear systems on mesh-connected parallel computers

NASA Technical Reports Server (NTRS)

Opsahl, Torstein; Reif, John

1987-01-01

The implementation of Pan and Reif's Parallel Nested Dissection (PND) algorithm on mesh connected parallel computers is described. This is the first known algorithm that allows very large, sparse linear systems of equations to be solved efficiently in polylog time using a small number of processors. How the processor bound of PND can be matched to the number of processors available on a given parallel computer by slowing down the algorithm by constant factors is described. Also, for the important class of problems where G(A) is a grid graph, a unique memory mapping that reduces the inter-processor communication requirements of PND to those that can be executed on mesh connected parallel machines is detailed. A description of an implementation on the Goodyear Massively Parallel Processor (MPP), located at Goddard is given. Also, a detailed discussion of data mappings and performance issues is given.
NbN A/D Conversion of IR Focal Plane Sensor Signal at 10 K

NASA Technical Reports Server (NTRS)

Eaton, L.; Durand, D.; Sandell, R.; Spargo, J.; Krabach, T.

1994-01-01

We are implementing a 12 bit SFQ counting ADC with parallel-to-serial readout using our established 10 K NbN capability. This circuit provides a key element of the analog signal processor (ASP) used in large infrared focal plane arrays. The circuit processes the signal data stream from a Si:As BIB detector array. A 10 mega samples per second (MSPS) pixel data stream flows from the chip at a 120 megabit bit rate in a format that is compatible with other superconductive time dependent processor (TDP) circuits being developed. We will discuss our planned ASP demonstration, the circuit design, and test results.
A Real-Time Capable Software-Defined Receiver Using GPU for Adaptive Anti-Jam GPS Sensors

PubMed Central

Seo, Jiwon; Chen, Yu-Hsuan; De Lorenzo, David S.; Lo, Sherman; Enge, Per; Akos, Dennis; Lee, Jiyun

2011-01-01

Due to their weak received signal power, Global Positioning System (GPS) signals are vulnerable to radio frequency interference. Adaptive beam and null steering of the gain pattern of a GPS antenna array can significantly increase the resistance of GPS sensors to signal interference and jamming. Since adaptive array processing requires intensive computational power, beamsteering GPS receivers were usually implemented using hardware such as field-programmable gate arrays (FPGAs). However, a software implementation using general-purpose processors is much more desirable because of its flexibility and cost effectiveness. This paper presents a GPS software-defined radio (SDR) with adaptive beamsteering capability for anti-jam applications. The GPS SDR design is based on an optimized desktop parallel processing architecture using a quad-core Central Processing Unit (CPU) coupled with a new generation Graphics Processing Unit (GPU) having massively parallel processors. This GPS SDR demonstrates sufficient computational capability to support a four-element antenna array and future GPS L5 signal processing in real time. After providing the details of our design and optimization schemes for future GPU-based GPS SDR developments, the jamming resistance of our GPS SDR under synthetic wideband jamming is presented. Since the GPS SDR uses commercial-off-the-shelf hardware and processors, it can be easily adopted in civil GPS applications requiring anti-jam capabilities. PMID:22164116
A real-time capable software-defined receiver using GPU for adaptive anti-jam GPS sensors.

PubMed

Seo, Jiwon; Chen, Yu-Hsuan; De Lorenzo, David S; Lo, Sherman; Enge, Per; Akos, Dennis; Lee, Jiyun

2011-01-01

Due to their weak received signal power, Global Positioning System (GPS) signals are vulnerable to radio frequency interference. Adaptive beam and null steering of the gain pattern of a GPS antenna array can significantly increase the resistance of GPS sensors to signal interference and jamming. Since adaptive array processing requires intensive computational power, beamsteering GPS receivers were usually implemented using hardware such as field-programmable gate arrays (FPGAs). However, a software implementation using general-purpose processors is much more desirable because of its flexibility and cost effectiveness. This paper presents a GPS software-defined radio (SDR) with adaptive beamsteering capability for anti-jam applications. The GPS SDR design is based on an optimized desktop parallel processing architecture using a quad-core Central Processing Unit (CPU) coupled with a new generation Graphics Processing Unit (GPU) having massively parallel processors. This GPS SDR demonstrates sufficient computational capability to support a four-element antenna array and future GPS L5 signal processing in real time. After providing the details of our design and optimization schemes for future GPU-based GPS SDR developments, the jamming resistance of our GPS SDR under synthetic wideband jamming is presented. Since the GPS SDR uses commercial-off-the-shelf hardware and processors, it can be easily adopted in civil GPS applications requiring anti-jam capabilities.
Multiple Embedded Processors for Fault-Tolerant Computing

NASA Technical Reports Server (NTRS)

Bolotin, Gary; Watson, Robert; Katanyoutanant, Sunant; Burke, Gary; Wang, Mandy

2005-01-01

A fault-tolerant computer architecture has been conceived in an effort to reduce vulnerability to single-event upsets (spurious bit flips caused by impingement of energetic ionizing particles or photons). As in some prior fault-tolerant architectures, the redundancy needed for fault tolerance is obtained by use of multiple processors in one computer. Unlike prior architectures, the multiple processors are embedded in a single field-programmable gate array (FPGA). What makes this new approach practical is the recent commercial availability of FPGAs that are capable of having multiple embedded processors. A working prototype (see figure) consists of two embedded IBM PowerPC 405 processor cores and a comparator built on a Xilinx Virtex-II Pro FPGA. This relatively simple instantiation of the architecture implements an error-detection scheme. A planned future version, incorporating four processors and two comparators, would correct some errors in addition to detecting them.
Fault detection and bypass in a sequence information signal processor

NASA Technical Reports Server (NTRS)

Peterson, John C. (Inventor); Chow, Edward T. (Inventor)

1992-01-01

The invention comprises a plurality of scan registers, each such register respectively associated with a processor element; an on-chip comparator, encoder and fault bypass register. Each scan register generates a unitary signal the logic state of which depends on the correctness of the input from the previous processor in the systolic array. These unitary signals are input to a common comparator which generates an output indicating whether or not an error has occurred. These unitary signals are also input to an encoder which identifies the location of any fault detected so that an appropriate multiplexer can be switched to bypass the faulty processor element. Input scan data can be readily programmed to fully exercise all of the processor elements so that no fault can remain undetected.
Effect of processor temperature on film dosimetry

DOE Office of Scientific and Technical Information (OSTI.GOV)

Srivastava, Shiv P.; Das, Indra J., E-mail: idas@iupui.edu

2012-07-01

Optical density (OD) of a radiographic film plays an important role in radiation dosimetry, which depends on various parameters, including beam energy, depth, field size, film batch, dose, dose rate, air film interface, postexposure processing time, and temperature of the processor. Most of these parameters have been studied for Kodak XV and extended dose range (EDR) films used in radiation oncology. There is very limited information on processor temperature, which is investigated in this study. Multiple XV and EDR films were exposed in the reference condition (d{sub max.}, 10 Multiplication-Sign 10 cm{sup 2}, 100 cm) to a given dose. Anmore » automatic film processor (X-Omat 5000) was used for processing films. The temperature of the processor was adjusted manually with increasing temperature. At each temperature, a set of films was processed to evaluate OD at a given dose. For both films, OD is a linear function of processor temperature in the range of 29.4-40.6 Degree-Sign C (85-105 Degree-Sign F) for various dose ranges. The changes in processor temperature are directly related to the dose by a quadratic function. A simple linear equation is provided for the changes in OD vs. processor temperature, which could be used for correcting dose in radiation dosimetry when film is used.« less
FPGA-based multiprocessor system for injection molding control.

PubMed

Muñoz-Barron, Benigno; Morales-Velazquez, Luis; Romero-Troncoso, Rene J; Rodriguez-Donate, Carlos; Trejo-Hernandez, Miguel; Benitez-Rangel, Juan P; Osornio-Rios, Roque A

2012-10-18

The plastic industry is a very important manufacturing sector and injection molding is a widely used forming method in that industry. The contribution of this work is the development of a strategy to retrofit control of an injection molding machine based on an embedded system microprocessors sensor network on a field programmable gate array (FPGA) device. Six types of embedded processors are included in the system: a smart-sensor processor, a micro fuzzy logic controller, a programmable logic controller, a system manager, an IO processor and a communication processor. Temperature, pressure and position are controlled by the proposed system and experimentation results show its feasibility and robustness. As validation of the present work, a particular sample was successfully injected.
Development of a 64 channel ultrasonic high frequency linear array imaging system.

PubMed

Hu, ChangHong; Zhang, Lequan; Cannata, Jonathan M; Yen, Jesse; Shung, K Kirk

2011-12-01

In order to improve the lateral resolution and extend the field of view of a previously reported 48 element 30 MHz ultrasound linear array and 16-channel digital imaging system, the development of a 256 element 30 MHz linear array and an ultrasound imaging system with increased channel count has been undertaken. This paper reports the design and testing of a 64 channel digital imaging system which consists of an analog front-end pulser/receiver, 64 channels of Time-Gain Compensation (TGC), 64 channels of high-speed digitizer as well as a beamformer. A Personal Computer (PC) is used as the user interface to display real-time images. This system is designed as a platform for the purpose of testing the performance of high frequency linear arrays that have been developed in house. Therefore conventional approaches were taken it its implementation. Flexibility and ease of use are of primary concern whereas consideration of cost-effectiveness and novelty in design are only secondary. Even so, there are many issues at higher frequencies but do not exist at lower frequencies need to be solved. The system provides 64 channels of excitation pulsers while receiving simultaneously at a 20-120 MHz sampling rate to 12-bits. The digitized data from all channels are first fed through Field Programmable Gate Arrays (FPGAs), and then stored in memories. These raw data are accessed by the beamforming processor to re-build the image or to be downloaded to the PC for further processing. The beamformer that applies delays to the echoes of each channel is implemented with the strategy that combines coarse (8.3 ns) and fine delays (2 ns). The coarse delays are integer multiples of the sampling clock rate and are achieved by controlling the write enable pin of the First-In-First-Out (FIFO) memory to obtain valid beamforming data. The fine delays are accomplished with interpolation filters. This system is capable of achieving a maximum frame rate of 50 frames per second. Wire phantom images acquired with this system show a spatial resolution of 146 μm (lateral) and 54 μm (axial). Images with excised rabbit and pig eyeball as well as mouse embryo were also acquired to demonstrate its imaging capability. Copyright © 2011 Elsevier B.V. All rights reserved.
Development of a 64 channel ultrasonic high frequency linear array imaging system

PubMed Central

Hu, ChangHong; Zhang, Lequan; Cannata, Jonathan M.; Yen, Jesse; Shung, K. Kirk

2011-01-01

In order to improve the lateral resolution and extend the field of view of a previously reported 48 element 30 MHz ultrasound linear array and 16-channel digital imaging system, the development of a 256 element 30 MHz linear array and an ultrasound imaging system with increased channel count has been undertaken. This paper reports the design and testing of a 64 channel digital imaging system which consists of an analog front-end pulser/receiver, 64 channels of Time-Gain Compensation (TGC), 64 channels of high-speed digitizer as well as a beamformer. A Personal Computer (PC) is used as the user interface to display real-time images. This system is designed as a platform for the purpose of testing the performance of high frequency linear arrays that have been developed in house. Therefore conventional approaches were taken it its implementation. Flexibility and ease of use are of primary concern whereas consideration of cost-effectiveness and novelty in design are only secondary. Even so, there are many issues at higher frequencies but do not exist at lower frequencies need to be solved. The system provides 64 channels of excitation pulsers while receiving simultaneously at a 20 MHz–120 MHz sampling rate to 12-bits. The digitized data from all channels are first fed through Field Programmable Gate Arrays (FPGAs), and then stored in memories. These raw data are accessed by the beamforming processor to re-build the image or to be downloaded to the PC for further processing. The beamformer that applies delays to the echoes of each channel is implemented with the strategy that combines coarse (8.3ns) and fine delays (2 ns). The coarse delays are integer multiples of the sampling clock rate and are achieved by controlling the write enable pin of the First-In-First-Out (FIFO) memory to obtain valid beamforming data. The fine delays are accomplished with interpolation filters. This system is capable of achieving a maximum frame rate of 50 frames per second. Wire phantom images acquired with this system show a spatial resolution of 146 μm (lateral) and 54 μm (axial). Images with excised rabbit and pig eyeball as well as mouse embryo were also acquired to demonstrate its imaging capability. PMID:21684568
Benchmarking GNU Radio Kernels and Multi-Processor Scheduling

DTIC Science & Technology

2013-01-14

AMD E350 APU , comparable to Atom • ARM Cortex A8 running on a Gumstix Overo on an Ettus USRP E110 The general testing procedure consists of • Build...Intel Atom, and the AMD E350 APU . 3.2 Multi-Processor Scheduling Figure 1: GFLOPs per second through an FFT array on an Intel i7. Example output from
Design and implementation of a high performance network security processor

NASA Astrophysics Data System (ADS)

Wang, Haixin; Bai, Guoqiang; Chen, Hongyi

2010-03-01

The last few years have seen many significant progresses in the field of application-specific processors. One example is network security processors (NSPs) that perform various cryptographic operations specified by network security protocols and help to offload the computation intensive burdens from network processors (NPs). This article presents a high performance NSP system architecture implementation intended for both internet protocol security (IPSec) and secure socket layer (SSL) protocol acceleration, which are widely employed in virtual private network (VPN) and e-commerce applications. The efficient dual one-way pipelined data transfer skeleton and optimised integration scheme of the heterogenous parallel crypto engine arrays lead to a Gbps rate NSP, which is programmable with domain specific descriptor-based instructions. The descriptor-based control flow fragments large data packets and distributes them to the crypto engine arrays, which fully utilises the parallel computation resources and improves the overall system data throughput. A prototyping platform for this NSP design is implemented with a Xilinx XC3S5000 based FPGA chip set. Results show that the design gives a peak throughput for the IPSec ESP tunnel mode of 2.85 Gbps with over 2100 full SSL handshakes per second at a clock rate of 95 MHz.
Finite elements and the method of conjugate gradients on a concurrent processor

NASA Technical Reports Server (NTRS)

Lyzenga, G. A.; Raefsky, A.; Hager, G. H.

1985-01-01

An algorithm for the iterative solution of finite element problems on a concurrent processor is presented. The method of conjugate gradients is used to solve the system of matrix equations, which is distributed among the processors of a MIMD computer according to an element-based spatial decomposition. This algorithm is implemented in a two-dimensional elastostatics program on the Caltech Hypercube concurrent processor. The results of tests on up to 32 processors show nearly linear concurrent speedup, with efficiencies over 90 percent for sufficiently large problems.
Finite elements and the method of conjugate gradients on a concurrent processor

NASA Technical Reports Server (NTRS)

Lyzenga, G. A.; Raefsky, A.; Hager, B. H.

1984-01-01

An algorithm for the iterative solution of finite element problems on a concurrent processor is presented. The method of conjugate gradients is used to solve the system of matrix equations, which is distributed among the processors of a MIMD computer according to an element-based spatial decomposition. This algorithm is implemented in a two-dimensional elastostatics program on the Caltech Hypercube concurrent processor. The results of tests on up to 32 processors show nearly linear concurrent speedup, with efficiencies over 90% for sufficiently large problems.
BLAS- BASIC LINEAR ALGEBRA SUBPROGRAMS

NASA Technical Reports Server (NTRS)

Krogh, F. T.

1994-01-01

The Basic Linear Algebra Subprogram (BLAS) library is a collection of FORTRAN callable routines for employing standard techniques in performing the basic operations of numerical linear algebra. The BLAS library was developed to provide a portable and efficient source of basic operations for designers of programs involving linear algebraic computations. The subprograms available in the library cover the operations of dot product, multiplication of a scalar and a vector, vector plus a scalar times a vector, Givens transformation, modified Givens transformation, copy, swap, Euclidean norm, sum of magnitudes, and location of the largest magnitude element. Since these subprograms are to be used in an ANSI FORTRAN context, the cases of single precision, double precision, and complex data are provided for. All of the subprograms have been thoroughly tested and produce consistent results even when transported from machine to machine. BLAS contains Assembler versions and FORTRAN test code for any of the following compilers: Lahey F77L, Microsoft FORTRAN, or IBM Professional FORTRAN. It requires the Microsoft Macro Assembler and a math co-processor. The PC implementation allows individual arrays of over 64K. The BLAS library was developed in 1979. The PC version was made available in 1986 and updated in 1988.
New Modular Ultrasonic Signal Processing Building Blocks for Real-Time Data Acquisition and Post Processing

NASA Astrophysics Data System (ADS)

Weber, Walter H.; Mair, H. Douglas; Jansen, Dion

2003-03-01

A suite of basic signal processors has been developed. These basic building blocks can be cascaded together to form more complex processors without the need for programming. The data structures between each of the processors are handled automatically. This allows a processor built for one purpose to be applied to any type of data such as images, waveform arrays and single values. The processors are part of Winspect Data Acquisition software. The new processors are fast enough to work on A-scan signals live while scanning. Their primary use is to extract features, reduce noise or to calculate material properties. The cascaded processors work equally well on live A-scan displays, live gated data or as a post-processing engine on saved data. Researchers are able to call their own MATLAB or C-code from anywhere within the processor structure. A built-in formula node processor that uses a simple algebraic editor may make external user programs unnecessary. This paper also discusses the problems associated with ad hoc software development and how graphical programming languages can tie up researchers writing software rather than designing experiments.
Operational compatibility of 30-centimeter-diameter ion thruster with integrally regulated solar array power source

NASA Technical Reports Server (NTRS)

Gooder, S. T.

1977-01-01

System tests were performed in which Integrally Regulated Solar Arrays (IRSA's) were used to directly power the beam and accelerator loads of a 30-cm-diameter, electron bombardment, mercury ion thruster. The remaining thruster loads were supplied from conventional power-processing circuits. This combination of IRSA's and conventional circuits formed a hybrid power processor. Thruster performance was evaluated at 3/4- and 1-A beam currents with both the IRSA-hybrid and conventional power processors and was found to be identical for both systems. Power processing is significantly more efficient with the hybrid system. System dynamics and IRSA response to thruster arcs are also examined.
Runtime support and compilation methods for user-specified data distributions

NASA Technical Reports Server (NTRS)

Ponnusamy, Ravi; Saltz, Joel; Choudhury, Alok; Hwang, Yuan-Shin; Fox, Geoffrey

1993-01-01

This paper describes two new ideas by which an HPF compiler can deal with irregular computations effectively. The first mechanism invokes a user specified mapping procedure via a set of compiler directives. The directives allow use of program arrays to describe graph connectivity, spatial location of array elements, and computational load. The second mechanism is a simple conservative method that in many cases enables a compiler to recognize that it is possible to reuse previously computed information from inspectors (e.g. communication schedules, loop iteration partitions, information that associates off-processor data copies with on-processor buffer locations). We present performance results for these mechanisms from a Fortran 90D compiler implementation.

Arranging computer architectures to create higher-performance controllers

NASA Technical Reports Server (NTRS)

Jacklin, Stephen A.

1988-01-01

Techniques for integrating microprocessors, array processors, and other intelligent devices in control systems are reviewed, with an emphasis on the (re)arrangement of components to form distributed or parallel processing systems. Consideration is given to the selection of the host microprocessor, increasing the power and/or memory capacity of the host, multitasking software for the host, array processors to reduce computation time, the allocation of real-time and non-real-time events to different computer subsystems, intelligent devices to share the computational burden for real-time events, and intelligent interfaces to increase communication speeds. The case of a helicopter vibration-suppression and stabilization controller is analyzed as an example, and significant improvements in computation and throughput rates are demonstrated.
A programmable systolic array correlator as a trigger processor for electron pairs in rich (ring image Cherenkov) counters

NASA Astrophysics Data System (ADS)

Männer, R.

1989-12-01

This paper describes a systolic array processor for a ring image Cherenkov counter which is capable of identifying pairs of electron circles with a known radius and a certain minimum distance within 15 μs. The processor is a very flexible and fast device. It consists of 128 x 128 processing elements (PEs), where one PE is assigned to each pixel of the image. All PEs run synchronously at 40 MHz. The identification of electron circles is done by correlating the detector image with the proper circle circumference. Circle centers are found by peak detection in the correlation result. A second correlation with a circle disc allows circles of closed electron pairs to be rejected. The trigger decision is generated if a pseudo adder detects at least two remaining circles. The device is controlled by a freely programmable sequencer. A VLSI chip containing 8 x 8 PEs is being developed using a VENUS design system and will be produced in 2μ CMOS technology.
Unstructured Adaptive Grid Computations on an Array of SMPs

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Pramanick, Ira; Sohn, Andrew; Simon, Horst D.

1996-01-01

Dynamic load balancing is necessary for parallel adaptive methods to solve unsteady CFD problems on unstructured grids. We have presented such a dynamic load balancing framework called JOVE, in this paper. Results on a four-POWERnode POWER CHALLENGEarray demonstrated that load balancing gives significant performance improvements over no load balancing for such adaptive computations. The parallel speedup of JOVE, implemented using MPI on the POWER CHALLENCEarray, was significant, being as high as 31 for 32 processors. An implementation of JOVE that exploits 'an array of SMPS' architecture was also studied; this hybrid JOVE outperformed flat JOVE by up to 28% on the meshes and adaption models tested. With large, realistic meshes and actual flow-solver and adaption phases incorporated into JOVE, hybrid JOVE can be expected to yield significant advantage over flat JOVE, especially as the number of processors is increased, thus demonstrating the scalability of an array of SMPs architecture.
Smart-Pixel Array Processors Based on Optimal Cellular Neural Networks for Space Sensor Applications

NASA Technical Reports Server (NTRS)

Fang, Wai-Chi; Sheu, Bing J.; Venus, Holger; Sandau, Rainer

1997-01-01

A smart-pixel cellular neural network (CNN) with hardware annealing capability, digitally programmable synaptic weights, and multisensor parallel interface has been under development for advanced space sensor applications. The smart-pixel CNN architecture is a programmable multi-dimensional array of optoelectronic neurons which are locally connected with their local neurons and associated active-pixel sensors. Integration of the neuroprocessor in each processor node of a scalable multiprocessor system offers orders-of-magnitude computing performance enhancements for on-board real-time intelligent multisensor processing and control tasks of advanced small satellites. The smart-pixel CNN operation theory, architecture, design and implementation, and system applications are investigated in detail. The VLSI (Very Large Scale Integration) implementation feasibility was illustrated by a prototype smart-pixel 5x5 neuroprocessor array chip of active dimensions 1380 micron x 746 micron in a 2-micron CMOS technology.
Matrix preconditioning: a robust operation for optical linear algebra processors.

PubMed

Ghosh, A; Paparao, P

1987-07-15

Analog electrooptical processors are best suited for applications demanding high computational throughput with tolerance for inaccuracies. Matrix preconditioning is one such application. Matrix preconditioning is a preprocessing step for reducing the condition number of a matrix and is used extensively with gradient algorithms for increasing the rate of convergence and improving the accuracy of the solution. In this paper, we describe a simple parallel algorithm for matrix preconditioning, which can be implemented efficiently on a pipelined optical linear algebra processor. From the results of our numerical experiments we show that the efficacy of the preconditioning algorithm is affected very little by the errors of the optical system.
Algorithms for Automatic Alignment of Arrays

NASA Technical Reports Server (NTRS)

Chatterjee, Siddhartha; Gilbert, John R.; Oliker, Leonid; Schreiber, Robert; Sheffler, Thomas J.

1996-01-01

Aggregate data objects (such as arrays) are distributed across the processor memories when compiling a data-parallel language for a distributed-memory machine. The mapping determines the amount of communication needed to bring operands of parallel operations into alignment with each other. A common approach is to break the mapping into two stages: an alignment that maps all the objects to an abstract template, followed by a distribution that maps the template to the processors. This paper describes algorithms for solving the various facets of the alignment problem: axis and stride alignment, static and mobile offset alignment, and replication labeling. We show that optimal axis and stride alignment is NP-complete for general program graphs, and give a heuristic method that can explore the space of possible solutions in a number of ways. We show that some of these strategies can give better solutions than a simple greedy approach proposed earlier. We also show how local graph contractions can reduce the size of the problem significantly without changing the best solution. This allows more complex and effective heuristics to be used. We show how to model the static offset alignment problem using linear programming, and we show that loop-dependent mobile offset alignment is sometimes necessary for optimum performance. We describe an algorithm with for determining mobile alignments for objects within do loops. We also identify situations in which replicated alignment is either required by the program itself or can be used to improve performance. We describe an algorithm based on network flow that replicates objects so as to minimize the total amount of broadcast communication in replication.
FPGA-Based Multiprocessor System for Injection Molding Control

PubMed Central

Muñoz-Barron, Benigno; Morales-Velazquez, Luis; Romero-Troncoso, Rene J.; Rodriguez-Donate, Carlos; Trejo-Hernandez, Miguel; Benitez-Rangel, Juan P.; Osornio-Rios, Roque A.

2012-01-01

The plastic industry is a very important manufacturing sector and injection molding is a widely used forming method in that industry. The contribution of this work is the development of a strategy to retrofit control of an injection molding machine based on an embedded system microprocessors sensor network on a field programmable gate array (FPGA) device. Six types of embedded processors are included in the system: a smart-sensor processor, a micro fuzzy logic controller, a programmable logic controller, a system manager, an IO processor and a communication processor. Temperature, pressure and position are controlled by the proposed system and experimentation results show its feasibility and robustness. As validation of the present work, a particular sample was successfully injected. PMID:23202036
Study of a programmable high speed processor for use on-board satellites

NASA Astrophysics Data System (ADS)

Degavre, J. Cl.; Okkes, R.; Gaillat, G.

The availability of VLSI programmable devices will significantly enhance satellite on-board data processing capabilities. A case study is presented which indicates that computation-intensive processing applications requiring the execution of 100 megainstructions/sec are within the CD power constraints of satellites. It is noted that the current progress in semicustom design technique development and in achievable gate array densities, together with the recent announcement of improved monochip processors, are encouraging the development of an on-board programmable processor architecture able to associate the devices that will appear in communication and military markets.
Digital system for structural dynamics simulation

NASA Technical Reports Server (NTRS)

Krauter, A. I.; Lagace, L. J.; Wojnar, M. K.; Glor, C.

1982-01-01

State-of-the-art digital hardware and software for the simulation of complex structural dynamic interactions, such as those which occur in rotating structures (engine systems). System were incorporated in a designed to use an array of processors in which the computation for each physical subelement or functional subsystem would be assigned to a single specific processor in the simulator. These node processors are microprogrammed bit-slice microcomputers which function autonomously and can communicate with each other and a central control minicomputer over parallel digital lines. Inter-processor nearest neighbor communications busses pass the constants which represent physical constraints and boundary conditions. The node processors are connected to the six nearest neighbor node processors to simulate the actual physical interface of real substructures. Computer generated finite element mesh and force models can be developed with the aid of the central control minicomputer. The control computer also oversees the animation of a graphics display system, disk-based mass storage along with the individual processing elements.
Limit characteristics of digital optoelectronic processor

NASA Astrophysics Data System (ADS)

Kolobrodov, V. G.; Tymchik, G. S.; Kolobrodov, M. S.

2018-01-01

In this article, the limiting characteristics of a digital optoelectronic processor are explored. The limits are defined by diffraction effects and a matrix structure of the devices for input and output of optical signals. The purpose of a present research is to optimize the parameters of the processor's components. The developed physical and mathematical model of DOEP allowed to establish the limit characteristics of the processor, restricted by diffraction effects and an array structure of the equipment for input and output of optical signals, as well as to optimize the parameters of the processor's components. The diameter of the entrance pupil of the Fourier lens is determined by the size of SLM and the pixel size of the modulator. To determine the spectral resolution, it is offered to use a concept of an optimum phase when the resolved diffraction maxima coincide with the pixel centers of the radiation detector.
Implementation of a Configurable Fault Tolerant Processor (CFTP) Using Internal Triple Modular Redundancy (TMR)

DTIC Science & Technology

2005-12-01

Upsets in SRAM FPGAs,” Military and Aerospace Applications of Programmable Logic Devices, September 2002. 8. Wakerly , John F,. “Microcomputer...change. The goal of the Configurable Fault Tolerant Processor (CFTP) Project is to explore, develop and demonstrate the applicability of using off-the...develop and demonstrate the applicability of using commercial-of-the-shelf (COTS) Field Programmable Gate Arrays (FPGA) in the design of
Dynamic load balance scheme for the DSMC algorithm

DOE Office of Scientific and Technical Information (OSTI.GOV)

Li, Jin; Geng, Xiangren; Jiang, Dingwu

The direct simulation Monte Carlo (DSMC) algorithm, devised by Bird, has been used over a wide range of various rarified flow problems in the past 40 years. While the DSMC is suitable for the parallel implementation on powerful multi-processor architecture, it also introduces a large load imbalance across the processor array, even for small examples. The load imposed on a processor by a DSMC calculation is determined to a large extent by the total of simulator particles upon it. Since most flows are impulsively started with initial distribution of particles which is surely quite different from the steady state, themore » total of simulator particles will change dramatically. The load balance based upon an initial distribution of particles will break down as the steady state of flow is reached. The load imbalance and huge computational cost of DSMC has limited its application to rarefied or simple transitional flows. In this paper, by taking advantage of METIS, a software for partitioning unstructured graphs, and taking the total of simulator particles in each cell as a weight information, the repartitioning based upon the principle that each processor handles approximately the equal total of simulator particles has been achieved. The computation must pause several times to renew the total of simulator particles in each processor and repartition the whole domain again. Thus the load balance across the processors array holds in the duration of computation. The parallel efficiency can be improved effectively. The benchmark solution of a cylinder submerged in hypersonic flow has been simulated numerically. Besides, hypersonic flow past around a complex wing-body configuration has also been simulated. The results have displayed that, for both of cases, the computational time can be reduced by about 50%.« less
Analysis and simulation tools for solar array power systems

NASA Astrophysics Data System (ADS)

Pongratananukul, Nattorn

This dissertation presents simulation tools developed specifically for the design of solar array power systems. Contributions are made in several aspects of the system design phases, including solar source modeling, system simulation, and controller verification. A tool to automate the study of solar array configurations using general purpose circuit simulators has been developed based on the modeling of individual solar cells. Hierarchical structure of solar cell elements, including semiconductor properties, allows simulation of electrical properties as well as the evaluation of the impact of environmental conditions. A second developed tool provides a co-simulation platform with the capability to verify the performance of an actual digital controller implemented in programmable hardware such as a DSP processor, while the entire solar array including the DC-DC power converter is modeled in software algorithms running on a computer. This "virtual plant" allows developing and debugging code for the digital controller, and also to improve the control algorithm. One important task in solar arrays is to track the maximum power point on the array in order to maximize the power that can be delivered. Digital controllers implemented with programmable processors are particularly attractive for this task because sophisticated tracking algorithms can be implemented and revised when needed to optimize their performance. The proposed co-simulation tools are thus very valuable in developing and optimizing the control algorithm, before the system is built. Examples that demonstrate the effectiveness of the proposed methodologies are presented. The proposed simulation tools are also valuable in the design of multi-channel arrays. In the specific system that we have designed and tested, the control algorithm is implemented on a single digital signal processor. In each of the channels the maximum power point is tracked individually. In the prototype we built, off-the-shelf commercial DC-DC converters were utilized. At the end, the overall performance of the entire system was evaluated using solar array simulators capable of simulating various I-V characteristics, and also by using an electronic load. Experimental results are presented.
Onboard Experiment Data Support Facility

NASA Technical Reports Server (NTRS)

1976-01-01

An onboard array structure has been devised for end to end processing of data from multiple spaceborne sensors. The array constitutes sets of programmable pipeline processors whose elements perform each assigned function in 0.25 microseconds. This space shuttle computer system can handle data rates from a few bits to over 100 megabits per second.
Improved Magnetic STAR Methods for Real-Time, Point-by-Point Localization of Unexploded Ordnance and Buried Mines

DTIC Science & Technology

2008-09-01

of magnetic UXO. The prototype STAR Sensor comprises: a) A cubic array of eight fluxgate magnetometers . b) A 24-channel data acquisition/signal...array (shaded boxes) of eight low noise Triaxial Fluxgate Magnetometers (TFM) develops 24 channels of vector B- field data. Processor hardware
Optimal expression evaluation for data parallel architectures

NASA Technical Reports Server (NTRS)

Gilbert, John R.; Schreiber, Robert

1990-01-01

A data parallel machine represents an array or other composite data structure by allocating one processor (at least conceptually) per data item. A pointwise operation can be performed between two such arrays in unit time, provided their corresponding elements are allocated in the same processors. If the arrays are not aligned in this fashion, the cost of moving one or both of them is part of the cost of the operation. The choice of where to perform the operation then affects this cost. If an expression with several operands is to be evaluated, there may be many choices of where to perform the intermediate operations. An efficient algorithm is given to find the minimum-cost way to evaluate an expression, for several different data parallel architectures. This algorithm applies to any architecture in which the metric describing the cost of moving an array is robust. This encompasses most of the common data parallel communication architectures, including meshes of arbitrary dimension and hypercubes. Remarks are made on several variations of the problem, some of which are solved and some of which remain open.
Optical computing using optical flip-flops in Fourier processors: use in matrix multiplication and discrete linear transforms.

PubMed

Ando, S; Sekine, S; Mita, M; Katsuo, S

1989-12-15

An architecture and the algorithms for matrix multiplication using optical flip-flops (OFFs) in optical processors are proposed based on residue arithmetic. The proposed system is capable of processing all elements of matrices in parallel utilizing the information retrieving ability of optical Fourier processors. The employment of OFFs enables bidirectional data flow leading to a simpler architecture and the burden of residue-to-decimal (or residue-to-binary) conversion to operation time can be largely reduced by processing all elements in parallel. The calculated characteristics of operation time suggest a promising use of the system in a real time 2-D linear transform.
A fast adaptive convex hull algorithm on two-dimensional processor arrays with a reconfigurable BUS system

NASA Technical Reports Server (NTRS)

Olariu, S.; Schwing, J.; Zhang, J.

1991-01-01

A bus system that can change dynamically to suit computational needs is referred to as reconfigurable. We present a fast adaptive convex hull algorithm on a two-dimensional processor array with a reconfigurable bus system (2-D PARBS, for short). Specifically, we show that computing the convex hull of a planar set of n points taken O(log n/log m) time on a 2-D PARBS of size mn x n with 3 less than or equal to m less than or equal to n. Our result implies that the convex hull of n points in the plane can be computed in O(1) time in a 2-D PARBS of size n(exp 1.5) x n.
A site oriented supercomputer for theoretical physics: The Fermilab Advanced Computer Program Multi Array Processor System (ACMAPS)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nash, T.; Atac, R.; Cook, A.

1989-03-06

The ACPMAPS multipocessor is a highly cost effective, local memory parallel computer with a hypercube or compound hypercube architecture. Communication requires the attention of only the two communicating nodes. The design is aimed at floating point intensive, grid like problems, particularly those with extreme computing requirements. The processing nodes of the system are single board array processors, each with a peak power of 20 Mflops, supported by 8 Mbytes of data and 2 Mbytes of instruction memory. The system currently being assembled has a peak power of 5 Gflops. The nodes are based on the Weitek XL Chip set. Themore » system delivers performance at approximately $300/Mflop. 8 refs., 4 figs.« less
System balance analysis for vector computers

NASA Technical Reports Server (NTRS)

Knight, J. C.; Poole, W. G., Jr.; Voight, R. G.

1975-01-01

The availability of vector processors capable of sustaining computing rates of 10 to the 8th power arithmetic results pers second raised the question of whether peripheral storage devices representing current technology can keep such processors supplied with data. By examining the solution of a large banded linear system on these computers, it was found that even under ideal conditions, the processors will frequently be waiting for problem data.

A Coherent VLSI Environment

DTIC Science & Technology

1987-03-31

processors . The symmetry-breaking algorithms give efficient ways to convert probabilistic algorithms to deterministic algorithms. Some of the...techniques have been applied to construct several efficient linear- processor algorithms for graph problems, including an O(lg* n)-time algorithm for (A + 1...On n-node graphs, the algorithm works in O(log 2 n) time using only n processors , in contrast to the previous best algorithm which used about n3
Processing techniques for software based SAR processors

NASA Technical Reports Server (NTRS)

Leung, K.; Wu, C.

1983-01-01

Software SAR processing techniques defined to treat Shuttle Imaging Radar-B (SIR-B) data are reviewed. The algorithms are devised for the data processing procedure selection, SAR correlation function implementation, multiple array processors utilization, cornerturning, variable reference length azimuth processing, and range migration handling. The Interim Digital Processor (IDP) originally implemented for handling Seasat SAR data has been adapted for the SIR-B, and offers a resolution of 100 km using a processing procedure based on the Fast Fourier Transformation fast correlation approach. Peculiarities of the Seasat SAR data processing requirements are reviewed, along with modifications introduced for the SIR-B. An Advanced Digital SAR Processor (ADSP) is under development for use with the SIR-B in the 1986 time frame as an upgrade for the IDP, which will be in service in 1984-5.
An analysis of scatter decomposition

NASA Technical Reports Server (NTRS)

Nicol, David M.; Saltz, Joel H.

1990-01-01

A formal analysis of a powerful mapping technique known as scatter decomposition is presented. Scatter decomposition divides an irregular computational domain into a large number of equal sized pieces, and distributes them modularly among processors. A probabilistic model of workload in one dimension is used to formally explain why, and when scatter decomposition works. The first result is that if correlation in workload is a convex function of distance, then scattering a more finely decomposed domain yields a lower average processor workload variance. The second result shows that if the workload process is stationary Gaussian and the correlation function decreases linearly in distance until becoming zero and then remains zero, scattering a more finely decomposed domain yields a lower expected maximum processor workload. Finally it is shown that if the correlation function decreases linearly across the entire domain, then among all mappings that assign an equal number of domain pieces to each processor, scatter decomposition minimizes the average processor workload variance. The dependence of these results on the assumption of decreasing correlation is illustrated with situations where a coarser granularity actually achieves better load balance.
A Radiation Dosimeter Concept for the Lunar Surface Environment

NASA Technical Reports Server (NTRS)

Adams, James H.; Christl, Mark J.; Watts, John; Kuznetsov, Eugeny N.; Parnell, Thomas A.; Pendleton, Geoff N.

2007-01-01

A novel silicon detector configuration for radiation dose measurements in an environment where solar energetic particles are of most concern is described. The dosimeter would also measure the dose from galactic cosmic rays. In the lunar environment a large range in particle flux and ionization density must be measured and converted to dose equivalent. This could be accomplished with a thick (e.g. 2mm) silicon detector segmented into cubic volume elements "voxels" followed by a second, thin monolithic silicon detector. The electronics needed to implement this detector concept include analog signal processors (ASIC) and a field programmable gate array (FPGA) for data accumulation and conversion to linear energy transfer (LET) spectra and to dose-equivalent (Sievert). Currently available commercial ASIC's and FPGA's are suitable for implementing the analog and digital systems.
Block Copolymers as Templates for Arrays of Carbon Nanotubes

NASA Technical Reports Server (NTRS)

Bronikowski, Michael; Hunt, Brian

2003-01-01

A method of manufacturing regular arrays of precisely sized, shaped, positioned, and oriented carbon nanotubes has been proposed. Arrays of carbon nanotubes could prove useful in such diverse applications as communications (especially for filtering of signals), biotechnology (for sequencing of DNA and separation of chemicals), and micro- and nanoelectronics (as field emitters and as signal transducers and processors). The method is expected to be suitable for implementation in standard semiconductor-device fabrication facilities.
Analog hardware for learning neural networks

NASA Technical Reports Server (NTRS)

Eberhardt, Silvio P. (Inventor)

1991-01-01

This is a recurrent or feedforward analog neural network processor having a multi-level neuron array and a synaptic matrix for storing weighted analog values of synaptic connection strengths which is characterized by temporarily changing one connection strength at a time to determine its effect on system output relative to the desired target. That connection strength is then adjusted based on the effect, whereby the processor is taught the correct response to training examples connection by connection.
The Event Based Language and Its Multiple Processor Implementations.

DTIC Science & Technology

1980-01-01

10 6.1 "Recursive" Linear Fibonacci ................................................ 105 6.2 The Readers Writers Problem...kinds. Examples of such systems are: C.mmp [Wu-72], Pluribus [He-73], Data Flow [ De -75], the boolean n-cube parallel machine [Su-77], and the MuNet [Wa...concurrency within programs; therefore, we hate concentrated on two types of systems which seem suitable: a processor network, and a data flow processor [ De -77
Implementation of digital equality comparator circuit on memristive memory crossbar array using material implication logic

NASA Astrophysics Data System (ADS)

Haron, Adib; Mahdzair, Fazren; Luqman, Anas; Osman, Nazmie; Junid, Syed Abdul Mutalib Al

2018-03-01

One of the most significant constraints of Von Neumann architecture is the limited bandwidth between memory and processor. The cost to move data back and forth between memory and processor is considerably higher than the computation in the processor itself. This architecture significantly impacts the Big Data and data-intensive application such as DNA analysis comparison which spend most of the processing time to move data. Recently, the in-memory processing concept was proposed, which is based on the capability to perform the logic operation on the physical memory structure using a crossbar topology and non-volatile resistive-switching memristor technology. This paper proposes a scheme to map digital equality comparator circuit on memristive memory crossbar array. The 2-bit, 4-bit, 8-bit, 16-bit, 32-bit, and 64-bit of equality comparator circuit are mapped on memristive memory crossbar array by using material implication logic in a sequential and parallel method. The simulation results show that, for the 64-bit word size, the parallel mapping exhibits 2.8× better performance in total execution time than sequential mapping but has a trade-off in terms of energy consumption and area utilization. Meanwhile, the total crossbar area can be reduced by 1.2× for sequential mapping and 1.5× for parallel mapping both by using the overlapping technique.
FPGA Coprocessor for Accelerated Classification of Images

NASA Technical Reports Server (NTRS)

Pingree, Paula J.; Scharenbroich, Lucas J.; Werne, Thomas A.

2008-01-01

An effort related to that described in the preceding article focuses on developing a spaceborne processing platform for fast and accurate onboard classification of image data, a critical part of modern satellite image processing. The approach again has been to exploit the versatility of recently developed hybrid Virtex-4FX field-programmable gate array (FPGA) to run diverse science applications on embedded processors while taking advantage of the reconfigurable hardware resources of the FPGAs. In this case, the FPGA serves as a coprocessor that implements legacy C-language support-vector-machine (SVM) image-classification algorithms to detect and identify natural phenomena such as flooding, volcanic eruptions, and sea-ice break-up. The FPGA provides hardware acceleration for increased onboard processing capability than previously demonstrated in software. The original C-language program demonstrated on an imaging instrument aboard the Earth Observing-1 (EO-1) satellite implements a linear-kernel SVM algorithm for classifying parts of the images as snow, water, ice, land, or cloud or unclassified. Current onboard processors, such as on EO-1, have limited computing power, extremely limited active storage capability and are no longer considered state-of-the-art. Using commercially available software that translates C-language programs into hardware description language (HDL) files, the legacy C-language program, and two newly formulated programs for a more capable expanded-linear-kernel and a more accurate polynomial-kernel SVM algorithm, have been implemented in the Virtex-4FX FPGA. In tests, the FPGA implementations have exhibited significant speedups over conventional software implementations running on general-purpose hardware.
Massively parallel processor computer

NASA Technical Reports Server (NTRS)

Fung, L. W. (Inventor)

1983-01-01

An apparatus for processing multidimensional data with strong spatial characteristics, such as raw image data, characterized by a large number of parallel data streams in an ordered array is described. It comprises a large number (e.g., 16,384 in a 128 x 128 array) of parallel processing elements operating simultaneously and independently on single bit slices of a corresponding array of incoming data streams under control of a single set of instructions. Each of the processing elements comprises a bidirectional data bus in communication with a register for storing single bit slices together with a random access memory unit and associated circuitry, including a binary counter/shift register device, for performing logical and arithmetical computations on the bit slices, and an I/O unit for interfacing the bidirectional data bus with the data stream source. The massively parallel processor architecture enables very high speed processing of large amounts of ordered parallel data, including spatial translation by shifting or sliding of bits vertically or horizontally to neighboring processing elements.
Asynchronous parallel status comparator

DOEpatents

Arnold, Jeffrey W.; Hart, Mark M.

1992-01-01

Apparatus for matching asynchronously received signals and determining whether two or more out of a total number of possible signals match. The apparatus comprises, in one embodiment, an array of sensors positioned in discrete locations and in communication with one or more processors. The processors will receive signals if the sensors detect a change in the variable sensed from a nominal to a special condition and will transmit location information in the form of a digital data set to two or more receivers. The receivers collect, read, latch and acknowledge the data sets and forward them to decoders that produce an output signal for each data set received. The receivers also periodically reset the system following each scan of the sensor array. A comparator then determines if any two or more, as specified by the user, of the output signals corresponds to the same location. A sufficient number of matches produces a system output signal that activates a system to restore the array to its nominal condition.
Asynchronous parallel status comparator

DOEpatents

Arnold, J.W.; Hart, M.M.

1992-12-15

Disclosed is an apparatus for matching asynchronously received signals and determining whether two or more out of a total number of possible signals match. The apparatus comprises, in one embodiment, an array of sensors positioned in discrete locations and in communication with one or more processors. The processors will receive signals if the sensors detect a change in the variable sensed from a nominal to a special condition and will transmit location information in the form of a digital data set to two or more receivers. The receivers collect, read, latch and acknowledge the data sets and forward them to decoders that produce an output signal for each data set received. The receivers also periodically reset the system following each scan of the sensor array. A comparator then determines if any two or more, as specified by the user, of the output signals corresponds to the same location. A sufficient number of matches produces a system output signal that activates a system to restore the array to its nominal condition. 4 figs.
Soft-core processor study for node-based architectures.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Van Houten, Jonathan Roger; Jarosz, Jason P.; Welch, Benjamin James

2008-09-01

Node-based architecture (NBA) designs for future satellite projects hold the promise of decreasing system development time and costs, size, weight, and power and positioning the laboratory to address other emerging mission opportunities quickly. Reconfigurable Field Programmable Gate Array (FPGA) based modules will comprise the core of several of the NBA nodes. Microprocessing capabilities will be necessary with varying degrees of mission-specific performance requirements on these nodes. To enable the flexibility of these reconfigurable nodes, it is advantageous to incorporate the microprocessor into the FPGA itself, either as a hardcore processor built into the FPGA or as a soft-core processor builtmore » out of FPGA elements. This document describes the evaluation of three reconfigurable FPGA based processors for use in future NBA systems--two soft cores (MicroBlaze and non-fault-tolerant LEON) and one hard core (PowerPC 405). Two standard performance benchmark applications were developed for each processor. The first, Dhrystone, is a fixed-point operation metric. The second, Whetstone, is a floating-point operation metric. Several trials were run at varying code locations, loop counts, processor speeds, and cache configurations. FPGA resource utilization was recorded for each configuration. Cache configurations impacted the results greatly; for optimal processor efficiency it is necessary to enable caches on the processors. Processor caches carry a penalty; cache error mitigation is necessary when operating in a radiation environment.« less
An architecture for real-time vision processing

NASA Technical Reports Server (NTRS)

Chien, Chiun-Hong

1994-01-01

To study the feasibility of developing an architecture for real time vision processing, a task queue server and parallel algorithms for two vision operations were designed and implemented on an i860-based Mercury Computing System 860VS array processor. The proposed architecture treats each vision function as a task or set of tasks which may be recursively divided into subtasks and processed by multiple processors coordinated by a task queue server accessible by all processors. Each idle processor subsequently fetches a task and associated data from the task queue server for processing and posts the result to shared memory for later use. Load balancing can be carried out within the processing system without the requirement for a centralized controller. The author concludes that real time vision processing cannot be achieved without both sequential and parallel vision algorithms and a good parallel vision architecture.
A microcomputer based frequency-domain processor for laser Doppler anemometry

NASA Technical Reports Server (NTRS)

Horne, W. Clifton; Adair, Desmond

1988-01-01

A prototype multi-channel laser Doppler anemometry (LDA) processor was assembled using a wideband transient recorder and a microcomputer with an array processor for fast Fourier transform (FFT) computations. The prototype instrument was used to acquire, process, and record signals from a three-component wind tunnel LDA system subject to various conditions of noise and flow turbulence. The recorded data was used to evaluate the effectiveness of burst acceptance criteria, processing algorithms, and selection of processing parameters such as record length. The recorded signals were also used to obtain comparative estimates of signal-to-noise ratio between time-domain and frequency-domain signal detection schemes. These comparisons show that the FFT processing scheme allows accurate processing of signals for which the signal-to-noise ratio is 10 to 15 dB less than is practical using counter processors.
VLSI 'smart' I/O module development

NASA Astrophysics Data System (ADS)

Kirk, Dan

The developmental history, design, and operation of the MIL-STD-1553A/B discrete and serial module (DSM) for the U.S. Navy AN/AYK-14(V) avionics computer are described and illustrated with diagrams. The ongoing preplanned product improvement for the AN/AYK-14(V) includes five dual-redundant MIL-STD-1553 channels based on DSMs. The DSM is a front-end processor for transferring data to and from a common memory, sharing memory with a host processor to provide improved 'smart' input/output performance. Each DSM comprises three hardware sections: three VLSI-6000 semicustomized CMOS arrays, memory units to support the arrays, and buffers and resynchronization circuits. The DSM hardware module design, VLSI-6000 design tools, controlware and test software, and checkout procedures (using a hardware simulator) are characterized in detail.
Optical linear algebra processors - Architectures and algorithms

NASA Technical Reports Server (NTRS)

Casasent, David

1986-01-01

Attention is given to the component design and optical configuration features of a generic optical linear algebra processor (OLAP) architecture, as well as the large number of OLAP architectures, number representations, algorithms and applications encountered in current literature. Number-representation issues associated with bipolar and complex-valued data representations, high-accuracy (including floating point) performance, and the base or radix to be employed, are discussed, together with case studies on a space-integrating frequency-multiplexed architecture and a hybrid space-integrating and time-integrating multichannel architecture.
Eigensolution of finite element problems in a completely connected parallel architecture

NASA Technical Reports Server (NTRS)

Akl, F.; Morel, M.

1989-01-01

A parallel algorithm is presented for the solution of the generalized eigenproblem in linear elastic finite element analysis. The algorithm is based on a completely connected parallel architecture in which each processor is allowed to communicate with all other processors. The algorithm is successfully implemented on a tightly coupled MIMD parallel processor. A finite element model is divided into m domains each of which is assumed to process n elements. Each domain is then assigned to a processor or to a logical processor (task) if the number of domains exceeds the number of physical processors. The effect of the number of domains, the number of degrees-of-freedom located along the global fronts, and the dimension of the subspace on the performance of the algorithm is investigated. For a 64-element rectangular plate, speed-ups of 1.86, 3.13, 3.18, and 3.61 are achieved on two, four, six, and eight processors, respectively.
C-MOS array design techniques: SUMC multiprocessor system study

NASA Technical Reports Server (NTRS)

Clapp, W. A.; Helbig, W. A.; Merriam, A. S.

1972-01-01

The current capabilities of LSI techniques for speed and reliability, plus the possibilities of assembling large configurations of LSI logic and storage elements, have demanded the study of multiprocessors and multiprocessing techniques, problems, and potentialities. Evaluated are three previous systems studies for a space ultrareliable modular computer multiprocessing system, and a new multiprocessing system is proposed that is flexibly configured with up to four central processors, four 1/0 processors, and 16 main memory units, plus auxiliary memory and peripheral devices. This multiprocessor system features a multilevel interrupt, qualified S/360 compatibility for ground-based generation of programs, virtual memory management of a storage hierarchy through 1/0 processors, and multiport access to multiple and shared memory units.
Implementation of the DPM Monte Carlo code on a parallel architecture for treatment planning applications.

PubMed

Tyagi, Neelam; Bose, Abhijit; Chetty, Indrin J

2004-09-01

We have parallelized the Dose Planning Method (DPM), a Monte Carlo code optimized for radiotherapy class problems, on distributed-memory processor architectures using the Message Passing Interface (MPI). Parallelization has been investigated on a variety of parallel computing architectures at the University of Michigan-Center for Advanced Computing, with respect to efficiency and speedup as a function of the number of processors. We have integrated the parallel pseudo random number generator from the Scalable Parallel Pseudo-Random Number Generator (SPRNG) library to run with the parallel DPM. The Intel cluster consisting of 800 MHz Intel Pentium III processor shows an almost linear speedup up to 32 processors for simulating 1 x 10(8) or more particles. The speedup results are nearly linear on an Athlon cluster (up to 24 processors based on availability) which consists of 1.8 GHz+ Advanced Micro Devices (AMD) Athlon processors on increasing the problem size up to 8 x 10(8) histories. For a smaller number of histories (1 x 10(8)) the reduction of efficiency with the Athlon cluster (down to 83.9% with 24 processors) occurs because the processing time required to simulate 1 x 10(8) histories is less than the time associated with interprocessor communication. A similar trend was seen with the Opteron Cluster (consisting of 1400 MHz, 64-bit AMD Opteron processors) on increasing the problem size. Because of the 64-bit architecture Opteron processors are capable of storing and processing instructions at a faster rate and hence are faster as compared to the 32-bit Athlon processors. We have validated our implementation with an in-phantom dose calculation study using a parallel pencil monoenergetic electron beam of 20 MeV energy. The phantom consists of layers of water, lung, bone, aluminum, and titanium. The agreement in the central axis depth dose curves and profiles at different depths shows that the serial and parallel codes are equivalent in accuracy.

Powerful conveyer belt real-time online detection system based on x-ray

NASA Astrophysics Data System (ADS)

Rong, Feng; Miao, Chang-yun; Meng, Wei

2009-07-01

The powerful conveyer belt is widely used in the mine, dock, and so on. After used for a long time, internal steel rope of the conveyor belt may fracture, rust, joints moving, and so on .This would bring potential safety problems. A kind of detection system based on x-ray is designed in this paper. Linear array detector (LDA) is used. LDA cost is low, response fast; technology mature .Output charge of LDA is transformed into differential voltage signal by amplifier. This kind of signal have great ability of anti-noise, is suitable for long-distance transmission. The processor is FPGA. A IP core control 4-channel A/D convertor, achieve parallel output data collection. Soft-core processor MicroBlaze which process tcp/ip protocol is embedded in FPGA. Sampling data are transferred to a computer via Ethernet. In order to improve the image quality, algorithm of getting rid of noise from the measurement result and taking gain normalization for pixel value is studied and designed. Experiments show that this system work well, can real-time online detect conveyor belt of width of 2.0m and speed of 5 m/s, does not affect the production. Image is clear, visual and can easily judge the situation of conveyor belt.
Implementing Access to Data Distributed on Many Processors

NASA Technical Reports Server (NTRS)

James, Mark

2006-01-01

A reference architecture is defined for an object-oriented implementation of domains, arrays, and distributions written in the programming language Chapel. This technology primarily addresses domains that contain arrays that have regular index sets with the low-level implementation details being beyond the scope of this discussion. What is defined is a complete set of object-oriented operators that allows one to perform data distributions for domain arrays involving regular arithmetic index sets. What is unique is that these operators allow for the arbitrary regions of the arrays to be fragmented and distributed across multiple processors with a single point of access giving the programmer the illusion that all the elements are collocated on a single processor. Today's massively parallel High Productivity Computing Systems (HPCS) are characterized by a modular structure, with a large number of processing and memory units connected by a high-speed network. Locality of access as well as load balancing are primary concerns in these systems that are typically used for high-performance scientific computation. Data distributions address these issues by providing a range of methods for spreading large data sets across the components of a system. Over the past two decades, many languages, systems, tools, and libraries have been developed for the support of distributions. Since the performance of data parallel applications is directly influenced by the distribution strategy, users often resort to low-level programming models that allow fine-tuning of the distribution aspects affecting performance, but, at the same time, are tedious and error-prone. This technology presents a reusable design of a data-distribution framework for data parallel high-performance applications. Distributions are a means to express locality in systems composed of large numbers of processor and memory components connected by a network. Since distributions have a great effect on the performance of applications, it is important that the distribution strategy is flexible, so its behavior can change depending on the needs of the application. At the same time, high productivity concerns require that the user be shielded from error-prone, tedious details such as communication and synchronization.
Development of software for the MSFC solar vector magnetograph

NASA Technical Reports Server (NTRS)

Kineke, Jack

1996-01-01

The Marshall Space Flight Center Solar Vector Magnetograph is a special purpose telescope used to measure the vector magnetic field in active areas on the surface of the sun. This instrument measures the linear and circular polarization intensities (the Stokes vectors Q, U and V) produced by the Zeeman effect on a specific spectral line due to the solar magnetic field from which the longitudinal and transverse components of the magnetic field may be determined. Beginning in 1990 as a Summer Faculty Fellow in project JOVE and continuing under NASA Grant NAG8-1042, the author has been developing computer software to perform these computations, first using a DEC MicroVAX system equipped with a high speed array processor, and more recently using a DEC AXP/OSF system. This summer's work is a continuation of this development.
The application of charge-coupled device processors in automatic-control systems

NASA Technical Reports Server (NTRS)

Mcvey, E. S.; Parrish, E. A., Jr.

1977-01-01

The application of charge-coupled device (CCD) processors to automatic-control systems is suggested. CCD processors are a new form of semiconductor component with the unique ability to process sampled signals on an analog basis. Specific implementations of controllers are suggested for linear time-invariant, time-varying, and nonlinear systems. Typical processing time should be only a few microseconds. This form of technology may become competitive with microprocessors and minicomputers in addition to supplementing them.
Upset Characterization of the PowerPC405 Hard-core Processor Embedded in Virtex-II Pro Field Programmable Gate Arrays

NASA Technical Reports Server (NTRS)

Swift, Gary M.; Allen, Gregory S.; Farmanesh, Farhad; George, Jeffrey; Petrick, David J.; Chayab, Fayez

2006-01-01

Shown in this presentation are recent results for the upset susceptibility of the various types of memory elements in the embedded PowerPC405 in the Xilinx V2P40 FPGA. For critical flight designs where configuration upsets are mitigated effectively through appropriate design triplication and configuration scrubbing, these upsets of processor elements can dominate the system error rate. Data from irradiations with both protons and heavy ions are given and compared using available models.
Integrated High-Speed Torque Control System for a Robotic Joint

NASA Technical Reports Server (NTRS)

Davis, Donald R. (Inventor); Radford, Nicolaus A. (Inventor); Permenter, Frank Noble (Inventor); Valvo, Michael C. (Inventor); Askew, R. Scott (Inventor)

2013-01-01

A control system for achieving high-speed torque for a joint of a robot includes a printed circuit board assembly (PCBA) having a collocated joint processor and high-speed communication bus. The PCBA may also include a power inverter module (PIM) and local sensor conditioning electronics (SCE) for processing sensor data from one or more motor position sensors. Torque control of a motor of the joint is provided via the PCBA as a high-speed torque loop. Each joint processor may be embedded within or collocated with the robotic joint being controlled. Collocation of the joint processor, PIM, and high-speed bus may increase noise immunity of the control system, and the localized processing of sensor data from the joint motor at the joint level may minimize bus cabling to and from each control node. The joint processor may include a field programmable gate array (FPGA).
CPU architecture for a fast and energy-saving calculation of convolution neural networks

NASA Astrophysics Data System (ADS)

Knoll, Florian J.; Grelcke, Michael; Czymmek, Vitali; Holtorf, Tim; Hussmann, Stephan

2017-06-01

One of the most difficult problem in the use of artificial neural networks is the computational capacity. Although large search engine companies own specially developed hardware to provide the necessary computing power, for the conventional user only remains the state of the art method, which is the use of a graphic processing unit (GPU) as a computational basis. Although these processors are well suited for large matrix computations, they need massive energy. Therefore a new processor on the basis of a field programmable gate array (FPGA) has been developed and is optimized for the application of deep learning. This processor is presented in this paper. The processor can be adapted for a particular application (in this paper to an organic farming application). The power consumption is only a fraction of a GPU application and should therefore be well suited for energy-saving applications.
Computations on the massively parallel processor at the Goddard Space Flight Center

NASA Technical Reports Server (NTRS)

Strong, James P.

1991-01-01

Described are four significant algorithms implemented on the massively parallel processor (MPP) at the Goddard Space Flight Center. Two are in the area of image analysis. Of the other two, one is a mathematical simulation experiment and the other deals with the efficient transfer of data between distantly separated processors in the MPP array. The first algorithm presented is the automatic determination of elevations from stereo pairs. The second algorithm solves mathematical logistic equations capable of producing both ordered and chaotic (or random) solutions. This work can potentially lead to the simulation of artificial life processes. The third algorithm is the automatic segmentation of images into reasonable regions based on some similarity criterion, while the fourth is an implementation of a bitonic sort of data which significantly overcomes the nearest neighbor interconnection constraints on the MPP for transferring data between distant processors.
NeuroFlow: A General Purpose Spiking Neural Network Simulation Platform using Customizable Processors.

PubMed

Cheung, Kit; Schultz, Simon R; Luk, Wayne

2015-01-01

NeuroFlow is a scalable spiking neural network simulation platform for off-the-shelf high performance computing systems using customizable hardware processors such as Field-Programmable Gate Arrays (FPGAs). Unlike multi-core processors and application-specific integrated circuits, the processor architecture of NeuroFlow can be redesigned and reconfigured to suit a particular simulation to deliver optimized performance, such as the degree of parallelism to employ. The compilation process supports using PyNN, a simulator-independent neural network description language, to configure the processor. NeuroFlow supports a number of commonly used current or conductance based neuronal models such as integrate-and-fire and Izhikevich models, and the spike-timing-dependent plasticity (STDP) rule for learning. A 6-FPGA system can simulate a network of up to ~600,000 neurons and can achieve a real-time performance of 400,000 neurons. Using one FPGA, NeuroFlow delivers a speedup of up to 33.6 times the speed of an 8-core processor, or 2.83 times the speed of GPU-based platforms. With high flexibility and throughput, NeuroFlow provides a viable environment for large-scale neural network simulation.
NeuroFlow: A General Purpose Spiking Neural Network Simulation Platform using Customizable Processors

PubMed Central

Cheung, Kit; Schultz, Simon R.; Luk, Wayne

2016-01-01

NeuroFlow is a scalable spiking neural network simulation platform for off-the-shelf high performance computing systems using customizable hardware processors such as Field-Programmable Gate Arrays (FPGAs). Unlike multi-core processors and application-specific integrated circuits, the processor architecture of NeuroFlow can be redesigned and reconfigured to suit a particular simulation to deliver optimized performance, such as the degree of parallelism to employ. The compilation process supports using PyNN, a simulator-independent neural network description language, to configure the processor. NeuroFlow supports a number of commonly used current or conductance based neuronal models such as integrate-and-fire and Izhikevich models, and the spike-timing-dependent plasticity (STDP) rule for learning. A 6-FPGA system can simulate a network of up to ~600,000 neurons and can achieve a real-time performance of 400,000 neurons. Using one FPGA, NeuroFlow delivers a speedup of up to 33.6 times the speed of an 8-core processor, or 2.83 times the speed of GPU-based platforms. With high flexibility and throughput, NeuroFlow provides a viable environment for large-scale neural network simulation. PMID:26834542
Optical microwave filter based on spectral slicing by use of arrayed waveguide gratings.

PubMed

Pastor, Daniel; Ortega, Beatriz; Capmany, José; Sales, Salvador; Martinez, Alfonso; Muñoz, Pascual

2003-10-01

We have experimentally demonstrated a new optical signal processor based on the use of arrayed waveguide gratings. The structure exploits the concept of spectral slicing combined with the use of an optical dispersive medium. The approach presents increased flexibility from previous slicing-based structures in terms of tunability, reconfiguration, and apodization of the samples or coefficients of the transversal optical filter.
Radiation-Hardened Wafer Scale Integration

DTIC Science & Technology

1989-10-25

unlimited. LEXINGTON MASSACHUSETTS EXECUTIVE SUMMARY A focal plane processor (FPP) for a large array of LWIR photodetectors on a space platform must...It seems certain that large. scanning LWIR arrays will once again be of interest in the future, though their specifications will differ from those... nonuniformity and defects in the ZMR material, but films of good quality produced by this technique are now available commercially from Kopin Corporation. Such
Block iterative restoration of astronomical images with the massively parallel processor

NASA Technical Reports Server (NTRS)

Heap, Sara R.; Lindler, Don J.

1987-01-01

A method is described for algebraic image restoration capable of treating astronomical images. For a typical 500 x 500 image, direct algebraic restoration would require the solution of a 250,000 x 250,000 linear system. The block iterative approach is used to reduce the problem to solving 4900 121 x 121 linear systems. The algorithm was implemented on the Goddard Massively Parallel Processor, which can solve a 121 x 121 system in approximately 0.06 seconds. Examples are shown of the results for various astronomical images.
Two-dimensional optoelectronic interconnect-processor and its operational bit error rate

NASA Astrophysics Data System (ADS)

Liu, J. Jiang; Gollsneider, Brian; Chang, Wayne H.; Carhart, Gary W.; Vorontsov, Mikhail A.; Simonis, George J.; Shoop, Barry L.

2004-10-01

Two-dimensional (2-D) multi-channel 8x8 optical interconnect and processor system were designed and developed using complementary metal-oxide-semiconductor (CMOS) driven 850-nm vertical-cavity surface-emitting laser (VCSEL) arrays and the photodetector (PD) arrays with corresponding wavelengths. We performed operation and bit-error-rate (BER) analysis on this free-space integrated 8x8 VCSEL optical interconnects driven by silicon-on-sapphire (SOS) circuits. Pseudo-random bit stream (PRBS) data sequence was used in operation of the interconnects. Eye diagrams were measured from individual channels and analyzed using a digital oscilloscope at data rates from 155 Mb/s to 1.5 Gb/s. Using a statistical model of Gaussian distribution for the random noise in the transmission, we developed a method to compute the BER instantaneously with the digital eye-diagrams. Direct measurements on this interconnects were also taken on a standard BER tester for verification. We found that the results of two methods were in the same order and within 50% accuracy. The integrated interconnects were investigated in an optoelectronic processing architecture of digital halftoning image processor. Error diffusion networks implemented by the inherently parallel nature of photonics promise to provide high quality digital halftoned images.
Landsat real-time processing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Davis, E.L.

A novel method for performing real-time acquisition and processing Landsat/EROS data covers all aspects including radiometric and geometric corrections of multispectral scanner or return-beam vidicon inputs, image enhancement, statistical analysis, feature extraction, and classification. Radiometric transformations include bias/gain adjustment, noise suppression, calibration, scan angle compensation, and illumination compensation, including topography and atmospheric effects. Correction or compensation for geometric distortion includes sensor-related distortions, such as centering, skew, size, scan nonlinearity, radial symmetry, and tangential symmetry. Also included are object image-related distortions such as aspect angle (altitude), scale distortion (altitude), terrain relief, and earth curvature. Ephemeral corrections are also applied to compensatemore » for satellite forward movement, earth rotation, altitude variations, satellite vibration, and mirror scan velocity. Image enhancement includes high-pass, low-pass, and Laplacian mask filtering and data restoration for intermittent losses. Resource classification is provided by statistical analysis including histograms, correlational analysis, matrix manipulations, and determination of spectral responses. Feature extraction includes spatial frequency analysis, which is used in parallel discriminant functions in each array processor for rapid determination. The technique uses integrated parallel array processors that decimate the tasks concurrently under supervision of a control processor. The operator-machine interface is optimized for programming ease and graphics image windowing.« less
Parallel/distributed direct method for solving linear systems

NASA Technical Reports Server (NTRS)

Lin, Avi

1990-01-01

A new family of parallel schemes for directly solving linear systems is presented and analyzed. It is shown that these schemes exhibit a near optimal performance and enjoy several important features: (1) For large enough linear systems, the design of the appropriate paralleled algorithm is insensitive to the number of processors as its performance grows monotonically with them; (2) It is especially good for large matrices, with dimensions large relative to the number of processors in the system; (3) It can be used in both distributed parallel computing environments and tightly coupled parallel computing systems; and (4) This set of algorithms can be mapped onto any parallel architecture without any major programming difficulties or algorithmical changes.
On nonlinear finite element analysis in single-, multi- and parallel-processors

NASA Technical Reports Server (NTRS)

Utku, S.; Melosh, R.; Islam, M.; Salama, M.

1982-01-01

Numerical solution of nonlinear equilibrium problems of structures by means of Newton-Raphson type iterations is reviewed. Each step of the iteration is shown to correspond to the solution of a linear problem, therefore the feasibility of the finite element method for nonlinear analysis is established. Organization and flow of data for various types of digital computers, such as single-processor/single-level memory, single-processor/two-level-memory, vector-processor/two-level-memory, and parallel-processors, with and without sub-structuring (i.e. partitioning) are given. The effect of the relative costs of computation, memory and data transfer on substructuring is shown. The idea of assigning comparable size substructures to parallel processors is exploited. Under Cholesky type factorization schemes, the efficiency of parallel processing is shown to decrease due to the occasional shared data, just as that due to the shared facilities.
Optimal processor assignment for pipeline computations

NASA Technical Reports Server (NTRS)

Nicol, David M.; Simha, Rahul; Choudhury, Alok N.; Narahari, Bhagirath

1991-01-01

The availability of large scale multitasked parallel architectures introduces the following processor assignment problem for pipelined computations. Given a set of tasks and their precedence constraints, along with their experimentally determined individual responses times for different processor sizes, find an assignment of processor to tasks. Two objectives are of interest: minimal response given a throughput requirement, and maximal throughput given a response time requirement. These assignment problems differ considerably from the classical mapping problem in which several tasks share a processor; instead, it is assumed that a large number of processors are to be assigned to a relatively small number of tasks. Efficient assignment algorithms were developed for different classes of task structures. For a p processor system and a series parallel precedence graph with n constituent tasks, an O(np2) algorithm is provided that finds the optimal assignment for the response time optimization problem; it was found that the assignment optimizing the constrained throughput in O(np2log p) time. Special cases of linear, independent, and tree graphs are also considered.
Optical Associative Processors For Visual Perception"

NASA Astrophysics Data System (ADS)

Casasent, David; Telfer, Brian

1988-05-01

We consider various associative processor modifications required to allow these systems to be used for visual perception, scene analysis, and object recognition. For these applications, decisions on the class of the objects present in the input image are required and thus heteroassociative memories are necessary (rather than the autoassociative memories that have been given most attention). We analyze the performance of both associative processors and note that there is considerable difference between heteroassociative and autoassociative memories. We describe associative processors suitable for realizing functions such as: distortion invariance (using linear discriminant function memory synthesis techniques), noise and image processing performance (using autoassociative memories in cascade with with a heteroassociative processor and with a finite number of autoassociative memory iterations employed), shift invariance (achieved through the use of associative processors operating on feature space data), and the analysis of multiple objects in high noise (which is achieved using associative processing of the output from symbolic correlators). We detail and provide initial demonstrations of the use of associative processors operating on iconic, feature space and symbolic data, as well as adaptive associative processors.
SPAR reference manual

NASA Technical Reports Server (NTRS)

Whetstone, W. D.

1976-01-01

The functions and operating rules of the SPAR system, which is a group of computer programs used primarily to perform stress, buckling, and vibrational analyses of linear finite element systems, were given. The following subject areas were discussed: basic information, structure definition, format system matrix processors, utility programs, static solutions, stresses, sparse matrix eigensolver, dynamic response, graphics, and substructure processors.

Noise Analysis of Spatial Phase coding in analog Acoustooptic Processors

NASA Technical Reports Server (NTRS)

Gary, Charles K.; Lum, Henry, Jr. (Technical Monitor)

1994-01-01

Optical beams can carry information in their amplitude and phase; however, optical analog numerical calculators such as an optical matrix processor use incoherent light to achieve linear operation. Thus, the phase information is lost and only the magnitude can be used. This limits such processors to the representation of positive real numbers. Many systems have been devised to overcome this deficit through the use of digital number representations, but they all operate at a greatly reduced efficiency in contrast to analog systems. The most widely accepted method to achieve sign coding in analog optical systems has been the use of an offset for the zero level. Unfortunately, this results in increased noise sensitivity for small numbers. In this paper, we examine the use of spatially coherent sign coding in acoustooptical processors, a method first developed for digital calculations by D. V. Tigin. This coding technique uses spatial coherence for the representation of signed numbers, while temporal incoherence allows for linear analog processing of the optical information. We show how spatial phase coding reduces noise sensitivity for signed analog calculations.
Fault-Tolerant, Radiation-Hard DSP

NASA Technical Reports Server (NTRS)

Czajkowski, David

2011-01-01

Commercial digital signal processors (DSPs) for use in high-speed satellite computers are challenged by the damaging effects of space radiation, mainly single event upsets (SEUs) and single event functional interrupts (SEFIs). Innovations have been developed for mitigating the effects of SEUs and SEFIs, enabling the use of very-highspeed commercial DSPs with improved SEU tolerances. Time-triple modular redundancy (TTMR) is a method of applying traditional triple modular redundancy on a single processor, exploiting the VLIW (very long instruction word) class of parallel processors. TTMR improves SEU rates substantially. SEFIs are solved by a SEFI-hardened core circuit, external to the microprocessor. It monitors the health of the processor, and if a SEFI occurs, forces the processor to return to performance through a series of escalating events. TTMR and hardened-core solutions were developed for both DSPs and reconfigurable field-programmable gate arrays (FPGAs). This includes advancement of TTMR algorithms for DSPs and reconfigurable FPGAs, plus a rad-hard, hardened-core integrated circuit that services both the DSP and FPGA. Additionally, a combined DSP and FPGA board architecture was fully developed into a rad-hard engineering product. This technology enables use of commercial off-the-shelf (COTS) DSPs in computers for satellite and other space applications, allowing rapid deployment at a much lower cost. Traditional rad-hard space computers are very expensive and typically have long lead times. These computers are either based on traditional rad-hard processors, which have extremely low computational performance, or triple modular redundant (TMR) FPGA arrays, which suffer from power and complexity issues. Even more frustrating is that the TMR arrays of FPGAs require a fixed, external rad-hard voting element, thereby causing them to lose much of their reconfiguration capability and in some cases significant speed reduction. The benefits of COTS high-performance signal processing include significant increase in onboard science data processing, enabling orders of magnitude reduction in required communication bandwidth for science data return, orders of magnitude improvement in onboard mission planning and critical decision making, and the ability to rapidly respond to changing mission environments, thus enabling opportunistic science and orders of magnitude reduction in the cost of mission operations through reduction of required staff. Additional benefits of COTS-based, high-performance signal processing include the ability to leverage considerable commercial and academic investments in advanced computing tools, techniques, and infra structure, and the familiarity of the science and IT community with these computing environments.
Numerical aerodynamic simulation facility preliminary study, volume 2 and appendices

NASA Technical Reports Server (NTRS)

1977-01-01

Data to support results obtained in technology assessment studies are presented. Objectives, starting points, and future study tasks are outlined. Key design issues discussed in appendices include: data allocation, transposition network design, fault tolerance and trustworthiness, logic design, processing element of existing components, number of processors, the host system, alternate data base memory designs, number representation, fast div 521 instruction, architectures, and lockstep array versus synchronizable array machine comparison.
Efficient Feature Extraction and Likelihood Fusion for Vehicle Tracking in Low Frame Rate Airborne Video

DTIC Science & Technology

2010-07-01

imagery, persistent sensor array I. Introduction New device fabrication technologies and heterogeneous embedded processors have led to the emergence of a...geometric occlusions between target and sensor , motion blur, urban scene complexity, and high data volumes. In practical terms the targets are small...distributed airborne narrow-field-of-view video sensor networks. Airborne camera arrays combined with com- putational photography techniques enable the
TRIGA: Telecommunications Protocol Processing Subsystem Using Reconfigurable Interoperable Gate Arrays

NASA Technical Reports Server (NTRS)

Pang, Jackson; Pingree, Paula J.; Torgerson, J. Leigh

2006-01-01

We present the Telecommunications protocol processing subsystem using Reconfigurable Interoperable Gate Arrays (TRIGA), a novel approach that unifies fault tolerance, error correction coding and interplanetary communication protocol off-loading to implement CCSDS File Delivery Protocol and Datalink layers. The new reconfigurable architecture offers more than one order of magnitude throughput increase while reducing footprint requirements in memory, command and data handling processor utilization, communication system interconnects and power consumption.
Realization of preconditioned Lanczos and conjugate gradient algorithms on optical linear algebra processors.

PubMed

Ghosh, A

1988-08-01

Lanczos and conjugate gradient algorithms are important in computational linear algebra. In this paper, a parallel pipelined realization of these algorithms on a ring of optical linear algebra processors is described. The flow of data is designed to minimize the idle times of the optical multiprocessor and the redundancy of computations. The effects of optical round-off errors on the solutions obtained by the optical Lanczos and conjugate gradient algorithms are analyzed, and it is shown that optical preconditioning can improve the accuracy of these algorithms substantially. Algorithms for optical preconditioning and results of numerical experiments on solving linear systems of equations arising from partial differential equations are discussed. Since the Lanczos algorithm is used mostly with sparse matrices, a folded storage scheme to represent sparse matrices on spatial light modulators is also described.
FPGA wavelet processor design using language for instruction-set architectures (LISA)

NASA Astrophysics Data System (ADS)

Meyer-Bäse, Uwe; Vera, Alonzo; Rao, Suhasini; Lenk, Karl; Pattichis, Marios

2007-04-01

The design of an microprocessor is a long, tedious, and error-prone task consisting of typically three design phases: architecture exploration, software design (assembler, linker, loader, profiler), architecture implementation (RTL generation for FPGA or cell-based ASIC) and verification. The Language for instruction-set architectures (LISA) allows to model a microprocessor not only from instruction-set but also from architecture description including pipelining behavior that allows a design and development tool consistency over all levels of the design. To explore the capability of the LISA processor design platform a.k.a. CoWare Processor Designer we present in this paper three microprocessor designs that implement a 8/8 wavelet transform processor that is typically used in today's FBI fingerprint compression scheme. We have designed a 3 stage pipelined 16 bit RISC processor (NanoBlaze). Although RISC μPs are usually considered "fast" processors due to design concept like constant instruction word size, deep pipelines and many general purpose registers, it turns out that DSP operations consume essential processing time in a RISC processor. In a second step we have used design principles from programmable digital signal processor (PDSP) to improve the throughput of the DWT processor. A multiply-accumulate operation along with indirect addressing operation were the key to achieve higher throughput. A further improvement is possible with today's FPGA technology. Today's FPGAs offer a large number of embedded array multipliers and it is now feasible to design a "true" vector processor (TVP). A multiplication of two vectors can be done in just one clock cycle with our TVP, a complete scalar product in two clock cycles. Code profiling and Xilinx FPGA ISE synthesis results are provided that demonstrate the essential improvement that a TVP has compared with traditional RISC or PDSP designs.
Face classification using electronic synapses

NASA Astrophysics Data System (ADS)

Yao, Peng; Wu, Huaqiang; Gao, Bin; Eryilmaz, Sukru Burc; Huang, Xueyao; Zhang, Wenqiang; Zhang, Qingtian; Deng, Ning; Shi, Luping; Wong, H.-S. Philip; Qian, He

2017-05-01

Conventional hardware platforms consume huge amount of energy for cognitive learning due to the data movement between the processor and the off-chip memory. Brain-inspired device technologies using analogue weight storage allow to complete cognitive tasks more efficiently. Here we present an analogue non-volatile resistive memory (an electronic synapse) with foundry friendly materials. The device shows bidirectional continuous weight modulation behaviour. Grey-scale face classification is experimentally demonstrated using an integrated 1024-cell array with parallel online training. The energy consumption within the analogue synapses for each iteration is 1,000 × (20 ×) lower compared to an implementation using Intel Xeon Phi processor with off-chip memory (with hypothetical on-chip digital resistive random access memory). The accuracy on test sets is close to the result using a central processing unit. These experimental results consolidate the feasibility of analogue synaptic array and pave the way toward building an energy efficient and large-scale neuromorphic system.
TOGA - A GNSS Reflections Instrument for Remote Sensing Using Beamforming

NASA Technical Reports Server (NTRS)

Esterhuizen, S.; Meehan, T. K.; Robison, D.

2009-01-01

Remotely sensing the Earth's surface using GNSS signals as bi-static radar sources is one of the most challenging applications for radiometric instrument design. As part of NASA's Instrument Incubator Program, our group at JPL has built a prototype instrument, TOGA (Time-shifted, Orthometric, GNSS Array), to address a variety of GNSS science needs. Observing GNSS reflections is major focus of the design/development effort. The TOGA design features a steerable beam antenna array which can form a high-gain antenna pattern in multiple directions simultaneously. Multiple FPGAs provide flexible digital signal processing logic to process both GPS and Galileo reflections. A Linux OS based science processor serves as experiment scheduler and data post-processor. This paper outlines the TOGA design approach as well as preliminary results of reflection data collected from test flights over the Pacific ocean. This reflections data demonstrates observation of the GPS L1/L2C/L5 signals.
Development of a General-Purpose Analysis System Based on a Programmable Fluid Processor Final Report CRADA No. TC-2027-01

DOE Office of Scientific and Technical Information (OSTI.GOV)

McConaghy, C. F.; Gascoyne, P. R.

The purpose ofthis project was to develop a general-purpose analysis system based on a programmable fluid processor (PFP). The PFP is an array of electrodes surrounded by fluid reservoirs and injectors. Injected droplets of various reagents are manjpulated and combined on the array by Dielectrophoretic (DEP) forces. The goal was to create a small handheld device that could accomplish the tasks currently undertaken by much larger, time consuming, manual manipulation in the lab. The entire effo1t was funded by DARPA under the Bio-Flips program. MD Anderson Cancer Center was the PI for the DARPA effort. The Bio-Flips program was amore » 3- year program that ran from September 2000 to September 2003. The CRADA was somewhat behind the Bi-Flips program running from June 2001 to June 2004 with a no cost extension to September 2004.« less
Face classification using electronic synapses.

PubMed

Yao, Peng; Wu, Huaqiang; Gao, Bin; Eryilmaz, Sukru Burc; Huang, Xueyao; Zhang, Wenqiang; Zhang, Qingtian; Deng, Ning; Shi, Luping; Wong, H-S Philip; Qian, He

2017-05-12

Conventional hardware platforms consume huge amount of energy for cognitive learning due to the data movement between the processor and the off-chip memory. Brain-inspired device technologies using analogue weight storage allow to complete cognitive tasks more efficiently. Here we present an analogue non-volatile resistive memory (an electronic synapse) with foundry friendly materials. The device shows bidirectional continuous weight modulation behaviour. Grey-scale face classification is experimentally demonstrated using an integrated 1024-cell array with parallel online training. The energy consumption within the analogue synapses for each iteration is 1,000 × (20 ×) lower compared to an implementation using Intel Xeon Phi processor with off-chip memory (with hypothetical on-chip digital resistive random access memory). The accuracy on test sets is close to the result using a central processing unit. These experimental results consolidate the feasibility of analogue synaptic array and pave the way toward building an energy efficient and large-scale neuromorphic system.
Methodology for fast detection of false sharing in threaded scientific codes

DOEpatents

Chung, I-Hsin; Cong, Guojing; Murata, Hiroki; Negishi, Yasushi; Wen, Hui-Fang

2014-11-25

A profiling tool identifies a code region with a false sharing potential. A static analysis tool classifies variables and arrays in the identified code region. A mapping detection library correlates memory access instructions in the identified code region with variables and arrays in the identified code region while a processor is running the identified code region. The mapping detection library identifies one or more instructions at risk, in the identified code region, which are subject to an analysis by a false sharing detection library. A false sharing detection library performs a run-time analysis of the one or more instructions at risk while the processor is re-running the identified code region. The false sharing detection library determines, based on the performed run-time analysis, whether two different portions of the cache memory line are accessed by the generated binary code.
Microcalorimeters with Germanium Thermistors for High Resolution Soft and Hard X-ray Astronomy

NASA Technical Reports Server (NTRS)

Silver, E.

2003-01-01

This is a progress report for the first year of a three year Space Research and Technology (SR&T) grant to continue the advancement of neutron transmutation doped (NTD-based) microcalorimeters. We have re-prioritized certain aspects of the statement of work and chose to emphasize issues of array development in the first year rather than wait until year two. Consequently, some of the projects scheduled for the first year were delayed to the second year. Here we report on our progress to: a) Build and test a 1 x 4 element array and to investigate electrical and thermal cross-talk; b) Build a multiplexed 4 channel analog pulse processor; c) Build a digital pulse processor that can accommodate 4 channels with independent triggers; d) Develop a proportional thermal baseline restoration system compatible with the constant voltage mode of microcalorimeter operation.
High-speed, automatic controller design considerations for integrating array processor, multi-microprocessor, and host computer system architectures

NASA Technical Reports Server (NTRS)

Jacklin, S. A.; Leyland, J. A.; Warmbrodt, W.

1985-01-01

Modern control systems must typically perform real-time identification and control, as well as coordinate a host of other activities related to user interaction, online graphics, and file management. This paper discusses five global design considerations which are useful to integrate array processor, multimicroprocessor, and host computer system architectures into versatile, high-speed controllers. Such controllers are capable of very high control throughput, and can maintain constant interaction with the nonreal-time or user environment. As an application example, the architecture of a high-speed, closed-loop controller used to actively control helicopter vibration is briefly discussed. Although this system has been designed for use as the controller for real-time rotorcraft dynamics and control studies in a wind tunnel environment, the controller architecture can generally be applied to a wide range of automatic control applications.
Compiling for Application Specific Computational Acceleration in Reconfigurable Architectures Final Report CRADA No. TSB-2033-01

DOE Office of Scientific and Technical Information (OSTI.GOV)

De Supinski, B.; Caliga, D.

2017-09-28

The primary objective of this project was to develop memory optimization technology to efficiently deliver data to, and distribute data within, the SRC-6's Field Programmable Gate Array- ("FPGA") based Multi-Adaptive Processors (MAPs). The hardware/software approach was to explore efficient MAP configurations and generate the compiler technology to exploit those configurations. This memory accessing technology represents an important step towards making reconfigurable symmetric multi-processor (SMP) architectures that will be a costeffective solution for large-scale scientific computing.
Demonstration of universal parametric entangling gates on a multi-qubit lattice

PubMed Central

Reagor, Matthew; Osborn, Christopher B.; Tezak, Nikolas; Staley, Alexa; Prawiroatmodjo, Guenevere; Scheer, Michael; Alidoust, Nasser; Sete, Eyob A.; Didier, Nicolas; da Silva, Marcus P.; Acala, Ezer; Angeles, Joel; Bestwick, Andrew; Block, Maxwell; Bloom, Benjamin; Bradley, Adam; Bui, Catvu; Caldwell, Shane; Capelluto, Lauren; Chilcott, Rick; Cordova, Jeff; Crossman, Genya; Curtis, Michael; Deshpande, Saniya; El Bouayadi, Tristan; Girshovich, Daniel; Hong, Sabrina; Hudson, Alex; Karalekas, Peter; Kuang, Kat; Lenihan, Michael; Manenti, Riccardo; Manning, Thomas; Marshall, Jayss; Mohan, Yuvraj; O’Brien, William; Otterbach, Johannes; Papageorge, Alexander; Paquette, Jean-Philip; Pelstring, Michael; Polloreno, Anthony; Rawat, Vijay; Ryan, Colm A.; Renzas, Russ; Rubin, Nick; Russel, Damon; Rust, Michael; Scarabelli, Diego; Selvanayagam, Michael; Sinclair, Rodney; Smith, Robert; Suska, Mark; To, Ting-Wai; Vahidpour, Mehrnoosh; Vodrahalli, Nagesh; Whyland, Tyler; Yadav, Kamal; Zeng, William; Rigetti, Chad T.

2018-01-01

We show that parametric coupling techniques can be used to generate selective entangling interactions for multi-qubit processors. By inducing coherent population exchange between adjacent qubits under frequency modulation, we implement a universal gate set for a linear array of four superconducting qubits. An average process fidelity of ℱ = 93% is estimated for three two-qubit gates via quantum process tomography. We establish the suitability of these techniques for computation by preparing a four-qubit maximally entangled state and comparing the estimated state fidelity with the expected performance of the individual entangling gates. In addition, we prepare an eight-qubit register in all possible bitstring permutations and monitor the fidelity of a two-qubit gate across one pair of these qubits. Across all these permutations, an average fidelity of ℱ = 91.6 ± 2.6% is observed. These results thus offer a path to a scalable architecture with high selectivity and low cross-talk. PMID:29423443
Hierarchical Address Event Routing for Reconfigurable Large-Scale Neuromorphic Systems.

PubMed

Park, Jongkil; Yu, Theodore; Joshi, Siddharth; Maier, Christoph; Cauwenberghs, Gert

2017-10-01

We present a hierarchical address-event routing (HiAER) architecture for scalable communication of neural and synaptic spike events between neuromorphic processors, implemented with five Xilinx Spartan-6 field-programmable gate arrays and four custom analog neuromophic integrated circuits serving 262k neurons and 262M synapses. The architecture extends the single-bus address-event representation protocol to a hierarchy of multiple nested buses, routing events across increasing scales of spatial distance. The HiAER protocol provides individually programmable axonal delay in addition to strength for each synapse, lending itself toward biologically plausible neural network architectures, and scales across a range of hierarchies suitable for multichip and multiboard systems in reconfigurable large-scale neuromorphic systems. We show approximately linear scaling of net global synaptic event throughput with number of routing nodes in the network, at 3.6×10 7 synaptic events per second per 16k-neuron node in the hierarchy.
Toward Millions of File System IOPS on Low-Cost, Commodity Hardware

PubMed Central

Zheng, Da; Burns, Randal; Szalay, Alexander S.

2013-01-01

We describe a storage system that removes I/O bottlenecks to achieve more than one million IOPS based on a user-space file abstraction for arrays of commodity SSDs. The file abstraction refactors I/O scheduling and placement for extreme parallelism and non-uniform memory and I/O. The system includes a set-associative, parallel page cache in the user space. We redesign page caching to eliminate CPU overhead and lock-contention in non-uniform memory architecture machines. We evaluate our design on a 32 core NUMA machine with four, eight-core processors. Experiments show that our design delivers 1.23 million 512-byte read IOPS. The page cache realizes the scalable IOPS of Linux asynchronous I/O (AIO) and increases user-perceived I/O performance linearly with cache hit rates. The parallel, set-associative cache matches the cache hit rates of the global Linux page cache under real workloads. PMID:24402052
Toward Millions of File System IOPS on Low-Cost, Commodity Hardware.

PubMed

Zheng, Da; Burns, Randal; Szalay, Alexander S

2013-01-01

We describe a storage system that removes I/O bottlenecks to achieve more than one million IOPS based on a user-space file abstraction for arrays of commodity SSDs. The file abstraction refactors I/O scheduling and placement for extreme parallelism and non-uniform memory and I/O. The system includes a set-associative, parallel page cache in the user space. We redesign page caching to eliminate CPU overhead and lock-contention in non-uniform memory architecture machines. We evaluate our design on a 32 core NUMA machine with four, eight-core processors. Experiments show that our design delivers 1.23 million 512-byte read IOPS. The page cache realizes the scalable IOPS of Linux asynchronous I/O (AIO) and increases user-perceived I/O performance linearly with cache hit rates. The parallel, set-associative cache matches the cache hit rates of the global Linux page cache under real workloads.
A two-qubit photonic quantum processor and its application to solving systems of linear equations

PubMed Central

Barz, Stefanie; Kassal, Ivan; Ringbauer, Martin; Lipp, Yannick Ole; Dakić, Borivoje; Aspuru-Guzik, Alán; Walther, Philip

2014-01-01

Large-scale quantum computers will require the ability to apply long sequences of entangling gates to many qubits. In a photonic architecture, where single-qubit gates can be performed easily and precisely, the application of consecutive two-qubit entangling gates has been a significant obstacle. Here, we demonstrate a two-qubit photonic quantum processor that implements two consecutive CNOT gates on the same pair of polarisation-encoded qubits. To demonstrate the flexibility of our system, we implement various instances of the quantum algorithm for solving of systems of linear equations. PMID:25135432

Negative base encoding in optical linear algebra processors

NASA Technical Reports Server (NTRS)

Perlee, C.; Casasent, D.

1986-01-01

In the digital multiplication by analog convolution algorithm, the bits of two encoded numbers are convolved to form the product of the two numbers in mixed binary representation; this output can be easily converted to binary. Attention is presently given to negative base encoding, treating base -2 initially, and then showing that the negative base system can be readily extended to any radix. In general, negative base encoding in optical linear algebra processors represents a more efficient technique than either sign magnitude or 2's complement encoding, when the additions of digitally encoded products are performed in parallel.
Parallel network simulations with NEURON.

PubMed

Migliore, M; Cannia, C; Lytton, W W; Markram, Henry; Hines, M L

2006-10-01

The NEURON simulation environment has been extended to support parallel network simulations. Each processor integrates the equations for its subnet over an interval equal to the minimum (interprocessor) presynaptic spike generation to postsynaptic spike delivery connection delay. The performance of three published network models with very different spike patterns exhibits superlinear speedup on Beowulf clusters and demonstrates that spike communication overhead is often less than the benefit of an increased fraction of the entire problem fitting into high speed cache. On the EPFL IBM Blue Gene, almost linear speedup was obtained up to 100 processors. Increasing one model from 500 to 40,000 realistic cells exhibited almost linear speedup on 2,000 processors, with an integration time of 9.8 seconds and communication time of 1.3 seconds. The potential for speed-ups of several orders of magnitude makes practical the running of large network simulations that could otherwise not be explored.
Computational efficiency of parallel combinatorial OR-tree searches

NASA Technical Reports Server (NTRS)

Li, Guo-Jie; Wah, Benjamin W.

1990-01-01

The performance of parallel combinatorial OR-tree searches is analytically evaluated. This performance depends on the complexity of the problem to be solved, the error allowance function, the dominance relation, and the search strategies. The exact performance may be difficult to predict due to the nondeterminism and anomalies of parallelism. The authors derive the performance bounds of parallel OR-tree searches with respect to the best-first, depth-first, and breadth-first strategies, and verify these bounds by simulation. They show that a near-linear speedup can be achieved with respect to a large number of processors for parallel OR-tree searches. Using the bounds developed, the authors derive sufficient conditions for assuring that parallelism will not degrade performance and necessary conditions for allowing parallelism to have a speedup greater than the ratio of the numbers of processors. These bounds and conditions provide the theoretical foundation for determining the number of processors required to assure a near-linear speedup.
Parallel Network Simulations with NEURON

PubMed Central

Migliore, M.; Cannia, C.; Lytton, W.W; Markram, Henry; Hines, M. L.

2009-01-01

The NEURON simulation environment has been extended to support parallel network simulations. Each processor integrates the equations for its subnet over an interval equal to the minimum (interprocessor) presynaptic spike generation to postsynaptic spike delivery connection delay. The performance of three published network models with very different spike patterns exhibits superlinear speedup on Beowulf clusters and demonstrates that spike communication overhead is often less than the benefit of an increased fraction of the entire problem fitting into high speed cache. On the EPFL IBM Blue Gene, almost linear speedup was obtained up to 100 processors. Increasing one model from 500 to 40,000 realistic cells exhibited almost linear speedup on 2000 processors, with an integration time of 9.8 seconds and communication time of 1.3 seconds. The potential for speed-ups of several orders of magnitude makes practical the running of large network simulations that could otherwise not be explored. PMID:16732488
A High-Throughput Processor for Flight Control Research Using Small UAVs

NASA Technical Reports Server (NTRS)

Klenke, Robert H.; Sleeman, W. C., IV; Motter, Mark A.

2006-01-01

There are numerous autopilot systems that are commercially available for small (<100 lbs) UAVs. However, they all share several key disadvantages for conducting aerodynamic research, chief amongst which is the fact that most utilize older, slower, 8- or 16-bit microcontroller technologies. This paper describes the development and testing of a flight control system (FCS) for small UAV s based on a modern, high throughput, embedded processor. In addition, this FCS platform contains user-configurable hardware resources in the form of a Field Programmable Gate Array (FPGA) that can be used to implement custom, application-specific hardware. This hardware can be used to off-load routine tasks such as sensor data collection, from the FCS processor thereby further increasing the computational throughput of the system.
Phase space simulation of collisionless stellar systems on the massively parallel processor

NASA Technical Reports Server (NTRS)

White, Richard L.

1987-01-01

A numerical technique for solving the collisionless Boltzmann equation describing the time evolution of a self gravitating fluid in phase space was implemented on the Massively Parallel Processor (MPP). The code performs calculations for a two dimensional phase space grid (with one space and one velocity dimension). Some results from calculations are presented. The execution speed of the code is comparable to the speed of a single processor of a Cray-XMP. Advantages and disadvantages of the MPP architecture for this type of problem are discussed. The nearest neighbor connectivity of the MPP array does not pose a significant obstacle. Future MPP-like machines should have much more local memory and easier access to staging memory and disks in order to be effective for this type of problem.
Redundant disk arrays: Reliable, parallel secondary storage. Ph.D. Thesis

NASA Technical Reports Server (NTRS)

Gibson, Garth Alan

1990-01-01

During the past decade, advances in processor and memory technology have given rise to increases in computational performance that far outstrip increases in the performance of secondary storage technology. Coupled with emerging small-disk technology, disk arrays provide the cost, volume, and capacity of current disk subsystems, by leveraging parallelism, many times their performance. Unfortunately, arrays of small disks may have much higher failure rates than the single large disks they replace. Redundant arrays of inexpensive disks (RAID) use simple redundancy schemes to provide high data reliability. The data encoding, performance, and reliability of redundant disk arrays are investigated. Organizing redundant data into a disk array is treated as a coding problem. Among alternatives examined, codes as simple as parity are shown to effectively correct single, self-identifying disk failures.
Environmentally adaptive processing for shallow ocean applications: A sequential Bayesian approach.

PubMed

Candy, J V

2015-09-01

The shallow ocean is a changing environment primarily due to temperature variations in its upper layers directly affecting sound propagation throughout. The need to develop processors capable of tracking these changes implies a stochastic as well as an environmentally adaptive design. Bayesian techniques have evolved to enable a class of processors capable of performing in such an uncertain, nonstationary (varying statistics), non-Gaussian, variable shallow ocean environment. A solution to this problem is addressed by developing a sequential Bayesian processor capable of providing a joint solution to the modal function tracking and environmental adaptivity problem. Here, the focus is on the development of both a particle filter and an unscented Kalman filter capable of providing reasonable performance for this problem. These processors are applied to hydrophone measurements obtained from a vertical array. The adaptivity problem is attacked by allowing the modal coefficients and/or wavenumbers to be jointly estimated from the noisy measurement data along with tracking of the modal functions while simultaneously enhancing the noisy pressure-field measurements.
Software-Reconfigurable Processors for Spacecraft

NASA Technical Reports Server (NTRS)

Farrington, Allen; Gray, Andrew; Bell, Bryan; Stanton, Valerie; Chong, Yong; Peters, Kenneth; Lee, Clement; Srinivasan, Jeffrey

2005-01-01

A report presents an overview of an architecture for a software-reconfigurable network data processor for a spacecraft engaged in scientific exploration. When executed on suitable electronic hardware, the software performs the functions of a physical layer (in effect, acts as a software radio in that it performs modulation, demodulation, pulse-shaping, error correction, coding, and decoding), a data-link layer, a network layer, a transport layer, and application-layer processing of scientific data. The software-reconfigurable network processor is undergoing development to enable rapid prototyping and rapid implementation of communication, navigation, and scientific signal-processing functions; to provide a long-lived communication infrastructure; and to provide greatly improved scientific-instrumentation and scientific-data-processing functions by enabling science-driven in-flight reconfiguration of computing resources devoted to these functions. This development is an extension of terrestrial radio and network developments (e.g., in the cellular-telephone industry) implemented in software running on such hardware as field-programmable gate arrays, digital signal processors, traditional digital circuits, and mixed-signal application-specific integrated circuits (ASICs).
Testability Design Rating System: Testability Handbook. Volume 1

DTIC Science & Technology

1992-02-01

4-10 4.7.5 Summary of False BIT Alarms (FBA) ............................. 4-10 4.7.6 Smart BIT Technique...Circuit Board PGA Pin Grid Array PLA Programmable Logic Array PLD Programmable Logic Device PN Pseudo-Random Number PREDICT Probabilistic Estimation of...11 4.7.6 Smart BIT ( reference: RADC-TR-85-198). " Smart " BIT is a term given to BIT circuitry in a system LRU which includes dedicated processor/memory
Parallel discrete event simulation: A shared memory approach

NASA Technical Reports Server (NTRS)

Reed, Daniel A.; Malony, Allen D.; Mccredie, Bradley D.

1987-01-01

With traditional event list techniques, evaluating a detailed discrete event simulation model can often require hours or even days of computation time. Parallel simulation mimics the interacting servers and queues of a real system by assigning each simulated entity to a processor. By eliminating the event list and maintaining only sufficient synchronization to insure causality, parallel simulation can potentially provide speedups that are linear in the number of processors. A set of shared memory experiments is presented using the Chandy-Misra distributed simulation algorithm to simulate networks of queues. Parameters include queueing network topology and routing probabilities, number of processors, and assignment of network nodes to processors. These experiments show that Chandy-Misra distributed simulation is a questionable alternative to sequential simulation of most queueing network models.
Accuracy requirements of optical linear algebra processors in adaptive optics imaging systems

NASA Technical Reports Server (NTRS)

Downie, John D.

1990-01-01

A ground-based adaptive optics imaging telescope system attempts to improve image quality by detecting and correcting for atmospherically induced wavefront aberrations. The required control computations during each cycle will take a finite amount of time. Longer time delays result in larger values of residual wavefront error variance since the atmosphere continues to change during that time. Thus an optical processor may be well-suited for this task. This paper presents a study of the accuracy requirements in a general optical processor that will make it competitive with, or superior to, a conventional digital computer for the adaptive optics application. An optimization of the adaptive optics correction algorithm with respect to an optical processor's degree of accuracy is also briefly discussed.
Wake Vortex Avoidance System and Method

NASA Technical Reports Server (NTRS)

Shams, Qamar A. (Inventor); Zuckerwar, Allan J. (Inventor); Knight, Howard K. (Inventor)

2017-01-01

A wake vortex avoidance system includes a microphone array configured to detect low frequency sounds. A signal processor determines a geometric mean coherence based on the detected low frequency sounds. A display displays wake vortices based on the determined geometric mean coherence.
SPAR thermal analysis processors reference manual, system level 16. Volume 1: Program executive. Volume 2: Theory. Volume 3: Demonstration problems. Volume 4: Experimental thermal element capability. Volume 5: Programmer reference

NASA Technical Reports Server (NTRS)

Marlowe, M. B.; Moore, R. A.; Whetstone, W. D.

1979-01-01

User instructions are given for performing linear and nonlinear steady state and transient thermal analyses with SPAR thermal analysis processors TGEO, SSTA, and TRTA. It is assumed that the user is familiar with basic SPAR operations and basic heat transfer theory.
Dedicated hardware processor and corresponding system-on-chip design for real-time laser speckle imaging.

PubMed

Jiang, Chao; Zhang, Hongyan; Wang, Jia; Wang, Yaru; He, Heng; Liu, Rui; Zhou, Fangyuan; Deng, Jialiang; Li, Pengcheng; Luo, Qingming

2011-11-01

Laser speckle imaging (LSI) is a noninvasive and full-field optical imaging technique which produces two-dimensional blood flow maps of tissues from the raw laser speckle images captured by a CCD camera without scanning. We present a hardware-friendly algorithm for the real-time processing of laser speckle imaging. The algorithm is developed and optimized specifically for LSI processing in the field programmable gate array (FPGA). Based on this algorithm, we designed a dedicated hardware processor for real-time LSI in FPGA. The pipeline processing scheme and parallel computing architecture are introduced into the design of this LSI hardware processor. When the LSI hardware processor is implemented in the FPGA running at the maximum frequency of 130 MHz, up to 85 raw images with the resolution of 640×480 pixels can be processed per second. Meanwhile, we also present a system on chip (SOC) solution for LSI processing by integrating the CCD controller, memory controller, LSI hardware processor, and LCD display controller into a single FPGA chip. This SOC solution also can be used to produce an application specific integrated circuit for LSI processing.
Replication of Space-Shuttle Computers in FPGAs and ASICs

NASA Technical Reports Server (NTRS)

Ferguson, Roscoe C.

2008-01-01

A document discusses the replication of the functionality of the onboard space-shuttle general-purpose computers (GPCs) in field-programmable gate arrays (FPGAs) and application-specific integrated circuits (ASICs). The purpose of the replication effort is to enable utilization of proven space-shuttle flight software and software-development facilities to the extent possible during development of software for flight computers for a new generation of launch vehicles derived from the space shuttles. The replication involves specifying the instruction set of the central processing unit and the input/output processor (IOP) of the space-shuttle GPC in a hardware description language (HDL). The HDL is synthesized to form a "core" processor in an FPGA or, less preferably, in an ASIC. The core processor can be used to create a flight-control card to be inserted into a new avionics computer. The IOP of the GPC as implemented in the core processor could be designed to support data-bus protocols other than that of a multiplexer interface adapter (MIA) used in the space shuttle. Hence, a computer containing the core processor could be tailored to communicate via the space-shuttle GPC bus and/or one or more other buses.
Dynamically programmable cache

NASA Astrophysics Data System (ADS)

Nakkar, Mouna; Harding, John A.; Schwartz, David A.; Franzon, Paul D.; Conte, Thomas

1998-10-01

Reconfigurable machines have recently been used as co- processors to accelerate the execution of certain algorithms or program subroutines. The problems with the above approach include high reconfiguration time and limited partial reconfiguration. By far the most critical problems are: (1) the small on-chip memory which results in slower execution time, and (2) small FPGA areas that cannot implement large subroutines. Dynamically Programmable Cache (DPC) is a novel architecture for embedded processors which offers solutions to the above problems. To solve memory access problems, DPC processors merge reconfigurable arrays with the data cache at various cache levels to create a multi-level reconfigurable machines. As a result DPC machines have both higher data accessibility and FPGA memory bandwidth. To solve the limited FPGA resource problem, DPC processors implemented multi-context switching (Virtualization) concept. Virtualization allows implementation of large subroutines with fewer FPGA cells. Additionally, DPC processors can parallelize the execution of several operations resulting in faster execution time. In this paper, the speedup improvement for DPC machines are shown to be 5X faster than an Altera FLEX10K FPGA chip and 2X faster than a Sun Ultral SPARC station for two different algorithms (convolution and motion estimation).
Multiple-access phased array antenna simulator for a digital beam-forming system investigation

NASA Technical Reports Server (NTRS)

Kerczewski, Robert J.; Yu, John; Walton, Joanne C.; Perl, Thomas D.; Andro, Monty; Alexovich, Robert E.

1992-01-01

Future versions of data relay satellite systems are currently being planned by NASA. Being given consideration for implementation are on-board digital beamforming techniques which will allow multiple users to simultaneously access a single S-band phased array antenna system. To investigate the potential performance of such a system, a laboratory simulator has been developed at NASA's Lewis Research Center. This paper describes the system simulator, and in particular, the requirements, design and performance of a key subsystem, the phased array antenna simulator, which provides realistic inputs to the digital processor including multiple signals, noise, and nonlinearities.
Multiple-access phased array antenna simulator for a digital beam forming system investigation

NASA Technical Reports Server (NTRS)

Kerczewski, Robert J.; Yu, John; Walton, Joanne C.; Perl, Thomas D.; Andro, Monty; Alexovich, Robert E.

1992-01-01

Future versions of data relay satellite systems are currently being planned by NASA. Being given consideration for implementation are on-board digital beamforming techniques which will allow multiple users to simultaneously access a single S-band phased array antenna system. To investigate the potential performance of such a system, a laboratory simulator has been developed at NASA's Lewis Research Center. This paper describes the system simulator, and in particular, the requirements, design, and performance of a key subsystem, the phased array antenna simulator, which provides realistic inputs to the digital processor including multiple signals, noise, and nonlinearities.
The design and implementation of cost-effective algorithms for direct solution of banded linear systems on the vector processor system 32 supercomputer

NASA Technical Reports Server (NTRS)

Samba, A. S.

1985-01-01

The problem of solving banded linear systems by direct (non-iterative) techniques on the Vector Processor System (VPS) 32 supercomputer is considered. Two efficient direct methods for solving banded linear systems on the VPS 32 are described. The vector cyclic reduction (VCR) algorithm is discussed in detail. The performance of the VCR on a three parameter model problem is also illustrated. The VCR is an adaptation of the conventional point cyclic reduction algorithm. The second direct method is the Customized Reduction of Augmented Triangles' (CRAT). CRAT has the dominant characteristics of an efficient VPS 32 algorithm. CRAT is tailored to the pipeline architecture of the VPS 32 and as a consequence the algorithm is implicitly vectorizable.

Digital Beamforming Scatterometer

NASA Technical Reports Server (NTRS)

Rincon, Rafael F.; Vega, Manuel; Kman, Luko; Buenfil, Manuel; Geist, Alessandro; Hillard, Larry; Racette, Paul

2009-01-01

This paper discusses scatterometer measurements collected with multi-mode Digital Beamforming Synthetic Aperture Radar (DBSAR) during the SMAP-VEX 2008 campaign. The 2008 SMAP Validation Experiment was conducted to address a number of specific questions related to the soil moisture retrieval algorithms. SMAP-VEX 2008 consisted on a series of aircraft-based.flights conducted on the Eastern Shore of Maryland and Delaware in the fall of 2008. Several other instruments participated in the campaign including the Passive Active L-Band System (PALS), the Marshall Airborne Polarimetric Imaging Radiometer (MAPIR), and the Global Positioning System Reflectometer (GPSR). This campaign was the first SMAP Validation Experiment. DBSAR is a multimode radar system developed at NASA/Goddard Space Flight Center that combines state-of-the-art radar technologies, on-board processing, and advances in signal processing techniques in order to enable new remote sensing capabilities applicable to Earth science and planetary applications [l]. The instrument can be configured to operate in scatterometer, Synthetic Aperture Radar (SAR), or altimeter mode. The system builds upon the L-band Imaging Scatterometer (LIS) developed as part of the RadSTAR program. The radar is a phased array system designed to fly on the NASA P3 aircraft. The instrument consists of a programmable waveform generator, eight transmit/receive (T/R) channels, a microstrip antenna, and a reconfigurable data acquisition and processor system. Each transmit channel incorporates a digital attenuator, and digital phase shifter that enables amplitude and phase modulation on transmit. The attenuators, phase shifters, and calibration switches are digitally controlled by the radar control card (RCC) on a pulse by pulse basis. The antenna is a corporate fed microstrip patch-array centered at 1.26 GHz with a 20 MHz bandwidth. Although only one feed is used with the present configuration, a provision was made for separate corporate feeds for vertical and horizontal polarization. System upgrades to dual polarization are currently under way. The DBSAR processor is a reconfigurable data acquisition and processor system capable of real-time, high-speed data processing. DBSAR uses an FPGA-based architecture to implement digitally down-conversion, in-phase and quadrature (I/Q) demodulation, and subsequent radar specific algorithms. The core of the processor board consists of an analog-to-digital (AID) section, three Altera Stratix field programmable gate arrays (FPGAs), an ARM microcontroller, several memory devices, and an Ethernet interface. The processor also interfaces with a navigation board consisting of a GPS and a MEMS gyro. The processor has been configured to operate in scatterometer, Synthetic Aperture Radar (SAR), and altimeter modes. All the modes are based on digital beamforming which is a digital process that generates the far-field beam patterns at various scan angles from voltages sampled in the antenna array. This technique allows steering the received beam and controlling its beam-width and side-lobe. Several beamforming techniques can be implemented each characterized by unique strengths and weaknesses, and each applicable to different measurement scenarios. In Scatterometer mode, the radar is capable to.generate a wide beam or scan a narrow beam on transmit, and to steer the received beam on processing while controlling its beamwidth and side-lobe level. Table I lists some important radar characteristics
Parallel asynchronous systems and image processing algorithms

NASA Technical Reports Server (NTRS)

Coon, D. D.; Perera, A. G. U.

1989-01-01

A new hardware approach to implementation of image processing algorithms is described. The approach is based on silicon devices which would permit an independent analog processing channel to be dedicated to evey pixel. A laminar architecture consisting of a stack of planar arrays of the device would form a two-dimensional array processor with a 2-D array of inputs located directly behind a focal plane detector array. A 2-D image data stream would propagate in neuronlike asynchronous pulse coded form through the laminar processor. Such systems would integrate image acquisition and image processing. Acquisition and processing would be performed concurrently as in natural vision systems. The research is aimed at implementation of algorithms, such as the intensity dependent summation algorithm and pyramid processing structures, which are motivated by the operation of natural vision systems. Implementation of natural vision algorithms would benefit from the use of neuronlike information coding and the laminar, 2-D parallel, vision system type architecture. Besides providing a neural network framework for implementation of natural vision algorithms, a 2-D parallel approach could eliminate the serial bottleneck of conventional processing systems. Conversion to serial format would occur only after raw intensity data has been substantially processed. An interesting challenge arises from the fact that the mathematical formulation of natural vision algorithms does not specify the means of implementation, so that hardware implementation poses intriguing questions involving vision science.
Accuracy requirements of optical linear algebra processors in adaptive optics imaging systems.

PubMed

Downie, J D; Goodman, J W

1989-10-15

A ground-based adaptive optics imaging telescope system attempts to improve image quality by measuring and correcting for atmospherically induced wavefront aberrations. The necessary control computations during each cycle will take a finite amount of time, which adds to the residual error variance since the atmosphere continues to change during that time. Thus an optical processor may be well-suited for this task. This paper investigates this possibility by studying the accuracy requirements in a general optical processor that will make it competitive with, or superior to, a conventional digital computer for adaptive optics use.
Embedded System Implementation on FPGA System With μCLinux OS

NASA Astrophysics Data System (ADS)

Fairuz Muhd Amin, Ahmad; Aris, Ishak; Syamsul Azmir Raja Abdullah, Raja; Kalos Zakiah Sahbudin, Ratna

2011-02-01

Embedded systems are taking on more complicated tasks as the processors involved become more powerful. The embedded systems have been widely used in many areas such as in industries, automotives, medical imaging, communications, speech recognition and computer vision. The complexity requirements in hardware and software nowadays need a flexibility system for further enhancement in any design without adding new hardware. Therefore, any changes in the design system will affect the processor that need to be changed. To overcome this problem, a System On Programmable Chip (SOPC) has been designed and developed using Field Programmable Gate Array (FPGA). A softcore processor, NIOS II 32-bit RISC, which is the microprocessor core was utilized in FPGA system together with the embedded operating system(OS), μClinux. In this paper, an example of web server is explained and demonstrated
Endobronchial ultrasound elastography: a new method in endobronchial ultrasound-guided transbronchial needle aspiration.

PubMed

Jiang, Jun-Hong; Turner, J Francis; Huang, Jian-An

2015-12-01

TBNA through the flexible bronchoscope is a 37-year-old technology that utilizes a TBNA needle to puncture the bronchial wall and obtain specimens of peribronchial and mediastinal lesions through the flexible bronchoscope for the diagnosis of benign and malignant diseases in the mediastinum and lung. Since 2002, the Olympus Company developed the first generation ultrasound equipment for use in the airway, initially utilizing an ultrasound probe introduced through the working channel followed by incoroporation of a fixed linear ultrasound array at the distal tip of the bronchoscope. This new bronchoscope equipped with a convex type ultrasound probe on the tip was subsequently introduced into clinical practice. The convex probe (CP)-EBUS allows real-time endobronchial ultrasound-guided transbronchial needle aspiration (EBUS-TBNA) of mediastinal and hilar lymph nodes. EBUS-TBNA is a minimally invasive procedure performed under local anesthesia that has been shown to have a high sensitivity and diagnostic yield for lymph node staging of lung cancer. In 10 years of EBUS development, the Olympus Company developed the second generation EBUS bronchoscope (BF-UC260FW) with the ultrasound image processor (EU-M1), and in 2013 introduced a new ultrasound image processor (EU-M2) into clinical practice. FUJI company has also developed a curvilinear array endobronchial ultrasound bronchoscope (EB-530 US) that makes it easier for the operator to master the operation of the ultrasonic bronchoscope. Also, the new thin convex probe endobronchial ultrasound bronchoscope (TCP-EBUS) is able to visualize one to three bifurcations distal to the current CP-EBUS. The emergence of EBUS-TBNA has also been accompanied by innovation in EBUS instruments. EBUS elastography is, then, a new technique for describing the compliance of structures during EBUS, which may be of use in the determination of metastasis to the mediastinal and hilar lymph nodes. This article describes these new EBUS techniques and reviews the relevant literature.
Multispectral linear array visible and shortwave infrared sensors

NASA Astrophysics Data System (ADS)

Tower, J. R.; Warren, F. B.; Pellon, L. E.; Strong, R.; Elabd, H.; Cope, A. D.; Hoffmann, D. M.; Kramer, W. M.; Longsderff, R. W.

1984-08-01

All-solid state pushbroom sensors for multispectral linear array (MLA) instruments to replace mechanical scanners used on LANDSAT satellites are introduced. A buttable, four-spectral-band, linear-format charge coupled device (CCD) and a buttable, two-spectral-band, linear-format, shortwave infrared CCD are described. These silicon integrated circuits may be butted end to end to provide multispectral focal planes with thousands of contiguous, in-line photosites. The visible CCD integrated circuit is organized as four linear arrays of 1024 pixels each. Each array views the scene in a different spectral window, resulting in a four-band sensor. The shortwave infrared (SWIR) sensor is organized as 2 linear arrays of 512 detectors each. Each linear array is optimized for performance at a different wavelength in the SWIR band.
When emotionality trumps reason: a study of individual processing style and juror bias.

PubMed

Gunnell, Justin J; Ceci, Stephen J

2010-01-01

"Cognitive Experiential Self Theory" (CEST) postulates that information-processing proceeds through two pathways, a rational one and an experiential one. The former is characterized by an emphasis on analysis, fact, and logical argument, whereas the latter is characterized by emotional and personal experience. We examined whether individuals influenced by the experiential system (E-processors) are more susceptible to extralegal biases (e.g. defendant attractiveness) than those influenced by the rational system (R-processors). Participants reviewed a criminal trial transcript and defendant profile and determined verdict, sentencing, and extralegal susceptibility. Although E-processors and R-processors convicted attractive defendants at similar rates, E-processors were more likely to convict less attractive defendants. Whereas R-processors did not sentence attractive and less attractive defendants differently, E-processors gave more lenient sentences to attractive defendants and harsher sentences to less attractive defendants. E-processors were also more likely to report that extralegal factors would change their verdicts. Further, the degree to which emotionality trumped rationality within an individual, as measured by a novel scoring method, linearly correlated with harsher sentences and extralegal influence. In sum, the results support an "unattractive harshness" effect during guilt determination, an attraction leniency effect during sentencing and increased susceptibility to extralegal factors within E-processors. Copyright © 2010 John Wiley & Sons, Ltd. Copyright © 2010 John Wiley & Sons, Ltd.
Status report of the end-to-end ASKAP software system: towards early science operations

NASA Astrophysics Data System (ADS)

Guzman, Juan Carlos; Chapman, Jessica; Marquarding, Malte; Whiting, Matthew

2016-08-01

The Australian SKA Pathfinder (ASKAP) is a novel centimetre radio synthesis telescope currently in the commissioning phase and located in the midwest region of Western Australia. It comprises of 36 x 12 m diameter reflector antennas each equipped with state-of-the-art and award winning Phased Array Feeds (PAF) technology. The PAFs provide a wide, 30 square degree field-of-view by forming up to 36 separate dual-polarisation beams at once. This results in a high data rate: 70 TB of correlated visibilities in an 8-hour observation, requiring custom-written, high-performance software running in dedicated High Performance Computing (HPC) facilities. The first six antennas equipped with first-generation PAF technology (Mark I), named the Boolardy Engineering Test Array (BETA) have been in use since 2014 as a platform to test PAF calibration and imaging techniques, and along the way it has been producing some great science results. Commissioning of the ASKAP Array Release 1, that is the first six antennas with second-generation PAFs (Mark II) is currently under way. An integral part of the instrument is the Central Processor platform hosted at the Pawsey Supercomputing Centre in Perth, which executes custom-written software pipelines, designed specifically to meet the ASKAP imaging requirements of wide field of view and high dynamic range. There are three key hardware components of the Central Processor: The ingest nodes (16 x node cluster), the fast temporary storage (1 PB Lustre file system) and the processing supercomputer (200 TFlop system). This High-Performance Computing (HPC) platform is managed and supported by the Pawsey support team. Due to the limited amount of data generated by BETA and the first ASKAP Array Release, the Central Processor platform has been running in a more "traditional" or user-interactive mode. But this is about to change: integration and verification of the online ingest pipeline starts in early 2016, which is required to support the full 300 MHz bandwidth for Array Release 1; followed by the deployment of the real-time data processing components. In addition to the Central Processor, the first production release of the CSIRO ASKAP Science Data Archive (CASDA) has also been deployed in one of the Pawsey Supercomputing Centre facilities and it is integrated to the end-to-end ASKAP data flow system. This paper describes the current status of the "end-to-end" data flow software system from preparing observations to data acquisition, processing and archiving; and the challenges of integrating an HPC facility as a key part of the instrument. It also shares some lessons learned since the start of integration activities and the challenges ahead in preparation for the start of the Early Science program.
High-Performance, Radiation-Hardened Electronics for Space Environments

NASA Technical Reports Server (NTRS)

Keys, Andrew S.; Watson, Michael D.; Frazier, Donald O.; Adams, James H.; Johnson, Michael A.; Kolawa, Elizabeth A.

2007-01-01

The Radiation Hardened Electronics for Space Environments (RHESE) project endeavors to advance the current state-of-the-art in high-performance, radiation-hardened electronics and processors, ensuring successful performance of space systems required to operate within extreme radiation and temperature environments. Because RHESE is a project within the Exploration Technology Development Program (ETDP), RHESE's primary customers will be the human and robotic missions being developed by NASA's Exploration Systems Mission Directorate (ESMD) in partial fulfillment of the Vision for Space Exploration. Benefits are also anticipated for NASA's science missions to planetary and deep-space destinations. As a technology development effort, RHESE provides a broad-scoped, full spectrum of approaches to environmentally harden space electronics, including new materials, advanced design processes, reconfigurable hardware techniques, and software modeling of the radiation environment. The RHESE sub-project tasks are: SelfReconfigurable Electronics for Extreme Environments, Radiation Effects Predictive Modeling, Radiation Hardened Memory, Single Event Effects (SEE) Immune Reconfigurable Field Programmable Gate Array (FPGA) (SIRF), Radiation Hardening by Software, Radiation Hardened High Performance Processors (HPP), Reconfigurable Computing, Low Temperature Tolerant MEMS by Design, and Silicon-Germanium (SiGe) Integrated Electronics for Extreme Environments. These nine sub-project tasks are managed by technical leads as located across five different NASA field centers, including Ames Research Center, Goddard Space Flight Center, the Jet Propulsion Laboratory, Langley Research Center, and Marshall Space Flight Center. The overall RHESE integrated project management responsibility resides with NASA's Marshall Space Flight Center (MSFC). Initial technology development emphasis within RHESE focuses on the hardening of Field Programmable Gate Arrays (FPGA)s and Field Programmable Analog Arrays (FPAA)s for use in reconfigurable architectures. As these component/chip level technologies mature, the RHESE project emphasis shifts to focus on efforts encompassing total processor hardening techniques and board-level electronic reconfiguration techniques featuring spare and interface modularity. This phased approach to distributing emphasis between technology developments provides hardened FPGA/FPAAs for early mission infusion, then migrates to hardened, board-level, high speed processors with associated memory elements and high density storage for the longer duration missions encountered for Lunar Outpost and Mars Exploration occurring later in the Constellation schedule.
Atmospheric Modeling And Sensor Simulation (AMASS) study

NASA Technical Reports Server (NTRS)

Parker, K. G.

1985-01-01

A 4800 band synchronous communications link was established between the Perkin-Elmer (P-E) 3250 Atmospheric Modeling and Sensor Simulation (AMASS) system and the Cyber 205 located at the Goddard Space Flight Center. An extension study of off-the-shelf array processors offering standard interface to the Perkin-Elmer was conducted to determine which would meet computational requirements of the division. A Floating Point Systems AP-120B was borrowed from another Marshall Space Flight Center laboratory for evaluation. It was determined that available array processors did not offer significantly more capabilities than the borrowed unit, although at least three other vendors indicated that standard Perkin-Elmer interfaces would be marketed in the future. Therefore, the recommendation was made to continue to utilize the 120B ad to keep monitoring the AP market. Hardware necessary to support requirements of the ASD as well as to enhance system performance was specified and procured. Filters were implemented on the Harris/McIDAS system including two-dimensional lowpass, gradient, Laplacian, and bicubic interpolation routines.
A new multifunction acousto-optic signal processor

NASA Technical Reports Server (NTRS)

Berg, N. J.; Casseday, M. W.; Filipov, A. N.; Pellegrino, J. M.

1984-01-01

An acousto-optic architecture for simultaneously obtaining time integration correlation and high-speed power spectrum analysis was constructed using commercially available TeO2 modulators and photodiode detector-arrays. The correlator section of the processor uses coherent interferometry to attain maximum bandwidth and dynamic range while achieving a time-bandwidth product of 1 million. Two correllator outputs are achieved in this system configuration. One is optically filtered and magnified 2 : 1 to decrease the spatial frequency to a level where a 25-MHz bandwidth may be sampled by a 62-mm array with elements on 25-micro centers. The other output is magnified by a factor of 10 such that the center 4 microseconds of information is available for estimation of time-difference-of-arrival to within 10 ns. The Bragg cell spectrum-analyzer section, which also has two outputs, resolves a 25-MHz instantaneous bandwidth to 25 kHz and can determine discrete-frequency reception time to within 15 microseconds. A microprocessor combines spectrum analysis information with that obtained from the correlator.
A pipelined architecture for real time correction of non-uniformity in infrared focal plane arrays imaging system using multiprocessors

NASA Astrophysics Data System (ADS)

Zou, Liang; Fu, Zhuang; Zhao, YanZheng; Yang, JunYan

2010-07-01

This paper proposes a kind of pipelined electric circuit architecture implemented in FPGA, a very large scale integrated circuit (VLSI), which efficiently deals with the real time non-uniformity correction (NUC) algorithm for infrared focal plane arrays (IRFPA). Dual Nios II soft-core processors and a DSP with a 64+ core together constitute this image system. Each processor undertakes own systematic task, coordinating its work with each other's. The system on programmable chip (SOPC) in FPGA works steadily under the global clock frequency of 96Mhz. Adequate time allowance makes FPGA perform NUC image pre-processing algorithm with ease, which has offered favorable guarantee for the work of post image processing in DSP. And at the meantime, this paper presents a hardware (HW) and software (SW) co-design in FPGA. Thus, this systematic architecture yields an image processing system with multiprocessor, and a smart solution to the satisfaction with the performance of the system.
MULTI-CORE AND OPTICAL PROCESSOR RELATED APPLICATIONS RESEARCH AT OAK RIDGE NATIONAL LABORATORY

DOE Office of Scientific and Technical Information (OSTI.GOV)

Barhen, Jacob; Kerekes, Ryan A; ST Charles, Jesse Lee

2008-01-01

High-speed parallelization of common tasks holds great promise as a low-risk approach to achieving the significant increases in signal processing and computational performance required for next generation innovations in reconfigurable radio systems. Researchers at the Oak Ridge National Laboratory have been working on exploiting the parallelization offered by this emerging technology and applying it to a variety of problems. This paper will highlight recent experience with four different parallel processors applied to signal processing tasks that are directly relevant to signal processing required for SDR/CR waveforms. The first is the EnLight Optical Core Processor applied to matched filter (MF) correlationmore » processing via fast Fourier transform (FFT) of broadband Dopplersensitive waveforms (DSW) using active sonar arrays for target tracking. The second is the IBM CELL Broadband Engine applied to 2-D discrete Fourier transform (DFT) kernel for image processing and frequency domain processing. And the third is the NVIDIA graphical processor applied to document feature clustering. EnLight Optical Core Processor. Optical processing is inherently capable of high-parallelism that can be translated to very high performance, low power dissipation computing. The EnLight 256 is a small form factor signal processing chip (5x5 cm2) with a digital optical core that is being developed by an Israeli startup company. As part of its evaluation of foreign technology, ORNL's Center for Engineering Science Advanced Research (CESAR) had access to a precursor EnLight 64 Alpha hardware for a preliminary assessment of capabilities in terms of large Fourier transforms for matched filter banks and on applications related to Doppler-sensitive waveforms. This processor is optimized for array operations, which it performs in fixed-point arithmetic at the rate of 16 TeraOPS at 8-bit precision. This is approximately 1000 times faster than the fastest DSP available today. The optical core performs the matrix-vector multiplications, where the nominal matrix size is 256x256. The system clock is 125MHz. At each clock cycle, 128K multiply-and-add operations per second (OPS) are carried out, which yields a peak performance of 16 TeraOPS. IBM Cell Broadband Engine. The Cell processor is the extraordinary resulting product of 5 years of sustained, intensive R&D collaboration (involving over $400M investment) between IBM, Sony, and Toshiba. Its architecture comprises one multithreaded 64-bit PowerPC processor element (PPE) with VMX capabilities and two levels of globally coherent cache, and 8 synergistic processor elements (SPEs). Each SPE consists of a processor (SPU) designed for streaming workloads, local memory, and a globally coherent direct memory access (DMA) engine. Computations are performed in 128-bit wide single instruction multiple data streams (SIMD). An integrated high-bandwidth element interconnect bus (EIB) connects the nine processors and their ports to external memory and to system I/O. The Applied Software Engineering Research (ASER) Group at the ORNL is applying the Cell to a variety of text and image analysis applications. Research on Cell-equipped PlayStation3 (PS3) consoles has led to the development of a correlation-based image recognition engine that enables a single PS3 to process images at more than 10X the speed of state-of-the-art single-core processors. NVIDIA Graphics Processing Units. The ASER group is also employing the latest NVIDIA graphical processing units (GPUs) to accelerate clustering of thousands of text documents using recently developed clustering algorithms such as document flocking and affinity propagation.« less
An optimization of the FPGA/NIOS adaptive FIR filter using linear prediction to reduce narrow band RFI for the next generation ground-based ultra-high energy cosmic-ray experiment

NASA Astrophysics Data System (ADS)

Szadkowski, Zbigniew; Fraenkel, E. D.; Glas, Dariusz; Legumina, Remigiusz

2013-12-01

The electromagnetic part of an extensive air shower developing in the atmosphere provides significant information complementary to that obtained by water Cherenkov detectors which are predominantly sensitive to the muonic content of an air shower at ground. The emissions can be observed in the frequency band between 10 and 100 MHz. However, this frequency range is significantly contaminated by narrow-band RFI and other human-made distortions. The Auger Engineering Radio Array currently suppresses the RFI by multiple time-to-frequency domain conversions using an FFT procedure as well as by a set of manually chosen IIR notch filters in the time-domain. An alternative approach developed in this paper is an adaptive FIR filter based on linear prediction (LP). The coefficients for the linear predictor are dynamically refreshed and calculated in the virtual NIOS processor. The radio detector is an autonomous system installed on the Argentinean pampas and supplied from a solar panel. Powerful calculation capacity inside the FPGA is a factor. Power consumption versus the degree of effectiveness of the calculation inside the FPGA is a figure of merit to be minimized. Results show that the RFI contamination can be significantly suppressed by the LP FIR filter for 64 or less stages.
Microlaser-based compact optical neuro-processors (Invited Paper)

NASA Astrophysics Data System (ADS)

Paek, Eung Gi; Chan, Winston K.; Zah, Chung-En; Cheung, Kwok-wai; Curtis, L.; Chang-Hasnain, Constance J.

1992-10-01

This paper reviews the recent progress in the development of holographic neural networks using surface-emitting laser diode arrays (SELDAs). Since the previous work on ultrafast holographic memory readout system and a robust incoherent correlator, progress has been made in several areas: the use of an array of monolithic `neurons' to reconstruct holographic memories; two-dimensional (2-D) wavelength-division multiplexing (WDM) for image transmission through a single-mode fiber; and finally, an associative memory using time- division multiplexing (TDM). Experimental demonstrations on these are presented.
FPGA Acceleration of the phylogenetic likelihood function for Bayesian MCMC inference methods.

PubMed

Zierke, Stephanie; Bakos, Jason D

2010-04-12

Likelihood (ML)-based phylogenetic inference has become a popular method for estimating the evolutionary relationships among species based on genomic sequence data. This method is used in applications such as RAxML, GARLI, MrBayes, PAML, and PAUP. The Phylogenetic Likelihood Function (PLF) is an important kernel computation for this method. The PLF consists of a loop with no conditional behavior or dependencies between iterations. As such it contains a high potential for exploiting parallelism using micro-architectural techniques. In this paper, we describe a technique for mapping the PLF and supporting logic onto a Field Programmable Gate Array (FPGA)-based co-processor. By leveraging the FPGA's on-chip DSP modules and the high-bandwidth local memory attached to the FPGA, the resultant co-processor can accelerate ML-based methods and outperform state-of-the-art multi-core processors. We use the MrBayes 3 tool as a framework for designing our co-processor. For large datasets, we estimate that our accelerated MrBayes, if run on a current-generation FPGA, achieves a 10x speedup relative to software running on a state-of-the-art server-class microprocessor. The FPGA-based implementation achieves its performance by deeply pipelining the likelihood computations, performing multiple floating-point operations in parallel, and through a natural log approximation that is chosen specifically to leverage a deeply pipelined custom architecture. Heterogeneous computing, which combines general-purpose processors with special-purpose co-processors such as FPGAs and GPUs, is a promising approach for high-performance phylogeny inference as shown by the growing body of literature in this field. FPGAs in particular are well-suited for this task because of their low power consumption as compared to many-core processors and Graphics Processor Units (GPUs).
Development Of A Three-Dimensional Circuit Integration Technology And Computer Architecture

NASA Astrophysics Data System (ADS)

Etchells, R. D.; Grinberg, J.; Nudd, G. R.

1981-12-01

This paper is the first of a series 1,2,3 describing a range of efforts at Hughes Research Laboratories, which are collectively referred to as "Three-Dimensional Microelectronics." The technology being developed is a combination of a unique circuit fabrication/packaging technology and a novel processing architecture. The packaging technology greatly reduces the parasitic impedances associated with signal-routing in complex VLSI structures, while simultaneously allowing circuit densities orders of magnitude higher than the current state-of-the-art. When combined with the 3-D processor architecture, the resulting machine exhibits a one- to two-order of magnitude simultaneous improvement over current state-of-the-art machines in the three areas of processing speed, power consumption, and physical volume. The 3-D architecture is essentially that commonly referred to as a "cellular array", with the ultimate implementation having as many as 512 x 512 processors working in parallel. The three-dimensional nature of the assembled machine arises from the fact that the chips containing the active circuitry of the processor are stacked on top of each other. In this structure, electrical signals are passed vertically through the chips via thermomigrated aluminum feedthroughs. Signals are passed between adjacent chips by micro-interconnects. This discussion presents a broad view of the total effort, as well as a more detailed treatment of the fabrication and packaging technologies themselves. The results of performance simulations of the completed 3-D processor executing a variety of algorithms are also presented. Of particular pertinence to the interests of the focal-plane array community is the simulation of the UNICORNS nonuniformity correction algorithms as executed by the 3-D architecture.
An FPGA computing demo core for space charge simulation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wu, Jinyuan; Huang, Yifei; /Fermilab

2009-01-01

In accelerator physics, space charge simulation requires large amount of computing power. In a particle system, each calculation requires time/resource consuming operations such as multiplications, divisions, and square roots. Because of the flexibility of field programmable gate arrays (FPGAs), we implemented this task with efficient use of the available computing resources and completely eliminated non-calculating operations that are indispensable in regular micro-processors (e.g. instruction fetch, instruction decoding, etc.). We designed and tested a 16-bit demo core for computing Coulomb's force in an Altera Cyclone II FPGA device. To save resources, the inverse square-root cube operation in our design is computedmore » using a memory look-up table addressed with nine to ten most significant non-zero bits. At 200 MHz internal clock, our demo core reaches a throughput of 200 M pairs/s/core, faster than a typical 2 GHz micro-processor by about a factor of 10. Temperature and power consumption of FPGAs were also lower than those of micro-processors. Fast and convenient, FPGAs can serve as alternatives to time-consuming micro-processors for space charge simulation.« less
Sound-field reproduction systems using fixed-directivity loudspeakers.

PubMed

Poletti, M; Fazi, F M; Nelson, P A

2010-06-01

Sound reproduction systems using open arrays of loudspeakers in rooms suffer from degradations due to room reflections. These reflections can be reduced using pre-compensation of the loudspeaker signals, but this requires calibration of the array in the room, and is processor-intensive. This paper examines 3D sound reproduction systems using spherical arrays of fixed-directivity loudspeakers which reduce the sound field radiated outside the array. A generalized form of the simple source formulation and a mode-matching solution are derived for the required loudspeaker weights. The exterior field is derived and expressions for the exterior power and direct to reverberant ratio are derived. The theoretical results and simulations confirm that minimum interference occurs for loudspeakers which have hyper-cardioid polar responses.
Video rate morphological processor based on a redundant number representation

NASA Astrophysics Data System (ADS)

Kuczborski, Wojciech; Attikiouzel, Yianni; Crebbin, Gregory A.

1992-03-01

This paper presents a video rate morphological processor for automated visual inspection of printed circuit boards, integrated circuit masks, and other complex objects. Inspection algorithms are based on gray-scale mathematical morphology. Hardware complexity of the known methods of real-time implementation of gray-scale morphology--the umbra transform and the threshold decomposition--has prompted us to propose a novel technique which applied an arithmetic system without carrying propagation. After considering several arithmetic systems, a redundant number representation has been selected for implementation. Two options are analyzed here. The first is a pure signed digit number representation (SDNR) with the base of 4. The second option is a combination of the base-2 SDNR (to represent gray levels of images) and the conventional twos complement code (to represent gray levels of structuring elements). Operation principle of the morphological processor is based on the concept of the digit level systolic array. Individual processing units and small memory elements create a pipeline. The memory elements store current image windows (kernels). All operation primitives of processing units apply a unified direction of digit processing: most significant digit first (MSDF). The implementation technology is based on the field programmable gate arrays by Xilinx. This paper justified the rationality of a new approach to logic design, which is the decomposition of Boolean functions instead of Boolean minimization.

A programmable computational image sensor for high-speed vision

NASA Astrophysics Data System (ADS)

Yang, Jie; Shi, Cong; Long, Xitian; Wu, Nanjian

2013-08-01

In this paper we present a programmable computational image sensor for high-speed vision. This computational image sensor contains four main blocks: an image pixel array, a massively parallel processing element (PE) array, a row processor (RP) array and a RISC core. The pixel-parallel PE is responsible for transferring, storing and processing image raw data in a SIMD fashion with its own programming language. The RPs are one dimensional array of simplified RISC cores, it can carry out complex arithmetic and logic operations. The PE array and RP array can finish great amount of computation with few instruction cycles and therefore satisfy the low- and middle-level high-speed image processing requirement. The RISC core controls the whole system operation and finishes some high-level image processing algorithms. We utilize a simplified AHB bus as the system bus to connect our major components. Programming language and corresponding tool chain for this computational image sensor are also developed.
Algorithms for solving large sparse systems of simultaneous linear equations on vector processors

NASA Technical Reports Server (NTRS)

David, R. E.

1984-01-01

Very efficient algorithms for solving large sparse systems of simultaneous linear equations have been developed for serial processing computers. These involve a reordering of matrix rows and columns in order to obtain a near triangular pattern of nonzero elements. Then an LU factorization is developed to represent the matrix inverse in terms of a sequence of elementary Gaussian eliminations, or pivots. In this paper it is shown how these algorithms are adapted for efficient implementation on vector processors. Results obtained on the CYBER 200 Model 205 are presented for a series of large test problems which show the comparative advantages of the triangularization and vector processing algorithms.
The SPAR thermal analyzer: Present and future

NASA Astrophysics Data System (ADS)

Marlowe, M. B.; Whetstone, W. D.; Robinson, J. C.

The SPAR thermal analyzer, a system of finite-element processors for performing steady-state and transient thermal analyses, is described. The processors communicate with each other through the SPAR random access data base. As each processor is executed, all pertinent source data is extracted from the data base and results are stored in the data base. Steady state temperature distributions are determined by a direct solution method for linear problems and a modified Newton-Raphson method for nonlinear problems. An explicit and several implicit methods are available for the solution of transient heat transfer problems. Finite element plotting capability is available for model checkout and verification.
The SPAR thermal analyzer: Present and future

NASA Technical Reports Server (NTRS)

Marlowe, M. B.; Whetstone, W. D.; Robinson, J. C.

1982-01-01

The SPAR thermal analyzer, a system of finite-element processors for performing steady-state and transient thermal analyses, is described. The processors communicate with each other through the SPAR random access data base. As each processor is executed, all pertinent source data is extracted from the data base and results are stored in the data base. Steady state temperature distributions are determined by a direct solution method for linear problems and a modified Newton-Raphson method for nonlinear problems. An explicit and several implicit methods are available for the solution of transient heat transfer problems. Finite element plotting capability is available for model checkout and verification.
Parallel discrete event simulation using shared memory

NASA Technical Reports Server (NTRS)

Reed, Daniel A.; Malony, Allen D.; Mccredie, Bradley D.

1988-01-01

With traditional event-list techniques, evaluating a detailed discrete-event simulation-model can often require hours or even days of computation time. By eliminating the event list and maintaining only sufficient synchronization to ensure causality, parallel simulation can potentially provide speedups that are linear in the numbers of processors. A set of shared-memory experiments, using the Chandy-Misra distributed-simulation algorithm, to simulate networks of queues is presented. Parameters of the study include queueing network topology and routing probabilities, number of processors, and assignment of network nodes to processors. These experiments show that Chandy-Misra distributed simulation is a questionable alternative to sequential-simulation of most queueing network models.
Processing of thermionic power on an electrically propelled spacecraft

NASA Technical Reports Server (NTRS)

Macie, T. W.

1973-01-01

A study to define the power processing equipment required between a thermionic reactor and an array of mercury-ion thrusters for a nuclear electric propulsion system is reported. Observations and recommendations that resulted from this study were: (1) the preferred thermionic-fuel-element source voltages are 23 V or higher; (2) transistor characteristics exert a strong effect on power processor mass; (3) the power processor mass could be considerably reduced should the magnetic materials that exhibit low losses at high frequencies, that have a high Curie point, and that can operate at 15 to 20 kG become avaliable; (4) electrical component packaging on the radiator could reduce the area that is sensitive to meteoroid penetration, thereby reducing the meteoroid shielding mass requirement; (5) an experimental model of the power processor design should be built and tested to verify the efficiencies, masses, and all the automatic operational aspects of the design.
Fuzzy logic particle tracking velocimetry

NASA Technical Reports Server (NTRS)

Wernet, Mark P.

1993-01-01

Fuzzy logic has proven to be a simple and robust method for process control. Instead of requiring a complex model of the system, a user defined rule base is used to control the process. In this paper the principles of fuzzy logic control are applied to Particle Tracking Velocimetry (PTV). Two frames of digitally recorded, single exposure particle imagery are used as input. The fuzzy processor uses the local particle displacement information to determine the correct particle tracks. Fuzzy PTV is an improvement over traditional PTV techniques which typically require a sequence (greater than 2) of image frames for accurately tracking particles. The fuzzy processor executes in software on a PC without the use of specialized array or fuzzy logic processors. A pair of sample input images with roughly 300 particle images each, results in more than 200 velocity vectors in under 8 seconds of processing time.
Reconfigurable L-Band Radar

NASA Technical Reports Server (NTRS)

Rincon, Rafael F.

2008-01-01

The reconfigurable L-Band radar is an ongoing development at NASA/GSFC that exploits the capability inherently in phased array radar systems with a state-of-the-art data acquisition and real-time processor in order to enable multi-mode measurement techniques in a single radar architecture. The development leverages on the L-Band Imaging Scatterometer, a radar system designed for the development and testing of new radar techniques; and the custom-built DBSAR processor, a highly reconfigurable, high speed data acquisition and processing system. The radar modes currently implemented include scatterometer, synthetic aperture radar, and altimetry; and plans to add new modes such as radiometry and bi-static GNSS signals are being formulated. This development is aimed at enhancing the radar remote sensing capabilities for airborne and spaceborne applications in support of Earth Science and planetary exploration This paper describes the design of the radar and processor systems, explains the operational modes, and discusses preliminary measurements and future plans.
Comparing an FPGA to a Cell for an Image Processing Application

NASA Astrophysics Data System (ADS)

Rakvic, Ryan N.; Ngo, Hau; Broussard, Randy P.; Ives, Robert W.

2010-12-01

Modern advancements in configurable hardware, most notably Field-Programmable Gate Arrays (FPGAs), have provided an exciting opportunity to discover the parallel nature of modern image processing algorithms. On the other hand, PlayStation3 (PS3) game consoles contain a multicore heterogeneous processor known as the Cell, which is designed to perform complex image processing algorithms at a high performance. In this research project, our aim is to study the differences in performance of a modern image processing algorithm on these two hardware platforms. In particular, Iris Recognition Systems have recently become an attractive identification method because of their extremely high accuracy. Iris matching, a repeatedly executed portion of a modern iris recognition algorithm, is parallelized on an FPGA system and a Cell processor. We demonstrate a 2.5 times speedup of the parallelized algorithm on the FPGA system when compared to a Cell processor-based version.
An optical processor for object recognition and tracking

NASA Technical Reports Server (NTRS)

Sloan, J.; Udomkesmalee, S.

1987-01-01

The design and development of a miniaturized optical processor that performs real time image correlation are described. The optical correlator utilizes the Vander Lugt matched spatial filter technique. The correlation output, a focused beam of light, is imaged onto a CMOS photodetector array. In addition to performing target recognition, the device also tracks the target. The hardware, composed of optical and electro-optical components, occupies only 590 cu cm of volume. A complete correlator system would also include an input imaging lens. This optical processing system is compact, rugged, requires only 3.5 watts of operating power, and weighs less than 3 kg. It represents a major achievement in miniaturizing optical processors. When considered as a special-purpose processing unit, it is an attractive alternative to conventional digital image recognition processing. It is conceivable that the combined technology of both optical and ditital processing could result in a very advanced robot vision system.
Resolution of Port/Starboard Ambiguity Using a Linear Array of Triplets and a Twin-Line Planar Array

DTIC Science & Technology

2016-06-01

STARBOARD AMBIGUITY USING A LINEAR ARRAY OF TRIPLETS AND A TWIN- LINE PLANAR ARRAY by Stilson Veras Cardoso June 2016 Thesis Advisor...OF TRIPLETS AND A TWIN-LINE PLANAR ARRAY 5. FUNDING NUMBERS 6. AUTHOR(S) Stilson Veras Cardoso 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES...A LINEAR ARRAY OF TRIPLETS AND A TWIN-LINE PLANAR ARRAY Stilson Veras Cardoso Civilian, Brazilian Navy B.S., University of Brasília, 1993
DOE Office of Scientific and Technical Information (OSTI.GOV)

Learn, Mark Walter

Sandia National Laboratories is currently developing new processing and data communication architectures for use in future satellite payloads. These architectures will leverage the flexibility and performance of state-of-the-art static-random-access-memory-based Field Programmable Gate Arrays (FPGAs). One such FPGA is the radiation-hardened version of the Virtex-5 being developed by Xilinx. However, not all features of this FPGA are being radiation-hardened by design and could still be susceptible to on-orbit upsets. One such feature is the embedded hard-core PPC440 processor. Since this processor is implemented in the FPGA as a hard-core, traditional mitigation approaches such as Triple Modular Redundancy (TMR) are not availablemore » to improve the processor's on-orbit reliability. The goal of this work is to investigate techniques that can help mitigate the embedded hard-core PPC440 processor within the Virtex-5 FPGA other than TMR. Implementing various mitigation schemes reliably within the PPC440 offers a powerful reconfigurable computing resource to these node-based processing architectures. This document summarizes the work done on the cache mitigation scheme for the embedded hard-core PPC440 processor within the Virtex-5 FPGAs, and describes in detail the design of the cache mitigation scheme and the testing conducted at the radiation effects facility on the Texas A&M campus.« less
Development of a new signal processor for tetralateral position sensitive detector based on single-chip microcomputer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Huang Meizhen; Shi Longzhao; Wang Yuxing

2006-08-15

An inherently nonlinear relation between the output current of the tetralateral position sensitive detector (PSD) and the position of the incident light spot has been found theoretically. Based on single-chip microcomputer and the theoretical relation between output current and position, a new signal processor capable of correcting nonlinearity and reducing position measurement deviation of tetralateral PSD was developed. A tetralateral PSD (S1200, 13x13 mm{sup 2}, Hamamatsu Photonics K.K.) was measured with the new signal processor, a linear relation between the output position of the PSD, and the incident position of the light spot was obtained. In the 60% range ofmore » a 13x13 mm{sup 2} active area, the position nonlinearity (rms) was 0.15% and the position measurement deviation (rms) was {+-}20 {mu}m. Compared with traditional analog signal processor, the new signal processor is of better compatibility, lower cost, higher precision, and easier to be interfaced.« less
Development of a new signal processor for tetralateral position sensitive detector based on single-chip microcomputer

NASA Astrophysics Data System (ADS)

Huang, Mei-Zhen; Shi, Long-Zhao; Wang, Yu-Xing; Ni, Yi; Li, Zhen-Qing; Ding, Hai-Feng

2006-08-01

An inherently nonlinear relation between the output current of the tetralateral position sensitive detector (PSD) and the position of the incident light spot has been found theoretically. Based on single-chip microcomputer and the theoretical relation between output current and position, a new signal processor capable of correcting nonlinearity and reducing position measurement deviation of tetralateral PSD was developed. A tetralateral PSD (S1200, 13×13mm2, Hamamatsu Photonics K.K.) was measured with the new signal processor, a linear relation between the output position of the PSD, and the incident position of the light spot was obtained. In the 60% range of a 13×13mm2 active area, the position nonlinearity (rms) was 0.15% and the position measurement deviation (rms) was ±20μm. Compared with traditional analog signal processor, the new signal processor is of better compatibility, lower cost, higher precision, and easier to be interfaced.
Superconducting quantum circuits at the surface code threshold for fault tolerance.

PubMed

Barends, R; Kelly, J; Megrant, A; Veitia, A; Sank, D; Jeffrey, E; White, T C; Mutus, J; Fowler, A G; Campbell, B; Chen, Y; Chen, Z; Chiaro, B; Dunsworth, A; Neill, C; O'Malley, P; Roushan, P; Vainsencher, A; Wenner, J; Korotkov, A N; Cleland, A N; Martinis, John M

2014-04-24

A quantum computer can solve hard problems, such as prime factoring, database searching and quantum simulation, at the cost of needing to protect fragile quantum states from error. Quantum error correction provides this protection by distributing a logical state among many physical quantum bits (qubits) by means of quantum entanglement. Superconductivity is a useful phenomenon in this regard, because it allows the construction of large quantum circuits and is compatible with microfabrication. For superconducting qubits, the surface code approach to quantum computing is a natural choice for error correction, because it uses only nearest-neighbour coupling and rapidly cycled entangling gates. The gate fidelity requirements are modest: the per-step fidelity threshold is only about 99 per cent. Here we demonstrate a universal set of logic gates in a superconducting multi-qubit processor, achieving an average single-qubit gate fidelity of 99.92 per cent and a two-qubit gate fidelity of up to 99.4 per cent. This places Josephson quantum computing at the fault-tolerance threshold for surface code error correction. Our quantum processor is a first step towards the surface code, using five qubits arranged in a linear array with nearest-neighbour coupling. As a further demonstration, we construct a five-qubit Greenberger-Horne-Zeilinger state using the complete circuit and full set of gates. The results demonstrate that Josephson quantum computing is a high-fidelity technology, with a clear path to scaling up to large-scale, fault-tolerant quantum circuits.
Iterative current mode per pixel ADC for 3D SoftChip implementation in CMOS

NASA Astrophysics Data System (ADS)

Lachowicz, Stefan W.; Rassau, Alexander; Lee, Seung-Minh; Eshraghian, Kamran; Lee, Mike M.

2003-04-01

Mobile multimedia communication has rapidly become a significant area of research and development constantly challenging boundaries on a variety of technological fronts. The processing requirements for the capture, conversion, compression, decompression, enhancement, display, etc. of increasingly higher quality multimedia content places heavy demands even on current ULSI (ultra large scale integration) systems, particularly for mobile applications where area and power are primary considerations. The ADC presented in this paper is designed for a vertically integrated (3D) system comprising two distinct layers bonded together using Indium bump technology. The top layer is a CMOS imaging array containing analogue-to-digital converters, and a buffer memory. The bottom layer takes the form of a configurable array processor (CAP), a highly parallel array of soft programmable processors capable of carrying out complex processing tasks directly on data stored in the top plane. This paper presents a ADC scheme for the image capture plane. The analogue photocurrent or sampled voltage is transferred to the ADC via a column or a column/row bus. In the proposed system, an array of analogue-to-digital converters is distributed, so that a one-bit cell is associated with one sensor. The analogue-to-digital converters are algorithmic current-mode converters. Eight such cells are cascaded to form an 8-bit converter. Additionally, each photo-sensor is equipped with a current memory cell, and multiple conversions are performed with scaled values of the photocurrent for colour processing.
Hyperspectral Microwave Atmospheric Sounder (HyMas) - New Capability in the CoSMIR-CoSSIR Scanhead

NASA Technical Reports Server (NTRS)

Hilliard, L. M.; Racette, P. E.; Blackwell, W.; Galbraith, C.; Thompson, E.

2015-01-01

Lincoln Laboratory and NASA's Goddard Space Flight Center have teamed to re-use an existing instrument platform, the CoSMIRCoSSIR system for atmospheric sounding, to develop a new capability in hyperspectral filtering, data collection, and display. The volume of the scanhead accomodated an intermediate frequency processor(IFP), that provides the filtering and digitization of the raw data and the interoperable remote component (IRC) adapted to CoSMIR, CoSSIR, and HyMAS that stores and archives the data with time tagged calibration and navigation data.The first element of the work is the demonstration of a hyperspectral microwave receiver subsystem that was recently shown using a comprehensive simulation study to yield performance that substantially exceeds current state-of-the-art. Hyperspectral microwave sounders with 100 channels offer temperature and humidity sounding improvements similar to those obtained when infrared sensors became hyperspectral, but with the relative insensitivity to clouds that characterizes microwave sensors. Hyperspectral microwave operation is achieved using independent RF antennareceiver arrays that sample the same areavolume of the Earths surfaceatmosphere at slightly different frequencies and therefore synthesize a set of dense, finely spaced vertical weighting functions. The second, enabling element of the proposal is the development of a compact 52-channel Intermediate Frequency processor module. A principal challenge in the development of a hyperspectral microwave system is the size of the IF filter bank required for channelization. Large bandwidths are simultaneously processed, thus complicating the use of digital back-ends with associated high complexities, costs, and power requirements. Our approach involves passive filters implemented using low-temperature co-fired ceramic (LTCC) technology to achieve an ultra-compact module that can be easily integrated with existing RF front-end technology. This IF processor is universally applicable to other microwave sensing missions requiring compact IF spectrometry.The data include 52 operational channels with low IF module volume (100cm3) and mass (300g) and linearity better than 0.3 over a 330K dynamic range.
Parallel spatial direct numerical simulations on the Intel iPSC/860 hypercube

NASA Technical Reports Server (NTRS)

Joslin, Ronald D.; Zubair, Mohammad

1993-01-01

The implementation and performance of a parallel spatial direct numerical simulation (PSDNS) approach on the Intel iPSC/860 hypercube is documented. The direct numerical simulation approach is used to compute spatially evolving disturbances associated with the laminar-to-turbulent transition in boundary-layer flows. The feasibility of using the PSDNS on the hypercube to perform transition studies is examined. The results indicate that the direct numerical simulation approach can effectively be parallelized on a distributed-memory parallel machine. By increasing the number of processors nearly ideal linear speedups are achieved with nonoptimized routines; slower than linear speedups are achieved with optimized (machine dependent library) routines. This slower than linear speedup results because the Fast Fourier Transform (FFT) routine dominates the computational cost and because the routine indicates less than ideal speedups. However with the machine-dependent routines the total computational cost decreases by a factor of 4 to 5 compared with standard FORTRAN routines. The computational cost increases linearly with spanwise wall-normal and streamwise grid refinements. The hypercube with 32 processors was estimated to require approximately twice the amount of Cray supercomputer single processor time to complete a comparable simulation; however it is estimated that a subgrid-scale model which reduces the required number of grid points and becomes a large-eddy simulation (PSLES) would reduce the computational cost and memory requirements by a factor of 10 over the PSDNS. This PSLES implementation would enable transition simulations on the hypercube at a reasonable computational cost.
Temperature-Adaptive Circuits on Reconfigurable Analog Arrays

NASA Technical Reports Server (NTRS)

Stoica, Adrian; Zebulum, Ricardo S.; Keymeulen, Didier; Ramesham, Rajeshuni; Neff, Joseph; Katkoori, Srinivas

2006-01-01

Demonstration of a self-reconfigurable Integrated Circuit (IC) that would operate under extreme temperature (-180 C and 120 C) and radiation (300krad), without the protection of thermal controls and radiation shields. Self-Reconfigurable Electronics platform: a) Evolutionary Processor (EP) to run reconfiguration mechanism; b) Reconfigurable chip (FPGA, FPAA, etc).
OAO-3 end of mission power subsystem evaluation

NASA Technical Reports Server (NTRS)

Tasevoli, M.

1982-01-01

End of mission tests were performed on the OAO-3 power subsystem in three component areas: solar array, nickel-cadmium batteries and the On-Board Processor (OBP) power boost operation. Solar array evaluation consisted of analyzing array performance characteristics and comparing them to earlier flight data. Measured solar array degradation of 14.1 to 17.7% after 8 1/3 years is in good agreement with theortical radiation damage losses. Battery discharge characteristics were compared to results of laboratory life cycle tests performed on similar cells. Comparison of cell voltage profils reveals close correlation and confirms the validity of real time life cycle simulation. The successful operation of the system in the OBP/power boost regulation mode demonstrates the excellent life, reliability and greater system utilization of power subsystems using maximum power trackers.

The 7.5 kW solar array simulator

NASA Technical Reports Server (NTRS)

Robson, R. R.

1975-01-01

A high power solar array simulator capable of providing the input power to simultaneously operate two 30 cm diameter ion thruster power processors was designed, fabricated, and tested. The maximum power point is set to between 150 and 7500 watts representing an open circuit voltage from 50 to 300 volts and a short circuit current from 4 to 36 amps. Illuminated solar cells are used as the control element to provide a true solar cell characteristic and permit the option of simulating changes in this characteristic due to variations in solar intensity and/or temperature of the solar array. This is accomplished by changing the illumination and/or temperature of the control cells. The response of the output to a step change in load closely approximates that of an actual solar array.
SoAx: A generic C++ Structure of Arrays for handling particles in HPC codes

NASA Astrophysics Data System (ADS)

Homann, Holger; Laenen, Francois

2018-03-01

The numerical study of physical problems often require integrating the dynamics of a large number of particles evolving according to a given set of equations. Particles are characterized by the information they are carrying such as an identity, a position other. There are generally speaking two different possibilities for handling particles in high performance computing (HPC) codes. The concept of an Array of Structures (AoS) is in the spirit of the object-oriented programming (OOP) paradigm in that the particle information is implemented as a structure. Here, an object (realization of the structure) represents one particle and a set of many particles is stored in an array. In contrast, using the concept of a Structure of Arrays (SoA), a single structure holds several arrays each representing one property (such as the identity) of the whole set of particles. The AoS approach is often implemented in HPC codes due to its handiness and flexibility. For a class of problems, however, it is known that the performance of SoA is much better than that of AoS. We confirm this observation for our particle problem. Using a benchmark we show that on modern Intel Xeon processors the SoA implementation is typically several times faster than the AoS one. On Intel's MIC co-processors the performance gap even attains a factor of ten. The same is true for GPU computing, using both computational and multi-purpose GPUs. Combining performance and handiness, we present the library SoAx that has optimal performance (on CPUs, MICs, and GPUs) while providing the same handiness as AoS. For this, SoAx uses modern C++ design techniques such template meta programming that allows to automatically generate code for user defined heterogeneous data structures.
Autothermal and partial oxidation reformer-based fuel processor, method for improving catalyst function in autothermal and partial oxidation reformer-based processors

DOEpatents

Ahmed, Shabbir; Papadias, Dionissios D.; Lee, Sheldon H. D.; Ahluwalia, Rajesh K.

2013-01-08

The invention provides a fuel processor comprising a linear flow structure having an upstream portion and a downstream portion; a first catalyst supported at the upstream portion; and a second catalyst supported at the downstream portion, wherein the first catalyst is in fluid communication with the second catalyst. Also provided is a method for reforming fuel, the method comprising contacting the fuel to an oxidation catalyst so as to partially oxidize the fuel and generate heat; warming incoming fuel with the heat while simultaneously warming a reforming catalyst with the heat; and reacting the partially oxidized fuel with steam using the reforming catalyst.
Parallel algorithms for boundary value problems

NASA Technical Reports Server (NTRS)

Lin, Avi

1990-01-01

A general approach to solve boundary value problems numerically in a parallel environment is discussed. The basic algorithm consists of two steps: the local step where all the P available processors work in parallel, and the global step where one processor solves a tridiagonal linear system of the order P. The main advantages of this approach are two fold. First, this suggested approach is very flexible, especially in the local step and thus the algorithm can be used with any number of processors and with any of the SIMD or MIMD machines. Secondly, the communication complexity is very small and thus can be used as easily with shared memory machines. Several examples for using this strategy are discussed.
Analysis of DuPont and Kodak duplicating films and chemistries in a Fultron spray processor

NASA Technical Reports Server (NTRS)

Weinstein, M. S.

1972-01-01

A test program was conducted with duPont duplicating film type SR 112 and SCOLOR developer and Kodak duplicating film types 2430, 2422, and FE 2628 (SO-467) and MX-641 developer to determine sensitometric and image quality characteristics of these materials when used with a fultron spray processor. The test results show that the SCOLOR developer foams excessively in the fultron processor when used with or without the addition of an antifoaming agent. The Kodak type FE 2628 film with MX-641 chemistry had the longest linear Log E range at a 1.0 gamma. Sensitometric curves and granularity traces for all film process combinations tested are included.
Silicon nanodisk array with a fin field-effect transistor for time-domain weighted sum calculation toward massively parallel spiking neural networks

NASA Astrophysics Data System (ADS)

Tohara, Takashi; Liang, Haichao; Tanaka, Hirofumi; Igarashi, Makoto; Samukawa, Seiji; Endo, Kazuhiko; Takahashi, Yasuo; Morie, Takashi

2016-03-01

A nanodisk array connected with a fin field-effect transistor is fabricated and analyzed for spiking neural network applications. This nanodevice performs weighted sums in the time domain using rising slopes of responses triggered by input spike pulses. The nanodisk arrays, which act as a resistance of several giga-ohms, are fabricated using a self-assembly bio-nano-template technique. Weighted sums are achieved with an energy dissipation on the order of 1 fJ, where the number of inputs can be more than one hundred. This amount of energy is several orders of magnitude lower than that of conventional digital processors.
Advanced satellite communication system

NASA Technical Reports Server (NTRS)

Staples, Edward J.; Lie, Sen

1992-01-01

The objective of this research program was to develop an innovative advanced satellite receiver/demodulator utilizing surface acoustic wave (SAW) chirp transform processor and coherent BPSK demodulation. The algorithm of this SAW chirp Fourier transformer is of the Convolve - Multiply - Convolve (CMC) type, utilizing off-the-shelf reflective array compressor (RAC) chirp filters. This satellite receiver, if fully developed, was intended to be used as an on-board multichannel communications repeater. The Advanced Communications Receiver consists of four units: (1) CMC processor, (2) single sideband modulator, (3) demodulator, and (4) chirp waveform generator and individual channel processors. The input signal is composed of multiple user transmission frequencies operating independently from remotely located ground terminals. This signal is Fourier transformed by the CMC Processor into a unique time slot for each user frequency. The CMC processor is driven by a waveform generator through a single sideband (SSB) modulator. The output of the coherent demodulator is composed of positive and negative pulses, which are the envelopes of the chirp transform processor output. These pulses correspond to the data symbols. Following the demodulator, a logic circuit reconstructs the pulses into data, which are subsequently differentially decoded to form the transmitted data. The coherent demodulation and detection of BPSK signals derived from a CMC chirp transform processor were experimentally demonstrated and bit error rate (BER) testing was performed. To assess the feasibility of such advanced receiver, the results were compared with the theoretical analysis and plotted for an average BER as a function of signal-to-noise ratio. Another goal of this SBIR program was the development of a commercial product. The commercial product developed was an arbitrary waveform generator. The successful sales have begun with the delivery of the first arbitrary waveform generator.
Detection and imaging of moving objects with SAR by a joint space-time-frequency processing

NASA Astrophysics Data System (ADS)

Barbarossa, Sergio; Farina, Alfonso

This paper proposes a joint spacetime-frequency processing scheme for the detection and imaging of moving targets by Synthetic Aperture Radars (SAR). The method is based on the availability of an array antenna. The signals received by the array elements are combined, in a spacetime processor, to cancel the clutter. Then, they are analyzed in the time-frequency domain, by computing their Wigner-Ville Distribution (WVD), in order to estimate the instantaneous frequency, to be used for the successive phase compensation, necessary to produce a high resolution image.
System and method for high power diode based additive manufacturing

DOEpatents

El-Dasher, Bassem S.; Bayramian, Andrew; Demuth, James A.; Farmer, Joseph C.; Torres, Sharon G.

2018-01-02

A system is disclosed for performing an Additive Manufacturing (AM) fabrication process on a powdered material forming a substrate. The system may make use of a diode array for generating an optical signal sufficient to melt a powdered material of the substrate. A mask may be used for preventing a first predetermined portion of the optical signal from reaching the substrate, while allowing a second predetermined portion to reach the substrate. At least one processor may be used for controlling an output of the diode array.
System and method for high power diode based additive manufacturing

DOEpatents

El-Dasher, Bassem S.; Bayramian, Andrew; Demuth, James A.; Farmer, Joseph C.; Torres, Sharon G.

2016-04-12

A system is disclosed for performing an Additive Manufacturing (AM) fabrication process on a powdered material forming a substrate. The system may make use of a diode array for generating an optical signal sufficient to melt a powdered material of the substrate. A mask may be used for preventing a first predetermined portion of the optical signal from reaching the substrate, while allowing a second predetermined portion to reach the substrate. At least one processor may be used for controlling an output of the diode array.
Domain decomposition methods for the parallel computation of reacting flows

NASA Technical Reports Server (NTRS)

Keyes, David E.

1988-01-01

Domain decomposition is a natural route to parallel computing for partial differential equation solvers. Subdomains of which the original domain of definition is comprised are assigned to independent processors at the price of periodic coordination between processors to compute global parameters and maintain the requisite degree of continuity of the solution at the subdomain interfaces. In the domain-decomposed solution of steady multidimensional systems of PDEs by finite difference methods using a pseudo-transient version of Newton iteration, the only portion of the computation which generally stands in the way of efficient parallelization is the solution of the large, sparse linear systems arising at each Newton step. For some Jacobian matrices drawn from an actual two-dimensional reacting flow problem, comparisons are made between relaxation-based linear solvers and also preconditioned iterative methods of Conjugate Gradient and Chebyshev type, focusing attention on both iteration count and global inner product count. The generalized minimum residual method with block-ILU preconditioning is judged the best serial method among those considered, and parallel numerical experiments on the Encore Multimax demonstrate for it approximately 10-fold speedup on 16 processors.
Autonomous Telemetry Collection for Single-Processor Small Satellites

NASA Technical Reports Server (NTRS)

Speer, Dave

2003-01-01

For the Space Technology 5 mission, which is being developed under NASA's New Millennium Program, a single spacecraft processor will be required to do on-board real-time computations and operations associated with attitude control, up-link and down-link communications, science data processing, solid-state recorder management, power switching and battery charge management, experiment data collection, health and status data collection, etc. Much of the health and status information is in analog form, and each of the analog signals must be routed to the input of an analog-to-digital converter, converted to digital form, and then stored in memory. If the micro-operations of the analog data collection process are implemented in software, the processor may use up a lot of time either waiting for the analog signal to settle, waiting for the analog-to-digital conversion to complete, or servicing a large number of high frequency interrupts. In order to off-load a very busy processor, the collection and digitization of all analog spacecraft health and status data will be done autonomously by a field-programmable gate array that can configure the analog signal chain, control the analog-to-digital converter, and store the converted data in memory.
Breadboard linear array scan imager using LSI solid-state technology

NASA Technical Reports Server (NTRS)

Tracy, R. A.; Brennan, J. A.; Frankel, D. G.; Noll, R. E.

1976-01-01

The performance of large scale integration photodiode arrays in a linear array scan (pushbroom) breadboard was evaluated for application to multispectral remote sensing of the earth's resources. The technical approach, implementation, and test results of the program are described. Several self scanned linear array visible photodetector focal plane arrays were fabricated and evaluated in an optical bench configuration. A 1728-detector array operating in four bands (0.5 - 1.1 micrometer) was evaluated for noise, spectral response, dynamic range, crosstalk, MTF, noise equivalent irradiance, linearity, and image quality. Other results include image artifact data, temporal characteristics, radiometric accuracy, calibration experience, chip alignment, and array fabrication experience. Special studies and experimentation were included in long array fabrication and real-time image processing for low-cost ground stations, including the use of computer image processing. High quality images were produced and all objectives of the program were attained.
System and method for 100% moisture and basis weight measurement of moving paper

DOEpatents

Hernandez, Jose E.; Koo, Jackson C.

2002-01-01

A system for characterizing a set of properties for a moving substance are disclosed. The system includes: a first near-infrared linear array; a second near-infrared linear array; a first filter transparent to a first absorption wavelength emitted by the moving substance and juxtaposed between the substance and the first array; a second filter blocking the first absorption wavelength emitted by the moving substance and juxtaposed between the substance and the second array; and a computational device for characterizing data from the arrays into information on a property of the substance. The method includes the steps of: filtering out a first absorption wavelength emitted by a substance; monitoring the first absorption wavelength with a first near-infrared linear array; blocking the first wavelength from reaching a second near-infrared linear array; and characterizing data from the arrays into information on a property of the substance.
Molecular Mechanics with an Array Processor.

DTIC Science & Technology

1982-06-01

34 to be submritted. 40 B. W. Kernrihan and D M Ritchie, The CPm guniw.g Language, Prentice- Hall. Eaglewood Cliffs, New Jersey, 1978. 60 D. J. Adams , in...1400 Washington Avenue Ban e o Albany, New York 12203 La J la, California 92093 Dr. Rank Loos Professor C. A. Ansell Latuna Research Laboratory
Star sensing for an earth imaging sensor

NASA Technical Reports Server (NTRS)

Ellis, Kenneth K. (Inventor); Griffith, Paul C. (Inventor)

2012-01-01

A star sensor includes (a) a scan mirror for scanning at least one star; (b) a detector array, coupled to the scan mirror, for detecting the one star; and (c) a processor, coupled to the detector array. The processor includes a first filter configured to reduce noise spikes in the detected one star, and provide a detection mask of filtered data. Also included is a second filter configured to reduce non-contiguous samples in the detection mask. A centroid calculator is included to determine a location of the one star, after the first and second filtering. The first filter includes a median filter, followed by an averaging filter, both configured to filter the one star in an along-scan direction of the scan mirror. The first filter includes another median filter, which is configured to filter the detected one star in the cross-scan direction of the scan mirror. An adder is included to subtract (a) output data from the other median filter from (b) output data from the averaging filter and provide filtered star data to the second filter.
High-performance computing for airborne applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Quinn, Heather M; Manuzzato, Andrea; Fairbanks, Tom

2010-06-28

Recently, there has been attempts to move common satellite tasks to unmanned aerial vehicles (UAVs). UAVs are significantly cheaper to buy than satellites and easier to deploy on an as-needed basis. The more benign radiation environment also allows for an aggressive adoption of state-of-the-art commercial computational devices, which increases the amount of data that can be collected. There are a number of commercial computing devices currently available that are well-suited to high-performance computing. These devices range from specialized computational devices, such as field-programmable gate arrays (FPGAs) and digital signal processors (DSPs), to traditional computing platforms, such as microprocessors. Even thoughmore » the radiation environment is relatively benign, these devices could be susceptible to single-event effects. In this paper, we will present radiation data for high-performance computing devices in a accelerated neutron environment. These devices include a multi-core digital signal processor, two field-programmable gate arrays, and a microprocessor. From these results, we found that all of these devices are suitable for many airplane environments without reliability problems.« less
Imer-product array processor for retrieval of stored images represented by bipolar binary (+1,-1) pixels using partial input trinary pixels represented by (+1,-1)

NASA Technical Reports Server (NTRS)

Liu, Hua-Kuang (Inventor); Awwal, Abdul A. S. (Inventor); Karim, Mohammad A. (Inventor)

1993-01-01

An inner-product array processor is provided with thresholding of the inner product during each iteration to make more significant the inner product employed in estimating a vector to be used as the input vector for the next iteration. While stored vectors and estimated vectors are represented in bipolar binary (1,-1), only those elements of an initial partial input vector that are believed to be common with those of a stored vector are represented in bipolar binary; the remaining elements of a partial input vector are set to 0. This mode of representation, in which the known elements of a partial input vector are in bipolar binary form and the remaining elements are set equal to 0, is referred to as trinary representation. The initial inner products corresponding to the partial input vector will then be equal to the number of known elements. Inner-product thresholding is applied to accelerate convergence and to avoid convergence to a negative input product.
Towards implementation of cellular automata in Microbial Fuel Cells.

PubMed

Tsompanas, Michail-Antisthenis I; Adamatzky, Andrew; Sirakoulis, Georgios Ch; Greenman, John; Ieropoulos, Ioannis

2017-01-01

The Microbial Fuel Cell (MFC) is a bio-electrochemical transducer converting waste products into electricity using microbial communities. Cellular Automaton (CA) is a uniform array of finite-state machines that update their states in discrete time depending on states of their closest neighbors by the same rule. Arrays of MFCs could, in principle, act as massive-parallel computing devices with local connectivity between elementary processors. We provide a theoretical design of such a parallel processor by implementing CA in MFCs. We have chosen Conway's Game of Life as the 'benchmark' CA because this is the most popular CA which also exhibits an enormously rich spectrum of patterns. Each cell of the Game of Life CA is realized using two MFCs. The MFCs are linked electrically and hydraulically. The model is verified via simulation of an electrical circuit demonstrating equivalent behaviours. The design is a first step towards future implementations of fully autonomous biological computing devices with massive parallelism. The energy independence of such devices counteracts their somewhat slow transitions-compared to silicon circuitry-between the different states during computation.
Towards implementation of cellular automata in Microbial Fuel Cells

PubMed Central

Adamatzky, Andrew; Sirakoulis, Georgios Ch.; Greenman, John; Ieropoulos, Ioannis

2017-01-01

The Microbial Fuel Cell (MFC) is a bio-electrochemical transducer converting waste products into electricity using microbial communities. Cellular Automaton (CA) is a uniform array of finite-state machines that update their states in discrete time depending on states of their closest neighbors by the same rule. Arrays of MFCs could, in principle, act as massive-parallel computing devices with local connectivity between elementary processors. We provide a theoretical design of such a parallel processor by implementing CA in MFCs. We have chosen Conway’s Game of Life as the ‘benchmark’ CA because this is the most popular CA which also exhibits an enormously rich spectrum of patterns. Each cell of the Game of Life CA is realized using two MFCs. The MFCs are linked electrically and hydraulically. The model is verified via simulation of an electrical circuit demonstrating equivalent behaviours. The design is a first step towards future implementations of fully autonomous biological computing devices with massive parallelism. The energy independence of such devices counteracts their somewhat slow transitions—compared to silicon circuitry—between the different states during computation. PMID:28498871

Development of a multikilowatt ion thruster power processor

NASA Technical Reports Server (NTRS)

Schoenfeld, A. D.; Goldin, D. S.; Biess, J. J.

1972-01-01

A feasibility study was made of the application of silicon-controlled, rectifier series, resonant inverter, power conditioning technology to electric propulsion power processing operating from a 200 to 400 Vdc solar array bus. A power system block diagram was generated to meet the electrical requirements of a 20 CM hollow cathode, mercury bombardment, ion engine. The SCR series resonant inverter was developed as a primary means of power switching and conversion, and the analog signal-to-discrete-time-interval converter control system was applied to achieve good regulation. A complete breadboard was designed, fabricated, and tested with a resistive load bank, and critical power processor areas relating to efficiency, weight, and part count were identified.
Photographic film image enhancement

NASA Technical Reports Server (NTRS)

Horner, J. L.

1975-01-01

A series of experiments were undertaken to assess the feasibility of defogging color film by the techniques of optical spatial filtering. A coherent optical processor was built using red, blue, and green laser light input and specially designed Fourier transformation lenses. An array of spatial filters was fabricated on black and white emulsion slides using the coherent optical processor. The technique was first applied to laboratory white light fogged film, and the results were successful. However, when the same technique was applied to some original Apollo X radiation fogged color negatives, the results showed no similar restoration. Examples of each experiment are presented and possible reasons for the lack of restoration in the Apollo films are discussed.
Compression of CCD raw images for digital still cameras

NASA Astrophysics Data System (ADS)

Sriram, Parthasarathy; Sudharsanan, Subramania

2005-03-01

Lossless compression of raw CCD images captured using color filter arrays has several benefits. The benefits include improved storage capacity, reduced memory bandwidth, and lower power consumption for digital still camera processors. The paper discusses the benefits in detail and proposes the use of a computationally efficient block adaptive scheme for lossless compression. Experimental results are provided that indicate that the scheme performs well for CCD raw images attaining compression factors of more than two. The block adaptive method also compares favorably with JPEG-LS. A discussion is provided indicating how the proposed lossless coding scheme can be incorporated into digital still camera processors enabling lower memory bandwidth and storage requirements.
Advanced Avionics and Processor Systems for a Flexible Space Exploration Architecture

NASA Technical Reports Server (NTRS)

Keys, Andrew S.; Adams, James H.; Smith, Leigh M.; Johnson, Michael A.; Cressler, John D.

2010-01-01

The Advanced Avionics and Processor Systems (AAPS) project, formerly known as the Radiation Hardened Electronics for Space Environments (RHESE) project, endeavors to develop advanced avionic and processor technologies anticipated to be used by NASA s currently evolving space exploration architectures. The AAPS project is a part of the Exploration Technology Development Program, which funds an entire suite of technologies that are aimed at enabling NASA s ability to explore beyond low earth orbit. NASA s Marshall Space Flight Center (MSFC) manages the AAPS project. AAPS uses a broad-scoped approach to developing avionic and processor systems. Investment areas include advanced electronic designs and technologies capable of providing environmental hardness, reconfigurable computing techniques, software tools for radiation effects assessment, and radiation environment modeling tools. Near-term emphasis within the multiple AAPS tasks focuses on developing prototype components using semiconductor processes and materials (such as Silicon-Germanium (SiGe)) to enhance a device s tolerance to radiation events and low temperature environments. As the SiGe technology will culminate in a delivered prototype this fiscal year, the project emphasis shifts its focus to developing low-power, high efficiency total processor hardening techniques. In addition to processor development, the project endeavors to demonstrate techniques applicable to reconfigurable computing and partially reconfigurable Field Programmable Gate Arrays (FPGAs). This capability enables avionic architectures the ability to develop FPGA-based, radiation tolerant processor boards that can serve in multiple physical locations throughout the spacecraft and perform multiple functions during the course of the mission. The individual tasks that comprise AAPS are diverse, yet united in the common endeavor to develop electronics capable of operating within the harsh environment of space. Specifically, the AAPS tasks for the Federal fiscal year of 2010 are: Silicon-Germanium (SiGe) Integrated Electronics for Extreme Environments, Modeling of Radiation Effects on Electronics, Radiation Hardened High Performance Processors (HPP), and and Reconfigurable Computing.
Astrophysical N-body Simulations Using Hierarchical Tree Data Structures

NASA Astrophysics Data System (ADS)

Warren, M. S.; Salmon, J. K.

The authors report on recent large astrophysical N-body simulations executed on the Intel Touchstone Delta system. They review the astrophysical motivation and the numerical techniques and discuss steps taken to parallelize these simulations. The methods scale as O(N log N), for large values of N, and also scale linearly with the number of processors. The performance sustained for a duration of 67 h, was between 5.1 and 5.4 Gflop/s on a 512-processor system.
Underwater Threat Source Localization: Processing Sensor Network TDOAs with a Terascale Optical Core Device

DOE Office of Scientific and Technical Information (OSTI.GOV)

Barhen, Jacob; Imam, Neena

2007-01-01

Revolutionary computing technologies are defined in terms of technological breakthroughs, which leapfrog over near-term projected advances in conventional hardware and software to produce paradigm shifts in computational science. For underwater threat source localization using information provided by a dynamical sensor network, one of the most promising computational advances builds upon the emergence of digital optical-core devices. In this article, we present initial results of sensor network calculations that focus on the concept of signal wavefront time-difference-of-arrival (TDOA). The corresponding algorithms are implemented on the EnLight processing platform recently introduced by Lenslet Laboratories. This tera-scale digital optical core processor is optimizedmore » for array operations, which it performs in a fixed-point-arithmetic architecture. Our results (i) illustrate the ability to reach the required accuracy in the TDOA computation, and (ii) demonstrate that a considerable speed-up can be achieved when using the EnLight 64a prototype processor as compared to a dual Intel XeonTM processor.« less
Processor and method for developing a set of admissible fixture designs for a workpiece

DOEpatents

Brost, R.C.; Goldberg, K.Y.; Wallack, A.S.; Canny, J.

1996-08-13

A fixture process and method is provided for developing a complete set of all admissible fixture designs for a workpiece which prevents the workpiece from translating or rotating. The fixture processor generates the set of all admissible designs based on geometric access constraints and expected applied forces on the workpiece. For instance, the fixture processor may generate a set of admissible fixture designs for first, second and third locators placed in an array of holes on a fixture plate and a translating clamp attached to the fixture plate for contacting the workpiece. In another instance, a fixture vice is used in which first, second, third and fourth locators are used and first and second fixture jaws are tightened to secure the workpiece. The fixture process also ranks the set of admissible fixture designs according to a predetermined quality metric so that the optimal fixture design for the desired purpose may be identified from the set of all admissible fixture designs. 27 figs.
A Real-Time Marker-Based Visual Sensor Based on a FPGA and a Soft Core Processor

PubMed Central

Tayara, Hilal; Ham, Woonchul; Chong, Kil To

2016-01-01

This paper introduces a real-time marker-based visual sensor architecture for mobile robot localization and navigation. A hardware acceleration architecture for post video processing system was implemented on a field-programmable gate array (FPGA). The pose calculation algorithm was implemented in a System on Chip (SoC) with an Altera Nios II soft-core processor. For every frame, single pass image segmentation and Feature Accelerated Segment Test (FAST) corner detection were used for extracting the predefined markers with known geometries in FPGA. Coplanar PosIT algorithm was implemented on the Nios II soft-core processor supplied with floating point hardware for accelerating floating point operations. Trigonometric functions have been approximated using Taylor series and cubic approximation using Lagrange polynomials. Inverse square root method has been implemented for approximating square root computations. Real time results have been achieved and pixel streams have been processed on the fly without any need to buffer the input frame for further implementation. PMID:27983714
Processor and method for developing a set of admissible fixture designs for a workpiece

DOEpatents

Brost, Randolph C.; Goldberg, Kenneth Y.; Canny, John; Wallack, Aaron S.

1999-01-01

Methods and apparatus are provided for developing a complete set of all admissible Type I and Type II fixture designs for a workpiece. The fixture processor generates the set of all admissible designs based on geometric access constraints and expected applied forces on the workpiece. For instance, the fixture processor may generate a set of admissible fixture designs for first, second and third locators placed in an array of holes on a fixture plate and a translating clamp attached to the fixture plate for contacting the workpiece. In another instance, a fixture vise is used in which first, second, third and fourth locators are used and first and second fixture jaws are tightened to secure the workpiece. The fixture process also ranks the set of admissible fixture designs according to a predetermined quality metric so that the optimal fixture design for the desired purpose may be identified from the set of all admissible fixture designs.
Processor and method for developing a set of admissible fixture designs for a workpiece

DOEpatents

Brost, Randolph C.; Goldberg, Kenneth Y.; Wallack, Aaron S.; Canny, John

1996-01-01

A fixture process and method is provided for developing a complete set of all admissible fixture designs for a workpiece which prevents the workpiece from translating or rotating. The fixture processor generates the set of all admissible designs based on geometric access constraints and expected applied forces on the workpiece. For instance, the fixture processor may generate a set of admissible fixture designs for first, second and third locators placed in an array of holes on a fixture plate and a translating clamp attached to the fixture plate for contacting the workpiece. In another instance, a fixture vice is used in which first, second, third and fourth locators are used and first and second fixture jaws are tightened to secure the workpiece. The fixture process also ranks the set of admissible fixture designs according to a predetermined quality metric so that the optimal fixture design for the desired purpose may be identified from the set of all admissible fixture designs.
Processor and method for developing a set of admissible fixture designs for a workpiece

DOEpatents

Brost, R.C.; Goldberg, K.Y.; Canny, J.; Wallack, A.S.

1999-01-05

Methods and apparatus are provided for developing a complete set of all admissible Type 1 and Type 2 fixture designs for a workpiece. The fixture processor generates the set of all admissible designs based on geometric access constraints and expected applied forces on the workpiece. For instance, the fixture processor may generate a set of admissible fixture designs for first, second and third locators placed in an array of holes on a fixture plate and a translating clamp attached to the fixture plate for contacting the workpiece. In another instance, a fixture vise is used in which first, second, third and fourth locators are used and first and second fixture jaws are tightened to secure the workpiece. The fixture process also ranks the set of admissible fixture designs according to a predetermined quality metric so that the optimal fixture design for the desired purpose may be identified from the set of all admissible fixture designs. 44 figs.
A Real-Time Marker-Based Visual Sensor Based on a FPGA and a Soft Core Processor.

PubMed

Tayara, Hilal; Ham, Woonchul; Chong, Kil To

2016-12-15

This paper introduces a real-time marker-based visual sensor architecture for mobile robot localization and navigation. A hardware acceleration architecture for post video processing system was implemented on a field-programmable gate array (FPGA). The pose calculation algorithm was implemented in a System on Chip (SoC) with an Altera Nios II soft-core processor. For every frame, single pass image segmentation and Feature Accelerated Segment Test (FAST) corner detection were used for extracting the predefined markers with known geometries in FPGA. Coplanar PosIT algorithm was implemented on the Nios II soft-core processor supplied with floating point hardware for accelerating floating point operations. Trigonometric functions have been approximated using Taylor series and cubic approximation using Lagrange polynomials. Inverse square root method has been implemented for approximating square root computations. Real time results have been achieved and pixel streams have been processed on the fly without any need to buffer the input frame for further implementation.
Multistatic Array Sampling Scheme for Fast Near-Field Image Reconstruction

DTIC Science & Technology

2016-01-01

reconstruction. The array topology samples the scene on a regular grid of phase centers, using a tiling of Boundary Arrays (BAs). Following a simple correction...hardware. Fig. 1 depicts the multistatic array topology. As seen, the topology is a tiled arrangement of Boundary Arrays (BAs). The BA is a well-known...sparse array layout comprised of two linear transmit arrays, and two linear receive arrays [6]. A slightly different tiled arrangement of BAs was used
Eigensolution of finite element problems in a completely connected parallel architecture

NASA Technical Reports Server (NTRS)

Akl, Fred A.; Morel, Michael R.

1989-01-01

A parallel algorithm for the solution of the generalized eigenproblem in linear elastic finite element analysis, (K)(phi)=(M)(phi)(omega), where (K) and (M) are of order N, and (omega) is of order q is presented. The parallel algorithm is based on a completely connected parallel architecture in which each processor is allowed to communicate with all other processors. The algorithm has been successfully implemented on a tightly coupled multiple-instruction-multiple-data (MIMD) parallel processing computer, Cray X-MP. A finite element model is divided into m domains each of which is assumed to process n elements. Each domain is then assigned to a processor, or to a logical processor (task) if the number of domains exceeds the number of physical processors. The macro-tasking library routines are used in mapping each domain to a user task. Computational speed-up and efficiency are used to determine the effectiveness of the algorithm. The effect of the number of domains, the number of degrees-of-freedom located along the global fronts and the dimension of the subspace on the performance of the algorithm are investigated. For a 64-element rectangular plate, speed-ups of 1.86, 3.13, 3.18 and 3.61 are achieved on two, four, six and eight processors, respectively.
The Data Acquisition System of the Stockholm Educational Air Shower Array

NASA Astrophysics Data System (ADS)

Hofverberg, P.; Johansson, H.; Pearce, M.; Rydstrom, S.; Wikstrom, C.

2005-12-01

The Stockholm Educational Air Shower Array (SEASA) project is deploying an array of plastic scintillator detector stations on school roofs in the Stockholm area. Signals from GPS satellites are used to time synchronise signals from the widely separated detector stations, allowing cosmic ray air showers to be identified and studied. A low-cost and highly scalable data acquisition system has been produced using embedded Linux processors which communicate station data to a central server running a MySQL database. Air shower data can be visualised in real-time using a Java-applet client. It is also possible to query the database and manage detector stations from the client. In this paper, the design and performance of the system are described
Examination of roundwood utilization rates in West Virginia

Treesearch

Shawn T. Grushecky; Jan Wiedenbeck; Curt C. Hassler

2013-01-01

Forest harvesting is an integral part of the West Virginia forest economy. This component of the supply chain supports a diverse array of primary and secondary processors. A key metric used to describe the efficiency of the roundwood extraction process is the logging utilization factor (LUF). The LUF is one way managers can discern the overall use of harvested...
Precision orbit raising trajectories. [solar electric propulsion orbital transfer program

NASA Technical Reports Server (NTRS)

Flanagan, P. F.; Horsewood, J. L.; Pines, S.

1975-01-01

A precision trajectory program has been developed to serve as a test bed for geocentric orbit raising steering laws. The steering laws to be evaluated have been developed using optimization methods employing averaging techniques. This program provides the capability of testing the steering laws in a precision simulation. The principal system models incorporated in the program are described, including the radiation environment, the solar array model, the thrusters and power processors, the geopotential, and the solar system. Steering and array orientation constraints are discussed, and the impact of these constraints on program design is considered.
The SKA1 LOW telescope: system architecture and design performance

NASA Astrophysics Data System (ADS)

Waterson, Mark F.; Labate, Maria Grazia; Schnetler, Hermine; Wagg, Jeff; Turner, Wallace; Dewdney, Peter

2016-07-01

The SKA1-LOW radio telescope will be a low-frequency (50-350 MHz) aperture array located in Western Australia. Its scientific objectives will prioritize studies of the Epoch of Reionization and pulsar physics. Development of the telescope has been allocated to consortia responsible for the aperture array front end, timing distribution, signal and data transport, correlation and beamforming signal processors, infrastructure, monitor and control systems, and science data processing. This paper will describe the system architectural design and key performance parameters of the telescope and summarize the high-level sub-system designs of the consortia.
Linear Spectral Analysis of Plume Emissions Using an Optical Matrix Processor

NASA Technical Reports Server (NTRS)

Gary, C. K.

1992-01-01

Plume spectrometry provides a means to monitor the health of a burning rocket engine, and optical matrix processors provide a means to analyze the plume spectra in real time. By observing the spectrum of the exhaust plume of a rocket engine, researchers have detected anomalous behavior of the engine and have even determined the failure of some equipment before it would normally have been noticed. The spectrum of the plume is analyzed by isolating information in the spectrum about the various materials present to estimate what materials are being burned in the engine. Scientists at the Marshall Space Flight Center (MSFC) have implemented a high resolution spectrometer to discriminate the spectral peaks of the many species present in the plume. Researchers at the Stennis Space Center Demonstration Testbed Facility (DTF) have implemented a high resolution spectrometer observing a 1200-lb. thrust engine. At this facility, known concentrations of contaminants can be introduced into the burn, allowing for the confirmation of diagnostic algorithms. While the high resolution of the measured spectra has allowed greatly increased insight into the functioning of the engine, the large data flows generated limit the ability to perform real-time processing. The use of an optical matrix processor and the linear analysis technique described below may allow for the detailed real-time analysis of the engine's health. A small optical matrix processor can perform the required mathematical analysis both quicker and with less energy than a large electronic computer dedicated to the same spectral analysis routine.
Conformal array design on arbitrary polygon surface with transformation optics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Deng, Li, E-mail: dengl@bupt.edu.cn; Hong, Weijun, E-mail: hongwj@bupt.edu.cn; Zhu, Jianfeng

2016-06-15

A transformation-optics based method to design a conformal antenna array on an arbitrary polygon surface is proposed and demonstrated in this paper. This conformal antenna array can be adjusted to behave equivalently as a uniformly spaced linear array by applying an appropriate transformation medium. An typical example of general arbitrary polygon conformal arrays, not limited to circular array, is presented, verifying the proposed approach. In summary, the novel arbitrary polygon surface conformal array can be utilized in array synthesis and beam-forming, maintaining all benefits of linear array.

Restoring Low Sidelobe Antenna Patterns with Failed Elements in a Phased Array Antenna

DTIC Science & Technology

2016-02-01

optimum low sidelobes are demonstrated in several examples. Index Terms — Array signal processing, beams, linear algebra , phased arrays, shaped...represented by a linear combination of low sidelobe beamformers with no failed elements, ’s, in a neighborhood around under the constraint that the linear ...would expect that linear combinations of them in a neighborhood around would also have low sidelobes. The algorithms in this paper exploit this
Techniques for the rapid display and manipulation of 3-D biomedical data.

PubMed

Goldwasser, S M; Reynolds, R A; Talton, D A; Walsh, E S

1988-01-01

The use of fully interactive 3-D workstations with true real-time performance will become increasingly common as technology matures and economical commercial systems become available. This paper provides a comprehensive introduction to high speed approaches to the display and manipulation of 3-D medical objects obtained from tomographic data acquisition systems such as CT, MR, and PET. A variety of techniques are outlined including the use of software on conventional minicomputers, hardware assist devices such as array processors and programmable frame buffers, and special purpose computer architecture for dedicated high performance systems. While both algorithms and architectures are addressed, the major theme centers around the utilization of hardware-based approaches including parallel processors for the implementation of true real-time systems.
Integrated 3-D vision system for autonomous vehicles

NASA Astrophysics Data System (ADS)

Hou, Kun M.; Shawky, Mohamed; Tu, Xiaowei

1992-03-01

Nowadays, autonomous vehicles have become a multidiscipline field. Its evolution is taking advantage of the recent technological progress in computer architectures. As the development tools became more sophisticated, the trend is being more specialized, or even dedicated architectures. In this paper, we will focus our interest on a parallel vision subsystem integrated in the overall system architecture. The system modules work in parallel, communicating through a hierarchical blackboard, an extension of the 'tuple space' from LINDA concepts, where they may exchange data or synchronization messages. The general purpose processing elements are of different skills, built around 40 MHz i860 Intel RISC processors for high level processing and pipelined systolic array processors based on PLAs or FPGAs for low-level processing.
Advanced On-Board Processor (AOP). [for future spacecraft applications

NASA Technical Reports Server (NTRS)

1973-01-01

Advanced On-board Processor the (AOP) uses large scale integration throughout and is the most advanced space qualified computer of its class in existence today. It was designed to satisfy most spacecraft requirements which are anticipated over the next several years. The AOP design utilizes custom metallized multigate arrays (CMMA) which have been designed specifically for this computer. This approach provides the most efficient use of circuits, reduces volume, weight, assembly costs and provides for a significant increase in reliability by the significant reduction in conventional circuit interconnections. The required 69 CMMA packages are assembled on a single multilayer printed circuit board which together with associated connectors constitutes the complete AOP. This approach also reduces conventional interconnections thus further reducing weight, volume and assembly costs.
Spacecraft on-board SAR image generation for EOS-type missions

NASA Technical Reports Server (NTRS)

Liu, K. Y.; Arens, W. E.; Assal, H. M.; Vesecky, J. F.

1987-01-01

Spacecraft on-board synthetic aperture radar (SAR) image generation is an extremely difficult problem because of the requirements for high computational rates (usually on the order of Giga-operations per second), high reliability (some missions last up to 10 years), and low power dissipation and mass (typically less than 500 watts and 100 Kilograms). Recently, a JPL study was performed to assess the feasibility of on-board SAR image generation for EOS-type missions. This paper summarizes the results of that study. Specifically, it proposes a processor architecture using a VLSI time-domain parallel array for azimuth correlation. Using available space qualifiable technology to implement the proposed architecture, an on-board SAR processor having acceptable power and mass characteristics appears feasible for EOS-type applications.
CoNNeCT Baseband Processor Module

NASA Technical Reports Server (NTRS)

Yamamoto, Clifford K; Jedrey, Thomas C.; Gutrich, Daniel G.; Goodpasture, Richard L.

2011-01-01

A document describes the CoNNeCT Baseband Processor Module (BPM) based on an updated processor, memory technology, and field-programmable gate arrays (FPGAs). The BPM was developed from a requirement to provide sufficient computing power and memory storage to conduct experiments for a Software Defined Radio (SDR) to be implemented. The flight SDR uses the AT697 SPARC processor with on-chip data and instruction cache. The non-volatile memory has been increased from a 20-Mbit EEPROM (electrically erasable programmable read only memory) to a 4-Gbit Flash, managed by the RTAX2000 Housekeeper, allowing more programs and FPGA bit-files to be stored. The volatile memory has been increased from a 20-Mbit SRAM (static random access memory) to a 1.25-Gbit SDRAM (synchronous dynamic random access memory), providing additional memory space for more complex operating systems and programs to be executed on the SPARC. All memory is EDAC (error detection and correction) protected, while the SPARC processor implements fault protection via TMR (triple modular redundancy) architecture. Further capability over prior BPM designs includes the addition of a second FPGA to implement features beyond the resources of a single FPGA. Both FPGAs are implemented with Xilinx Virtex-II and are interconnected by a 96-bit bus to facilitate data exchange. Dedicated 1.25- Gbit SDRAMs are wired to each Xilinx FPGA to accommodate high rate data buffering for SDR applications as well as independent SpaceWire interfaces. The RTAX2000 manages scrub and configuration of each Xilinx.
Use of Field Programmable Gate Array Technology in Future Space Avionics

NASA Technical Reports Server (NTRS)

Ferguson, Roscoe C.; Tate, Robert

2005-01-01

Fulfilling NASA's new vision for space exploration requires the development of sustainable, flexible and fault tolerant spacecraft control systems. The traditional development paradigm consists of the purchase or fabrication of hardware boards with fixed processor and/or Digital Signal Processing (DSP) components interconnected via a standardized bus system. This is followed by the purchase and/or development of software. This paradigm has several disadvantages for the development of systems to support NASA's new vision. Building a system to be fault tolerant increases the complexity and decreases the performance of included software. Standard bus design and conventional implementation produces natural bottlenecks. Configuring hardware components in systems containing common processors and DSPs is difficult initially and expensive or impossible to change later. The existence of Hardware Description Languages (HDLs), the recent increase in performance, density and radiation tolerance of Field Programmable Gate Arrays (FPGAs), and Intellectual Property (IP) Cores provides the technology for reprogrammable Systems on a Chip (SOC). This technology supports a paradigm better suited for NASA's vision. Hardware and software production are melded for more effective development; they can both evolve together over time. Designers incorporating this technology into future avionics can benefit from its flexibility. Systems can be designed with improved fault isolation and tolerance using hardware instead of software. Also, these designs can be protected from obsolescence problems where maintenance is compromised via component and vendor availability.To investigate the flexibility of this technology, the core of the Central Processing Unit and Input/Output Processor of the Space Shuttle AP101S Computer were prototyped in Verilog HDL and synthesized into an Altera Stratix FPGA.
Implementing the PM Programming Language using MPI and OpenMP - a New Tool for Programming Geophysical Models on Parallel Systems

NASA Astrophysics Data System (ADS)

Bellerby, Tim

2015-04-01

PM (Parallel Models) is a new parallel programming language specifically designed for writing environmental and geophysical models. The language is intended to enable implementers to concentrate on the science behind the model rather than the details of running on parallel hardware. At the same time PM leaves the programmer in control - all parallelisation is explicit and the parallel structure of any given program may be deduced directly from the code. This paper describes a PM implementation based on the Message Passing Interface (MPI) and Open Multi-Processing (OpenMP) standards, looking at issues involved with translating the PM parallelisation model to MPI/OpenMP protocols and considering performance in terms of the competing factors of finer-grained parallelisation and increased communication overhead. In order to maximise portability, the implementation stays within the MPI 1.3 standard as much as possible, with MPI-2 MPI-IO file handling the only significant exception. Moreover, it does not assume a thread-safe implementation of MPI. PM adopts a two-tier abstract representation of parallel hardware. A PM processor is a conceptual unit capable of efficiently executing a set of language tasks, with a complete parallel system consisting of an abstract N-dimensional array of such processors. PM processors may map to single cores executing tasks using cooperative multi-tasking, to multiple cores or even to separate processing nodes, efficiently sharing tasks using algorithms such as work stealing. While tasks may move between hardware elements within a PM processor, they may not move between processors without specific programmer intervention. Tasks are assigned to processors using a nested parallelism approach, building on ideas from Reyes et al. (2009). The main program owns all available processors. When the program enters a parallel statement then either processors are divided out among the newly generated tasks (number of new tasks < number of processors) or tasks are divided out among the available processors (number of tasks > number of processors). Nested parallel statements may further subdivide the processor set owned by a given task. Tasks or processors are distributed evenly by default, but uneven distributions are possible under programmer control. It is also possible to explicitly enable child tasks to migrate within the processor set owned by their parent task, reducing load unbalancing at the potential cost of increased inter-processor message traffic. PM incorporates some programming structures from the earlier MIST language presented at a previous EGU General Assembly, while adopting a significantly different underlying parallelisation model and type system. PM code is available at www.pm-lang.org under an unrestrictive MIT license. Reference Ruymán Reyes, Antonio J. Dorta, Francisco Almeida, Francisco de Sande, 2009. Automatic Hybrid MPI+OpenMP Code Generation with llc, Recent Advances in Parallel Virtual Machine and Message Passing Interface, Lecture Notes in Computer Science Volume 5759, 185-195
SAR processing on the MPP

NASA Technical Reports Server (NTRS)

Batcher, K. E.; Eddey, E. E.; Faiss, R. O.; Gilmore, P. A.

1981-01-01

The processing of synthetic aperture radar (SAR) signals using the massively parallel processor (MPP) is discussed. The fast Fourier transform convolution procedures employed in the algorithms are described. The MPP architecture comprises an array unit (ARU) which processes arrays of data; an array control unit which controls the operation of the ARU and performs scalar arithmetic; a program and data management unit which controls the flow of data; and a unique staging memory (SM) which buffers and permutes data. The ARU contains a 128 by 128 array of bit-serial processing elements (PE). Two-by-four surarrays of PE's are packaged in a custom VLSI HCMOS chip. The staging memory is a large multidimensional-access memory which buffers and permutes data flowing with the system. Efficient SAR processing is achieved via ARU communication paths and SM data manipulation. Real time processing capability can be realized via a multiple ARU, multiple SM configuration.
Microsystem enabled photovoltaic modules and systems

DOEpatents

Nielson, Gregory N; Sweatt, William C; Okandan, Murat

2015-05-12

A microsystem enabled photovoltaic (MEPV) module including: an absorber layer; a fixed optic layer coupled to the absorber layer; a translatable optic layer; a translation stage coupled between the fixed and translatable optic layers; and a motion processor electrically coupled to the translation stage to controls motion of the translatable optic layer relative to the fixed optic layer. The absorber layer includes an array of photovoltaic (PV) elements. The fixed optic layer includes an array of quasi-collimating (QC) micro-optical elements designed and arranged to couple incident radiation from an intermediate image formed by the translatable optic layer into one of the PV elements such that it is quasi-collimated. The translatable optic layer includes an array of focusing micro-optical elements corresponding to the QC micro-optical element array. Each focusing micro-optical element is designed to produce a quasi-telecentric intermediate image from substantially collimated radiation incident within a predetermined field of view.
A FPGA-Based, Granularity-Variable Neuromorphic Processor and Its Application in a MIMO Real-Time Control System.

PubMed

Zhang, Zhen; Ma, Cheng; Zhu, Rong

2017-08-23

Artificial Neural Networks (ANNs), including Deep Neural Networks (DNNs), have become the state-of-the-art methods in machine learning and achieved amazing success in speech recognition, visual object recognition, and many other domains. There are several hardware platforms for developing accelerated implementation of ANN models. Since Field Programmable Gate Array (FPGA) architectures are flexible and can provide high performance per watt of power consumption, they have drawn a number of applications from scientists. In this paper, we propose a FPGA-based, granularity-variable neuromorphic processor (FBGVNP). The traits of FBGVNP can be summarized as granularity variability, scalability, integrated computing, and addressing ability: first, the number of neurons is variable rather than constant in one core; second, the multi-core network scale can be extended in various forms; third, the neuron addressing and computing processes are executed simultaneously. These make the processor more flexible and better suited for different applications. Moreover, a neural network-based controller is mapped to FBGVNP and applied in a multi-input, multi-output, (MIMO) real-time, temperature-sensing and control system. Experiments validate the effectiveness of the neuromorphic processor. The FBGVNP provides a new scheme for building ANNs, which is flexible, highly energy-efficient, and can be applied in many areas.
A FPGA-Based, Granularity-Variable Neuromorphic Processor and Its Application in a MIMO Real-Time Control System

PubMed Central

Zhang, Zhen; Zhu, Rong

2017-01-01

Artificial Neural Networks (ANNs), including Deep Neural Networks (DNNs), have become the state-of-the-art methods in machine learning and achieved amazing success in speech recognition, visual object recognition, and many other domains. There are several hardware platforms for developing accelerated implementation of ANN models. Since Field Programmable Gate Array (FPGA) architectures are flexible and can provide high performance per watt of power consumption, they have drawn a number of applications from scientists. In this paper, we propose a FPGA-based, granularity-variable neuromorphic processor (FBGVNP). The traits of FBGVNP can be summarized as granularity variability, scalability, integrated computing, and addressing ability: first, the number of neurons is variable rather than constant in one core; second, the multi-core network scale can be extended in various forms; third, the neuron addressing and computing processes are executed simultaneously. These make the processor more flexible and better suited for different applications. Moreover, a neural network-based controller is mapped to FBGVNP and applied in a multi-input, multi-output, (MIMO) real-time, temperature-sensing and control system. Experiments validate the effectiveness of the neuromorphic processor. The FBGVNP provides a new scheme for building ANNs, which is flexible, highly energy-efficient, and can be applied in many areas. PMID:28832522
An Intrinsically Digital Amplification Scheme for Hearing Aids

NASA Astrophysics Data System (ADS)

Blamey, Peter J.; Macfarlane, David S.; Steele, Brenton R.

2005-12-01

Results for linear and wide-dynamic range compression were compared with a new 64-channel digital amplification strategy in three separate studies. The new strategy addresses the requirements of the hearing aid user with efficient computations on an open-platform digital signal processor (DSP). The new amplification strategy is not modeled on prior analog strategies like compression and linear amplification, but uses statistical analysis of the signal to optimize the output dynamic range in each frequency band independently. Using the open-platform DSP processor also provided the opportunity for blind trial comparisons of the different processing schemes in BTE and ITE devices of a high commercial standard. The speech perception scores and questionnaire results show that it is possible to provide improved audibility for sound in many narrow frequency bands while simultaneously improving comfort, speech intelligibility in noise, and sound quality.
Real-time system for measuring three-dimensional shape of solder bump array by focus using varifocal mirror

NASA Astrophysics Data System (ADS)

Ishii, Akira; Tai, Haruka; Mitsudo, Jun

2007-10-01

This paper describes a real-time system for measuring the three-dimensional shape of solder bumps arrayed on an LSI chip-size-package (CSP) board presented for inspection based on the shape-from-focus technique. It uses a copper-alloy mirror deformed by a piezoelectric actuator as a varifocal mirror enabling a simple, fast, precise focusing mechanism without moving parts to be built. A practical measuring speed of 1.69 s/package for a small CSP board (4 x 4 mm2) was achieved by incorporating an exclusive field programmable gate array processor to calculate focus measure and by constructing a domed array of LEDs as a high-intensity, uniform illumination system so that a fast (150 fps) and high-resolution (1024 x 1024 pixels/frame) CMOS image sensor could be used. Accurate measurements of bump height were also achieved with errors of 10 μm (2σ) meeting the requirements for testing the coplanarity of a bump array.
Compression dynamics of quasi-spherical wire arrays with different linear mass profiles

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mitrofanov, K. N., E-mail: mitrofan@triniti.ru; Aleksandrov, V. V.; Gritsuk, A. N.

Results of experimental studies of the implosion of quasi-spherical wire (or metalized fiber) arrays are presented. The goal of the experiments was to achieve synchronous three-dimensional compression of the plasma produced in different regions of a quasi-spherical array into its geometrical center. To search for optimal synchronization conditions, quasi-spherical arrays with different initial profiles of the linear mass were used. The following dependences of the linear mass on the poloidal angle were used: m{sub l}(θ) ∝ sin{sup –1}θ and m{sub l}(θ) ∝ sin{sup –2}θ. The compression dynamics of such arrays was compared with that of quasi-spherical arrays without linear massmore » profiling, m{sub l}(θ) = const. To verify the experimental data, the spatiotemporal dynamics of plasma compression in quasi-spherical arrays was studied using various diagnostics. The experiments on three-dimensional implosion of quasi-spherical arrays made it possible to study how the frozen-in magnetic field of the discharge current penetrates into the array. By measuring the magnetic field in the plasma of a quasi-spherical array, information is obtained on the processes of plasma production and formation of plasma flows from the wire/fiber regions with and without an additionally deposited mass. It is found that penetration of the magnetic flux depends on the initial linear mass profile m{sub l}(θ) of the quasi-spherical array. From space-resolved spectral measurements and frame imaging of plasma X-ray emission, information is obtained on the dimensions and shape of the X-ray source formed during the implosion of a quasi-spherical array. The intensity of this source is estimated and compared with that of the Z-pinch formed during the implosion of a cylindrical array.« less
Accelerating scientific computations with mixed precision algorithms

NASA Astrophysics Data System (ADS)

Baboulin, Marc; Buttari, Alfredo; Dongarra, Jack; Kurzak, Jakub; Langou, Julie; Langou, Julien; Luszczek, Piotr; Tomov, Stanimire

2009-12-01

On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. The approach presented here can apply not only to conventional processors but also to other technologies such as Field Programmable Gate Arrays (FPGA), Graphical Processing Units (GPU), and the STI Cell BE processor. Results on modern processor architectures and the STI Cell BE are presented. Program summaryProgram title: ITER-REF Catalogue identifier: AECO_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AECO_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 7211 No. of bytes in distributed program, including test data, etc.: 41 862 Distribution format: tar.gz Programming language: FORTRAN 77 Computer: desktop, server Operating system: Unix/Linux RAM: 512 Mbytes Classification: 4.8 External routines: BLAS (optional) Nature of problem: On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. Solution method: Mixed precision algorithms stem from the observation that, in many cases, a single precision solution of a problem can be refined to the point where double precision accuracy is achieved. A common approach to the solution of linear systems, either dense or sparse, is to perform the LU factorization of the coefficient matrix using Gaussian elimination. First, the coefficient matrix A is factored into the product of a lower triangular matrix L and an upper triangular matrix U. Partial row pivoting is in general used to improve numerical stability resulting in a factorization PA=LU, where P is a permutation matrix. The solution for the system is achieved by first solving Ly=Pb (forward substitution) and then solving Ux=y (backward substitution). Due to round-off errors, the computed solution, x, carries a numerical error magnified by the condition number of the coefficient matrix A. In order to improve the computed solution, an iterative process can be applied, which produces a correction to the computed solution at each iteration, which then yields the method that is commonly known as the iterative refinement algorithm. Provided that the system is not too ill-conditioned, the algorithm produces a solution correct to the working precision. Running time: seconds/minutes
Fast particles identification in programmable form at level-0 trigger by means of the 3D-Flow system

DOE Office of Scientific and Technical Information (OSTI.GOV)

Crosetto, Dario B.

1998-10-30

The 3D-Flow Processor system is a new, technology-independent concept in very fast, real-time system architectures. Based on either an FPGA or an ASIC implementation, it can address, in a fully programmable manner, applications where commercially available processors would fail because of throughput requirements. Possible applications include filtering-algorithms (pattern recognition) from the input of multiple sensors, as well as moving any input validated by these filtering-algorithms to a single output channel. Both operations can easily be implemented on a 3D-Flow system to achieve a real-time processing system with a very short lag time. This system can be built either with off-the-shelfmore » FPGAs or, for higher data rates, with CMOS chips containing 4 to 16 processors each. The basic building block of the system, a 3D-Flow processor, has been successfully designed in VHDL code written in ''Generic HDL'' (mostly made of reusable blocks that are synthesizable in different technologies, or FPGAs), to produce a netlist for a four-processor ASIC featuring 0.35 micron CBA (Ceil Base Array) technology at 3.3 Volts, 884 mW power dissipation at 60 MHz and 63.75 mm sq. die size. The same VHDL code has been targeted to three FPGA manufacturers (Altera EPF10K250A, ORCA-Lucent Technologies 0R3T165 and Xilinx XCV1000). A complete set of software tools, the 3D-Flow System Manager, equally applicable to ASIC or FPGA implementations, has been produced to provide full system simulation, application development, real-time monitoring, and run-time fault recovery. Today's technology can accommodate 16 processors per chip in a medium size die, at a cost per processor of less than $5 based on the current silicon die/size technology cost.« less
Dual-scale topology optoelectronic processor.

PubMed

Marsden, G C; Krishnamoorthy, A V; Esener, S C; Lee, S H

1991-12-15

The dual-scale topology optoelectronic processor (D-STOP) is a parallel optoelectronic architecture for matrix algebraic processing. The architecture can be used for matrix-vector multiplication and two types of vector outer product. The computations are performed electronically, which allows multiplication and summation concepts in linear algebra to be generalized to various nonlinear or symbolic operations. This generalization permits the application of D-STOP to many computational problems. The architecture uses a minimum number of optical transmitters, which thereby reduces fabrication requirements while maintaining area-efficient electronics. The necessary optical interconnections are space invariant, minimizing space-bandwidth requirements.
Supercomputing with TOUGH2 family codes for coupled multi-physics simulations of geologic carbon sequestration

NASA Astrophysics Data System (ADS)

Yamamoto, H.; Nakajima, K.; Zhang, K.; Nanai, S.

2015-12-01

Powerful numerical codes that are capable of modeling complex coupled processes of physics and chemistry have been developed for predicting the fate of CO2 in reservoirs as well as its potential impacts on groundwater and subsurface environments. However, they are often computationally demanding for solving highly non-linear models in sufficient spatial and temporal resolutions. Geological heterogeneity and uncertainties further increase the challenges in modeling works. Two-phase flow simulations in heterogeneous media usually require much longer computational time than that in homogeneous media. Uncertainties in reservoir properties may necessitate stochastic simulations with multiple realizations. Recently, massively parallel supercomputers with more than thousands of processors become available in scientific and engineering communities. Such supercomputers may attract attentions from geoscientist and reservoir engineers for solving the large and non-linear models in higher resolutions within a reasonable time. However, for making it a useful tool, it is essential to tackle several practical obstacles to utilize large number of processors effectively for general-purpose reservoir simulators. We have implemented massively-parallel versions of two TOUGH2 family codes (a multi-phase flow simulator TOUGH2 and a chemically reactive transport simulator TOUGHREACT) on two different types (vector- and scalar-type) of supercomputers with a thousand to tens of thousands of processors. After completing implementation and extensive tune-up on the supercomputers, the computational performance was measured for three simulations with multi-million grid models, including a simulation of the dissolution-diffusion-convection process that requires high spatial and temporal resolutions to simulate the growth of small convective fingers of CO2-dissolved water to larger ones in a reservoir scale. The performance measurement confirmed that the both simulators exhibit excellent scalabilities showing almost linear speedup against number of processors up to over ten thousand cores. Generally this allows us to perform coupled multi-physics (THC) simulations on high resolution geologic models with multi-million grid in a practical time (e.g., less than a second per time step).
3D morphology reconstruction using linear array CCD binocular stereo vision imaging system

NASA Astrophysics Data System (ADS)

Pan, Yu; Wang, Jinjiang

2018-01-01

Binocular vision imaging system, which has a small field of view, cannot reconstruct the 3-D shape of the dynamic object. We found a linear array CCD binocular vision imaging system, which uses different calibration and reconstruct methods. On the basis of the binocular vision imaging system, the linear array CCD binocular vision imaging systems which has a wider field of view can reconstruct the 3-D morphology of objects in continuous motion, and the results are accurate. This research mainly introduces the composition and principle of linear array CCD binocular vision imaging system, including the calibration, capture, matching and reconstruction of the imaging system. The system consists of two linear array cameras which were placed in special arrangements and a horizontal moving platform that can pick up objects. The internal and external parameters of the camera are obtained by calibrating in advance. And then using the camera to capture images of moving objects, the results are then matched and 3-D reconstructed. The linear array CCD binocular vision imaging systems can accurately measure the 3-D appearance of moving objects, this essay is of great significance to measure the 3-D morphology of moving objects.

Linear encoding device

NASA Technical Reports Server (NTRS)

Leviton, Douglas B. (Inventor)

1993-01-01

A Linear Motion Encoding device for measuring the linear motion of a moving object is disclosed in which a light source is mounted on the moving object and a position sensitive detector such as an array photodetector is mounted on a nearby stationary object. The light source emits a light beam directed towards the array photodetector such that a light spot is created on the array. An analog-to-digital converter, connected to the array photodetector is used for reading the position of the spot on the array photodetector. A microprocessor and memory is connected to the analog-to-digital converter to hold and manipulate data provided by the analog-to-digital converter on the position of the spot and to compute the linear displacement of the moving object based upon the data from the analog-to-digital converter.
Sub-nanosecond clock synchronization and trigger management in the nuclear physics experiment AGATA

NASA Astrophysics Data System (ADS)

Bellato, M.; Bortolato, D.; Chavas, J.; Isocrate, R.; Rampazzo, G.; Triossi, A.; Bazzacco, D.; Mengoni, D.; Recchia, F.

2013-07-01

The new-generation spectrometer AGATA, the Advanced GAmma Tracking Array, requires sub-nanosecond clock synchronization among readout and front-end electronics modules that may lie hundred meters apart. We call GTS (Global Trigger and Synchronization System) the infrastructure responsible for precise clock synchronization and for the trigger management of AGATA. It is made of a central trigger processor and nodes, connected in a tree structure by means of optical fibers operated at 2Gb/s. The GTS tree handles the synchronization and the trigger data flow, whereas the trigger processor analyses and eventually validates the trigger primitives centrally. Sub-nanosecond synchronization is achieved by measuring two different types of round-trip times and by automatically correcting for phase-shift differences. For a tree of depth two, the peak-to-peak clock jitter at each leaf is 70 ps; the mean phase difference is 180 ps, while the standard deviation over such phase difference, namely the phase equalization repeatability, is 20 ps. The GTS system has run flawlessly for the two-year long AGATA campaign, held at the INFN Legnaro National Laboratories, Italy, where five triple clusters of the AGATA sub-array were coupled with a variety of ancillary detectors.
Research in the design of high-performance reconfigurable systems

NASA Technical Reports Server (NTRS)

Slotnick, D. L.; Mcewan, S. D.; Spry, A. J.

1984-01-01

An initial design for the Bit Processor (BP) referred to in prior reports as the Processing Element or PE has been completed. Eight BP's, together with their supporting random-access memory, a 64 k x 9 ROM to perform addition, routing logic, and some additional logic, constitute the components of a single stage. An initial stage design is given. Stages may be combined to perform high-speed fixed or floating point arithmetic. Stages can be configured into a range of arithmetic modules that includes bit-serial one or two-dimensional arrays; one or two dimensional arrays fixed or floating point processors; and specialized uniprocessors, such as long-word arithmetic units. One to eight BP's represent a likely initial chip level. The Stage would then correspond to a first-level pluggable module. As both this project and VLSI CAD/CAM progress, however, it is expected that the chip level would migrate upward to the stage and, perhaps, ultimately the box level. The BP RAM, consisting of two banks, holds only operands and indices. Programs are at the box (high-level function) and system level. At the system level initial effort has been concentrated on specifying the tools needed to evaluate design alternatives.
Stereo and IMU-Assisted Visual Odometry for Small Robots

NASA Technical Reports Server (NTRS)

2012-01-01

This software performs two functions: (1) taking stereo image pairs as input, it computes stereo disparity maps from them by cross-correlation to achieve 3D (three-dimensional) perception; (2) taking a sequence of stereo image pairs as input, it tracks features in the image sequence to estimate the motion of the cameras between successive image pairs. A real-time stereo vision system with IMU (inertial measurement unit)-assisted visual odometry was implemented on a single 750 MHz/520 MHz OMAP3530 SoC (system on chip) from TI (Texas Instruments). Frame rates of 46 fps (frames per second) were achieved at QVGA (Quarter Video Graphics Array i.e. 320 240), or 8 fps at VGA (Video Graphics Array 640 480) resolutions, while simultaneously tracking up to 200 features, taking full advantage of the OMAP3530's integer DSP (digital signal processor) and floating point ARM processors. This is a substantial advancement over previous work as the stereo implementation produces 146 Mde/s (millions of disparities evaluated per second) in 2.5W, yielding a stereo energy efficiency of 58.8 Mde/J, which is 3.75 better than prior DSP stereo while providing more functionality.
A Fourier Method for Sidelobe Reduction in Equally Spaced Linear Arrays

NASA Astrophysics Data System (ADS)

Safaai-Jazi, Ahmad; Stutzman, Warren L.

2018-04-01

Uniformly excited, equally spaced linear arrays have a sidelobe level larger than -13.3 dB, which is too high for many applications. This limitation can be remedied by nonuniform excitation of array elements. We present an efficient method for sidelobe reduction in equally spaced linear arrays with low penalty on the directivity. The method involves the following steps: construction of a periodic function containing only the sidelobes of the uniformly excited array, calculation of the Fourier series of this periodic function, subtracting the series from the array factor of the original uniformly excited array after it is truncated, and finally mitigating the truncation effects which yields significant increase in sidelobe level reduction. A sidelobe reduction factor is incorporated into element currents that makes much larger sidelobe reductions possible and also allows varying the sidelobe level incrementally. It is shown that such newly formed arrays can provide sidelobe levels that are at least 22.7 dB below those of the uniformly excited arrays with the same size and number of elements. Analytical expressions for element currents are presented. Radiation characteristics of the sidelobe-reduced arrays introduced here are examined, and numerical results for directivity, sidelobe level, and half-power beam width are presented for example cases. Performance improvements over popular conventional array synthesis methods, such as Chebyshev and linear current tapered arrays, are obtained with the new method.
Real-time processing for full-range Fourier-domain optical-coherence tomography with zero-filling interpolation using multiple graphic processing units.

PubMed

Watanabe, Yuuki; Maeno, Seiya; Aoshima, Kenji; Hasegawa, Haruyuki; Koseki, Hitoshi

2010-09-01

The real-time display of full-range, 2048?axial pixelx1024?lateral pixel, Fourier-domain optical-coherence tomography (FD-OCT) images is demonstrated. The required speed was achieved by using dual graphic processing units (GPUs) with many stream processors to realize highly parallel processing. We used a zero-filling technique, including a forward Fourier transform, a zero padding to increase the axial data-array size to 8192, an inverse-Fourier transform back to the spectral domain, a linear interpolation from wavelength to wavenumber, a lateral Hilbert transform to obtain the complex spectrum, a Fourier transform to obtain the axial profiles, and a log scaling. The data-transfer time of the frame grabber was 15.73?ms, and the processing time, which includes the data transfer between the GPU memory and the host computer, was 14.75?ms, for a total time shorter than the 36.70?ms frame-interval time using a line-scan CCD camera operated at 27.9?kHz. That is, our OCT system achieved a processed-image display rate of 27.23 frames/s.
Acousto-optic RF signal acquisition system

NASA Astrophysics Data System (ADS)

Bloxham, Laurence H.

1990-09-01

This paper describes the architecture and performance of a prototype Acousto-Optic RF Signal Acquisition System designed to intercept, automatically identify, and track communication signals in the VHF band. The system covers 28.0 to 92.0 MHz with five manually selectable, dual conversion; 12.8 MHZ bandwidth front ends. An acousto-optic spectrum analyzer (AOSA) implemented using a tellurium dioxide (Te02) Bragg cell is used to channelize the 12.8 MHz pass band into 512 25 KHz channels. Polarization switching is used to suppress optical noise. Excellent isolation and dynamic range are achieved by using a linear array of 512 custom 40/50 micron fiber optic cables to collect the light at the focal plane of the AOSA and route the light to individual photodetectors. The photodetectors are operated in the photovoltaic mode to compress the greater than 60 dB input optical dynamic range into an easily processed electrical signal. The 512 signals are multiplexed and processed as a line in a video image by a customized digital image processing system. The image processor simultaneously analyzes the channelized signal data and produces a classical waterfall display.
Design of a robust baseband LPC coder for speech transmission over 9.6 kbit/s noisy channels

NASA Astrophysics Data System (ADS)

Viswanathan, V. R.; Russell, W. H.; Higgins, A. L.

1982-04-01

This paper describes the design of a baseband Linear Predictive Coder (LPC) which transmits speech over 9.6 kbit/sec synchronous channels with random bit errors of up to 1%. Presented are the results of our investigation of a number of aspects of the baseband LPC coder with the goal of maximizing the quality of the transmitted speech. Important among these aspects are: bandwidth of the baseband, coding of the baseband residual, high-frequency regeneration, and error protection of important transmission parameters. The paper discusses these and other issues, presents the results of speech-quality tests conducted during the various stages of optimization, and describes the details of the optimized speech coder. This optimized speech coding algorithm has been implemented as a real-time full-duplex system on an array processor. Informal listening tests of the real-time coder have shown that the coder produces good speech quality in the absence of channel bit errors and introduces only a slight degradation in quality for channel bit error rates of up to 1%.
Absolute Position Encoders With Vertical Image Binning

NASA Technical Reports Server (NTRS)

Leviton, Douglas B.

2005-01-01

Improved optoelectronic patternrecognition encoders that measure rotary and linear 1-dimensional positions at conversion rates (numbers of readings per unit time) exceeding 20 kHz have been invented. Heretofore, optoelectronic pattern-recognition absoluteposition encoders have been limited to conversion rates <15 Hz -- too low for emerging industrial applications in which conversion rates ranging from 1 kHz to as much as 100 kHz are required. The high conversion rates of the improved encoders are made possible, in part, by use of vertically compressible or binnable (as described below) scale patterns in combination with modified readout sequences of the image sensors [charge-coupled devices (CCDs)] used to read the scale patterns. The modified readout sequences and the processing of the images thus read out are amenable to implementation by use of modern, high-speed, ultra-compact microprocessors and digital signal processors or field-programmable gate arrays. This combination of improvements makes it possible to greatly increase conversion rates through substantial reductions in all three components of conversion time: exposure time, image-readout time, and image-processing time.
Parallel algorithms for mapping pipelined and parallel computations

NASA Technical Reports Server (NTRS)

Nicol, David M.

1988-01-01

Many computational problems in image processing, signal processing, and scientific computing are naturally structured for either pipelined or parallel computation. When mapping such problems onto a parallel architecture it is often necessary to aggregate an obvious problem decomposition. Even in this context the general mapping problem is known to be computationally intractable, but recent advances have been made in identifying classes of problems and architectures for which optimal solutions can be found in polynomial time. Among these, the mapping of pipelined or parallel computations onto linear array, shared memory, and host-satellite systems figures prominently. This paper extends that work first by showing how to improve existing serial mapping algorithms. These improvements have significantly lower time and space complexities: in one case a published O(nm sup 3) time algorithm for mapping m modules onto n processors is reduced to an O(nm log m) time complexity, and its space requirements reduced from O(nm sup 2) to O(m). Run time complexity is further reduced with parallel mapping algorithms based on these improvements, which run on the architecture for which they create the mappings.
Advances in diagnostic ultrasonography.

PubMed

Reef, V B

1991-08-01

A wide variety of ultrasonographic equipment currently is available for use in equine practice, but no one machine is optimal for every type of imaging. Image quality is the most important factor in equipment selection once the needs of the practitioner are ascertained. The transducer frequencies available, transducer footprints, depth of field displayed, frame rate, gray scale, simultaneous electrocardiography, Doppler, and functions to modify the image are all important considerations. The ability to make measurements off of videocassette recorder playback and future upgradability should be evaluated. Linear array and sector technology are the backbone of equine ultrasonography today. Linear array technology is most useful for a high-volume broodmare practice, whereas sector technology is ideal for a more general equine practice. The curved or convex linear scanner has more applications than the standard linear array and is equipped with the linear array rectal probe, which provides the equine practitioner with a more versatile unit for equine ultrasonographic evaluations. The annular array and phased array systems have improved image quality, but each has its own limitations. The new sector scanners still provide the most versatile affordable equipment for equine general practice.
Modulation Transfer Function (MTF) measurement techniques for lenses and linear detector arrays

NASA Technical Reports Server (NTRS)

Schnabel, J. J., Jr.; Kaishoven, J. E., Jr.; Tom, D.

1984-01-01

Application is the determination of the Modulation Transfer Function (MTF) for linear detector arrays. A system set up requires knowledge of the MTF of the imaging lens. Procedure for this measurement is described for standard optical lab equipment. Given this information, various possible approaches to MTF measurement for linear arrays is described. The knife edge method is then described in detail.
Hyperspectral Microwave Atmospheric Sounder (HyMAS) - New Capability in the CoSMIR-CoSSIR Scanhead

NASA Technical Reports Server (NTRS)

Hilliard, Lawrence; Racette, Paul; Blackwell, William; Galbraith, Christopher; Thompson, Erik

2015-01-01

Lincoln Laboratory and NASA's Goddard Space Flight Center have teamed to re-use an existing instrument platform, the CoSMIR/CoSSIR system for atmospheric sounding, to develop a new capability in hyperspectral filtering, data collection, and display. The volume of the scanhead accomodated an intermediate frequency processor(IFP), that provides the filtering and digitization of the raw data and the interoperable remote component (IRC) adapted to CoSMIR, CoSSIR, and HyMAS that stores and archives the data with time tagged calibration and navigation data. The first element of the work is the demonstration of a hyperspectral microwave receiver subsystem that was recently shown using a comprehensive simulation study to yield performance that substantially exceeds current state-of-the-art. Hyperspectral microwave sounders with approximately 100 channels offer temperature and humidity sounding improvements similar to those obtained when infrared sensors became hyperspectral, but with the relative insensitivity to clouds that characterizes microwave sensors. Hyperspectral microwave operation is achieved using independent RF antenna/receiver arrays that sample the same area/volume of the Earth's surface/atmosphere at slightly different frequencies and therefore synthesize a set of dense, finely spaced vertical weighting functions. The second, enabling element of the proposal is the development of a compact 52-channel Intermediate Frequency processor module. A principal challenge in the development of a hyperspectral microwave system is the size of the IF filter bank required for channelization. Large bandwidths are simultaneously processed, thus complicating the use of digital back-ends with associated high complexities, costs, and power requirements. Our approach involves passive filters implemented using low-temperature co-fired ceramic (LTCC) technology to achieve an ultra-compact module that can be easily integrated with existing radio frequency front-end technology. This IF processor is universally applicable to other microwave sensing missions requiring compact IF spectrometry. The data include 52 operational channels with low IF module volume (less than 100 cubic centimeters) and mass (less than 300 grams) and linearity better than 0.3 percent over a 330,000 dynamic range.
Performances of multiprocessor multidisk architectures for continuous media storage

NASA Astrophysics Data System (ADS)

Gennart, Benoit A.; Messerli, Vincent; Hersch, Roger D.

1996-03-01

Multimedia interfaces increase the need for large image databases, capable of storing and reading streams of data with strict synchronicity and isochronicity requirements. In order to fulfill these requirements, we consider a parallel image server architecture which relies on arrays of intelligent disk nodes, each disk node being composed of one processor and one or more disks. This contribution analyzes through bottleneck performance evaluation and simulation the behavior of two multi-processor multi-disk architectures: a point-to-point architecture and a shared-bus architecture similar to current multiprocessor workstation architectures. We compare the two architectures on the basis of two multimedia algorithms: the compute-bound frame resizing by resampling and the data-bound disk-to-client stream transfer. The results suggest that the shared bus is a potential bottleneck despite its very high hardware throughput (400Mbytes/s) and that an architecture with addressable local memories located closely to their respective processors could partially remove this bottleneck. The point- to-point architecture is scalable and able to sustain high throughputs for simultaneous compute- bound and data-bound operations.
Fault-Tolerant Software-Defined Radio on Manycore

NASA Technical Reports Server (NTRS)

Ricketts, Scott

2015-01-01

Software-defined radio (SDR) platforms generally rely on field-programmable gate arrays (FPGAs) and digital signal processors (DSPs), but such architectures require significant software development. In addition, application demands for radiation mitigation and fault tolerance exacerbate programming challenges. MaXentric Technologies, LLC, has developed a manycore-based SDR technology that provides 100 times the throughput of conventional radiationhardened general purpose processors. Manycore systems (30-100 cores and beyond) have the potential to provide high processing performance at error rates that are equivalent to current space-deployed uniprocessor systems. MaXentric's innovation is a highly flexible radio, providing over-the-air reconfiguration; adaptability; and uninterrupted, real-time, multimode operation. The technology is also compliant with NASA's Space Telecommunications Radio System (STRS) architecture. In addition to its many uses within NASA communications, the SDR can also serve as a highly programmable research-stage prototyping device for new waveforms and other communications technologies. It can also support noncommunication codes on its multicore processor, collocated with the communications workload-reducing the size, weight, and power of the overall system by aggregating processing jobs to a single board computer.
Design of infrasound-detection system via adaptive LMSTDE algorithm

NASA Technical Reports Server (NTRS)

Khalaf, C. S.; Stoughton, J. W.

1984-01-01

A proposed solution to an aviation safety problem is based on passive detection of turbulent weather phenomena through their infrasonic emission. This thesis describes a system design that is adequate for detection and bearing evaluation of infrasounds. An array of four sensors, with the appropriate hardware, is used for the detection part. Bearing evaluation is based on estimates of time delays between sensor outputs. The generalized cross correlation (GCC), as the conventional time-delay estimation (TDE) method, is first reviewed. An adaptive TDE approach, using the least mean square (LMS) algorithm, is then discussed. A comparison between the two techniques is made and the advantages of the adaptive approach are listed. The behavior of the GCC, as a Roth processor, is examined for the anticipated signals. It is shown that the Roth processor has the desired effect of sharpening the peak of the correlation function. It is also shown that the LMSTDE technique is an equivalent implementation of the Roth processor in the time domain. A LMSTDE lead-lag model, with a variable stability coefficient and a convergence criterion, is designed.
Ion propulsion cost effectivity

NASA Technical Reports Server (NTRS)

Zafran, S.; Biess, J. J.

1978-01-01

Ion propulsion modules employing 8-cm thrusters and 30-cm thrusters were studied for Multimission Modular Spacecraft (MMS) applications. Recurring and nonrecurring cost elements were generated for these modules. As a result, ion propulsion cost drivers were identified to be Shuttle charges, solar array, power processing, and thruster costs. Cost effective design approaches included short length module configurations, array power sharing, operation at reduced thruster input power, simplified power processing units, and power processor output switching. The MMS mission model employed indicated that nonrecurring costs have to be shared with other programs unless the mission model grows. Extended performance missions exhibited the greatest benefits when compared with monopropellant hydrazine propulsion.
Opto-VLSI-based photonic true-time delay architecture for broadband adaptive nulling in phased array antennas.

PubMed

Juswardy, Budi; Xiao, Feng; Alameh, Kamal

2009-03-16

This paper proposes a novel Opto-VLSI-based tunable true-time delay generation unit for adaptively steering the nulls of microwave phased array antennas. Arbitrary single or multiple true-time delays can simultaneously be synthesized for each antenna element by slicing an RF-modulated broadband optical source and routing specific sliced wavebands through an Opto-VLSI processor to a high-dispersion fiber. Experimental results are presented, which demonstrate the principle of the true-time delay unit through the generation of 5 arbitrary true-time delays of up to 2.5 ns each. (c) 2009 Optical Society of America
Multi-element germanium detectors for synchrotron applications

NASA Astrophysics Data System (ADS)

Rumaiz, A. K.; Kuczewski, A. J.; Mead, J.; Vernon, E.; Pinelli, D.; Dooryhee, E.; Ghose, S.; Caswell, T.; Siddons, D. P.; Miceli, A.; Baldwin, J.; Almer, J.; Okasinski, J.; Quaranta, O.; Woods, R.; Krings, T.; Stock, S.

2018-04-01

We have developed a series of monolithic multi-element germanium detectors, based on sensor arrays produced by the Forschungzentrum Julich, and on Application-specific integrated circuits (ASICs) developed at Brookhaven. Devices have been made with element counts ranging from 64 to 384. These detectors are being used at NSLS-II and APS for a range of diffraction experiments, both monochromatic and energy-dispersive. Compact and powerful readout systems have been developed, based on the new generation of FPGA system-on-chip devices, which provide closely coupled multi-core processors embedded in large gate arrays. We will discuss the technical details of the systems, and present some of the results from them.
Application of linear array imaging techniques to the real-time inspection of airframe structures and substructures

NASA Technical Reports Server (NTRS)

Miller, James G.

1995-01-01

Development and application of linear array imaging technologies to address specific aging-aircraft inspection issues is described. Real-time video-taped images were obtained from an unmodified commercial linear-array medical scanner of specimens constructed to simulate typical types of flaws encountered in the inspection of aircraft structures. Results suggest that information regarding the characteristics, location, and interface properties of specific types of flaws in materials and structures may be obtained from the images acquired with a linear array. Furthermore, linear array imaging may offer the advantage of being able to compare 'good' regions with 'flawed' regions simultaneously, and in real time. Real-time imaging permits the inspector to obtain image information from various views and provides the opportunity for observing the effects of introducing specific interventions. Observation of an image in real-time can offer the operator the ability to 'interact' with the inspection process, thus providing new capabilities, and perhaps, new approaches to nondestructive inspections.

A generic FPGA-based detector readout and real-time image processing board

NASA Astrophysics Data System (ADS)

Sarpotdar, Mayuresh; Mathew, Joice; Safonova, Margarita; Murthy, Jayant

2016-07-01

For space-based astronomical observations, it is important to have a mechanism to capture the digital output from the standard detector for further on-board analysis and storage. We have developed a generic (application- wise) field-programmable gate array (FPGA) board to interface with an image sensor, a method to generate the clocks required to read the image data from the sensor, and a real-time image processor system (on-chip) which can be used for various image processing tasks. The FPGA board is applied as the image processor board in the Lunar Ultraviolet Cosmic Imager (LUCI) and a star sensor (StarSense) - instruments developed by our group. In this paper, we discuss the various design considerations for this board and its applications in the future balloon and possible space flights.
Space Tug Avionics Definition Study. Volume 5: Cost and Programmatics

NASA Technical Reports Server (NTRS)

1975-01-01

The baseline avionics system features a central digital computer that integrates the functions of all the space tug subsystems by means of a redundant digital data bus. The central computer consists of dual central processor units, dual input/output processors, and a fault tolerant memory, utilizing internal redundancy and error checking. Three electronically steerable phased arrays provide downlink transmission from any tug attitude directly to ground or via TDRS. Six laser gyros and six accelerometers in a dodecahedron configuration make up the inertial measurement unit. Both a scanning laser radar and a TV system, employing strobe lamps, are required as acquisition and docking sensors. Primary dc power at a nominal 28 volts is supplied from dual lightweight, thermally integrated fuel cells which operate from propellant grade reactants out of the main tanks.
Optical recognition of statistical patterns

NASA Astrophysics Data System (ADS)

Lee, S. H.

1981-12-01

Optical implementation of the Fukunaga-Koontz transform (FKT) and the Least-Squares Linear Mapping Technique (LSLMT) is described. The FKT is a linear transformation which performs image feature extraction for a two-class image classification problem. The LSLMT performs a transform from large dimensional feature space to small dimensional decision space for separating multiple image classes by maximizing the interclass differences while minimizing the intraclass variations. The FKT and the LSLMT were optically implemented by utilizing a coded phase optical processor. The transform was used for classifying birds and fish. After the F-K basis functions were calculated, those most useful for classification were incorporated into a computer generated hologram. The output of the optical processor, consisting of the squared magnitude of the F-K coefficients, was detected by a T.V. camera, digitized, and fed into a micro-computer for classification. A simple linear classifier based on only two F-K coefficients was able to separate the images into two classes, indicating that the F-K transform had chosen good features. Two advantages of optically implementing the FKT and LSLMT are parallel and real time processing.
Optical recognition of statistical patterns

NASA Technical Reports Server (NTRS)

Lee, S. H.

1981-01-01

Optical implementation of the Fukunaga-Koontz transform (FKT) and the Least-Squares Linear Mapping Technique (LSLMT) is described. The FKT is a linear transformation which performs image feature extraction for a two-class image classification problem. The LSLMT performs a transform from large dimensional feature space to small dimensional decision space for separating multiple image classes by maximizing the interclass differences while minimizing the intraclass variations. The FKT and the LSLMT were optically implemented by utilizing a coded phase optical processor. The transform was used for classifying birds and fish. After the F-K basis functions were calculated, those most useful for classification were incorporated into a computer generated hologram. The output of the optical processor, consisting of the squared magnitude of the F-K coefficients, was detected by a T.V. camera, digitized, and fed into a micro-computer for classification. A simple linear classifier based on only two F-K coefficients was able to separate the images into two classes, indicating that the F-K transform had chosen good features. Two advantages of optically implementing the FKT and LSLMT are parallel and real time processing.
Summer Proceedings 2016: The Center for Computing Research at Sandia National Laboratories

DOE Office of Scientific and Technical Information (OSTI.GOV)

Carleton, James Brian; Parks, Michael L.

Solving sparse linear systems from the discretization of elliptic partial differential equations (PDEs) is an important building block in many engineering applications. Sparse direct solvers can solve general linear systems, but are usually slower and use much more memory than effective iterative solvers. To overcome these two disadvantages, a hierarchical solver (LoRaSp) based on H2-matrices was introduced in [22]. Here, we have developed a parallel version of the algorithm in LoRaSp to solve large sparse matrices on distributed memory machines. On a single processor, the factorization time of our parallel solver scales almost linearly with the problem size for three-dimensionalmore » problems, as opposed to the quadratic scalability of many existing sparse direct solvers. Moreover, our solver leads to almost constant numbers of iterations, when used as a preconditioner for Poisson problems. On more than one processor, our algorithm has significant speedups compared to sequential runs. With this parallel algorithm, we are able to solve large problems much faster than many existing packages as demonstrated by the numerical experiments.« less
Integrated Sensing Processor, Phase 2

DTIC Science & Technology

2005-12-01

performance analysis for several baseline classifiers including neural nets, linear classifiers, and kNN classifiers. Use of CCDR as a preprocessing step...below the level of the benchmark non-linear classifier for this problem ( kNN ). Furthermore, the CCDR preconditioned kNN achieved a 10% improvement over...the benchmark kNN without CCDR. Finally, we found an important connection between intrinsic dimension estimation via entropic graphs and the optimal
Comparison of characteristics and downstream uniformity of linear-field and cross-field atmospheric pressure plasma jet array in He

NASA Astrophysics Data System (ADS)

Zhang, Bo; Fang, Zhi; Liu, Feng; Zhou, Renwu; Zhou, Ruoyu

2018-06-01

Using an atmospheric pressure plasma jet array is an effective way for expanding the treatment area of a single jet, and generating arrays with well downstream uniformity is of great interest for its applications. In this paper, a plasma jet array in helium is generated in a linear-field jet array with a ring-ring electrode structure excited by alternating current. The characteristics and downstream uniformity of the array and their dependence on the applied voltage and gas flow rate are investigated through optical, electrical, and Schlieren diagnostics. The results are compared with those of our reported work of a cross-field jet array with a needle-ring electrode structure. The results show that the linear-field jet array can generate relatively large-scale plasma with better uniformity and longer plumes than the cross-field case. The divergences observed in gas channels and the plasma plume trajectories are much less than those of the cross-field one. The deflection angle of lateral plumes is less than 6°, which is independent of the gas flow rate and applied voltage. The maximum downstream plumes of 23 mm can be obtained at 7 kV peak applied voltage and 4 l/min gas flow rate. The better uniformity of linear-field jet arrays is due to the effective suppression of hydrodynamic and electrical interactions among the jets in the arrays with a more uniform electric field distribution. The hydrodynamic interaction induced by the gas heating in the linear-field jet array is less than that of the cross-field one. The more uniform electric field distribution in the linear-field jet arrays can reduce the divergence of the propagation trajectories of the plasma plumes. It will generate less residual charge between the adjacent discharges and thus can reduce the accumulation effect of Coulomb force between the plasma plumes. The reported results can help design controllable and scalable plasma jet arrays with well uniformity for material surface and biomedical treatments.
Analog Ranging Modem Code Processor and Generator

DOT National Transportation Integrated Search

1974-05-01

The report details technical development efforts to implement an analog ranging modem using recently developed linear integrated circuits where possible. The breadboard hardware is capable of acquiring frequency and phase of a weak signal in a high n...
Efficient parallel architecture for highly coupled real-time linear system applications

NASA Technical Reports Server (NTRS)

Carroll, Chester C.; Homaifar, Abdollah; Barua, Soumavo

1988-01-01

A systematic procedure is developed for exploiting the parallel constructs of computation in a highly coupled, linear system application. An overall top-down design approach is adopted. Differential equations governing the application under consideration are partitioned into subtasks on the basis of a data flow analysis. The interconnected task units constitute a task graph which has to be computed in every update interval. Multiprocessing concepts utilizing parallel integration algorithms are then applied for efficient task graph execution. A simple scheduling routine is developed to handle task allocation while in the multiprocessor mode. Results of simulation and scheduling are compared on the basis of standard performance indices. Processor timing diagrams are developed on the basis of program output accruing to an optimal set of processors. Basic architectural attributes for implementing the system are discussed together with suggestions for processing element design. Emphasis is placed on flexible architectures capable of accommodating widely varying application specifics.
Parallelization of the FLAPW method

NASA Astrophysics Data System (ADS)

Canning, A.; Mannstadt, W.; Freeman, A. J.

2000-08-01

The FLAPW (full-potential linearized-augmented plane-wave) method is one of the most accurate first-principles methods for determining structural, electronic and magnetic properties of crystals and surfaces. Until the present work, the FLAPW method has been limited to systems of less than about a hundred atoms due to the lack of an efficient parallel implementation to exploit the power and memory of parallel computers. In this work, we present an efficient parallelization of the method by division among the processors of the plane-wave components for each state. The code is also optimized for RISC (reduced instruction set computer) architectures, such as those found on most parallel computers, making full use of BLAS (basic linear algebra subprograms) wherever possible. Scaling results are presented for systems of up to 686 silicon atoms and 343 palladium atoms per unit cell, running on up to 512 processors on a CRAY T3E parallel supercomputer.
Linearly tapered slot antenna circular array for mobile communications

NASA Technical Reports Server (NTRS)

Simons, Rainee N.; Kelly, Eron; Lee, Richard Q.; Taub, Susan R.

1993-01-01

The design, fabrication and testing of a conformal K-band circular array is presented. The array consists of sixteen linearly tapered slot antennas (LTSA). It is fed by a 1:16 microstrip line power splitter via electromagnetic coupling. The array has an omni-directional pattern in the azimuth plane. In the elevation plane the beam is displaced above the horizon.
FPGA-based distributed computing microarchitecture for complex physical dynamics investigation.

PubMed

Borgese, Gianluca; Pace, Calogero; Pantano, Pietro; Bilotta, Eleonora

2013-09-01

In this paper, we present a distributed computing system, called DCMARK, aimed at solving partial differential equations at the basis of many investigation fields, such as solid state physics, nuclear physics, and plasma physics. This distributed architecture is based on the cellular neural network paradigm, which allows us to divide the differential equation system solving into many parallel integration operations to be executed by a custom multiprocessor system. We push the number of processors to the limit of one processor for each equation. In order to test the present idea, we choose to implement DCMARK on a single FPGA, designing the single processor in order to minimize its hardware requirements and to obtain a large number of easily interconnected processors. This approach is particularly suited to study the properties of 1-, 2- and 3-D locally interconnected dynamical systems. In order to test the computing platform, we implement a 200 cells, Korteweg-de Vries (KdV) equation solver and perform a comparison between simulations conducted on a high performance PC and on our system. Since our distributed architecture takes a constant computing time to solve the equation system, independently of the number of dynamical elements (cells) of the CNN array, it allows us to reduce the elaboration time more than other similar systems in the literature. To ensure a high level of reconfigurability, we design a compact system on programmable chip managed by a softcore processor, which controls the fast data/control communication between our system and a PC Host. An intuitively graphical user interface allows us to change the calculation parameters and plot the results.
MAP3D: a media processor approach for high-end 3D graphics

NASA Astrophysics Data System (ADS)

Darsa, Lucia; Stadnicki, Steven; Basoglu, Chris

1999-12-01

Equator Technologies, Inc. has used a software-first approach to produce several programmable and advanced VLIW processor architectures that have the flexibility to run both traditional systems tasks and an array of media-rich applications. For example, Equator's MAP1000A is the world's fastest single-chip programmable signal and image processor targeted for digital consumer and office automation markets. The Equator MAP3D is a proposal for the architecture of the next generation of the Equator MAP family. The MAP3D is designed to achieve high-end 3D performance and a variety of customizable special effects by combining special graphics features with high performance floating-point and media processor architecture. As a programmable media processor, it offers the advantages of a completely configurable 3D pipeline--allowing developers to experiment with different algorithms and to tailor their pipeline to achieve the highest performance for a particular application. With the support of Equator's advanced C compiler and toolkit, MAP3D programs can be written in a high-level language. This allows the compiler to successfully find and exploit any parallelism in a programmer's code, thus decreasing the time to market of a given applications. The ability to run an operating system makes it possible to run concurrent applications in the MAP3D chip, such as video decoding while executing the 3D pipelines, so that integration of applications is easily achieved--using real-time decoded imagery for texturing 3D objects, for instance. This novel architecture enables an affordable, integrated solution for high performance 3D graphics.
A linear refractive photovoltaic concentrator solar array flight experiment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jones, P.A.; Murphy, D.M.; Piszczor, M.F.

1995-12-31

Concentrator arrays deliver a number of generic benefits for space including high array efficiency, protection from space radiation effects, and minimized plasma interactions. The line focus concentrator concept delivers two added advantages: (1) low-cost mass production of the lens material and, (2) relaxation of precise array tracking requirements to only a single axis. New array designs emphasize lightweight, high stiffness, stow-ability and ease of manufacture and assembly. The linear refractive concentrator can be designed to provide an essentially flat response over a wide range of longitudinal pointing errors for satellites having only single-axis tracking capability. In this paper the authorsmore » address the current status of the SCARLET linear concentrator program with special emphasis on hardware development of an array-level linear refractive concentrator flight experiment. An aggressive, 6-month development and flight validation program, sponsored by the Ballistic Missile Defense Organization (BMDO) and NASA Lewis Research Center, will quantify and verify SCARLET benefits with in-orbit performance measurements.« less
Off-Grid Direction of Arrival Estimation Based on Joint Spatial Sparsity for Distributed Sparse Linear Arrays

PubMed Central

Liang, Yujie; Ying, Rendong; Lu, Zhenqi; Liu, Peilin

2014-01-01

In the design phase of sensor arrays during array signal processing, the estimation performance and system cost are largely determined by array aperture size. In this article, we address the problem of joint direction-of-arrival (DOA) estimation with distributed sparse linear arrays (SLAs) and propose an off-grid synchronous approach based on distributed compressed sensing to obtain larger array aperture. We focus on the complex source distribution in the practical applications and classify the sources into common and innovation parts according to whether a signal of source can impinge on all the SLAs or a specific one. For each SLA, we construct a corresponding virtual uniform linear array (ULA) to create the relationship of random linear map between the signals respectively observed by these two arrays. The signal ensembles including the common/innovation sources for different SLAs are abstracted as a joint spatial sparsity model. And we use the minimization of concatenated atomic norm via semidefinite programming to solve the problem of joint DOA estimation. Joint calculation of the signals observed by all the SLAs exploits their redundancy caused by the common sources and decreases the requirement of array size. The numerical results illustrate the advantages of the proposed approach. PMID:25420150
Micro-machined high-frequency (80 MHz) PZT thick film linear arrays.

PubMed

Zhou, Qifa; Wu, Dawei; Liu, Changgeng; Zhu, Benpeng; Djuth, Frank; Shung, K

2010-10-01

This paper presents the development of a micromachined high-frequency linear array using PZT piezoelectric thick films. The linear array has 32 elements with an element width of 24 μm and an element length of 4 mm. Array elements were fabricated by deep reactive ion etching of PZT thick films, which were prepared from spin-coating of PZT sol-gel composite. Detailed fabrication processes, especially PZT thick film etching conditions and a novel transferring-and-etching method, are presented and discussed. Array designs were evaluated by simulation. Experimental measurements show that the array had a center frequency of 80 MHz and a fractional bandwidth (-6 dB) of 60%. An insertion loss of -41 dB and adjacent element crosstalk of -21 dB were found at the center frequency.
A bunch to bucket phase detector for the RHIC LLRF upgrade platform

DOE Office of Scientific and Technical Information (OSTI.GOV)

Smith, K.S.; Harvey, M.; Hayes, T.

2011-03-28

As part of the overall development effort for the RHIC LLRF Upgrade Platform [1,2,3], a generic four channel 16 bit Analog-to-Digital Converter (ADC) daughter module was developed to provide high speed, wide dynamic range digitizing and processing of signals from DC to several hundred megahertz. The first operational use of this card was to implement the bunch to bucket phase detector for the RHIC LLRF beam control feedback loops. This paper will describe the design and performance features of this daughter module as a bunch to bucket phase detector, and also provide an overview of its place within the overallmore » LLRF platform architecture as a high performance digitizer and signal processing module suitable to a variety of applications. In modern digital control and signal processing systems, ADCs provide the interface between the analog and digital signal domains. Once digitized, signals are then typically processed using algorithms implemented in field programmable gate array (FPGA) logic, general purpose processors (GPPs), digital signal processors (DSPs) or a combination of these. For the recently developed and commissioned RHIC LLRF Upgrade Platform, we've developed a four channel ADC daughter module based on the Linear Technology LTC2209 16 bit, 160 MSPS ADC and the Xilinx V5FX70T FPGA. The module is designed to be relatively generic in application, and with minimal analog filtering on board, is capable of processing signals from DC to 500 MHz or more. The module's first application was to implement the bunch to bucket phase detector (BTB-PD) for the RHIC LLRF system. The same module also provides DC digitizing of analog processed BPM signals used by the LLRF system for radial feedback.« less
NRL Review 1991

DTIC Science & Technology

1991-05-01

contact between averaging of the strong nuclear dipolar interaction the components will result at the interfacial region in this sample. In contrast, tho...and a sea marker to help save survivors $1.5 million for the institution in 1916, but of disasters at sea. A thermal diffusion process wartime delays...memory for large simulations on parallel intervening medium. Accomplishing this research array processors and immediate displays of results requires
Successful development of first-generation laser device; marking China's optoelectronic technology at world class level

NASA Astrophysics Data System (ADS)

1995-04-01

Bell Laboratories has developed the world's first optical information processor. Its core device is a self-excited electrooptical effect apparatus array of symmetric operation. After being developed in the United States, this high-technology device was successfully developed by China's scientists,thus making the fact that China's optoelectronic technology is among the most advanced in the world.
Effects Of Local Oscillator Errors On Digital Beamforming

DTIC Science & Technology

2016-03-01

processor EF element factor EW electronic warfare FFM flicker frequency modulation FOV field-of-view FPGA field-programmable gate array FPM flicker...frequencies and also more difficult to measure [15]. 2. Flicker frequency modulation The source for flicker frequency modulation ( FFM ) is attributed to...a physical resonance mechanism of an oscillator or issues controlling electronic components. Some oscillators might not show FFM noise, which might

Adaptive Optoelectronic Eyes: Hybrid Sensor/Processor Architectures

DTIC Science & Technology

2006-11-13

corresponding calculated data. The width of the mirror stopband is proportional to the refractive index difference between the high and low index materials ...Silicon VLSI Neuron Unit Arrays 56 Development of a Single-Sided Flip-Chip Bonding Process 65 Development of High Refractive Index Diffractive Optical ...Elements (DOEs) 68 Development of High-Performance Antireflection Coatings for High Refractive Index DOEs 69 Design and Fabrication of Low Threshold
A MIMO-Inspired Rapidly Switchable Photonic Interconnect Architecture (Postprint)

DTIC Science & Technology

2009-07-01

capabilities of future systems. Highspeed optical processing has been looked to as a means for eliminating this interconnect bottleneck. Presented...here are the results of a study for a novel optical (integrated photonic) processor which would allow for a high-speed, secure means for arbitrarily...regarded as a Multiple Input Multiple Output (MIMO) architecture. 15. SUBJECT TERMS Free-space optical interconnects, Optical Phased Arrays, High-Speed
The Use of Field Programmable Gate Arrays (FPGA) in Small Satellite Communication Systems

NASA Technical Reports Server (NTRS)

Varnavas, Kosta; Sims, William Herbert; Casas, Joseph

2015-01-01

This paper will describe the use of digital Field Programmable Gate Arrays (FPGA) to contribute to advancing the state-of-the-art in software defined radio (SDR) transponder design for the emerging SmallSat and CubeSat industry and to provide advances for NASA as described in the TAO5 Communication and Navigation Roadmap (Ref 4). The use of software defined radios (SDR) has been around for a long time. A typical implementation of the SDR is to use a processor and write software to implement all the functions of filtering, carrier recovery, error correction, framing etc. Even with modern high speed and low power digital signal processors, high speed memories, and efficient coding, the compute intensive nature of digital filters, error correcting and other algorithms is too much for modern processors to get efficient use of the available bandwidth to the ground. By using FPGAs, these compute intensive tasks can be done in parallel, pipelined fashion and more efficiently use every clock cycle to significantly increase throughput while maintaining low power. These methods will implement digital radios with significant data rates in the X and Ka bands. Using these state-of-the-art technologies, unprecedented uplink and downlink capabilities can be achieved in a 1/2 U sized telemetry system. Additionally, modern FPGAs have embedded processing systems, such as ARM cores, integrated inside the FPGA allowing mundane tasks such as parameter commanding to occur easily and flexibly. Potential partners include other NASA centers, industry and the DOD. These assets are associated with small satellite demonstration flights, LEO and deep space applications. MSFC currently has an SDR transponder test-bed using Hardware-in-the-Loop techniques to evaluate and improve SDR technologies.
Parallel eigenanalysis of finite element models in a completely connected architecture

NASA Technical Reports Server (NTRS)

Akl, F. A.; Morel, M. R.

1989-01-01

A parallel algorithm is presented for the solution of the generalized eigenproblem in linear elastic finite element analysis, (K)(phi) = (M)(phi)(omega), where (K) and (M) are of order N, and (omega) is order of q. The concurrent solution of the eigenproblem is based on the multifrontal/modified subspace method and is achieved in a completely connected parallel architecture in which each processor is allowed to communicate with all other processors. The algorithm was successfully implemented on a tightly coupled multiple-instruction multiple-data parallel processing machine, Cray X-MP. A finite element model is divided into m domains each of which is assumed to process n elements. Each domain is then assigned to a processor or to a logical processor (task) if the number of domains exceeds the number of physical processors. The macrotasking library routines are used in mapping each domain to a user task. Computational speed-up and efficiency are used to determine the effectiveness of the algorithm. The effect of the number of domains, the number of degrees-of-freedom located along the global fronts and the dimension of the subspace on the performance of the algorithm are investigated. A parallel finite element dynamic analysis program, p-feda, is documented and the performance of its subroutines in parallel environment is analyzed.
a Real-Time Computer Music Synthesis System

NASA Astrophysics Data System (ADS)

Lent, Keith Henry

A real time sound synthesis system has been developed at the Computer Music Center of The University of Texas at Austin. This system consists of several stand alone processors that were constructed jointly with White Instruments in Austin. These processors can be programmed as general purpose computers, but are provided with a number of specialized interfaces including: MIDI, 8 bit parallel, high speed serial, 2 channels analog input (18 bit A/Ds, 48kHz sample rate), and 4 channels analog output (18 bit D/As). In addition, a basic music synthesis language (Music56000) has been written in assembly code. On top of this, a symbolic compiler (PatchWork) has been developed to enable algorithms which run in these processors to be created graphically. And finally, a number of efficient time domain numerical models have been developed to enable the construction, simulation, control, and synthesis of many musical acoustics systems in real time on these processors. Specifically, assembly language models for cylindrical and conical horn sections, dissipative losses, tone holes, bells, and a number of linear and nonlinear boundary conditions have been developed.
Linear antenna array optimization using flower pollination algorithm.

PubMed

Saxena, Prerna; Kothari, Ashwin

2016-01-01

Flower pollination algorithm (FPA) is a new nature-inspired evolutionary algorithm used to solve multi-objective optimization problems. The aim of this paper is to introduce FPA to the electromagnetics and antenna community for the optimization of linear antenna arrays. FPA is applied for the first time to linear array so as to obtain optimized antenna positions in order to achieve an array pattern with minimum side lobe level along with placement of deep nulls in desired directions. Various design examples are presented that illustrate the use of FPA for linear antenna array optimization, and subsequently the results are validated by benchmarking along with results obtained using other state-of-the-art, nature-inspired evolutionary algorithms such as particle swarm optimization, ant colony optimization and cat swarm optimization. The results suggest that in most cases, FPA outperforms the other evolutionary algorithms and at times it yields a similar performance.
Out-Phased Array Linearized Signaling (OPALS): A Practical Approach to Physical Layer Encryption

DTIC Science & Technology

2015-10-26

Out-Phased Array Linearized Signaling ( OPALS ): A Practical Approach to Physical Layer Encryption Eric Tollefson, Bruce R. Jordan Jr., and Joseph D... OPALS ) which provides a practical approach to physical-layer encryption through spatial masking. Our approach modifies just the transmitter to employ...of the channel. With Out-Phased Array Linearized Signaling ( OPALS ), we propose a new masking technique that has some advantages of each of the
Field-Programmable Gate Array Computer in Structural Analysis: An Initial Exploration

NASA Technical Reports Server (NTRS)

Singleterry, Robert C., Jr.; Sobieszczanski-Sobieski, Jaroslaw; Brown, Samuel

2002-01-01

This paper reports on an initial assessment of using a Field-Programmable Gate Array (FPGA) computational device as a new tool for solving structural mechanics problems. A FPGA is an assemblage of binary gates arranged in logical blocks that are interconnected via software in a manner dependent on the algorithm being implemented and can be reprogrammed thousands of times per second. In effect, this creates a computer specialized for the problem that automatically exploits all the potential for parallel computing intrinsic in an algorithm. This inherent parallelism is the most important feature of the FPGA computational environment. It is therefore important that if a problem offers a choice of different solution algorithms, an algorithm of a higher degree of inherent parallelism should be selected. It is found that in structural analysis, an 'analog computer' style of programming, which solves problems by direct simulation of the terms in the governing differential equations, yields a more favorable solution algorithm than current solution methods. This style of programming is facilitated by a 'drag-and-drop' graphic programming language that is supplied with the particular type of FPGA computer reported in this paper. Simple examples in structural dynamics and statics illustrate the solution approach used. The FPGA system also allows linear scalability in computing capability. As the problem grows, the number of FPGA chips can be increased with no loss of computing efficiency due to data flow or algorithmic latency that occurs when a single problem is distributed among many conventional processors that operate in parallel. This initial assessment finds the FPGA hardware and software to be in their infancy in regard to the user conveniences; however, they have enormous potential for shrinking the elapsed time of structural analysis solutions if programmed with algorithms that exhibit inherent parallelism and linear scalability. This potential warrants further development of FPGA-tailored algorithms for structural analysis.
Real-Time Symbol Extraction From Grey-Level Images

NASA Astrophysics Data System (ADS)

Massen, R.; Simnacher, M.; Rosch, J.; Herre, E.; Wuhrer, H. W.

1988-04-01

A VME-bus image pipeline processor for extracting vectorized contours from grey-level images in real-time is presented. This 3 Giga operation per second processor uses large kernel convolvers and new non-linear neighbourhood processing algorithms to compute true 1-pixel wide and noise-free contours without thresholding even from grey-level images with quite varying edge sharpness. The local edge orientation is used as an additional cue to compute a list of vectors describing the closed and open contours in real-time and to dump a CAD-like symbolic image description into a symbol memory at pixel clock rate.
Iterative algorithms for tridiagonal matrices on a WSI-multiprocessor

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gajski, D.D.; Sameh, A.H.; Wisniewski, J.A.

1982-01-01

With the rapid advances in semiconductor technology, the construction of Wafer Scale Integration (WSI)-multiprocessors consisting of a large number of processors is now feasible. We illustrate the implementation of some basic linear algebra algorithms on such multiprocessors.
Supercomputing on massively parallel bit-serial architectures

NASA Technical Reports Server (NTRS)

Iobst, Ken

1985-01-01

Research on the Goodyear Massively Parallel Processor (MPP) suggests that high-level parallel languages are practical and can be designed with powerful new semantics that allow algorithms to be efficiently mapped to the real machines. For the MPP these semantics include parallel/associative array selection for both dense and sparse matrices, variable precision arithmetic to trade accuracy for speed, micro-pipelined train broadcast, and conditional branching at the processing element (PE) control unit level. The preliminary design of a FORTRAN-like parallel language for the MPP has been completed and is being used to write programs to perform sparse matrix array selection, min/max search, matrix multiplication, Gaussian elimination on single bit arrays and other generic algorithms. A description is given of the MPP design. Features of the system and its operation are illustrated in the form of charts and diagrams.
Isotropic-resolution linear-array-based photoacoustic computed tomography through inverse Radon transform

NASA Astrophysics Data System (ADS)

Li, Guo; Xia, Jun; Li, Lei; Wang, Lidai; Wang, Lihong V.

2015-03-01

Linear transducer arrays are readily available for ultrasonic detection in photoacoustic computed tomography. They offer low cost, hand-held convenience, and conventional ultrasonic imaging. However, the elevational resolution of linear transducer arrays, which is usually determined by the weak focus of the cylindrical acoustic lens, is about one order of magnitude worse than the in-plane axial and lateral spatial resolutions. Therefore, conventional linear scanning along the elevational direction cannot provide high-quality three-dimensional photoacoustic images due to the anisotropic spatial resolutions. Here we propose an innovative method to achieve isotropic resolutions for three-dimensional photoacoustic images through combined linear and rotational scanning. In each scan step, we first elevationally scan the linear transducer array, and then rotate the linear transducer array along its center in small steps, and scan again until 180 degrees have been covered. To reconstruct isotropic three-dimensional images from the multiple-directional scanning dataset, we use the standard inverse Radon transform originating from X-ray CT. We acquired a three-dimensional microsphere phantom image through the inverse Radon transform method and compared it with a single-elevational-scan three-dimensional image. The comparison shows that our method improves the elevational resolution by up to one order of magnitude, approaching the in-plane lateral-direction resolution. In vivo rat images were also acquired.
Moving-Article X-Ray Imaging System and Method for 3-D Image Generation

NASA Technical Reports Server (NTRS)

Fernandez, Kenneth R. (Inventor)

2012-01-01

An x-ray imaging system and method for a moving article are provided for an article moved along a linear direction of travel while the article is exposed to non-overlapping x-ray beams. A plurality of parallel linear sensor arrays are disposed in the x-ray beams after they pass through the article. More specifically, a first half of the plurality are disposed in a first of the x-ray beams while a second half of the plurality are disposed in a second of the x-ray beams. Each of the parallel linear sensor arrays is oriented perpendicular to the linear direction of travel. Each of the parallel linear sensor arrays in the first half is matched to a corresponding one of the parallel linear sensor arrays in the second half in terms of an angular position in the first of the x-ray beams and the second of the x-ray beams, respectively.
Micro-Machined High-Frequency (80 MHz) PZT Thick Film Linear Arrays

PubMed Central

Zhou, Qifa; Wu, Dawei; Liu, Changgeng; Zhu, Benpeng; Djuth, Frank; Shung, K. Kirk

2010-01-01

This paper presents the development of a micro-machined high-frequency linear array using PZT piezoelectric thick films. The linear array has 32 elements with an element width of 24 μm and an element length of 4 mm. Array elements were fabricated by deep reactive ion etching of PZT thick films, which were prepared from spin-coating of PZT solgel composite. Detailed fabrication processes, especially PZT thick film etching conditions and a novel transferring-and-etching method, are presented and discussed. Array designs were evaluated by simulation. Experimental measurements show that the array had a center frequency of 80 MHz and a fractional bandwidth (−6 dB) of 60%. An insertion loss of −41 dB and adjacent element crosstalk of −21 dB were found at the center frequency. PMID:20889407
Reliability of Central Adiposity Assessments Using B-Mode Ultrasound: A Comparison of Linear and Curved Array Transducers.

PubMed

Stoner, Lee; Geoffron, Morgane; Cornwall, Jon; Chinn, Victoria; Gram, Martin; Credeur, Daniel; Fryer, Simon

2016-12-01

Recently, it was reported that intra-abdominal thickness (IAT) assessments using ultrasound are most reliable if measured from the linea alba to the anterior vertebral column. These 2 anatomical sites can be simultaneously visualized using a linear array transducer. Linear array transducers have different operational characteristics when compared with conventional curved array transducers and are more reliable for some ultrasound-derived measures such as abdominal subcutaneous fat thickness. However, it is unknown whether linear array transducers facilitate more reliable IAT measurements than curved array transducers. The purpose of the current study was to (1) compare the reliability of linear and curved array transducer assessments of IAT and maximal abdominal ratio (MAR) and (2) use the findings to update central adiposity measurement guidelines. Fifteen healthy adults (mean [SD], 27 [10] years; 60% female) with a range of somatotypes (body mass index: mean [SD], 24 [4]; range, 19-33 kg/m; waist circumference: mean [SD], 75 [11]; range, 61-96 cm) were tested on 3 mornings under standardized conditions. Intra-abdominal thickness was assessed 2 cm above the umbilicus (transverse plane), measuring from linea alba to the anterior vertebral column. Maximal abdominal ratio was defined as the ratio of IAT to abdominal subcutaneous fat thickness. The IAT range was 25 to 87 mm, and the MAR range was 0.15 to 0.77. Between-day intraclass correlation coefficient values for IAT measurements made were comparable (0.96-0.97) for both transducers, as were MAR values (0.95). In conclusion, while both transducers provided equally reliable measurement of IAT, the use of a single linear array transducer simplifies the assessment of central adiposity.
Adaptive and mobile ground sensor array.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Holzrichter, Michael Warren; O'Rourke, William T.; Zenner, Jennifer

The goal of this LDRD was to demonstrate the use of robotic vehicles for deploying and autonomously reconfiguring seismic and acoustic sensor arrays with high (centimeter) accuracy to obtain enhancement of our capability to locate and characterize remote targets. The capability to accurately place sensors and then retrieve and reconfigure them allows sensors to be placed in phased arrays in an initial monitoring configuration and then to be reconfigured in an array tuned to the specific frequencies and directions of the selected target. This report reviews the findings and accomplishments achieved during this three-year project. This project successfully demonstrated autonomousmore » deployment and retrieval of a payload package with an accuracy of a few centimeters using differential global positioning system (GPS) signals. It developed an autonomous, multisensor, temporally aligned, radio-frequency communication and signal processing capability, and an array optimization algorithm, which was implemented on a digital signal processor (DSP). Additionally, the project converted the existing single-threaded, monolithic robotic vehicle control code into a multi-threaded, modular control architecture that enhances the reuse of control code in future projects.« less
Wide-field microscopy using microcamera arrays

NASA Astrophysics Data System (ADS)

Marks, Daniel L.; Youn, Seo Ho; Son, Hui S.; Kim, Jungsang; Brady, David J.

2013-02-01

A microcamera is a relay lens paired with image sensors. Microcameras are grouped into arrays to relay overlapping views of a single large surface to the sensors to form a continuous synthetic image. The imaged surface may be curved or irregular as each camera may independently be dynamically focused to a different depth. Microcamera arrays are akin to microprocessors in supercomputers in that both join individual processors by an optoelectronic routing fabric to increase capacity and performance. A microcamera may image ten or more megapixels and grouped into an array of several hundred, as has already been demonstrated by the DARPA AWARE Wide-Field program with multiscale gigapixel photography. We adapt gigapixel microcamera array architectures to wide-field microscopy of irregularly shaped surfaces to greatly increase area imaging over 1000 square millimeters at resolutions of 3 microns or better in a single snapshot. The system includes a novel relay design, a sensor electronics package, and a FPGA-based networking fabric. Biomedical applications of this include screening for skin lesions, wide-field and resolution-agile microsurgical imaging, and microscopic cytometry of millions of cells performed in situ.
Multi-terabyte EIDE disk arrays running Linux RAID5

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sanders, D.A.; Cremaldi, L.M.; Eschenburg, V.

2004-11-01

High-energy physics experiments are currently recording large amounts of data and in a few years will be recording prodigious quantities of data. New methods must be developed to handle this data and make analysis at universities possible. Grid Computing is one method; however, the data must be cached at the various Grid nodes. We examine some storage techniques that exploit recent developments in commodity hardware. Disk arrays using RAID level 5 (RAID-5) include both parity and striping. The striping improves access speed. The parity protects data in the event of a single disk failure, but not in the case ofmore » multiple disk failures. We report on tests of dual-processor Linux Software RAID-5 arrays and Hardware RAID-5 arrays using a 12-disk 3ware controller, in conjunction with 250 and 300 GB disks, for use in offline high-energy physics data analysis. The price of IDE disks is now less than $1/GB. These RAID-5 disk arrays can be scaled to sizes affordable to small institutions and used when fast random access at low cost is important.« less
A new parallel-vector finite element analysis software on distributed-memory computers

NASA Technical Reports Server (NTRS)

Qin, Jiangning; Nguyen, Duc T.

1993-01-01

A new parallel-vector finite element analysis software package MPFEA (Massively Parallel-vector Finite Element Analysis) is developed for large-scale structural analysis on massively parallel computers with distributed-memory. MPFEA is designed for parallel generation and assembly of the global finite element stiffness matrices as well as parallel solution of the simultaneous linear equations, since these are often the major time-consuming parts of a finite element analysis. Block-skyline storage scheme along with vector-unrolling techniques are used to enhance the vector performance. Communications among processors are carried out concurrently with arithmetic operations to reduce the total execution time. Numerical results on the Intel iPSC/860 computers (such as the Intel Gamma with 128 processors and the Intel Touchstone Delta with 512 processors) are presented, including an aircraft structure and some very large truss structures, to demonstrate the efficiency and accuracy of MPFEA.
Distributed Computation of the knn Graph for Large High-Dimensional Point Sets

PubMed Central

Plaku, Erion; Kavraki, Lydia E.

2009-01-01

High-dimensional problems arising from robot motion planning, biology, data mining, and geographic information systems often require the computation of k nearest neighbor (knn) graphs. The knn graph of a data set is obtained by connecting each point to its k closest points. As the research in the above-mentioned fields progressively addresses problems of unprecedented complexity, the demand for computing knn graphs based on arbitrary distance metrics and large high-dimensional data sets increases, exceeding resources available to a single machine. In this work we efficiently distribute the computation of knn graphs for clusters of processors with message passing. Extensions to our distributed framework include the computation of graphs based on other proximity queries, such as approximate knn or range queries. Our experiments show nearly linear speedup with over one hundred processors and indicate that similar speedup can be obtained with several hundred processors. PMID:19847318

A Low-Power ASIC Signal Processor for a Vestibular Prosthesis.

PubMed

Töreyin, Hakan; Bhatti, Pamela T

2016-06-01

A low-power ASIC signal processor for a vestibular prosthesis (VP) is reported. Fabricated with TI 0.35 μm CMOS technology and designed to interface with implanted inertial sensors, the digitally assisted analog signal processor operates extensively in the CMOS subthreshold region. During its operation the ASIC encodes head motion signals captured by the inertial sensors as electrical pulses ultimately targeted for in-vivo stimulation of vestibular nerve fibers. To achieve this, the ASIC implements a coordinate system transformation to correct for misalignment between natural sensors and implanted inertial sensors. It also mimics the frequency response characteristics and frequency encoding mappings of angular and linear head motions observed at the peripheral sense organs, semicircular canals and otolith. Overall the design occupies an area of 6.22 mm (2) and consumes 1.24 mW when supplied with ± 1.6 V.
A Low-Power ASIC Signal Processor for a Vestibular Prosthesis

PubMed Central

Töreyin, Hakan; Bhatti, Pamela T.

2017-01-01

A low-power ASIC signal processor for a vestibular prosthesis (VP) is reported. Fabricated with TI 0.35 μm CMOS technology and designed to interface with implanted inertial sensors, the digitally assisted analog signal processor operates extensively in the CMOS subthreshold region. During its operation the ASIC encodes head motion signals captured by the inertial sensors as electrical pulses ultimately targeted for in-vivo stimulation of vestibular nerve fibers. To achieve this, the ASIC implements a coordinate system transformation to correct for misalignment between natural sensors and implanted inertial sensors. It also mimics the frequency response characteristics and frequency encoding mappings of angular and linear head motions observed at the peripheral sense organs, semicircular canals and otolith. Overall the design occupies an area of 6.22 mm2 and consumes 1.24 mW when supplied with ± 1.6 V. PMID:26800546
Advanced compilation techniques in the PARADIGM compiler for distributed-memory multicomputers

NASA Technical Reports Server (NTRS)

Su, Ernesto; Lain, Antonio; Ramaswamy, Shankar; Palermo, Daniel J.; Hodges, Eugene W., IV; Banerjee, Prithviraj

1995-01-01

The PARADIGM compiler project provides an automated means to parallelize programs, written in a serial programming model, for efficient execution on distributed-memory multicomputers. .A previous implementation of the compiler based on the PTD representation allowed symbolic array sizes, affine loop bounds and array subscripts, and variable number of processors, provided that arrays were single or multi-dimensionally block distributed. The techniques presented here extend the compiler to also accept multidimensional cyclic and block-cyclic distributions within a uniform symbolic framework. These extensions demand more sophisticated symbolic manipulation capabilities. A novel aspect of our approach is to meet this demand by interfacing PARADIGM with a powerful off-the-shelf symbolic package, Mathematica. This paper describes some of the Mathematica routines that performs various transformations, shows how they are invoked and used by the compiler to overcome the new challenges, and presents experimental results for code involving cyclic and block-cyclic arrays as evidence of the feasibility of the approach.
SWARM: A 32 GHz Correlator and VLBI Beamformer for the Submillimeter Array

NASA Astrophysics Data System (ADS)

Primiani, Rurik A.; Young, Kenneth H.; Young, André; Patel, Nimesh; Wilson, Robert W.; Vertatschitsch, Laura; Chitwood, Billie B.; Srinivasan, Ranjani; MacMahon, David; Weintroub, Jonathan

2016-03-01

A 32GHz bandwidth VLBI capable correlator and phased array has been designed and deployeda at the Smithsonian Astrophysical Observatory’s Submillimeter Array (SMA). The SMA Wideband Astronomical ROACH2 Machine (SWARM) integrates two instruments: a correlator with 140kHz spectral resolution across its full 32GHz band, used for connected interferometric observations, and a phased array summer used when the SMA participates as a station in the Event Horizon Telescope (EHT) very long baseline interferometry (VLBI) array. For each SWARM quadrant, Reconfigurable Open Architecture Computing Hardware (ROACH2) units shared under open-source from the Collaboration for Astronomy Signal Processing and Electronics Research (CASPER) are equipped with a pair of ultra-fast analog-to-digital converters (ADCs), a field programmable gate array (FPGA) processor, and eight 10 Gigabit Ethernet (GbE) ports. A VLBI data recorder interface designated the SWARM digital back end, or SDBE, is implemented with a ninth ROACH2 per quadrant, feeding four Mark6 VLBI recorders with an aggregate recording rate of 64 Gbps. This paper describes the design and implementation of SWARM, as well as its deployment at SMA with reference to verification and science data.
Optimized Hyper Beamforming of Linear Antenna Arrays Using Collective Animal Behaviour

PubMed Central

Ram, Gopi; Mandal, Durbadal; Kar, Rajib; Ghoshal, Sakti Prasad

2013-01-01

A novel optimization technique which is developed on mimicking the collective animal behaviour (CAB) is applied for the optimal design of hyper beamforming of linear antenna arrays. Hyper beamforming is based on sum and difference beam patterns of the array, each raised to the power of a hyperbeam exponent parameter. The optimized hyperbeam is achieved by optimization of current excitation weights and uniform interelement spacing. As compared to conventional hyper beamforming of linear antenna array, real coded genetic algorithm (RGA), particle swarm optimization (PSO), and differential evolution (DE) applied to the hyper beam of the same array can achieve reduction in sidelobe level (SLL) and same or less first null beam width (FNBW), keeping the same value of hyperbeam exponent. Again, further reductions of sidelobe level (SLL) and first null beam width (FNBW) have been achieved by the proposed collective animal behaviour (CAB) algorithm. CAB finds near global optimal solution unlike RGA, PSO, and DE in the present problem. The above comparative optimization is illustrated through 10-, 14-, and 20-element linear antenna arrays to establish the optimization efficacy of CAB. PMID:23970843
A scalable geometric multigrid solver for nonsymmetric elliptic systems with application to variable-density flows

NASA Astrophysics Data System (ADS)

Esmaily, M.; Jofre, L.; Mani, A.; Iaccarino, G.

2018-03-01

A geometric multigrid algorithm is introduced for solving nonsymmetric linear systems resulting from the discretization of the variable density Navier-Stokes equations on nonuniform structured rectilinear grids and high-Reynolds number flows. The restriction operation is defined such that the resulting system on the coarser grids is symmetric, thereby allowing for the use of efficient smoother algorithms. To achieve an optimal rate of convergence, the sequence of interpolation and restriction operations are determined through a dynamic procedure. A parallel partitioning strategy is introduced to minimize communication while maintaining the load balance between all processors. To test the proposed algorithm, we consider two cases: 1) homogeneous isotropic turbulence discretized on uniform grids and 2) turbulent duct flow discretized on stretched grids. Testing the algorithm on systems with up to a billion unknowns shows that the cost varies linearly with the number of unknowns. This O (N) behavior confirms the robustness of the proposed multigrid method regarding ill-conditioning of large systems characteristic of multiscale high-Reynolds number turbulent flows. The robustness of our method to density variations is established by considering cases where density varies sharply in space by a factor of up to 104, showing its applicability to two-phase flow problems. Strong and weak scalability studies are carried out, employing up to 30,000 processors, to examine the parallel performance of our implementation. Excellent scalability of our solver is shown for a granularity as low as 104 to 105 unknowns per processor. At its tested peak throughput, it solves approximately 4 billion unknowns per second employing over 16,000 processors with a parallel efficiency higher than 50%.
Microstrip antenna developments at JPL

NASA Technical Reports Server (NTRS)

Huang, John

1991-01-01

The in-house development of microstrip antennas, initiated in 1981, when a spaceborne lightweight and low-profile planar array was needed for a satellite communication system, is described. The work described covers the prediction of finite-ground-plane effects by the geometric theory of diffraction, higher-order-mode circularly polarized circular patch antennas, circularly polarized microstrip arrays with linearly polarized elements, an impedance-matching teardrop-shaped probe feed, a dual-polarized microstrip array with high isolation and low cross-polarization, a planar microstrip Yagi array, a microstrip reflectarray, a Ka-band MMIC array, and a series-fed linear arrays.
Hiding the Disk and Network Latency of Out-of-Core Visualization

NASA Technical Reports Server (NTRS)

Ellsworth, David

2001-01-01

This paper describes an algorithm that improves the performance of application-controlled demand paging for out-of-core visualization by hiding the latency of reading data from both local disks or disks on remote servers. The performance improvements come from better overlapping the computation with the page reading process, and by performing multiple page reads in parallel. The paper includes measurements that show that the new multithreaded paging algorithm decreases the time needed to compute visualizations by one third when using one processor and reading data from local disk. The time needed when using one processor and reading data from remote disk decreased by two thirds. Visualization runs using data from remote disk actually ran faster than ones using data from local disk because the remote runs were able to make use of the remote server's high performance disk array.
Compute Element and Interface Box for the Hazard Detection System

NASA Technical Reports Server (NTRS)

Villalpando, Carlos Y.; Khanoyan, Garen; Stern, Ryan A.; Some, Raphael R.; Bailey, Erik S.; Carson, John M.; Vaughan, Geoffrey M.; Werner, Robert A.; Salomon, Phil M.; Martin, Keith E.;

2013-01-01

The Autonomous Landing and Hazard Avoidance Technology (ALHAT) program is building a sensor that enables a spacecraft to evaluate autonomously a potential landing area to generate a list of hazardous and safe landing sites. It will also provide navigation inputs relative to those safe sites. The Hazard Detection System Compute Element (HDS-CE) box combines a field-programmable gate array (FPGA) board for sensor integration and timing, with a multicore computer board for processing. The FPGA does system-level timing and data aggregation, and acts as a go-between, removing the real-time requirements from the processor and labeling events with a high resolution time. The processor manages the behavior of the system, controls the instruments connected to the HDS-CE, and services the "heavy lifting" computational requirements for analyzing the potential landing spots.

MOSAIC - A space-multiplexing technique for optical processing of large images

NASA Technical Reports Server (NTRS)

Athale, Ravindra A.; Astor, Michael E.; Yu, Jeffrey

1993-01-01

A technique for Fourier processing of images larger than the space-bandwidth products of conventional or smart spatial light modulators and two-dimensional detector arrays is described. The technique involves a spatial combination of subimages displayed on individual spatial light modulators to form a phase-coherent image, which is subsequently processed with Fourier optical techniques. Because of the technique's similarity with the mosaic technique used in art, the processor used is termed an optical MOSAIC processor. The phase accuracy requirements of this system were studied by computer simulation. It was found that phase errors of less than lambda/8 did not degrade the performance of the system and that the system was relatively insensitive to amplitude nonuniformities. Several schemes for implementing the subimage combination are described. Initial experimental results demonstrating the validity of the mosaic concept are also presented.
General-purpose interface bus for multiuser, multitasking computer system

NASA Technical Reports Server (NTRS)

Generazio, Edward R.; Roth, Don J.; Stang, David B.

1990-01-01

The architecture of a multiuser, multitasking, virtual-memory computer system intended for the use by a medium-size research group is described. There are three central processing units (CPU) in the configuration, each with 16 MB memory, and two 474 MB hard disks attached. CPU 1 is designed for data analysis and contains an array processor for fast-Fourier transformations. In addition, CPU 1 shares display images viewed with the image processor. CPU 2 is designed for image analysis and display. CPU 3 is designed for data acquisition and contains 8 GPIB channels and an analog-to-digital conversion input/output interface with 16 channels. Up to 9 users can access the third CPU simultaneously for data acquisition. Focus is placed on the optimization of hardware interfaces and software, facilitating instrument control, data acquisition, and processing.
A novel parallel architecture for local histogram equalization

NASA Astrophysics Data System (ADS)

Ohannessian, Mesrob I.; Choueiter, Ghinwa F.; Diab, Hassan

2005-07-01

Local histogram equalization is an image enhancement algorithm that has found wide application in the pre-processing stage of areas such as computer vision, pattern recognition and medical imaging. The computationally intensive nature of the procedure, however, is a main limitation when real time interactive applications are in question. This work explores the possibility of performing parallel local histogram equalization, using an array of special purpose elementary processors, through an HDL implementation that targets FPGA or ASIC platforms. A novel parallelization scheme is presented and the corresponding architecture is derived. The algorithm is reduced to pixel-level operations. Processing elements are assigned image blocks, to maintain a reasonable performance-cost ratio. To further simplify both processor and memory organizations, a bit-serial access scheme is used. A brief performance assessment is provided to illustrate and quantify the merit of the approach.
Fast neural net simulation with a DSP processor array.

PubMed

Muller, U A; Gunzinger, A; Guggenbuhl, W

1995-01-01

This paper describes the implementation of a fast neural net simulator on a novel parallel distributed-memory computer. A 60-processor system, named MUSIC (multiprocessor system with intelligent communication), is operational and runs the backpropagation algorithm at a speed of 330 million connection updates per second (continuous weight update) using 32-b floating-point precision. This is equal to 1.4 Gflops sustained performance. The complete system with 3.8 Gflops peak performance consumes less than 800 W of electrical power and fits into a 19-in rack. While reaching the speed of modern supercomputers, MUSIC still can be used as a personal desktop computer at a researcher's own disposal. In neural net simulation, this gives a computing performance to a single user which was unthinkable before. The system's real-time interfaces make it especially useful for embedded applications.
Achieving supercomputer performance for neural net simulation with an array of digital signal processors

DOE Office of Scientific and Technical Information (OSTI.GOV)

Muller, U.A.; Baumle, B.; Kohler, P.

1992-10-01

Music, a DSP-based system with a parallel distributed-memory architecture, provides enormous computing power yet retains the flexibility of a general-purpose computer. Reaching a peak performance of 2.7 Gflops at a significantly lower cost, power consumption, and space requirement than conventional supercomputers, Music is well suited to computationally intensive applications such as neural network simulation. 12 refs., 9 figs., 2 tabs.
RANS Simulations using OpenFOAM Software

DTIC Science & Technology

2016-01-01

Averaged Navier- Stokes (RANS) simulations is described and illustrated by applying the simpleFoam solver to two case studies; two dimensional flow...to run in parallel over large processor arrays. The purpose of this report is to illustrate and test the use of the steady-state Reynolds Averaged ...Group in the Maritime Platforms Division he has been simulating fluid flow around ships and submarines using finite element codes, Lagrangian vortex
Microdot - A Four-Bit Microcontroller Designed for Distributed Low-End Computing in Satellites

NASA Astrophysics Data System (ADS)

2002-03-01

Many satellites are an integrated collection of sensors and actuators that require dedicated real-time control. For single processor systems, additional sensors require an increase in computing power and speed to provide the multi-tasking capability needed to service each sensor. Faster processors cost more and consume more power, which taxes a satellite's power resources and may lead to shorter satellite lifetimes. An alternative design approach is a distributed network of small and low power microcontrollers designed for space that handle the computing requirements of each individual sensor and actuator. The design of microdot, a four-bit microcontroller for distributed low-end computing, is presented. The design is based on previous research completed at the Space Electronics Branch, Air Force Research Laboratory (AFRL/VSSE) at Kirtland AFB, NM, and the Air Force Institute of Technology at Wright-Patterson AFB, OH. The Microdot has 29 instructions and a 1K x 4 instruction memory. The distributed computing architecture is based on the Philips Semiconductor I2C Serial Bus Protocol. A prototype was implemented and tested using an Altera Field Programmable Gate Array (FPGA). The prototype was operable to 9.1 MHz. The design was targeted for fabrication in a radiation-hardened-by-design gate-array cell library for the TSMC 0.35 micrometer CMOS process.
Mobile and replicated alignment of arrays in data-parallel programs

NASA Technical Reports Server (NTRS)

Chatterjee, Siddhartha; Gilbert, John R.; Schreiber, Robert

1993-01-01

When a data-parallel language like FORTRAN 90 is compiled for a distributed-memory machine, aggregate data objects (such as arrays) are distributed across the processor memories. The mapping determines the amount of residual communication needed to bring operands of parallel operations into alignment with each other. A common approach is to break the mapping into two stages: first, an alignment that maps all the objects to an abstract template, and then a distribution that maps the template to the processors. We solve two facets of the problem of finding alignments that reduce residual communication: we determine alignments that vary in loops, and objects that should have replicated alignments. We show that loop-dependent mobile alignment is sometimes necessary for optimum performance, and we provide algorithms with which a compiler can determine good mobile alignments for objects within do loops. We also identify situations in which replicated alignment is either required by the program itself (via spread operations) or can be used to improve performance. We propose an algorithm based on network flow that determines which objects to replicate so as to minimize the total amount of broadcast communication in replication. This work on mobile and replicated alignment extends our earlier work on determining static alignment.
Chrestenson transform FPGA embedded factorizations.

PubMed

Corinthios, Michael J

2016-01-01

Chrestenson generalized Walsh transform factorizations for parallel processing imbedded implementations on field programmable gate arrays are presented. This general base transform, sometimes referred to as the Discrete Chrestenson transform, has received special attention in recent years. In fact, the Discrete Fourier transform and Walsh-Hadamard transform are but special cases of the Chrestenson generalized Walsh transform. Rotations of a base-p hypercube, where p is an arbitrary integer, are shown to produce dynamic contention-free memory allocation, in processor architecture. The approach is illustrated by factorizations involving the processing of matrices of the transform which are function of four variables. Parallel operations are implemented matrix multiplications. Each matrix, of dimension N × N, where N = p (n) , n integer, has a structure that depends on a variable parameter k that denotes the iteration number in the factorization process. The level of parallelism, in the form of M = p (m) processors can be chosen arbitrarily by varying m between zero to its maximum value of n - 1. The result is an equation describing the generalised parallelism factorization as a function of the four variables n, p, k and m. Applications of the approach are shown in relation to configuring field programmable gate arrays for digital signal processing applications.
Digitally programmable microfluidic automaton for multiscale combinatorial mixing and sample processing†

PubMed Central

Jensen, Erik C.; Stockton, Amanda M.; Chiesl, Thomas N.; Kim, Jungkyu; Bera, Abhisek; Mathies, Richard A.

2013-01-01

A digitally programmable microfluidic Automaton consisting of a 2-dimensional array of pneumatically actuated microvalves is programmed to perform new multiscale mixing and sample processing operations. Large (µL-scale) volume processing operations are enabled by precise metering of multiple reagents within individual nL-scale valves followed by serial repetitive transfer to programmed locations in the array. A novel process exploiting new combining valve concepts is developed for continuous rapid and complete mixing of reagents in less than 800 ms. Mixing, transfer, storage, and rinsing operations are implemented combinatorially to achieve complex assay automation protocols. The practical utility of this technology is demonstrated by performing automated serial dilution for quantitative analysis as well as the first demonstration of on-chip fluorescent derivatization of biomarker targets (carboxylic acids) for microchip capillary electrophoresis on the Mars Organic Analyzer. A language is developed to describe how unit operations are combined to form a microfluidic program. Finally, this technology is used to develop a novel microfluidic 6-sample processor for combinatorial mixing of large sets (>26 unique combinations) of reagents. The digitally programmable microfluidic Automaton is a versatile programmable sample processor for a wide range of process volumes, for multiple samples, and for different types of analyses. PMID:23172232
Optimized smith waterman processor design for breast cancer early diagnosis

NASA Astrophysics Data System (ADS)

Nurdin, D. S.; Isa, M. N.; Ismail, R. C.; Ahmad, M. I.

2017-09-01

This paper presents an optimized design of Processing Element (PE) of Systolic Array (SA) which implements affine gap penalty Smith Waterman (SW) algorithm on the Xilinx Virtex-6 XC6VLX75T Field Programmable Gate Array (FPGA) for Deoxyribonucleic Acid (DNA) sequence alignment. The PE optimization aims to reduce PE logic resources to increase number of PEs in FPGA for higher degree of parallelism during alignment matrix computations. This is useful for aligning long DNA-based disease sequence such as Breast Cancer (BC) for early diagnosis. The optimized PE architecture has the smallest PE area with 15 slices in a PE and 776 PEs implemented in the Virtex - 6 FPGA.

DBSAR's First Multimode Flight Campaign

NASA Technical Reports Server (NTRS)

Rincon, Rafael F.; Vega, Manuel; Buenfil, Manuel; Geist, Alessandro; Hilliard, Lawrence; Racette, Paul

2010-01-01

The Digital Beamforming SAR (DBSAR) is an airborne imaging radar system that combines phased array technology, reconfigurable on-board processing and waveform generation, and advances in signal processing to enable techniques not possible with conventional SARs. The system exploits the versatility inherently in phased-array technology with a state-of-the-art data acquisition and real-time processor in order to implement multi-mode measurement techniques in a single radar system. Operational modes include scatterometry over multiple antenna beams, Synthetic Aperture Radar (SAR) over several antenna beams, or Altimetry. The radar was flight tested in October 2008 on board of the NASA P3 aircraft over the Delmarva Peninsula, MD. The results from the DBSAR system performance is presented.
A Survey of Plasmas and Their Applications

NASA Technical Reports Server (NTRS)

Eastman, Timothy E.; Grabbe, C. (Editor)

2006-01-01

Plasmas are everywhere and relevant to everyone. We bath in a sea of photons, quanta of electromagnetic radiation, whose sources (natural and artificial) are dominantly plasma-based (stars, fluorescent lights, arc lamps.. .). Plasma surface modification and materials processing contribute increasingly to a wide array of modern artifacts; e.g., tiny plasma discharge elements constitute the pixel arrays of plasma televisions and plasma processing provides roughly one-third of the steps to produce semiconductors, essential elements of our networking and computing infrastructure. Finally, plasmas are central to many cutting edge technologies with high potential (compact high-energy particle accelerators; plasma-enhanced waste processors; high tolerance surface preparation and multifuel preprocessors for transportation systems; fusion for energy production).
A 400 KHz line rate 2048-pixel stitched SWIR linear array

NASA Astrophysics Data System (ADS)

Anchlia, Ankur; Vinella, Rosa M.; Gielen, Daphne; Wouters, Kristof; Vervenne, Vincent; Hooylaerts, Peter; Deroo, Pieter; Ruythooren, Wouter; De Gaspari, Danny; Das, Jo; Merken, Patrick

2016-05-01

Xenics has developed a family of stitched SWIR long linear arrays that operate up to 400 KHz of line rate. These arrays serve medical and industrial applications that require high line rates as well as space applications that require long linear arrays. The arrays are based on a modular ROIC design concept: modules of 512 pixels are stitched during fabrication to achieve 512, 1024 and 2048 pixel arrays. Each 512-pixel module has its own on-chip digital sequencer, analog readout chain and 4 output buffers. This modular concept enables a long array to run at a high line rates irrespective of the array length, which limits the line rate in a traditional linear array. The ROIC is flip-chipped with InGaAs detector arrays. The FPA has a pixel pitch of 12.5μm and has two pixel flavors: square (12.5μm) and rectangular (250μm). The frontend circuit is based on Capacitive Trans-impedance Amplifier (CTIA) to attain stable detector bias, and good linearity and signal integrity, especially at high speeds. The CTIA has an input auto-zero mechanism that allows to have low detector bias (<20mV). An on-chip Correlated Double Sample (CDS) facilitates removal of CTIA KTC and 1/f noise, and other offsets, achieving low noise performance. There are five gain modes in the FPA giving the full well range from 85Ke- to 40Me-. The measured input referred noise is 35e-rms in the highest gain mode. The FPA operates in Integrate While Read mode and, at a master clock rate of 60MHz and a minimum integration time of 1.4μs, achieves the highest line rate of 400 KHz. In this paper, design details and measurements results are presented in order to demonstrate the array performance.
A Low-Power Wearable Stand-Alone Tongue Drive System for People With Severe Disabilities.

PubMed

Jafari, Ali; Buswell, Nathanael; Ghovanloo, Maysam; Mohsenin, Tinoosh

2018-02-01

This paper presents a low-power stand-alone tongue drive system (sTDS) used for individuals with severe disabilities to potentially control their environment such as computer, smartphone, and wheelchair using their voluntary tongue movements. A low-power local processor is proposed, which can perform signal processing to convert raw magnetic sensor signals to user-defined commands, on the sTDS wearable headset, rather than sending all raw data out to a PC or smartphone. The proposed sTDS significantly reduces the transmitter power consumption and subsequently increases the battery life. Assuming the sTDS user issues one command every 20 ms, the proposed local processor reduces the data volume that needs to be wirelessly transmitted by a factor of 64, from 9.6 to 0.15 kb/s. The proposed processor consists of three main blocks: serial peripheral interface bus for receiving raw data from magnetic sensors, external magnetic interference attenuation to attenuate external magnetic field from the raw magnetic signal, and a machine learning classifier for command detection. A proof-of-concept prototype sTDS has been implemented with a low-power IGLOO-nano field programmable gate array (FPGA), bluetooth low energy, battery and magnetic sensors on a headset, and tested. At clock frequency of 20 MHz, the processor takes 6.6 s and consumes 27 nJ for detecting a command with a detection accuracy of 96.9%. To further reduce power consumption, an application-specified integrated circuit processor for the sTDS is implemented at the postlayout level in 65-nm CMOS technology with 1-V power supply, and it consumes 0.43 mW, which is 10 lower than FPGA power consumption and occupies an area of only 0.016 mm.
Simulation of an array-based neural net model

NASA Technical Reports Server (NTRS)

Barnden, John A.

1987-01-01

Research in cognitive science suggests that much of cognition involves the rapid manipulation of complex data structures. However, it is very unclear how this could be realized in neural networks or connectionist systems. A core question is: how could the interconnectivity of items in an abstract-level data structure be neurally encoded? The answer appeals mainly to positional relationships between activity patterns within neural arrays, rather than directly to neural connections in the traditional way. The new method was initially devised to account for abstract symbolic data structures, but it also supports cognitively useful spatial analogue, image-like representations. As the neural model is based on massive, uniform, parallel computations over 2D arrays, the massively parallel processor is a convenient tool for simulation work, although there are complications in using the machine to the fullest advantage. An MPP Pascal simulation program for a small pilot version of the model is running.
Target-in-the-loop high-power adaptive phase-locked fiber laser array using single-frequency dithering technique

NASA Astrophysics Data System (ADS)

Tao, R.; Ma, Y.; Si, L.; Dong, X.; Zhou, P.; Liu, Z.

2011-11-01

We present a theoretical and experimental study of a target-in-the-loop (TIL) high-power adaptive phase-locked fiber laser array. The system configuration of the TIL adaptive phase-locked fiber laser array is introduced, and the fundamental theory for TIL based on the single-dithering technique is deduced for the first time. Two 10-W-level high-power fiber amplifiers are set up and adaptive phase locking of the two fiber amplifiers is accomplished successfully by implementing a single-dithering algorithm on a signal processor. The experimental results demonstrate that the optical phase noise for each beam channel can be effectively compensated by the TIL adaptive optics system under high-power applications and the fringe contrast on a remotely located extended target is advanced from 12% to 74% for the two 10-W-level fiber amplifiers.
Digital algorithms for parallel pipelined single-detector homodyne fringe counting in laser interferometry

NASA Astrophysics Data System (ADS)

Rerucha, Simon; Sarbort, Martin; Hola, Miroslava; Cizek, Martin; Hucl, Vaclav; Cip, Ondrej; Lazar, Josef

2016-12-01

The homodyne detection with only a single detector represents a promising approach in the interferometric application which enables a significant reduction of the optical system complexity while preserving the fundamental resolution and dynamic range of the single frequency laser interferometers. We present the design, implementation and analysis of algorithmic methods for computational processing of the single-detector interference signal based on parallel pipelined processing suitable for real time implementation on a programmable hardware platform (e.g. the FPGA - Field Programmable Gate Arrays or the SoC - System on Chip). The algorithmic methods incorporate (a) the single detector signal (sine) scaling, filtering, demodulations and mixing necessary for the second (cosine) quadrature signal reconstruction followed by a conic section projection in Cartesian plane as well as (a) the phase unwrapping together with the goniometric and linear transformations needed for the scale linearization and periodic error correction. The digital computing scheme was designed for bandwidths up to tens of megahertz which would allow to measure the displacements at the velocities around half metre per second. The algorithmic methods were tested in real-time operation with a PC-based reference implementation that employed the advantage pipelined processing by balancing the computational load among multiple processor cores. The results indicate that the algorithmic methods are suitable for a wide range of applications [3] and that they are bringing the fringe counting interferometry closer to the industrial applications due to their optical setup simplicity and robustness, computational stability, scalability and also a cost-effectiveness.
Design, Fabrication and Characterization of A Bi-Frequency Co-Linear Array

PubMed Central

Wang, Zhuochen; Li, Sibo; Czernuszewicz, Tomasz J; Gallippi, Caterina M.; Liu, Ruibin; Geng, Xuecang

2016-01-01

Ultrasound imaging with high resolution and large penetration depth has been increasingly adopted in medical diagnosis, surgery guidance, and treatment assessment. Conventional ultrasound works at a particular frequency, with a −6 dB fractional bandwidth of ~70 %, limiting the imaging resolution or depth of field. In this paper, a bi-frequency co-linear array with resonant frequencies of 8 MHz and 20 MHz was investigated to meet the requirements of resolution and penetration depth for a broad range of ultrasound imaging applications. Specifically, a 32-element bi-frequency co-linear array was designed and fabricated, followed by element characterization and real-time sectorial scan (S-scan) phantom imaging using a Verasonics system. The bi-frequency co-linear array was tested in four different modes by switching between low and high frequencies on transmit and receive. The four modes included the following: (1) transmit low, receive low, (2) transmit low, receive high, (3) transmit high, receive low, (4) transmit high, receive high. After testing, the axial and lateral resolutions of all modes were calculated and compared. The results of this study suggest that bi-frequency co-linear arrays are potential aids for wideband fundamental imaging and harmonic/sub-harmonic imaging. PMID:26661069
Integrated circuit for SAW and MEMS sensors

NASA Astrophysics Data System (ADS)

Fischer, Wolf-Joachim; Koenig, Peter; Ploetner, Matthias; Hermann, Rudiger; Stab, Helmut

2001-11-01

The sensor processor circuit has been developed for hand-held devices used in industrial and environmental applications, such as on-line process monitoring. Thereby devices with SAW sensors or MEMS resonators will benefit from this processor especially. Up to 8 sensors can be connected to the circuit as multisensors or sensor arrays. Two sensor processors SP1 and SP2 for different applications are presented in this paper. The SP-1 chip has a PCMCIA interface which can be used for the program and data transfer. SAW sensors which are working in the frequency range from 80 MHz to 160 MHz can be connected to the processor directly. It is possible to use the new SP-2 chip fabricated in a 0.5(mu) CMOS process for SAW devices with a maximum frequency of 600 MHz. An on-chip analog-digital-converter (ADC) and 6 PWM modules support the development of high-miniaturized intelligent sensor systems We have developed a multi-SAW sensor system with this ASIC that manages the requirements on control as well as signal generation and storage and provides an interface to the PC and electronic devices on the board. Its low power consumption and its PCMCIA plug fulfil the requirements of small size and mobility. For this application sensors have been developed to detect hazardous gases in ambient air. Sensors with differently modified copper-phthalocyanine films are capable of detecting NO2 and O3, whereas those with a hyperbranched polyester film respond to NH3.
Novel processor architecture for onboard infrared sensors

NASA Astrophysics Data System (ADS)

Hihara, Hiroki; Iwasaki, Akira; Tamagawa, Nobuo; Kuribayashi, Mitsunobu; Hashimoto, Masanori; Mitsuyama, Yukio; Ochi, Hiroyuki; Onodera, Hidetoshi; Kanbara, Hiroyuki; Wakabayashi, Kazutoshi; Tada, Munehiro

2016-09-01

Infrared sensor system is a major concern for inter-planetary missions that investigate the nature and the formation processes of planets and asteroids. The infrared sensor system requires signal preprocessing functions that compensate for the intensity of infrared image sensors to get high quality data and high compression ratio through the limited capacity of transmission channels towards ground stations. For those implementations, combinations of Field Programmable Gate Arrays (FPGAs) and microprocessors are employed by AKATSUKI, the Venus Climate Orbiter, and HAYABUSA2, the asteroid probe. On the other hand, much smaller size and lower power consumption are demanded for future missions to accommodate more sensors. To fulfill this future demand, we developed a novel processor architecture which consists of reconfigurable cluster cores and programmable-logic cells with complementary atom switches. The complementary atom switches enable hardware programming without configuration memories, and thus soft-error on logic circuit connection is completely eliminated. This is a noteworthy advantage for space applications which cannot be found in conventional re-writable FPGAs. Almost one-tenth of lower power consumption is expected compared to conventional re-writable FPGAs because of the elimination of configuration memories. The proposed processor architecture can be reconfigured by behavioral synthesis with higher level language specification. Consequently, compensation functions are implemented in a single chip without accommodating program memories, which is accompanied with conventional microprocessors, while maintaining the comparable performance. This enables us to embed a processor element on each infrared signal detector output channel.
A hybrid optic-fiber sensor network with the function of self-diagnosis and self-healing

NASA Astrophysics Data System (ADS)

Xu, Shibo; Liu, Tiegen; Ge, Chunfeng; Chen, Cheng; Zhang, Hongxia

2014-11-01

We develop a hybrid wavelength division multiplexing optical fiber network with distributed fiber-optic sensors and quasi-distributed FBG sensor arrays which detect vibrations, temperatures and strains at the same time. The network has the ability to locate the failure sites automatically designated as self-diagnosis and make protective switching to reestablish sensing service designated as self-healing by cooperative work of software and hardware. The processes above are accomplished by master-slave processors with the help of optical and wireless telemetry signals. All the sensing and optical telemetry signals transmit in the same fiber either working fiber or backup fiber. We take wavelength 1450nm as downstream signal and wavelength 1350nm as upstream signal to control the network in normal circumstances, both signals are sent by a light emitting node of the corresponding processor. There is also a continuous laser wavelength 1310nm sent by each node and received by next node on both working and backup fibers to monitor their healthy states, but it does not carry any message like telemetry signals do. When fibers of two sensor units are completely damaged, the master processor will lose the communication with the node between the damaged ones.However we install RF module in each node to solve the possible problem. Finally, the whole network state is transmitted to host computer by master processor. Operator could know and control the network by human-machine interface if needed.
Implementation of 4-way Superscalar Hash MIPS Processor Using FPGA

NASA Astrophysics Data System (ADS)

Sahib Omran, Safaa; Fouad Jumma, Laith

2018-05-01

Due to the quick advancements in the personal communications systems and wireless communications, giving data security has turned into a more essential subject. This security idea turns into a more confounded subject when next-generation system requirements and constant calculation speed are considered in real-time. Hash functions are among the most essential cryptographic primitives and utilized as a part of the many fields of signature authentication and communication integrity. These functions are utilized to acquire a settled size unique fingerprint or hash value of an arbitrary length of message. In this paper, Secure Hash Algorithms (SHA) of types SHA-1, SHA-2 (SHA-224, SHA-256) and SHA-3 (BLAKE) are implemented on Field-Programmable Gate Array (FPGA) in a processor structure. The design is described and implemented using a hardware description language, namely VHSIC “Very High Speed Integrated Circuit” Hardware Description Language (VHDL). Since the logical operation of the hash types of (SHA-1, SHA-224, SHA-256 and SHA-3) are 32-bits, so a Superscalar Hash Microprocessor without Interlocked Pipelines (MIPS) processor are designed with only few instructions that were required in invoking the desired Hash algorithms, when the four types of hash algorithms executed sequentially using the designed processor, the total time required equal to approximately 342 us, with a throughput of 4.8 Mbps while the required to execute the same four hash algorithms using the designed four-way superscalar is reduced to 237 us with improved the throughput to 5.1 Mbps.
A 4.8 kbps code-excited linear predictive coder

NASA Technical Reports Server (NTRS)

Tremain, Thomas E.; Campbell, Joseph P., Jr.; Welch, Vanoy C.

1988-01-01

A secure voice system STU-3 capable of providing end-to-end secure voice communications (1984) was developed. The terminal for the new system will be built around the standard LPC-10 voice processor algorithm. The performance of the present STU-3 processor is considered to be good, its response to nonspeech sounds such as whistles, coughs and impulse-like noises may not be completely acceptable. Speech in noisy environments also causes problems with the LPC-10 voice algorithm. In addition, there is always a demand for something better. It is hoped that LPC-10's 2.4 kbps voice performance will be complemented with a very high quality speech coder operating at a higher data rate. This new coder is one of a number of candidate algorithms being considered for an upgraded version of the STU-3 in late 1989. The problems of designing a code-excited linear predictive (CELP) coder to provide very high quality speech at a 4.8 kbps data rate that can be implemented on today's hardware are considered.
An empirical determination of the effects of sea state bias on Seasat altimetry

NASA Technical Reports Server (NTRS)

Born, G. H.; Richards, M. A.; Rosborough, G. W.

1982-01-01

A linear empirical model has been developed for the correction of sea state bias effects, in Seasat altimetry data altitude measurements, that are due to (1) electromagnetic bias caused by the fact that ocean wave troughs reflect the altimeter signal more strongly than the crests, shifting the apparent mean sea level toward the wave troughs, and (2) an independent instrument-related bias resulting from the inability of height corrections applied in the ground processor to compensate for simplifying assumptions made for the processor aboard Seasat. After applying appropriate corrections to the altimetry data, an empirical model for the sea state bias is obtained by differencing significant wave height and height measurements from coincident ground tracks. Height differences are minimized by solving for the coefficient of a linear relationship between height differences and wave height differences that minimize the height differences. In more than 50% of the 36 cases examined, 7% of the value of significant wave height should be subtracted for sea state bias correction.
Parallelization of the FLAPW method and comparison with the PPW method

NASA Astrophysics Data System (ADS)

Canning, Andrew; Mannstadt, Wolfgang; Freeman, Arthur

2000-03-01

The FLAPW (full-potential linearized-augmented plane-wave) method is one of the most accurate first-principles methods for determining electronic and magnetic properties of crystals and surfaces. In the past the FLAPW method has been limited to systems of about a hundred atoms due to the lack of an efficient parallel implementation to exploit the power and memory of parallel computers. In this work we present an efficient parallelization of the method by division among the processors of the plane-wave components for each state. The code is also optimized for RISC (reduced instruction set computer) architectures, such as those found on most parallel computers, making full use of BLAS (basic linear algebra subprograms) wherever possible. Scaling results are presented for systems of up to 686 silicon atoms and 343 palladium atoms per unit cell running on up to 512 processors on a Cray T3E parallel supercomputer. Some results will also be presented on a comparison of the plane-wave pseudopotential method and the FLAPW method on large systems.
Study of Far—Field Directivity Pattern for Linear Arrays

NASA Astrophysics Data System (ADS)

Ana-Maria, Chiselev; Luminita, Moraru; Laura, Onose

2011-10-01

A model to calculate directivity pattern in far field is developed in this paper. Based on this model, the three-dimensional beam pattern is introduced and analyzed in order to investigate geometric parameters of linear arrays and their influences on the directivity pattern. Simulations in azimuthal plane are made to highlight the influence of transducers parameters, including number of elements and inter-element spacing. It is true that these parameters are important factors that influence the directivity pattern and the appearance of side-lobes for linear arrays.
Wireless Source Localization and Signal Collection from an Airborne Symmetric Line Array Sensor Network

DTIC Science & Technology

2014-09-01

band signal samples by taking the ratio of (166) and (165) as     2 2 /2 /2 sin sin coscos g g g g gg cQ cI eE n E n e...processors,” EEE Trans. Acoust. Speech Signal Process., vol. 31, no. 6, pp. 1378–1393, Dec. 1983. [10] J. Li, P. Stoica and Z. Wang, “On robust
Electronic Neural Networks

NASA Technical Reports Server (NTRS)

Thakoor, Anil

1990-01-01

Viewgraphs on electronic neural networks for space station are presented. Topics covered include: electronic neural networks; electronic implementations; VLSI/thin film hybrid hardware for neurocomputing; computations with analog parallel processing; features of neuroprocessors; applications of neuroprocessors; neural network hardware for terrain trafficability determination; a dedicated processor for path planning; neural network system interface; neural network for robotic control; error backpropagation algorithm for learning; resource allocation matrix; global optimization neuroprocessor; and electrically programmable read only thin-film synaptic array.
Knowledge-Based Transformational Synthesis of Efficient Structures for Concurrent Computation.

DTIC Science & Technology

1985-09-30

this wire network to a smaller wire network , creation of subnetworks to replace an overly-broad fanout network , virtualization which is the creation of...dependencies among the values they contain, reduction of this wire network to a smaller wire network , " creation of subnetworks to replace an overly-broad...fanout network , "rtualization which is the creation of additional array elements and processors to reflect the internal enumera- -4 tions that
Low-Cost Space Hardware and Software

NASA Technical Reports Server (NTRS)

Shea, Bradley Franklin

2013-01-01

The goal of this project is to demonstrate and support the overall vision of NASA's Rocket University (RocketU) through the design of an electrical power system (EPS) monitor for implementation on RUBICS (Rocket University Broad Initiatives CubeSat), through the support for the CHREC (Center for High-Performance Reconfigurable Computing) Space Processor, and through FPGA (Field Programmable Gate Array) design. RocketU will continue to provide low-cost innovations even with continuous cuts to the budget.

Feasibility of a special-purpose computer to solve the Navier-Stokes equations

NASA Technical Reports Server (NTRS)

Gritton, E. C.; King, W. S.; Sutherland, I.; Gaines, R. S.; Gazley, C., Jr.; Grosch, C.; Juncosa, M.; Petersen, H.

1978-01-01

Orders-of-magnitude improvements in computer performance can be realized with a parallel array of thousands of fast microprocessors. In this architecture, wiring congestion is minimized by limiting processor communication to nearest neighbors. When certain standard algorithms are applied to a viscous flow problem and existing LSI technology is used, performance estimates of this conceptual design show a dramatic decrease in computational time when compared to the CDC 7600.
Spacecube V2.0 Micro Single Board Computer

NASA Technical Reports Server (NTRS)

Petrick, David J. (Inventor); Geist, Alessandro (Inventor); Lin, Michael R. (Inventor); Crum, Gary R. (Inventor)

2017-01-01

A single board computer system radiation hardened for space flight includes a printed circuit board having a top side and bottom side; a reconfigurable field programmable gate array (FPGA) processor device disposed on the top side; a connector disposed on the top side; a plurality of peripheral components mounted on the bottom side; and wherein a size of the single board computer system is not greater than approximately 7 cm.times.7 cm.
An ANSERLIN array for mobile satellite applications

NASA Technical Reports Server (NTRS)

Colomb, F. Y.; Kunkee, D. B.; Mayes, P. E.; Smith, D. W.; Jamnejad, V.

1990-01-01

Design, analysis, construction, and test of linear arrays of ANSERLIN (annular sector, radiating line) elements are reported and discussed. Due to feeding simplicity and easy construction as well as good CP performance, a planar array composed of a number of such linear arrays each producing a shaped beam tilted in elevation, is a good candidate as a vehicle-mounted mechanically steered antenna for mobile satellite applications. A single level construction technique was developed that makes this type of array very cost competitive with other low-profile arrays. An asymmetric 19.5 inch long four-element array was fabricated and tested with reasonable performance. A smaller five-element symmetric array (16 inch long) was also designed and tested capable of operating in either sense of circular polarization. Efforts were made to successfully reduce this effect.
Microfabricated linear Paul-Straubel ion trap

DOEpatents

Mangan, Michael A [Albuquerque, NM; Blain, Matthew G [Albuquerque, NM; Tigges, Chris P [Albuquerque, NM; Linker, Kevin L [Albuquerque, NM

2011-04-19

An array of microfabricated linear Paul-Straubel ion traps can be used for mass spectrometric applications. Each ion trap comprises two parallel inner RF electrodes and two parallel outer DC control electrodes symmetric about a central trap axis and suspended over an opening in a substrate. Neighboring ion traps in the array can share a common outer DC control electrode. The ions confined transversely by an RF quadrupole electric field potential well on the ion trap axis. The array can trap a wide array of ions.
NOTE: MCDE: a new Monte Carlo dose engine for IMRT

NASA Astrophysics Data System (ADS)

Reynaert, N.; DeSmedt, B.; Coghe, M.; Paelinck, L.; Van Duyse, B.; DeGersem, W.; DeWagter, C.; DeNeve, W.; Thierens, H.

2004-07-01

A new accurate Monte Carlo code for IMRT dose computations, MCDE (Monte Carlo dose engine), is introduced. MCDE is based on BEAMnrc/DOSXYZnrc and consequently the accurate EGSnrc electron transport. DOSXYZnrc is reprogrammed as a component module for BEAMnrc. In this way both codes are interconnected elegantly, while maintaining the BEAM structure and only minimal changes to BEAMnrc.mortran are necessary. The treatment head of the Elekta SLiplus linear accelerator is modelled in detail. CT grids consisting of up to 200 slices of 512 × 512 voxels can be introduced and up to 100 beams can be handled simultaneously. The beams and CT data are imported from the treatment planning system GRATIS via a DICOM interface. To enable the handling of up to 50 × 106 voxels the system was programmed in Fortran95 to enable dynamic memory management. All region-dependent arrays (dose, statistics, transport arrays) were redefined. A scoring grid was introduced and superimposed on the geometry grid, to be able to limit the number of scoring voxels. The whole system uses approximately 200 MB of RAM and runs on a PC cluster consisting of 38 1.0 GHz processors. A set of in-house made scripts handle the parallellization and the centralization of the Monte Carlo calculations on a server. As an illustration of MCDE, a clinical example is discussed and compared with collapsed cone convolution calculations. At present, the system is still rather slow and is intended to be a tool for reliable verification of IMRT treatment planning in the case of the presence of tissue inhomogeneities such as air cavities.
A new approach for implementation of associative memory using volume holographic materials

NASA Astrophysics Data System (ADS)

Habibi, Mohammad; Pashaie, Ramin

2012-02-01

Associative memory, also known as fault tolerant or content-addressable memory, has gained considerable attention in last few decades. This memory possesses important advantages over the more common random access memories since it provides the capability to correct faults and/or partially missing information in a given input pattern. There is general consensus that optical implementation of connectionist models and parallel processors including associative memory has a better record of success compared to their electronic counterparts. In this article, we describe a novel optical implementation of associative memory which not only has the advantage of all optical learning and recalling capabilities, it can also be realized easily. We present a new approach, inspired by tomographic imaging techniques, for holographic implementation of associative memories. In this approach, a volume holographic material is sandwiched within a matrix of inputs (optical point sources) and outputs (photodetectors). The memory capacity is realized by the spatial modulation of refractive index of the holographic material. Constructing the spatial distribution of the refractive index from an array of known inputs and outputs is formulated as an inverse problem consisting a set of linear integral equations.
Jet Noise Source Localization Using Linear Phased Array

NASA Technical Reports Server (NTRS)

Agboola, Ferni A.; Bridges, James

2004-01-01

A study was conducted to further clarify the interpretation and application of linear phased array microphone results, for localizing aeroacoustics sources in aircraft exhaust jet. Two model engine nozzles were tested at varying power cycles with the array setup parallel to the jet axis. The array position was varied as well to determine best location for the array. The results showed that it is possible to resolve jet noise sources with bypass and other components separation. The results also showed that a focused near field image provides more realistic noise source localization at low to mid frequencies.
Multiband selection with linear array detectors

NASA Technical Reports Server (NTRS)

Richard, H. L.; Barnes, W. L.

1985-01-01

Several techniques that can be used in an earth-imaging system to separate the linear image formed after the collecting optics into the desired spectral band are examined. The advantages and disadvantages of the Multispectral Linear Array (MLA) multiple optics, the MLA adjacent arrays, the imaging spectrometer, and the MLA beam splitter are discussed. The beam-splitter design approach utilizes, in addition to relatively broad spectral region separation, a movable Multiband Selection Device (MSD), placed between the exit ports of the beam splitter and a linear array detector, permitting many bands to be selected. The successful development and test of the MSD is described. The device demonstrated the capacity to provide a wide field of view, visible-to-near IR/short-wave IR and thermal IR capability, and a multiplicity of spectral bands and polarization measuring means, as well as a reasonable size and weight at minimal cost and risk compared to a spectrometer design approach.
Fiber bundle phase conjugate mirror

DOEpatents

Ward, Benjamin G.

2012-05-01

An improved method and apparatus for passively conjugating the phases of a distorted wavefronts resulting from optical phase mismatch between elements of a fiber laser array are disclosed. A method for passively conjugating a distorted wavefront comprises the steps of: multiplexing a plurality of probe fibers and a bundle pump fiber in a fiber bundle array; passing the multiplexed output from the fiber bundle array through a collimating lens and into one portion of a non-linear medium; passing the output from a pump collection fiber through a focusing lens and into another portion of the non-linear medium so that the output from the pump collection fiber mixes with the multiplexed output from the fiber bundle; adjusting one or more degrees of freedom of one or more of the fiber bundle array, the collimating lens, the focusing lens, the non-linear medium, or the pump collection fiber to produce a standing wave in the non-linear medium.
Finite element computation on nearest neighbor connected machines

NASA Technical Reports Server (NTRS)

Mcaulay, A. D.

1984-01-01

Research aimed at faster, more cost effective parallel machines and algorithms for improving designer productivity with finite element computations is discussed. A set of 8 boards, containing 4 nearest neighbor connected arrays of commercially available floating point chips and substantial memory, are inserted into a commercially available machine. One-tenth Mflop (64 bit operation) processors provide an 89% efficiency when solving the equations arising in a finite element problem for a single variable regular grid of size 40 by 40 by 40. This is approximately 15 to 20 times faster than a much more expensive machine such as a VAX 11/780 used in double precision. The efficiency falls off as faster or more processors are envisaged because communication times become dominant. A novel successive overrelaxation algorithm which uses cyclic reduction in order to permit data transfer and computation to overlap in time is proposed.
Design of video processing and testing system based on DSP and FPGA

NASA Astrophysics Data System (ADS)

Xu, Hong; Lv, Jun; Chen, Xi'ai; Gong, Xuexia; Yang, Chen'na

2007-12-01

Based on high speed Digital Signal Processor (DSP) and Field Programmable Gate Array (FPGA), a video capture, processing and display system is presented, which is of miniaturization and low power. In this system, a triple buffering scheme was used for the capture and display, so that the application can always get a new buffer without waiting; The Digital Signal Processor has an image process ability and it can be used to test the boundary of workpiece's image. A video graduation technology is used to aim at the position which is about to be tested, also, it can enhance the system's flexibility. The character superposition technology realized by DSP is used to display the test result on the screen in character format. This system can process image information in real time, ensure test precision, and help to enhance product quality and quality management.
The economics of data acquisition computers for ST and MST radars

NASA Technical Reports Server (NTRS)

Watkins, B. J.

1983-01-01

Some low cost options for data acquisition computers for ST (stratosphere, troposphere) and MST (mesosphere, stratosphere, troposphere) are presented. The particular equipment discussed reflects choices made by the University of Alaska group but of course many other options exist. The low cost microprocessor and array processor approach presented here has several advantages because of its modularity. An inexpensive system may be configured for a minimum performance ST radar, whereas a multiprocessor and/or a multiarray processor system may be used for a higher performance MST radar. This modularity is important for a network of radars because the initial cost is minimized while future upgrades will still be possible at minimal expense. This modularity also aids in lowering the cost of software development because system expansions should rquire little software changes. The functions of the radar computer will be to obtain Doppler spectra in near real time with some minor analysis such as vector wind determination.
Hardware Architecture Study for NASA's Space Software Defined Radios

NASA Technical Reports Server (NTRS)

Reinhart, Richard C.; Scardelletti, Maximilian C.; Mortensen, Dale J.; Kacpura, Thomas J.; Andro, Monty; Smith, Carl; Liebetreu, John

2008-01-01

This study defines a hardware architecture approach for software defined radios to enable commonality among NASA space missions. The architecture accommodates a range of reconfigurable processing technologies including general purpose processors, digital signal processors, field programmable gate arrays (FPGAs), and application-specific integrated circuits (ASICs) in addition to flexible and tunable radio frequency (RF) front-ends to satisfy varying mission requirements. The hardware architecture consists of modules, radio functions, and and interfaces. The modules are a logical division of common radio functions that comprise a typical communication radio. This paper describes the architecture details, module definitions, and the typical functions on each module as well as the module interfaces. Trade-offs between component-based, custom architecture and a functional-based, open architecture are described. The architecture does not specify the internal physical implementation within each module, nor does the architecture mandate the standards or ratings of the hardware used to construct the radios.
Space Telecommunications Radio Systems (STRS) Hardware Architecture Standard: Release 1.0 Hardware Section

NASA Technical Reports Server (NTRS)

Reinhart, Richard C.; Kacpura, Thomas J.; Smith, Carl R.; Liebetreu, John; Hill, Gary; Mortensen, Dale J.; Andro, Monty; Scardelletti, Maximilian C.; Farrington, Allen

2008-01-01

This report defines a hardware architecture approach for software-defined radios to enable commonality among NASA space missions. The architecture accommodates a range of reconfigurable processing technologies including general-purpose processors, digital signal processors, field programmable gate arrays, and application-specific integrated circuits (ASICs) in addition to flexible and tunable radiofrequency front ends to satisfy varying mission requirements. The hardware architecture consists of modules, radio functions, and interfaces. The modules are a logical division of common radio functions that compose a typical communication radio. This report describes the architecture details, the module definitions, the typical functions on each module, and the module interfaces. Tradeoffs between component-based, custom architecture and a functional-based, open architecture are described. The architecture does not specify a physical implementation internally on each module, nor does the architecture mandate the standards or ratings of the hardware used to construct the radios.
Prototyping the HPDP Chip on STM 65 NM Process

NASA Astrophysics Data System (ADS)

Papadas, C.; Dramitinos, G.; Syed, M.; Helfers, T.; Dedes, G.; Schoellkopf, J.-P.; Dugoujon, L.

2011-08-01

Currently Astrium GmbH is involved in the of the High Performance Data Processor (HPDP) development programme for telecommunication applications under a DLR contract. The HPDP project targets the implementation of the commercially available reconfigurable array processor IP (XPP from the company PACT XPP Technologies) in a radiation hardened technology.In the current complementary development phase funded under the Greek Industry Incentive scheme, it is planned to prototype the HPDP chip in commercial STM 65 nm technology. In addition it is also planned to utilise the preliminary radiation hardened components of this library wherever possible.This abstract gives an overview of the HPDP chip architecture, the basic details of the STM 65 nm process and the design flow foreseen for the prototyping. The paper will discuss the development and integration issues involved in using the STM 65 nm process (also including the available preliminary radiation hardened components) for designs targeted to be used in space applications.
Automation of Data Traffic Control on DSM Architecture

NASA Technical Reports Server (NTRS)

Frumkin, Michael; Jin, Hao-Qiang; Yan, Jerry

2001-01-01

The design of distributed shared memory (DSM) computers liberates users from the duty to distribute data across processors and allows for the incremental development of parallel programs using, for example, OpenMP or Java threads. DSM architecture greatly simplifies the development of parallel programs having good performance on a few processors. However, to achieve a good program scalability on DSM computers requires that the user understand data flow in the application and use various techniques to avoid data traffic congestions. In this paper we discuss a number of such techniques, including data blocking, data placement, data transposition and page size control and evaluate their efficiency on the NAS (NASA Advanced Supercomputing) Parallel Benchmarks. We also present a tool which automates the detection of constructs causing data congestions in Fortran array oriented codes and advises the user on code transformations for improving data traffic in the application.
Degree-of-Freedom Strengthened Cascade Array for DOD-DOA Estimation in MIMO Array Systems.

PubMed

Yao, Bobin; Dong, Zhi; Zhang, Weile; Wang, Wei; Wu, Qisheng

2018-05-14

In spatial spectrum estimation, difference co-array can provide extra degrees-of-freedom (DOFs) for promoting parameter identifiability and parameter estimation accuracy. For the sake of acquiring as more DOFs as possible with a given number of physical sensors, we herein design a novel sensor array geometry named cascade array. This structure is generated by systematically connecting a uniform linear array (ULA) and a non-uniform linear array, and can provide more DOFs than some exist array structures but less than the upper-bound indicated by minimum redundant array (MRA). We further apply this cascade array into multiple input multiple output (MIMO) array systems, and propose a novel joint direction of departure (DOD) and direction of arrival (DOA) estimation algorithm, which is based on a reduced-dimensional weighted subspace fitting technique. The algorithm is angle auto-paired and computationally efficient. Theoretical analysis and numerical simulations prove the advantages and effectiveness of the proposed array structure and the related algorithm.
Very high frequency (beyond 100 MHz) PZT kerfless linear arrays.

PubMed

Wu, Da-Wei; Zhou, Qifa; Geng, Xuecang; Liu, Chang-Geng; Djuth, Frank; Shung, K Kirk

2009-10-01

This paper presents the design, fabrication, and measurements of very high frequency kerfless linear arrays prepared from PZT film and PZT bulk material. A 12-microm PZT thick film fabricated from PZT-5H powder/solution composite and a piece of 15-microm PZT-5H sheet were used to fabricate 32-element kerfless high-frequency linear arrays with photolithography. The PZT thick film was prepared by spin-coating of PZT sol-gel composite solution. The thin PZT-5H sheet sample was prepared by lapping a PZT-5H ceramic with a precision lapping machine. The measured results of the 2 arrays were compared. The PZT film array had a center frequency of 120 MHz, a bandwidth of 60% with a parylene matching layer, and an insertion loss of 41 dB. The PZT ceramic sheet array was found to have a center frequency of 128 MHz with a poorer bandwidth (40% with a parylene matching layer) but a better sensitivity (28 dB insertion loss).
Very High Frequency (Beyond 100 MHz) PZT Kerfless Linear Arrays

PubMed Central

Wu, Da-Wei; Zhou, Qifa; Geng, Xuecang; Liu, Chang-Geng; Djuth, Frank; Shung, K. Kirk

2010-01-01

This paper presents the design, fabrication, and measurements of very high frequency kerfless linear arrays prepared from PZT film and PZT bulk material. A 12-µm PZT thick film fabricated from PZT-5H powder/solution composite and a piece of 15-µm PZT-5H sheet were used to fabricate 32-element kerfless high-frequency linear arrays with photolithography. The PZT thick film was prepared by spin-coating of PZT sol-gel composite solution. The thin PZT-5H sheet sample was prepared by lapping a PZT-5H ceramic with a precision lapping machine. The measured results of the 2 arrays were compared. The PZT film array had a center frequency of 120 MHz, a bandwidth of 60% with a parylene matching layer, and an insertion loss of 41 dB. The PZT ceramic sheet array was found to have a center frequency of 128 MHz with a poorer bandwidth (40% with a parylene matching layer) but a better sensitivity (28 dB insertion loss). PMID:19942516
Sparse array angle estimation using reduced-dimension ESPRIT-MUSIC in MIMO radar.

PubMed

Zhang, Chaozhu; Pang, Yucai

2013-01-01

Sparse linear arrays provide better performance than the filled linear arrays in terms of angle estimation and resolution with reduced size and low cost. However, they are subject to manifold ambiguity. In this paper, both the transmit array and receive array are sparse linear arrays in the bistatic MIMO radar. Firstly, we present an ESPRIT-MUSIC method in which ESPRIT algorithm is used to obtain ambiguous angle estimates. The disambiguation algorithm uses MUSIC-based procedure to identify the true direction cosine estimate from a set of ambiguous candidate estimates. The paired transmit angle and receive angle can be estimated and the manifold ambiguity can be solved. However, the proposed algorithm has high computational complexity due to the requirement of two-dimension search. Further, the Reduced-Dimension ESPRIT-MUSIC (RD-ESPRIT-MUSIC) is proposed to reduce the complexity of the algorithm. And the RD-ESPRIT-MUSIC only demands one-dimension search. Simulation results demonstrate the effectiveness of the method.

Extraction and Propagation of an Intense Rotating Electron Beam,

DTIC Science & Technology

1982-10-01

radiochromic foils positioned at z = 25 cm. The equal transmission density contours are ranked in linear order of increasing exposure (increasing current...flux encircled by the cathode e = %rc2Bc. Linearizing the equation of motion around the equilibrium, we can find the wavelength of small radial...the beam rotation. The mask which precedes the scint- illator is a linear array of dots while the projection is made up of two disjoint linear arrays
Multi-threaded parallel simulation of non-local non-linear problems in ultrashort laser pulse propagation in the presence of plasma

NASA Astrophysics Data System (ADS)

Baregheh, Mandana; Mezentsev, Vladimir; Schmitz, Holger

2011-06-01

We describe a parallel multi-threaded approach for high performance modelling of wide class of phenomena in ultrafast nonlinear optics. Specific implementation has been performed using the highly parallel capabilities of a programmable graphics processor.
Design of Small MEMS Microphone Array Systems for Direction Finding of Outdoors Moving Vehicles

PubMed Central

Zhang, Xin; Huang, Jingchang; Song, Enliang; Liu, Huawei; Li, Baoqing; Yuan, Xiaobing

2014-01-01

In this paper, a MEMS microphone array system scheme is proposed which implements real-time direction of arrival (DOA) estimation for moving vehicles. Wind noise is the primary source of unwanted noise on microphones outdoors. A multiple signal classification (MUSIC) algorithm is used in this paper for direction finding associated with spatial coherence to discriminate between the wind noise and the acoustic signals of a vehicle. The method is implemented in a SHARC DSP processor and the real-time estimated DOA is uploaded through Bluetooth or a UART module. Experimental results in different places show the validity of the system and the deviation is no bigger than 6° in the presence of wind noise. PMID:24603636
Multi-element germanium detectors for synchrotron applications

DOE PAGES

Rumaiz, A. K.; Kuczewski, A. J.; Mead, J.; ...

2018-04-27

In this paper, we have developed a series of monolithic multi-element germanium detectors, based on sensor arrays produced by the Forschungzentrum Julich, and on Application-specific integrated circuits (ASICs) developed at Brookhaven. Devices have been made with element counts ranging from 64 to 384. These detectors are being used at NSLS-II and APS for a range of diffraction experiments, both monochromatic and energy-dispersive. Compact and powerful readout systems have been developed, based on the new generation of FPGA system-on-chip devices, which provide closely coupled multi-core processors embedded in large gate arrays. Finally, we will discuss the technical details of the systems,more » and present some of the results from them.« less
Design of small MEMS microphone array systems for direction finding of outdoors moving vehicles.

PubMed

Zhang, Xin; Huang, Jingchang; Song, Enliang; Liu, Huawei; Li, Baoqing; Yuan, Xiaobing

2014-03-05

In this paper, a MEMS microphone array system scheme is proposed which implements real-time direction of arrival (DOA) estimation for moving vehicles. Wind noise is the primary source of unwanted noise on microphones outdoors. A multiple signal classification (MUSIC) algorithm is used in this paper for direction finding associated with spatial coherence to discriminate between the wind noise and the acoustic signals of a vehicle. The method is implemented in a SHARC DSP processor and the real-time estimated DOA is uploaded through Bluetooth or a UART module. Experimental results in different places show the validity of the system and the deviation is no bigger than 6° in the presence of wind noise.
Time-delayed directional beam phased array antenna

DOEpatents

Fund, Douglas Eugene; Cable, John William; Cecil, Tony Myron

2004-10-19

An antenna comprising a phased array of quadrifilar helix or other multifilar antenna elements and a time-delaying feed network adapted to feed the elements. The feed network can employ a plurality of coaxial cables that physically bridge a microstrip feed circuitry to feed power signals to the elements. The cables provide an incremental time delay which is related to their physical lengths, such that replacing cables having a first set of lengths with cables having a second set of lengths functions to change the time delay and shift or steer the antenna's main beam. Alternatively, the coaxial cables may be replaced with a programmable signal processor unit adapted to introduce the time delay using signal processing techniques applied to the power signals.
Multi-element germanium detectors for synchrotron applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rumaiz, A. K.; Kuczewski, A. J.; Mead, J.

In this paper, we have developed a series of monolithic multi-element germanium detectors, based on sensor arrays produced by the Forschungzentrum Julich, and on Application-specific integrated circuits (ASICs) developed at Brookhaven. Devices have been made with element counts ranging from 64 to 384. These detectors are being used at NSLS-II and APS for a range of diffraction experiments, both monochromatic and energy-dispersive. Compact and powerful readout systems have been developed, based on the new generation of FPGA system-on-chip devices, which provide closely coupled multi-core processors embedded in large gate arrays. Finally, we will discuss the technical details of the systems,more » and present some of the results from them.« less
Digital micromirror devices: principles and applications in imaging.

PubMed

Bansal, Vivek; Saggau, Peter

2013-05-01

A digital micromirror device (DMD) is an array of individually switchable mirrors that can be used in many advanced optical systems as a rapid spatial light modulator. With a DMD, several implementations of confocal microscopy, hyperspectral imaging, and fluorescence lifetime imaging can be realized. The DMD can also be used as a real-time optical processor for applications such as the programmable array microscope and compressive sensing. Advantages and disadvantages of the DMD for these applications as well as methods to overcome some of the limitations will be discussed in this article. Practical considerations when designing with the DMD and sample optical layouts of a completely DMD-based imaging system and one in which acousto-optic deflectors (AODs) are used in the illumination pathway are also provided.
Development of a ground signal processor for digital synthetic array radar data

NASA Technical Reports Server (NTRS)

Griffin, C. R.; Estes, J. M.

1981-01-01

A modified APQ-102 sidelooking array radar (SLAR) in a B-57 aircraft test bed is used, with other optical and infrared sensors, in remote sensing of Earth surface features for various users at NASA Johnson Space Center. The video from the radar is normally recorded on photographic film and subsequently processed photographically into high resolution radar images. Using a high speed sampling (digitizing) system, the two receiver channels of cross-and co-polarized video are recorded on wideband magnetic tape along with radar and platform parameters. These data are subsequently reformatted and processed into digital synthetic aperture radar images with the image data available on magnetic tape for subsequent analysis by investigators. The system design and results obtained are described.
Impact localization on composite structures using time difference and MUSIC approach

NASA Astrophysics Data System (ADS)

Zhong, Yongteng; Xiang, Jiawei

2017-05-01

1-D uniform linear array (ULA) has the shortcoming of the half-plane mirror effect, which does not allow discriminating between a target placed above the array and a target placed below the array. This paper presents time difference (TD) and multiple signal classification (MUSIC) based omni-directional impact localization on a large stiffened composite structure using improved linear array, which is able to perform omni-directional 360° localization. This array contains 2M+3 PZT sensors, where 2M+1 PZT sensors are arranged as a uniform linear array, and the other two PZT sensors are placed above and below the array. Firstly, the arrival times of impact signals observed by the other two sensors are determined using the wavelet transform. Compared with each other, the direction range of impact source can be decided in general, 0°to 180° or 180°to 360°. And then, two dimensional multiple signal classification (2D-MUSIC) based spatial spectrum formula using the uniform linear array is applied for impact localization by the general direction range. When the arrival times of impact signals observed by upper PZT is equal to that of lower PZT, the direction can be located in x axis (0°or 180°). And time difference based MUSIC method is present to locate impact position. To verify the proposed approach, the proposed approach is applied to a composite structure. The localization results are in good agreement with the actual impact occurring positions.
Linear CCD attitude measurement system based on the identification of the auxiliary array CCD

NASA Astrophysics Data System (ADS)

Hu, Yinghui; Yuan, Feng; Li, Kai; Wang, Yan

2015-10-01

Object to the high precision flying target attitude measurement issues of a large space and large field of view, comparing existing measurement methods, the idea is proposed of using two array CCD to assist in identifying the three linear CCD with multi-cooperative target attitude measurement system, and to address the existing nonlinear system errors and calibration parameters and more problems with nine linear CCD spectroscopic test system of too complicated constraints among camera position caused by excessive. The mathematical model of binocular vision and three linear CCD test system are established, co-spot composition triangle utilize three red LED position light, three points' coordinates are given in advance by Cooperate Measuring Machine, the red LED in the composition of the three sides of a triangle adds three blue LED light points as an auxiliary, so that array CCD is easier to identify three red LED light points, and linear CCD camera is installed of a red filter to filter out the blue LED light points while reducing stray light. Using array CCD to measure the spot, identifying and calculating the spatial coordinates solutions of red LED light points, while utilizing linear CCD to measure three red LED spot for solving linear CCD test system, which can be drawn from 27 solution. Measured with array CCD coordinates auxiliary linear CCD has achieved spot identification, and has solved the difficult problems of multi-objective linear CCD identification. Unique combination of linear CCD imaging features, linear CCD special cylindrical lens system is developed using telecentric optical design, the energy center of the spot position in the depth range of convergence in the direction is perpendicular to the optical axis of the small changes ensuring highprecision image quality, and the entire test system improves spatial object attitude measurement speed and precision.
Thermal-Independent Properties of PIN-PMN-PT Single-Crystal Linear-Array Ultrasonic Transducers

PubMed Central

Chen, Ruimin; Wu, Jinchuan; Lam, Kwok Ho; Yao, Liheng; Zhou, Qifa; Tian, Jian; Han, Pengdi; Shung, K. Kirk

2013-01-01

In this paper, low-frequency 32-element linear-array ultrasonic transducers were designed and fabricated using both ternary Pb(In1/2Nb1/2)–Pb(Mg1/3Nb2/3)–PbTiO3 (PIN-PMN-PT) and binary Pb(Mg1/3Nb2/3)–PbTiO3 (PMN-PT) single crystals. Performance of the array transducers was characterized as a function of temperature ranging from room temperature to 160°C. It was found that the array transducers fabricated using the PIN-PMN-PT single crystal were capable of satisfactory performance at 160°C, having a −6-dB bandwidth of 66% and an insertion loss of 37 dB. The results suggest that the potential of PIN-PMN-PT linear-array ultrasonic transducers for high-temperature ultrasonic transducer applications is promising. PMID:23221227
Experiment on a three-beam adaptive array for EHF frequency-hopped signals using a fast algorithm, phase-D

NASA Astrophysics Data System (ADS)

Yen, J. L.; Kremer, P.; Amin, N.; Fung, J.

1989-05-01

The Department of National Defence (Canada) has been conducting studies into multi-beam adaptive arrays for extremely high frequency (EHF) frequency hopped signals. A three-beam 43 GHz adaptive antenna and a beam control processor is under development. An interactive software package for the operation of the array, capable of applying different control algorithms is being written. A maximum signal to jammer plus noise ratio (SJNR) was found to provide superior performance in preventing degradation of user signals in the presence of nearby jammers. A new fast algorithm using a modified conjugate gradient approach was found to be a very efficient way to implement anti-jamming arrays based on maximum SJNR criterion. The present study was intended to refine and simplify this algorithm and to implement the algorithm on an experimental array for real-time evaluation of anti-jamming performance. A three-beam adaptive array was used. A simulation package was used in the evaluation of multi-beam systems using more than three beams and different user-jammer scenarios. An attempt to further reduce the computation burden through continued analysis of maximum SJNR met with limited success. A method to acquire and track an incoming laser beam is proposed.
Experiment on a three-beam adaptive array for EHF frequency-hopped signals using a fast algorithm, phase E

NASA Astrophysics Data System (ADS)

Yen, J. L.; Kremer, P.; Fung, J.

1990-05-01

The Department of National Defence (Canada) has been conducting studies into multi-beam adaptive arrays for extremely high frequency (EHF) frequency hopped signals. A three-beam 43 GHz adaptive antenna and a beam control processor is under development. An interactive software package for the operation of the array, capable of applying different control algorithms is being written. A maximum signal to jammer plus noise ratio (SJNR) has been found to provide superior performance in preventing degradation of user signals in the presence of nearby jammers. A new fast algorithm using a modified conjugate gradient approach has been found to be a very efficient way to implement anti-jamming arrays based on maximum SJNR criterion. The present study was intended to refine and simplify this algorithm and to implement the algorithm on an experimental array for real-time evaluation of anti-jamming performance. A three-beam adaptive array was used. A simulation package was used in the evaluation of multi-beam systems using more than three beams and different user-jammer scenarios. An attempt to further reduce the computation burden through further analysis of maximum SJNR met with limited success. The investigation of a new angle detector for spatial tracking in heterodyne laser space communications was completed.
Dragon Ears airborne acoustic array: CSP analysis applied to cross array to compute real-time 2D acoustic sound field

NASA Astrophysics Data System (ADS)

Cerwin, Steve; Barnes, Julie; Kell, Scott; Walters, Mark

2003-09-01

This paper describes development and application of a novel method to accomplish real-time solid angle acoustic direction finding using two 8-element orthogonal microphone arrays. The developed prototype system was intended for localization and signature recognition of ground-based sounds from a small UAV. Recent advances in computer speeds have enabled the implementation of microphone arrays in many audio applications. Still, the real-time presentation of a two-dimensional sound field for the purpose of audio target localization is computationally challenging. In order to overcome this challenge, a crosspower spectrum phase1 (CSP) technique was applied to each 8-element arm of a 16-element cross array to provide audio target localization. In this paper, we describe the technique and compare it with two other commonly used techniques; Cross-Spectral Matrix2 and MUSIC3. The results show that the CSP technique applied to two 8-element orthogonal arrays provides a computationally efficient solution with reasonable accuracy and tolerable artifacts, sufficient for real-time applications. Additional topics include development of a synchronized 16-channel transmitter and receiver to relay the airborne data to the ground-based processor and presentation of test data demonstrating both ground-mounted operation and airborne localization of ground-based gunshots and loud engine sounds.
LDRD final report on massively-parallel linear programming : the parPCx system.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Parekh, Ojas; Phillips, Cynthia Ann; Boman, Erik Gunnar

2005-02-01

This report summarizes the research and development performed from October 2002 to September 2004 at Sandia National Laboratories under the Laboratory-Directed Research and Development (LDRD) project ''Massively-Parallel Linear Programming''. We developed a linear programming (LP) solver designed to use a large number of processors. LP is the optimization of a linear objective function subject to linear constraints. Companies and universities have expended huge efforts over decades to produce fast, stable serial LP solvers. Previous parallel codes run on shared-memory systems and have little or no distribution of the constraint matrix. We have seen no reports of general LP solver runsmore » on large numbers of processors. Our parallel LP code is based on an efficient serial implementation of Mehrotra's interior-point predictor-corrector algorithm (PCx). The computational core of this algorithm is the assembly and solution of a sparse linear system. We have substantially rewritten the PCx code and based it on Trilinos, the parallel linear algebra library developed at Sandia. Our interior-point method can use either direct or iterative solvers for the linear system. To achieve a good parallel data distribution of the constraint matrix, we use a (pre-release) version of a hypergraph partitioner from the Zoltan partitioning library. We describe the design and implementation of our new LP solver called parPCx and give preliminary computational results. We summarize a number of issues related to efficient parallel solution of LPs with interior-point methods including data distribution, numerical stability, and solving the core linear system using both direct and iterative methods. We describe a number of applications of LP specific to US Department of Energy mission areas and we summarize our efforts to integrate parPCx (and parallel LP solvers in general) into Sandia's massively-parallel integer programming solver PICO (Parallel Interger and Combinatorial Optimizer). We conclude with directions for long-term future algorithmic research and for near-term development that could improve the performance of parPCx.« less
An efficient parallel-processing method for transposing large matrices in place.

PubMed

Portnoff, M R

1999-01-01

We have developed an efficient algorithm for transposing large matrices in place. The algorithm is efficient because data are accessed either sequentially in blocks or randomly within blocks small enough to fit in cache, and because the same indexing calculations are shared among identical procedures operating on independent subsets of the data. This inherent parallelism makes the method well suited for a multiprocessor computing environment. The algorithm is easy to implement because the same two procedures are applied to the data in various groupings to carry out the complete transpose operation. Using only a single processor, we have demonstrated nearly an order of magnitude increase in speed over the previously published algorithm by Gate and Twigg for transposing a large rectangular matrix in place. With multiple processors operating in parallel, the processing speed increases almost linearly with the number of processors. A simplified version of the algorithm for square matrices is presented as well as an extension for matrices large enough to require virtual memory.
Long linear MWIR and LWIR HgCdTe infrared detection arrays for high resolution imaging

NASA Astrophysics Data System (ADS)

Chamonal, Jean-Paul; Audebert, Patrick; Medina, Philippe; Destefanis, Gérard; Deschamps, Joel R.; Girard, Michel; Chatard, Jean-Pierre

2018-04-01

This paper, "Long linear MWIR and LWIR HgCdTe infrared detection arrays for high resolution imaging," was presented as part of International Conference on Space Optics—ICSO 1997, held in Toulouse, France.
New fabrication of high-frequency (100-MHz) ultrasound PZT film kerfless linear array.

PubMed

Zhu, Benpeng; Chan, Ngai Yui; Dai, Jiyan; Shung, K Kirk; Takeuchi, Shinichi; Zhou, Qifa

2013-04-01

The paper describes the design, fabrication, and measurements of a high-frequency ultrasound kerfless linear array prepared from hydrothermal lead zirconate titanate (PZT) thick film. The 15-μm hydrothermal PZT thick film with an area of 1 × 1 cm, obtained through a self-separation process from Ti substrate, was used to fabricate a 32-element 100-MHz kerfless linear array with photolithography. The bandwidth at -6 dB without matching layer, insertion loss around center frequency, and crosstalk between adjacent elements were measured to be 39%, -30 dB, and -15 dB, respectively.
Linear-array based full-view high-resolution photoacoustic computed tomography of whole mouse brain functions in vivo

NASA Astrophysics Data System (ADS)

Li, Lei; Zhang, Pengfei; Wang, Lihong V.

2018-02-01

Photoacoustic computed tomography (PACT) is a non-invasive imaging technique offering high contrast, high resolution, and deep penetration in biological tissues. We report a photoacoustic computed tomography (PACT) system equipped with a high frequency linear array for anatomical and functional imaging of the mouse whole brain. The linear array was rotationally scanned in the coronal plane to achieve the full-view coverage. We investigated spontaneous neural activities in the deep brain by monitoring the hemodynamics and observed strong interhemispherical correlations between contralateral regions, both in the cortical layer and in the deep regions.

A performance analysis of advanced I/O architectures for PC-based network file servers

NASA Astrophysics Data System (ADS)

Huynh, K. D.; Khoshgoftaar, T. M.

1994-12-01

In the personal computing and workstation environments, more and more I/O adapters are becoming complete functional subsystems that are intelligent enough to handle I/O operations on their own without much intervention from the host processor. The IBM Subsystem Control Block (SCB) architecture has been defined to enhance the potential of these intelligent adapters by defining services and conventions that deliver command information and data to and from the adapters. In recent years, a new storage architecture, the Redundant Array of Independent Disks (RAID), has been quickly gaining acceptance in the world of computing. In this paper, we would like to discuss critical system design issues that are important to the performance of a network file server. We then present a performance analysis of the SCB architecture and disk array technology in typical network file server environments based on personal computers (PCs). One of the key issues investigated in this paper is whether a disk array can outperform a group of disks (of same type, same data capacity, and same cost) operating independently, not in parallel as in a disk array.
Efficiently modeling neural networks on massively parallel computers

NASA Technical Reports Server (NTRS)

Farber, Robert M.

1993-01-01

Neural networks are a very useful tool for analyzing and modeling complex real world systems. Applying neural network simulations to real world problems generally involves large amounts of data and massive amounts of computation. To efficiently handle the computational requirements of large problems, we have implemented at Los Alamos a highly efficient neural network compiler for serial computers, vector computers, vector parallel computers, and fine grain SIMD computers such as the CM-2 connection machine. This paper describes the mapping used by the compiler to implement feed-forward backpropagation neural networks for a SIMD (Single Instruction Multiple Data) architecture parallel computer. Thinking Machines Corporation has benchmarked our code at 1.3 billion interconnects per second (approximately 3 gigaflops) on a 64,000 processor CM-2 connection machine (Singer 1990). This mapping is applicable to other SIMD computers and can be implemented on MIMD computers such as the CM-5 connection machine. Our mapping has virtually no communications overhead with the exception of the communications required for a global summation across the processors (which has a sub-linear runtime growth on the order of O(log(number of processors)). We can efficiently model very large neural networks which have many neurons and interconnects and our mapping can extend to arbitrarily large networks (within memory limitations) by merging the memory space of separate processors with fast adjacent processor interprocessor communications. This paper will consider the simulation of only feed forward neural network although this method is extendable to recurrent networks.
Detection Capability of Linear-and-Power Processor for Random Burst Signals of Unknown Location

DTIC Science & Technology

1997-08-25

Technology Directorate of the Office of Naval Research, Ronald Tipper (ONR-322B). The technical reviewer for this report was Brian F. Harrison (Code 2121...Research Laboratory, B. Adams R. Fiddler E. Franchi R. Wagstaff Naval Oceanographic Office, MS Naval Personnel Research and Development Center
Adapting a Navier-Stokes code to the ICL-DAP

NASA Technical Reports Server (NTRS)

Grosch, C. E.

1985-01-01

The results of an experiment are reported, i.c., to adapt a Navier-Stokes code, originally developed on a serial computer, to concurrent processing on the CL Distributed Array Processor (DAP). The algorithm used in solving the Navier-Stokes equations is briefly described. The architecture of the DAP and DAP FORTRAN are also described. The modifications of the algorithm so as to fit the DAP are given and discussed. Finally, performance results are given and conclusions are drawn.
Readout and DAQ for Pixel Detectors

NASA Astrophysics Data System (ADS)

Platkevic, Michal

2010-01-01

Data readout and acquisition control of pixel detectors demand the transfer of significantly a large amounts of bits between the detector and the computer. For this purpose dedicated interfaces are used which are designed with focus on features like speed, small dimensions or flexibility of use such as digital signal processors, field-programmable gate arrays (FPGA) and USB communication ports. This work summarizes the readout and DAQ system built for state-of-the-art pixel detectors of the Medipix family.
Workshop on Future Directions for Optical Information Processing.

DTIC Science & Technology

1981-03-01

h . The i reference point source simultaneously illuminates the i member of a family of n phase-encoding Aiffusers (e.g. shower glass , ground glass ...diffuser (ground glass ) section illuminated with a plane wave [35.37). The n(n-1) - 4(3) - 12 crosstalk terms have been distributed into the noise...for 2x2 input Fig. 6. Outnut of processor analogous to that array, l.Sx magnifier, ground glass diffuser of Fig. 5, but using spherical wavefront and
Radiation Hardened Low Power Digital Signal Processor

DTIC Science & Technology

2005-04-15

Image Figure 53.0 Point Spread Function PSF Figure 54.0 Restored Image and Restored PSF Figure 55.0 Newly Created Array Figure 56.0 Deblurred Image and... noise and interference rejection. WOA’s of 32-taps and greater are easily managed by the TCSP. An architecture that could efficiently perform filter...to quickly calculate a Remez filter impulse response to be used in place of the window function. Using the Remez exchange algorithm to calculate the
ALMA Correlator Real-Time Data Processor

NASA Astrophysics Data System (ADS)

Pisano, J.; Amestica, R.; Perez, J.

2005-10-01

The design of a real-time Linux application utilizing Real-Time Application Interface (RTAI) to process real-time data from the radio astronomy correlator for the Atacama Large Millimeter Array (ALMA) is described. The correlator is a custom-built digital signal processor which computes the cross-correlation function of two digitized signal streams. ALMA will have 64 antennas with 2080 signal streams each with a sample rate of 4 giga-samples per second. The correlator's aggregate data output will be 1 gigabyte per second. The software is defined by hard deadlines with high input and processing data rates, while requiring interfaces to non real-time external computers. The designed computer system - the Correlator Data Processor or CDP, consists of a cluster of 17 SMP computers, 16 of which are compute nodes plus a master controller node all running real-time Linux kernels. Each compute node uses an RTAI kernel module to interface to a 32-bit parallel interface which accepts raw data at 64 megabytes per second in 1 megabyte chunks every 16 milliseconds. These data are transferred to tasks running on multiple CPUs in hard real-time using RTAI's LXRT facility to perform quantization corrections, data windowing, FFTs, and phase corrections for a processing rate of approximately 1 GFLOPS. Highly accurate timing signals are distributed to all seventeen computer nodes in order to synchronize them to other time-dependent devices in the observatory array. RTAI kernel tasks interface to the timing signals providing sub-millisecond timing resolution. The CDP interfaces, via the master node, to other computer systems on an external intra-net for command and control, data storage, and further data (image) processing. The master node accesses these external systems utilizing ALMA Common Software (ACS), a CORBA-based client-server software infrastructure providing logging, monitoring, data delivery, and intra-computer function invocation. The software is being developed in tandem with the correlator hardware which presents software engineering challenges as the hardware evolves. The current status of this project and future goals are also presented.
QuBiLS-MIDAS: a parallel free-software for molecular descriptors computation based on multilinear algebraic maps.

PubMed

García-Jacas, César R; Marrero-Ponce, Yovani; Acevedo-Martínez, Liesner; Barigye, Stephen J; Valdés-Martiní, José R; Contreras-Torres, Ernesto

2014-07-05

The present report introduces the QuBiLS-MIDAS software belonging to the ToMoCoMD-CARDD suite for the calculation of three-dimensional molecular descriptors (MDs) based on the two-linear (bilinear), three-linear, and four-linear (multilinear or N-linear) algebraic forms. Thus, it is unique software that computes these tensor-based indices. These descriptors, establish relations for two, three, and four atoms by using several (dis-)similarity metrics or multimetrics, matrix transformations, cutoffs, local calculations and aggregation operators. The theoretical background of these N-linear indices is also presented. The QuBiLS-MIDAS software was developed in the Java programming language and employs the Chemical Development Kit library for the manipulation of the chemical structures and the calculation of the atomic properties. This software is composed by a desktop user-friendly interface and an Abstract Programming Interface library. The former was created to simplify the configuration of the different options of the MDs, whereas the library was designed to allow its easy integration to other software for chemoinformatics applications. This program provides functionalities for data cleaning tasks and for batch processing of the molecular indices. In addition, it offers parallel calculation of the MDs through the use of all available processors in current computers. The studies of complexity of the main algorithms demonstrate that these were efficiently implemented with respect to their trivial implementation. Lastly, the performance tests reveal that this software has a suitable behavior when the amount of processors is increased. Therefore, the QuBiLS-MIDAS software constitutes a useful application for the computation of the molecular indices based on N-linear algebraic maps and it can be used freely to perform chemoinformatics studies. Copyright © 2014 Wiley Periodicals, Inc.
Thermopile Detector Arrays for Space Science Applications

NASA Technical Reports Server (NTRS)

Foote, M. C.; Kenyon, M.; Krueger, T. R.; McCann, T. A.; Chacon, R.; Jones, E. W.; Dickie, M. R.; Schofield, J. T.; McCleese, D. J.; Gaalema, S.

2004-01-01

Thermopile detectors are widely used in uncooled applications where small numbers of detectors are required, particularly in low-cost commercial applications or applications requiring accurate radiometry. Arrays of thermopile detectors, however, have not been developed to the extent of uncooled bolometer and pyroelectric/ferroelectric arrays. Efforts at JPL seek to remedy this deficiency by developing high performance thin-film thermopile detectors in both linear and two-dimensional formats. The linear thermopile arrays are produced by bulk micromachining and wire bonded to separate CMOS readout electronic chips. Such arrays are currently being fabricated for the Mars Climate Sounder instrument, scheduled for launch in 2005. Progress is also described towards realizing a two-dimensional thermopile array built over CMOS readout circuitry in the substrate.
MTF measurement and analysis of linear array HgCdTe infrared detectors

NASA Astrophysics Data System (ADS)

Zhang, Tong; Lin, Chun; Chen, Honglei; Sun, Changhong; Lin, Jiamu; Wang, Xi

2018-01-01

The slanted-edge technique is the main method for measurement detectors MTF, however this method is commonly used on planar array detectors. In this paper the authors present a modified slanted-edge method to measure the MTF of linear array HgCdTe detectors. Crosstalk is one of the major factors that degrade the MTF value of such an infrared detector. This paper presents an ion implantation guard-ring structure which was designed to effectively absorb photo-carriers that may laterally defuse between adjacent pixels thereby suppressing crosstalk. Measurement and analysis of the MTF of the linear array detectors with and without a guard-ring were carried out. The experimental results indicated that the ion implantation guard-ring structure effectively suppresses crosstalk and increases MTF value.
Sonography of the chest using linear-array versus sector transducers: Correlation with auscultation, chest radiography, and computed tomography.

PubMed

Tasci, Ozlem; Hatipoglu, Osman Nuri; Cagli, Bekir; Ermis, Veli

2016-07-08

The primary purpose of our study was to compare the efficacies of two sonographic (US) probes, a high-frequency linear-array probe and a lower-frequency phased-array sector probe in the diagnosis of basic thoracic pathologies. The secondary purpose was to compare the diagnostic performance of thoracic US with auscultation and chest radiography (CXR) using thoracic CT as a gold standard. In total, 55 consecutive patients scheduled for thoracic CT were enrolled in this prospective study. Four pathologic entities were evaluated: pneumothorax, pleural effusion, consolidation, and interstitial syndrome. A portable US scanner was used with a 5-10-MHz linear-array probe and a 1-5-MHz phased-array sector probe. The first probe used was chosen randomly. US, CXR, and auscultation results were compared with the CT results. The linear-array probe had the highest performance in the identification of pneumothorax (83% sensitivity, 100% specificity, and 99% diagnostic accuracy) and pleural effusion (100% sensitivity, 97% specificity, and 98% diagnostic accuracy); the sector probe had the highest performance in the identification of consolidation (89% sensitivity, 100% specificity, and 95% diagnostic accuracy) and interstitial syndrome (94% sensitivity, 93% specificity, and 94% diagnostic accuracy). For all pathologies, the performance of US was superior to those of CXR and auscultation. The linear probe is superior to the sector probe for identifying pleural pathologies, whereas the sector probe is superior to the linear probe for identifying parenchymal pathologies. Thoracic US has better diagnostic performance than CXR and auscultation for the diagnosis of common pathologic conditions of the chest. © 2016 Wiley Periodicals, Inc. J Clin Ultrasound 44:383-389, 2016. © 2016 Wiley Periodicals, Inc.
Large-Scale Parallel Viscous Flow Computations using an Unstructured Multigrid Algorithm

NASA Technical Reports Server (NTRS)

Mavriplis, Dimitri J.

1999-01-01

The development and testing of a parallel unstructured agglomeration multigrid algorithm for steady-state aerodynamic flows is discussed. The agglomeration multigrid strategy uses a graph algorithm to construct the coarse multigrid levels from the given fine grid, similar to an algebraic multigrid approach, but operates directly on the non-linear system using the FAS (Full Approximation Scheme) approach. The scalability and convergence rate of the multigrid algorithm are examined on the SGI Origin 2000 and the Cray T3E. An argument is given which indicates that the asymptotic scalability of the multigrid algorithm should be similar to that of its underlying single grid smoothing scheme. For medium size problems involving several million grid points, near perfect scalability is obtained for the single grid algorithm, while only a slight drop-off in parallel efficiency is observed for the multigrid V- and W-cycles, using up to 128 processors on the SGI Origin 2000, and up to 512 processors on the Cray T3E. For a large problem using 25 million grid points, good scalability is observed for the multigrid algorithm using up to 1450 processors on a Cray T3E, even when the coarsest grid level contains fewer points than the total number of processors.
List-mode PET image reconstruction for motion correction using the Intel XEON PHI co-processor

NASA Astrophysics Data System (ADS)

Ryder, W. J.; Angelis, G. I.; Bashar, R.; Gillam, J. E.; Fulton, R.; Meikle, S.

2014-03-01

List-mode image reconstruction with motion correction is computationally expensive, as it requires projection of hundreds of millions of rays through a 3D array. To decrease reconstruction time it is possible to use symmetric multiprocessing computers or graphics processing units. The former can have high financial costs, while the latter can require refactoring of algorithms. The Xeon Phi is a new co-processor card with a Many Integrated Core architecture that can run 4 multiple-instruction, multiple data threads per core with each thread having a 512-bit single instruction, multiple data vector register. Thus, it is possible to run in the region of 220 threads simultaneously. The aim of this study was to investigate whether the Xeon Phi co-processor card is a viable alternative to an x86 Linux server for accelerating List-mode PET image reconstruction for motion correction. An existing list-mode image reconstruction algorithm with motion correction was ported to run on the Xeon Phi coprocessor with the multi-threading implemented using pthreads. There were no differences between images reconstructed using the Phi co-processor card and images reconstructed using the same algorithm run on a Linux server. However, it was found that the reconstruction runtimes were 3 times greater for the Phi than the server. A new version of the image reconstruction algorithm was developed in C++ using OpenMP for mutli-threading and the Phi runtimes decreased to 1.67 times that of the host Linux server. Data transfer from the host to co-processor card was found to be a rate-limiting step; this needs to be carefully considered in order to maximize runtime speeds. When considering the purchase price of a Linux workstation with Xeon Phi co-processor card and top of the range Linux server, the former is a cost-effective computation resource for list-mode image reconstruction. A multi-Phi workstation could be a viable alternative to cluster computers at a lower cost for medical imaging applications.
Bitstream decoding processor for fast entropy decoding of variable length coding-based multiformat videos

NASA Astrophysics Data System (ADS)

Jo, Hyunho; Sim, Donggyu

2014-06-01

We present a bitstream decoding processor for entropy decoding of variable length coding-based multiformat videos. Since most of the computational complexity of entropy decoders comes from bitstream accesses and table look-up process, the developed bitstream processing unit (BsPU) has several designated instructions to access bitstreams and to minimize branch operations in the table look-up process. In addition, the instruction for bitstream access has the capability to remove emulation prevention bytes (EPBs) of H.264/AVC without initial delay, repeated memory accesses, and additional buffer. Experimental results show that the proposed method for EPB removal achieves a speed-up of 1.23 times compared to the conventional EPB removal method. In addition, the BsPU achieves speed-ups of 5.6 and 3.5 times in entropy decoding of H.264/AVC and MPEG-4 Visual bitstreams, respectively, compared to an existing processor without designated instructions and a new table mapping algorithm. The BsPU is implemented on a Xilinx Virtex5 LX330 field-programmable gate array. The MPEG-4 Visual (ASP, Level 5) and H.264/AVC (Main Profile, Level 4) are processed using the developed BsPU with a core clock speed of under 250 MHz in real time.
A K-Band Linear Phased Array Antenna Based on Ba(0.60)Sr(0.40)TiO3 Thin Film Phase Shifters

NASA Technical Reports Server (NTRS)

Romanofsky, R.; Bernhard, J.; Washington, G.; VanKeuls, F.; Miranda, F.; Cannedy, C.

2000-01-01

This paper summarizes the development of a 23.675 GHz linear 16-element scanning phased array antenna based on thin ferroelectric film coupled microstripline phase shifters and microstrip patch radiators.
Linear scaling computation of the Fock matrix. VI. Data parallel computation of the exchange-correlation matrix

NASA Astrophysics Data System (ADS)

Gan, Chee Kwan; Challacombe, Matt

2003-05-01

Recently, early onset linear scaling computation of the exchange-correlation matrix has been achieved using hierarchical cubature [J. Chem. Phys. 113, 10037 (2000)]. Hierarchical cubature differs from other methods in that the integration grid is adaptive and purely Cartesian, which allows for a straightforward domain decomposition in parallel computations; the volume enclosing the entire grid may be simply divided into a number of nonoverlapping boxes. In our data parallel approach, each box requires only a fraction of the total density to perform the necessary numerical integrations due to the finite extent of Gaussian-orbital basis sets. This inherent data locality may be exploited to reduce communications between processors as well as to avoid memory and copy overheads associated with data replication. Although the hierarchical cubature grid is Cartesian, naive boxing leads to irregular work loads due to strong spatial variations of the grid and the electron density. In this paper we describe equal time partitioning, which employs time measurement of the smallest sub-volumes (corresponding to the primitive cubature rule) to load balance grid-work for the next self-consistent-field iteration. After start-up from a heuristic center of mass partitioning, equal time partitioning exploits smooth variation of the density and grid between iterations to achieve load balance. With the 3-21G basis set and a medium quality grid, equal time partitioning applied to taxol (62 heavy atoms) attained a speedup of 61 out of 64 processors, while for a 110 molecule water cluster at standard density it achieved a speedup of 113 out of 128. The efficiency of equal time partitioning applied to hierarchical cubature improves as the grid work per processor increases. With a fine grid and the 6-311G(df,p) basis set, calculations on the 26 atom molecule α-pinene achieved a parallel efficiency better than 99% with 64 processors. For more coarse grained calculations, superlinear speedups are found to result from reduced computational complexity associated with data parallelism.
Micromachined poly-SiGe bolometer arrays for infrared imaging and spectroscopy

NASA Astrophysics Data System (ADS)

Leonov, Vladimir N.; Perova, Natalia A.; De Moor, Piet; Du Bois, Bert; Goessens, Claus; Grietens, Bob; Verbist, Agnes; Van Hoof, Chris A.; Vermeiren, Jan P.

2003-03-01

The state-of-the-art characteristics of micromachined polycrystalline SiGe microbolometer arrays are reported. An average NETD of 85 mK at a time constant of 14 ms is already achievable on typical self-supported 50 μm pixels in a linear 64-element array. In order to reach these values, the design optimization was performed based on the performance characteristics of linear 32-, 64- and 128-element arrays of 50-, 60- and 75-μm-pixel bolometers on several detector lots. The infrared and thermal modeling accounting for the read-out properties and self-heating effect in bolometers resulted in improved designs and competitive NETD values of 80 mK on 50 μm pixels in a 160x128 format at standard frame rates and f-number of 1. In parallel, the TCR-to-1/f noise ratio and the mechanical design of the pixels were improved making poly-SiGe a good candidate for a low-cost uncooled thermal array. The technological CMOS-based process possesses an attractive balance between characteristics and price, and allows the micromachining of thin structures, less than 0.2 μm. The resistance and TCR non-uniformity with σ/μ better than 0.2% combined with 99.93% yield are demonstrated. The first lots of fully processed linear arrays have already come from the IMEC process line and the results of characterization are presented. Next year, the first linear and small 2D arrays will be introduced on the market.
Co-Prime Frequency and Aperture Design for HF Surveillance, Wideband Radar Imaging, and Nonstationary Array Processing

DTIC Science & Technology

2018-03-01

offset designs . Particularly, the proposed CA-CFO is compared with uniform linear array and uniform frequency offset (ULA-UFO). Uniform linear array...and Aperture Design for HF Surveillance, Wideband Radar Imaging, and Nonstationary Array Processing (Grant No. N00014-13-1-0061) Submitted to...Contents 1. Executive Summary …………………………………………………………………………. 1 1.1. Generalized Co-Prime Array Design ………………………………………………… 1 1.2. Wideband
Mixed Linear/Square-Root Encoded Single Slope Ramp Provides a Fast, Low Noise Analog to Digital Converter with Very High Linearity for Focal Plane Arrays

NASA Technical Reports Server (NTRS)

Wrigley, Christopher James (Inventor); Hancock, Bruce R. (Inventor); Cunningham, Thomas J. (Inventor); Newton, Kenneth W. (Inventor)

2014-01-01

An analog-to-digital converter (ADC) converts pixel voltages from a CMOS image into a digital output. A voltage ramp generator generates a voltage ramp that has a linear first portion and a non-linear second portion. A digital output generator generates a digital output based on the voltage ramp, the pixel voltages, and comparator output from an array of comparators that compare the voltage ramp to the pixel voltages. A return lookup table linearizes the digital output values.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.