optical matrix-vector processors: Topics by Science.gov

Sample records for optical matrix-vector processors

Real-time optical laboratory solution of parabolic differential equations

NASA Technical Reports Server (NTRS)

Casasent, David; Jackson, James

1988-01-01

An optical laboratory matrix-vector processor is used to solve parabolic differential equations (the transient diffusion equation with two space variables and time) by an explicit algorithm. This includes optical matrix-vector nonbase-2 encoded laboratory data, the combination of nonbase-2 and frequency-multiplexed data on such processors, a high-accuracy optical laboratory solution of a partial differential equation, new data partitioning techniques, and a discussion of a multiprocessor optical matrix-vector architecture.
Implementation and Assessment of Advanced Analog Vector-Matrix Processor

NASA Technical Reports Server (NTRS)

Gary, Charles K.; Bualat, Maria G.; Lum, Henry, Jr. (Technical Monitor)

1994-01-01

This paper discusses the design and implementation of an analog optical vecto-rmatrix coprocessor with a throughput of 128 Mops for a personal computer. Vector matrix calculations are inherently parallel, providing a promising domain for the use of optical calculators. However, to date, digital optical systems have proven too cumbersome to replace electronics, and analog processors have not demonstrated sufficient accuracy in large scale systems. The goal of the work described in this paper is to demonstrate a viable optical coprocessor for linear operations. The analog optical processor presented has been integrated with a personal computer to provide full functionality and is the first demonstration of an optical linear algebra processor with a throughput greater than 100 Mops. The optical vector matrix processor consists of a laser diode source, an acoustooptical modulator array to input the vector information, a liquid crystal spatial light modulator to input the matrix information, an avalanche photodiode array to read out the result vector of the vector matrix multiplication, as well as transport optics and the electronics necessary to drive the optical modulators and interface to the computer. The intent of this research is to provide a low cost, highly energy efficient coprocessor for linear operations. Measurements of the analog accuracy of the processor performing 128 Mops are presented along with an assessment of the implications for future systems. A range of noise sources, including cross-talk, source amplitude fluctuations, shot noise at the detector, and non-linearities of the optoelectronic components are measured and compared to determine the most significant source of error. The possibilities for reducing these sources of error are discussed. Also, the total error is compared with that expected from a statistical analysis of the individual components and their relation to the vector-matrix operation. The sufficiency of the measured accuracy of the processor is compared with that required for a range of typical problems. Calculations resolving alloy concentrations from spectral plume data of rocket engines are implemented on the optical processor, demonstrating its sufficiency for this problem. We also show how this technology can be easily extended to a 100 x 100 10 MHz (200 Cops) processor.
A high-accuracy optical linear algebra processor for finite element applications

NASA Technical Reports Server (NTRS)

Casasent, D.; Taylor, B. K.

1984-01-01

Optical linear processors are computationally efficient computers for solving matrix-matrix and matrix-vector oriented problems. Optical system errors limit their dynamic range to 30-40 dB, which limits their accuray to 9-12 bits. Large problems, such as the finite element problem in structural mechanics (with tens or hundreds of thousands of variables) which can exploit the speed of optical processors, require the 32 bit accuracy obtainable from digital machines. To obtain this required 32 bit accuracy with an optical processor, the data can be digitally encoded, thereby reducing the dynamic range requirements of the optical system (i.e., decreasing the effect of optical errors on the data) while providing increased accuracy. This report describes a new digitally encoded optical linear algebra processor architecture for solving finite element and banded matrix-vector problems. A linear static plate bending case study is described which quantities the processor requirements. Multiplication by digital convolution is explained, and the digitally encoded optical processor architecture is advanced.
Rational calculation accuracy in acousto-optical matrix-vector processor

NASA Astrophysics Data System (ADS)

Oparin, V. V.; Tigin, Dmitry V.

1994-01-01

The high speed of parallel computations for a comparatively small-size processor and acceptable power consumption makes the usage of acousto-optic matrix-vector multiplier (AOMVM) attractive for processing of large amounts of information in real time. The limited accuracy of computations is an essential disadvantage of such a processor. The reduced accuracy requirements allow for considerable simplification of the AOMVM architecture and the reduction of the demands on its components.
Optical systolic array processor using residue arithmetic

NASA Technical Reports Server (NTRS)

Jackson, J.; Casasent, D.

1983-01-01

The use of residue arithmetic to increase the accuracy and reduce the dynamic range requirements of optical matrix-vector processors is evaluated. It is determined that matrix-vector operations and iterative algorithms can be performed totally in residue notation. A new parallel residue quantizer circuit is developed which significantly improves the performance of the systolic array feedback processor. Results are presented of a computer simulation of this system used to solve a set of three simultaneous equations.
Matrix-vector multiplication using digital partitioning for more accurate optical computing

NASA Technical Reports Server (NTRS)

Gary, C. K.

1992-01-01

Digital partitioning offers a flexible means of increasing the accuracy of an optical matrix-vector processor. This algorithm can be implemented with the same architecture required for a purely analog processor, which gives optical matrix-vector processors the ability to perform high-accuracy calculations at speeds comparable with or greater than electronic computers as well as the ability to perform analog operations at a much greater speed. Digital partitioning is compared with digital multiplication by analog convolution, residue number systems, and redundant number representation in terms of the size and the speed required for an equivalent throughput as well as in terms of the hardware requirements. Digital partitioning and digital multiplication by analog convolution are found to be the most efficient alogrithms if coding time and hardware are considered, and the architecture for digital partitioning permits the use of analog computations to provide the greatest throughput for a single processor.
Acoustooptic linear algebra processors - Architectures, algorithms, and applications

NASA Technical Reports Server (NTRS)

Casasent, D.

1984-01-01

Architectures, algorithms, and applications for systolic processors are described with attention to the realization of parallel algorithms on various optical systolic array processors. Systolic processors for matrices with special structure and matrices of general structure, and the realization of matrix-vector, matrix-matrix, and triple-matrix products and such architectures are described. Parallel algorithms for direct and indirect solutions to systems of linear algebraic equations and their implementation on optical systolic processors are detailed with attention to the pipelining and flow of data and operations. Parallel algorithms and their optical realization for LU and QR matrix decomposition are specifically detailed. These represent the fundamental operations necessary in the implementation of least squares, eigenvalue, and SVD solutions. Specific applications (e.g., the solution of partial differential equations, adaptive noise cancellation, and optimal control) are described to typify the use of matrix processors in modern advanced signal processing.
Dual-scale topology optoelectronic processor.

PubMed

Marsden, G C; Krishnamoorthy, A V; Esener, S C; Lee, S H

1991-12-15

The dual-scale topology optoelectronic processor (D-STOP) is a parallel optoelectronic architecture for matrix algebraic processing. The architecture can be used for matrix-vector multiplication and two types of vector outer product. The computations are performed electronically, which allows multiplication and summation concepts in linear algebra to be generalized to various nonlinear or symbolic operations. This generalization permits the application of D-STOP to many computational problems. The architecture uses a minimum number of optical transmitters, which thereby reduces fabrication requirements while maintaining area-efficient electronics. The necessary optical interconnections are space invariant, minimizing space-bandwidth requirements.
SPECIAL ISSUE ON OPTICAL PROCESSING OF INFORMATION: Analysis of the precision parameters of an optoelectronic vector-matrix processor of digital information

NASA Astrophysics Data System (ADS)

Odinokov, S. B.; Petrov, A. V.

1995-10-01

Mathematical models of components of a vector-matrix optoelectronic multiplier are considered. Perturbing factors influencing a real optoelectronic system — noise and errors of radiation sources and detectors, nonlinearity of an analogue—digital converter, nonideal optical systems — are taken into account. Analytic expressions are obtained for relating the precision of such a multiplier to the probability of an error amounting to one bit, to the parameters describing the quality of the multiplier components, and to the quality of the optical system of the processor. Various methods of increasing the dynamic range of a multiplier are considered at the technical systems level.
Iterative color-multiplexed, electro-optical processor.

PubMed

Psaltis, D; Casasent, D; Carlotto, M

1979-11-01

A noncoherent optical vector-matrix multiplier using a linear LED source array and a linear P-I-N photodiode detector array has been combined with a 1-D adder in a feedback loop. The resultant iterative optical processor and its use in solving simultaneous linear equations are described. Operation on complex data is provided by a novel color-multiplexing system.
Optical laboratory solution and error model simulation of a linear time-varying finite element equation

NASA Technical Reports Server (NTRS)

Taylor, B. K.; Casasent, D. P.

1989-01-01

The use of simplified error models to accurately simulate and evaluate the performance of an optical linear-algebra processor is described. The optical architecture used to perform banded matrix-vector products is reviewed, along with a linear dynamic finite-element case study. The laboratory hardware and ac-modulation technique used are presented. The individual processor error-source models and their simulator implementation are detailed. Several significant simplifications are introduced to ease the computational requirements and complexity of the simulations. The error models are verified with a laboratory implementation of the processor, and are used to evaluate its potential performance.
Implementation of a digital optical matrix-vector multiplier using a holographic look-up table and residue arithmetic

NASA Technical Reports Server (NTRS)

Habiby, Sarry F.

1987-01-01

The design and implementation of a digital (numerical) optical matrix-vector multiplier are presented. The objective is to demonstrate the operation of an optical processor designed to minimize computation time in performing a practical computing application. This is done by using the large array of processing elements in a Hughes liquid crystal light valve, and relying on the residue arithmetic representation, a holographic optical memory, and position coded optical look-up tables. In the design, all operations are performed in effectively one light valve response time regardless of matrix size. The features of the design allowing fast computation include the residue arithmetic representation, the mapping approach to computation, and the holographic memory. In addition, other features of the work include a practical light valve configuration for efficient polarization control, a model for recording multiple exposures in silver halides with equal reconstruction efficiency, and using light from an optical fiber for a reference beam source in constructing the hologram. The design can be extended to implement larger matrix arrays without increasing computation time.
Modulated error diffusion CGHs for neural nets

NASA Astrophysics Data System (ADS)

Vermeulen, Pieter J. E.; Casasent, David P.

1990-05-01

New modulated error diffusion CGHs (computer generated holograms) for optical computing are considered. Specific attention is given to their use in optical matrix-vector, associative processor, neural net and optical interconnection architectures. We consider lensless CGH systems (many CGHs use an external Fourier transform (FT) lens), the Fresnel sampling requirements, the effects of finite CGH apertures (sample and hold inputs), dot size correction (for laser recorders), and new applications for this novel encoding method (that devotes attention to quantization noise effects).
An efficient optical architecture for sparsely connected neural networks

NASA Technical Reports Server (NTRS)

Hine, Butler P., III; Downie, John D.; Reid, Max B.

1990-01-01

An architecture for general-purpose optical neural network processor is presented in which the interconnections and weights are formed by directing coherent beams holographically, thereby making use of the space-bandwidth products of the recording medium for sparsely interconnected networks more efficiently that the commonly used vector-matrix multiplier, since all of the hologram area is in use. An investigation is made of the use of computer-generated holograms recorded on such updatable media as thermoplastic materials, in order to define the interconnections and weights of a neural network processor; attention is given to limits on interconnection densities, diffraction efficiencies, and weighing accuracies possible with such an updatable thin film holographic device.
Matrix preconditioning: a robust operation for optical linear algebra processors.

PubMed

Ghosh, A; Paparao, P

1987-07-15

Analog electrooptical processors are best suited for applications demanding high computational throughput with tolerance for inaccuracies. Matrix preconditioning is one such application. Matrix preconditioning is a preprocessing step for reducing the condition number of a matrix and is used extensively with gradient algorithms for increasing the rate of convergence and improving the accuracy of the solution. In this paper, we describe a simple parallel algorithm for matrix preconditioning, which can be implemented efficiently on a pipelined optical linear algebra processor. From the results of our numerical experiments we show that the efficacy of the preconditioning algorithm is affected very little by the errors of the optical system.
A Parallel Framework with Block Matrices of a Discrete Fourier Transform for Vector-Valued Discrete-Time Signals.

PubMed

Soto-Quiros, Pablo

2015-01-01

This paper presents a parallel implementation of a kind of discrete Fourier transform (DFT): the vector-valued DFT. The vector-valued DFT is a novel tool to analyze the spectra of vector-valued discrete-time signals. This parallel implementation is developed in terms of a mathematical framework with a set of block matrix operations. These block matrix operations contribute to analysis, design, and implementation of parallel algorithms in multicore processors. In this work, an implementation and experimental investigation of the mathematical framework are performed using MATLAB with the Parallel Computing Toolbox. We found that there is advantage to use multicore processors and a parallel computing environment to minimize the high execution time. Additionally, speedup increases when the number of logical processors and length of the signal increase.
A sparse matrix algorithm on the Boolean vector machine

NASA Technical Reports Server (NTRS)

Wagner, Robert A.; Patrick, Merrell L.

1988-01-01

VLSI technology is being used to implement a prototype Boolean Vector Machine (BVM), which is a large network of very small processors with equally small memories that operate in SIMD mode; these use bit-serial arithmetic, and communicate via cube-connected cycles network. The BVM's bit-serial arithmetic and the small memories of individual processors are noted to compromise the system's effectiveness in large numerical problem applications. Attention is presently given to the implementation of a basic matrix-vector iteration algorithm for space matrices of the BVM, in order to generate over 1 billion useful floating-point operations/sec for this iteration algorithm. The algorithm is expressed in a novel language designated 'BVM'.
Algorithms for solving large sparse systems of simultaneous linear equations on vector processors

NASA Technical Reports Server (NTRS)

David, R. E.

1984-01-01

Very efficient algorithms for solving large sparse systems of simultaneous linear equations have been developed for serial processing computers. These involve a reordering of matrix rows and columns in order to obtain a near triangular pattern of nonzero elements. Then an LU factorization is developed to represent the matrix inverse in terms of a sequence of elementary Gaussian eliminations, or pivots. In this paper it is shown how these algorithms are adapted for efficient implementation on vector processors. Results obtained on the CYBER 200 Model 205 are presented for a series of large test problems which show the comparative advantages of the triangularization and vector processing algorithms.
Optical computing using optical flip-flops in Fourier processors: use in matrix multiplication and discrete linear transforms.

PubMed

Ando, S; Sekine, S; Mita, M; Katsuo, S

1989-12-15

An architecture and the algorithms for matrix multiplication using optical flip-flops (OFFs) in optical processors are proposed based on residue arithmetic. The proposed system is capable of processing all elements of matrices in parallel utilizing the information retrieving ability of optical Fourier processors. The employment of OFFs enables bidirectional data flow leading to a simpler architecture and the burden of residue-to-decimal (or residue-to-binary) conversion to operation time can be largely reduced by processing all elements in parallel. The calculated characteristics of operation time suggest a promising use of the system in a real time 2-D linear transform.
Acousto-Optical Vector Matrix Product Processor: Implementation Issues

DTIC Science & Technology

1989-04-25

power by a factor of 3.8. The acoustic velocity in longitudinal TeO2 is 4200 m/s, almost the same as the 4100 m/s acoustic velocity in dense flint glass ...field via an Interaction Model AOD150 dense flint glass Bragg Cell. The cell’s specifications are listed in the table below. BRAGG CELL SPECIFICATIONS...39 ns intervals). Since the speed of sound in dense flint glass is 4100 m/s, the acoustic field generated in a 10 As interval is distributed over a 4.1

Optoelectronic switch matrix as a look-up table for residue arithmetic.

PubMed

Macdonald, R I

1987-10-01

The use of optoelectronic matrix switches to perform look-up table functions in residue arithmetic processors is proposed. In this application, switchable detector arrays give the advantage of a greatly reduced requirement for optical sources by comparison with previous optoelectronic residue processors.
Optical systolic solutions of linear algebraic equations

NASA Technical Reports Server (NTRS)

Neuman, C. P.; Casasent, D.

1984-01-01

The philosophy and data encoding possible in systolic array optical processor (SAOP) were reviewed. The multitude of linear algebraic operations achievable on this architecture is examined. These operations include such linear algebraic algorithms as: matrix-decomposition, direct and indirect solutions, implicit and explicit methods for partial differential equations, eigenvalue and eigenvector calculations, and singular value decomposition. This architecture can be utilized to realize general techniques for solving matrix linear and nonlinear algebraic equations, least mean square error solutions, FIR filters, and nested-loop algorithms for control engineering applications. The data flow and pipelining of operations, design of parallel algorithms and flexible architectures, application of these architectures to computationally intensive physical problems, error source modeling of optical processors, and matching of the computational needs of practical engineering problems to the capabilities of optical processors are emphasized.
Multitasking 3-D forward modeling using high-order finite difference methods on the Cray X-MP/416

DOE Office of Scientific and Technical Information (OSTI.GOV)

Terki-Hassaine, O.; Leiss, E.L.

1988-01-01

The CRAY X-MP/416 was used to multitask 3-D forward modeling by the high-order finite difference method. Flowtrace analysis reveals that the most expensive operation in the unitasked program is a matrix vector multiplication. The in-core and out-of-core versions of a reentrant subroutine can perform any fraction of the matrix vector multiplication independently, a pattern compatible with multitasking. The matrix vector multiplication routine can be distributed over two to four processors. The rest of the program utilizes the microtasking feature that lets the system treat independent iterations of DO-loops as subtasks to be performed by any available processor. The availability ofmore » the Solid-State Storage Device (SSD) meant the I/O wait time was virtually zero. A performance study determined a theoretical speedup, taking into account the multitasking overhead. Multitasking programs utilizing both macrotasking and microtasking features obtained actual speedups that were approximately 80% of the ideal speedup.« less
Evaluation of the Xeon phi processor as a technology for the acceleration of real-time control in high-order adaptive optics systems

NASA Astrophysics Data System (ADS)

Barr, David; Basden, Alastair; Dipper, Nigel; Schwartz, Noah; Vick, Andy; Schnetler, Hermine

2014-08-01

We present wavefront reconstruction acceleration of high-order AO systems using an Intel Xeon Phi processor. The Xeon Phi is a coprocessor providing many integrated cores and designed for accelerating compute intensive, numerical codes. Unlike other accelerator technologies, it allows virtually unchanged C/C++ to be recompiled to run on the Xeon Phi, giving the potential of making development, upgrade and maintenance faster and less complex. We benchmark the Xeon Phi in the context of AO real-time control by running a matrix vector multiply (MVM) algorithm. We investigate variability in execution time and demonstrate a substantial speed-up in loop frequency. We examine the integration of a Xeon Phi into an existing RTC system and show that performance improvements can be achieved with limited development effort.
MULTI-CORE AND OPTICAL PROCESSOR RELATED APPLICATIONS RESEARCH AT OAK RIDGE NATIONAL LABORATORY

DOE Office of Scientific and Technical Information (OSTI.GOV)

Barhen, Jacob; Kerekes, Ryan A; ST Charles, Jesse Lee

2008-01-01

High-speed parallelization of common tasks holds great promise as a low-risk approach to achieving the significant increases in signal processing and computational performance required for next generation innovations in reconfigurable radio systems. Researchers at the Oak Ridge National Laboratory have been working on exploiting the parallelization offered by this emerging technology and applying it to a variety of problems. This paper will highlight recent experience with four different parallel processors applied to signal processing tasks that are directly relevant to signal processing required for SDR/CR waveforms. The first is the EnLight Optical Core Processor applied to matched filter (MF) correlationmore » processing via fast Fourier transform (FFT) of broadband Dopplersensitive waveforms (DSW) using active sonar arrays for target tracking. The second is the IBM CELL Broadband Engine applied to 2-D discrete Fourier transform (DFT) kernel for image processing and frequency domain processing. And the third is the NVIDIA graphical processor applied to document feature clustering. EnLight Optical Core Processor. Optical processing is inherently capable of high-parallelism that can be translated to very high performance, low power dissipation computing. The EnLight 256 is a small form factor signal processing chip (5x5 cm2) with a digital optical core that is being developed by an Israeli startup company. As part of its evaluation of foreign technology, ORNL's Center for Engineering Science Advanced Research (CESAR) had access to a precursor EnLight 64 Alpha hardware for a preliminary assessment of capabilities in terms of large Fourier transforms for matched filter banks and on applications related to Doppler-sensitive waveforms. This processor is optimized for array operations, which it performs in fixed-point arithmetic at the rate of 16 TeraOPS at 8-bit precision. This is approximately 1000 times faster than the fastest DSP available today. The optical core performs the matrix-vector multiplications, where the nominal matrix size is 256x256. The system clock is 125MHz. At each clock cycle, 128K multiply-and-add operations per second (OPS) are carried out, which yields a peak performance of 16 TeraOPS. IBM Cell Broadband Engine. The Cell processor is the extraordinary resulting product of 5 years of sustained, intensive R&D collaboration (involving over $400M investment) between IBM, Sony, and Toshiba. Its architecture comprises one multithreaded 64-bit PowerPC processor element (PPE) with VMX capabilities and two levels of globally coherent cache, and 8 synergistic processor elements (SPEs). Each SPE consists of a processor (SPU) designed for streaming workloads, local memory, and a globally coherent direct memory access (DMA) engine. Computations are performed in 128-bit wide single instruction multiple data streams (SIMD). An integrated high-bandwidth element interconnect bus (EIB) connects the nine processors and their ports to external memory and to system I/O. The Applied Software Engineering Research (ASER) Group at the ORNL is applying the Cell to a variety of text and image analysis applications. Research on Cell-equipped PlayStation3 (PS3) consoles has led to the development of a correlation-based image recognition engine that enables a single PS3 to process images at more than 10X the speed of state-of-the-art single-core processors. NVIDIA Graphics Processing Units. The ASER group is also employing the latest NVIDIA graphical processing units (GPUs) to accelerate clustering of thousands of text documents using recently developed clustering algorithms such as document flocking and affinity propagation.« less
Parallel-vector unsymmetric Eigen-Solver on high performance computers

NASA Technical Reports Server (NTRS)

Nguyen, Duc T.; Jiangning, Qin

1993-01-01

The popular QR algorithm for solving all eigenvalues of an unsymmetric matrix is reviewed. Among the basic components in the QR algorithm, it was concluded from this study, that the reduction of an unsymmetric matrix to a Hessenberg form (before applying the QR algorithm itself) can be done effectively by exploiting the vector speed and multiple processors offered by modern high-performance computers. Numerical examples of several test cases have indicated that the proposed parallel-vector algorithm for converting a given unsymmetric matrix to a Hessenberg form offers computational advantages over the existing algorithm. The time saving obtained by the proposed methods is increased as the problem size increased.
Implementation theory of distortion-invariant pattern recognition for optical and digital signal processing systems

NASA Astrophysics Data System (ADS)

Lhamon, Michael Earl

A pattern recognition system which uses complex correlation filter banks requires proportionally more computational effort than single-real valued filters. This introduces increased computation burden but also introduces a higher level of parallelism, that common computing platforms fail to identify. As a result, we consider algorithm mapping to both optical and digital processors. For digital implementation, we develop computationally efficient pattern recognition algorithms, referred to as, vector inner product operators that require less computational effort than traditional fast Fourier methods. These algorithms do not need correlation and they map readily onto parallel digital architectures, which imply new architectures for optical processors. These filters exploit circulant-symmetric matrix structures of the training set data representing a variety of distortions. By using the same mathematical basis as with the vector inner product operations, we are able to extend the capabilities of more traditional correlation filtering to what we refer to as "Super Images". These "Super Images" are used to morphologically transform a complicated input scene into a predetermined dot pattern. The orientation of the dot pattern is related to the rotational distortion of the object of interest. The optical implementation of "Super Images" yields feature reduction necessary for using other techniques, such as artificial neural networks. We propose a parallel digital signal processor architecture based on specific pattern recognition algorithms but general enough to be applicable to other similar problems. Such an architecture is classified as a data flow architecture. Instead of mapping an algorithm to an architecture, we propose mapping the DSP architecture to a class of pattern recognition algorithms. Today's optical processing systems have difficulties implementing full complex filter structures. Typically, optical systems (like the 4f correlators) are limited to phase-only implementation with lower detection performance than full complex electronic systems. Our study includes pseudo-random pixel encoding techniques for approximating full complex filtering. Optical filter bank implementation is possible and they have the advantage of time averaging the entire filter bank at real time rates. Time-averaged optical filtering is computational comparable to billions of digital operations-per-second. For this reason, we believe future trends in high speed pattern recognition will involve hybrid architectures of both optical and DSP elements.
Linear Spectral Analysis of Plume Emissions Using an Optical Matrix Processor

NASA Technical Reports Server (NTRS)

Gary, C. K.

1992-01-01

Plume spectrometry provides a means to monitor the health of a burning rocket engine, and optical matrix processors provide a means to analyze the plume spectra in real time. By observing the spectrum of the exhaust plume of a rocket engine, researchers have detected anomalous behavior of the engine and have even determined the failure of some equipment before it would normally have been noticed. The spectrum of the plume is analyzed by isolating information in the spectrum about the various materials present to estimate what materials are being burned in the engine. Scientists at the Marshall Space Flight Center (MSFC) have implemented a high resolution spectrometer to discriminate the spectral peaks of the many species present in the plume. Researchers at the Stennis Space Center Demonstration Testbed Facility (DTF) have implemented a high resolution spectrometer observing a 1200-lb. thrust engine. At this facility, known concentrations of contaminants can be introduced into the burn, allowing for the confirmation of diagnostic algorithms. While the high resolution of the measured spectra has allowed greatly increased insight into the functioning of the engine, the large data flows generated limit the ability to perform real-time processing. The use of an optical matrix processor and the linear analysis technique described below may allow for the detailed real-time analysis of the engine's health. A small optical matrix processor can perform the required mathematical analysis both quicker and with less energy than a large electronic computer dedicated to the same spectral analysis routine.
Design and experimental verification for optical module of optical vector-matrix multiplier.

PubMed

Zhu, Weiwei; Zhang, Lei; Lu, Yangyang; Zhou, Ping; Yang, Lin

2013-06-20

Optical computing is a new method to implement signal processing functions. The multiplication between a vector and a matrix is an important arithmetic algorithm in the signal processing domain. The optical vector-matrix multiplier (OVMM) is an optoelectronic system to carry out this operation, which consists of an electronic module and an optical module. In this paper, we propose an optical module for OVMM. To eliminate the cross talk and make full use of the optical elements, an elaborately designed structure that involves spherical lenses and cylindrical lenses is utilized in this optical system. The optical design software package ZEMAX is used to optimize the parameters and simulate the whole system. Finally, experimental data is obtained through experiments to evaluate the overall performance of the system. The results of both simulation and experiment indicate that the system constructed can implement the multiplication between a matrix with dimensions of 16 by 16 and a vector with a dimension of 16 successfully.
Optical implementation of systolic array processing

NASA Technical Reports Server (NTRS)

Caulfield, H. J.; Rhodes, W. T.; Foster, M. J.; Horvitz, S.

1981-01-01

Algorithms for matrix vector multiplication are implemented using acousto-optic cells for multiplication and input data transfer and using charge coupled devices detector arrays for accumulation and output of the results. No two dimensional matrix mask is required; matrix changes are implemented electronically. A system for multiplying a 50 component nonnegative real vector by a 50 by 50 nonnegative real matrix is described. Modifications for bipolar real and complex valued processing are possible, as are extensions to matrix-matrix multiplication and multiplication of a vector by multiple matrices.
An efficient parallel algorithm for matrix-vector multiplication

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hendrickson, B.; Leland, R.; Plimpton, S.

The multiplication of a vector by a matrix is the kernel computation of many algorithms in scientific computation. A fast parallel algorithm for this calculation is therefore necessary if one is to make full use of the new generation of parallel supercomputers. This paper presents a high performance, parallel matrix-vector multiplication algorithm that is particularly well suited to hypercube multiprocessors. For an n x n matrix on p processors, the communication cost of this algorithm is O(n/[radical]p + log(p)), independent of the matrix sparsity pattern. The performance of the algorithm is demonstrated by employing it as the kernel in themore » well-known NAS conjugate gradient benchmark, where a run time of 6.09 seconds was observed. This is the best published performance on this benchmark achieved to date using a massively parallel supercomputer.« less
Using a multifrontal sparse solver in a high performance, finite element code

NASA Technical Reports Server (NTRS)

King, Scott D.; Lucas, Robert; Raefsky, Arthur

1990-01-01

We consider the performance of the finite element method on a vector supercomputer. The computationally intensive parts of the finite element method are typically the individual element forms and the solution of the global stiffness matrix both of which are vectorized in high performance codes. To further increase throughput, new algorithms are needed. We compare a multifrontal sparse solver to a traditional skyline solver in a finite element code on a vector supercomputer. The multifrontal solver uses the Multiple-Minimum Degree reordering heuristic to reduce the number of operations required to factor a sparse matrix and full matrix computational kernels (e.g., BLAS3) to enhance vector performance. The net result in an order-of-magnitude reduction in run time for a finite element application on one processor of a Cray X-MP.
Frequency-multiplexed and pipelined iterative optical systolic array processors

NASA Technical Reports Server (NTRS)

Casasent, D.; Jackson, J.; Neuman, C.

1983-01-01

Optical matrix processors using acoustooptic transducers are described, with emphasis on new systolic array architectures using frequency multiplexing in addition to space and time multiplexing. A Kalman filtering application is considered in a case study from which the operations required on such a system can be defined. This also serves as a new and powerful application for iterative optical processors. The importance of pipelining the data flow and the ordering of the operations performed in a specific application of such a system are also noted. Several examples of how to effectively achieve this are included. A new technique for handling bipolar data on such architectures is also described.
Implementation of kernels on the Maestro processor

NASA Astrophysics Data System (ADS)

Suh, Jinwoo; Kang, D. I. D.; Crago, S. P.

Currently, most microprocessors use multiple cores to increase performance while limiting power usage. Some processors use not just a few cores, but tens of cores or even 100 cores. One such many-core microprocessor is the Maestro processor, which is based on Tilera's TILE64 processor. The Maestro chip is a 49-core, general-purpose, radiation-hardened processor designed for space applications. The Maestro processor, unlike the TILE64, has a floating point unit (FPU) in each core for improved floating point performance. The Maestro processor runs at 342 MHz clock frequency. On the Maestro processor, we implemented several widely used kernels: matrix multiplication, vector add, FIR filter, and FFT. We measured and analyzed the performance of these kernels. The achieved performance was up to 5.7 GFLOPS, and the speedup compared to single tile was up to 49 using 49 tiles.
Integrated optic vector-matrix multiplier

DOEpatents

Watts, Michael R [Albuquerque, NM

2011-09-27

A vector-matrix multiplier is disclosed which uses N different wavelengths of light that are modulated with amplitudes representing elements of an N.times.1 vector and combined to form an input wavelength-division multiplexed (WDM) light stream. The input WDM light stream is split into N streamlets from which each wavelength of the light is individually coupled out and modulated for a second time using an input signal representing elements of an M.times.N matrix, and is then coupled into an output waveguide for each streamlet to form an output WDM light stream which is detected to generate a product of the vector and matrix. The vector-matrix multiplier can be formed as an integrated optical circuit using either waveguide amplitude modulators or ring resonator amplitude modulators.
Architecture studies and system demonstrations for optical parallel processor for AI and NI

NASA Astrophysics Data System (ADS)

Lee, Sing H.

1988-03-01

In solving deterministic AI problems the data search for matching the arguments of a PROLOG expression causes serious bottleneck when implemented sequentially by electronic systems. To overcome this bottleneck we have developed the concepts for an optical expert system based on matrix-algebraic formulation, which will be suitable for parallel optical implementation. The optical AI system based on matrix-algebraic formation will offer distinct advantages for parallel search, adult learning, etc.
Optimization of the Brillouin operator on the KNL architecture

NASA Astrophysics Data System (ADS)

Dürr, Stephan

2018-03-01

Experiences with optimizing the matrix-times-vector application of the Brillouin operator on the Intel KNL processor are reported. Without adjustments to the memory layout, performance figures of 360 Gflop/s in single and 270 Gflop/s in double precision are observed. This is with Nc = 3 colors, Nv = 12 right-hand-sides, Nthr = 256 threads, on lattices of size 323 × 64, using exclusively OMP pragmas. Interestingly, the same routine performs quite well on Intel Core i7 architectures, too. Some observations on the much harderWilson fermion matrix-times-vector optimization problem are added.
GaAs Supercomputing: Architecture, Language, And Algorithms For Image Processing

NASA Astrophysics Data System (ADS)

Johl, John T.; Baker, Nick C.

1988-10-01

The application of high-speed GaAs processors in a parallel system matches the demanding computational requirements of image processing. The architecture of the McDonnell Douglas Astronautics Company (MDAC) vector processor is described along with the algorithms and language translator. Most image and signal processing algorithms can utilize parallel processing and show a significant performance improvement over sequential versions. The parallelization performed by this system is within each vector instruction. Since each vector has many elements, each requiring some computation, useful concurrent arithmetic operations can easily be performed. Balancing the memory bandwidth with the computation rate of the processors is an important design consideration for high efficiency and utilization. The architecture features a bus-based execution unit consisting of four to eight 32-bit GaAs RISC microprocessors running at a 200 MHz clock rate for a peak performance of 1.6 BOPS. The execution unit is connected to a vector memory with three buses capable of transferring two input words and one output word every 10 nsec. The address generators inside the vector memory perform different vector addressing modes and feed the data to the execution unit. The functions discussed in this paper include basic MATRIX OPERATIONS, 2-D SPATIAL CONVOLUTION, HISTOGRAM, and FFT. For each of these algorithms, assembly language programs were run on a behavioral model of the system to obtain performance figures.
Implementation of a fast digital optical matrix-vector multiplier using a holographic look-up table and residue arithmetic

NASA Technical Reports Server (NTRS)

Habiby, Sarry F.; Collins, Stuart A., Jr.

1987-01-01

The design and implementation of a digital (numerical) optical matrix-vector multiplier are presented. A Hughes liquid crystal light valve, the residue arithmetic representation, and a holographic optical memory are used to construct position coded optical look-up tables. All operations are performed in effectively one light valve response time with a potential for a high information density.
Implementation of a fast digital optical matrix-vector multiplier using a holographic look-up table and residue arithmetic.

PubMed

Habiby, S F; Collins, S A

1987-11-01

The design and implementation of a digital (numerical) optical matrix-vector multiplier are presented. A Hughes liquid crystal light valve, the residue arithmetic representation, and a holographic optical memory are used to construct position coded optical look-up tables. All operations are performed in effectively one light valve response time with a potential for a high information density.

Detection Performance of Horizontal Linear Hydrophone Arrays in Shallow Water.

DTIC Science & Technology

1980-12-15

random phase G gain G angle interval covariance matrix h processor vector H matrix matched filter; generalized beamformer I unity matrix 4 SACLANTCEN SR...omnidirectional sensor is h*Ph P G = - h [Eq. 47] G = h* Q h P s The following two sections evaluate a few examples of application of the OLP. Following the...At broadside the signal covariance matrix reduces to a dyadic: P 󈧬 s s*;therefore, the gain (e.g. Eq. 37) becomes tr(H* P H) Pn * -1 Q -1 Pn G ~OQp
Noise Analysis of Spatial Phase coding in analog Acoustooptic Processors

NASA Technical Reports Server (NTRS)

Gary, Charles K.; Lum, Henry, Jr. (Technical Monitor)

1994-01-01

Optical beams can carry information in their amplitude and phase; however, optical analog numerical calculators such as an optical matrix processor use incoherent light to achieve linear operation. Thus, the phase information is lost and only the magnitude can be used. This limits such processors to the representation of positive real numbers. Many systems have been devised to overcome this deficit through the use of digital number representations, but they all operate at a greatly reduced efficiency in contrast to analog systems. The most widely accepted method to achieve sign coding in analog optical systems has been the use of an offset for the zero level. Unfortunately, this results in increased noise sensitivity for small numbers. In this paper, we examine the use of spatially coherent sign coding in acoustooptical processors, a method first developed for digital calculations by D. V. Tigin. This coding technique uses spatial coherence for the representation of signed numbers, while temporal incoherence allows for linear analog processing of the optical information. We show how spatial phase coding reduces noise sensitivity for signed analog calculations.
Limit characteristics of digital optoelectronic processor

NASA Astrophysics Data System (ADS)

Kolobrodov, V. G.; Tymchik, G. S.; Kolobrodov, M. S.

2018-01-01

In this article, the limiting characteristics of a digital optoelectronic processor are explored. The limits are defined by diffraction effects and a matrix structure of the devices for input and output of optical signals. The purpose of a present research is to optimize the parameters of the processor's components. The developed physical and mathematical model of DOEP allowed to establish the limit characteristics of the processor, restricted by diffraction effects and an array structure of the equipment for input and output of optical signals, as well as to optimize the parameters of the processor's components. The diameter of the entrance pupil of the Fourier lens is determined by the size of SLM and the pixel size of the modulator. To determine the spectral resolution, it is offered to use a concept of an optimum phase when the resolved diffraction maxima coincide with the pixel centers of the radiation detector.
Solving large sparse eigenvalue problems on supercomputers

NASA Technical Reports Server (NTRS)

Philippe, Bernard; Saad, Youcef

1988-01-01

An important problem in scientific computing consists in finding a few eigenvalues and corresponding eigenvectors of a very large and sparse matrix. The most popular methods to solve these problems are based on projection techniques on appropriate subspaces. The main attraction of these methods is that they only require the use of the matrix in the form of matrix by vector multiplications. The implementations on supercomputers of two such methods for symmetric matrices, namely Lanczos' method and Davidson's method are compared. Since one of the most important operations in these two methods is the multiplication of vectors by the sparse matrix, methods of performing this operation efficiently are discussed. The advantages and the disadvantages of each method are compared and implementation aspects are discussed. Numerical experiments on a one processor CRAY 2 and CRAY X-MP are reported. Possible parallel implementations are also discussed.
Optical computing and image processing using photorefractive gallium arsenide

NASA Technical Reports Server (NTRS)

Cheng, Li-Jen; Liu, Duncan T. H.

1990-01-01

Recent experimental results on matrix-vector multiplication and multiple four-wave mixing using GaAs are presented. Attention is given to a simple concept of using two overlapping holograms in GaAs to do two matrix-vector multiplication processes operating in parallel with a common input vector. This concept can be used to construct high-speed, high-capacity, reconfigurable interconnection and multiplexing modules, important for optical computing and neural-network applications.
High-performance ultra-low power VLSI analog processor for data compression

NASA Technical Reports Server (NTRS)

Tawel, Raoul (Inventor)

1996-01-01

An apparatus for data compression employing a parallel analog processor. The apparatus includes an array of processor cells with N columns and M rows wherein the processor cells have an input device, memory device, and processor device. The input device is used for inputting a series of input vectors. Each input vector is simultaneously input into each column of the array of processor cells in a pre-determined sequential order. An input vector is made up of M components, ones of which are input into ones of M processor cells making up a column of the array. The memory device is used for providing ones of M components of a codebook vector to ones of the processor cells making up a column of the array. A different codebook vector is provided to each of the N columns of the array. The processor device is used for simultaneously comparing the components of each input vector to corresponding components of each codebook vector, and for outputting a signal representative of the closeness between the compared vector components. A combination device is used to combine the signal output from each processor cell in each column of the array and to output a combined signal. A closeness determination device is then used for determining which codebook vector is closest to an input vector from the combined signals, and for outputting a codebook vector index indicating which of the N codebook vectors was the closest to each input vector input into the array.
The density matrix renormalization group algorithm on kilo-processor architectures: Implementation and trade-offs

NASA Astrophysics Data System (ADS)

Nemes, Csaba; Barcza, Gergely; Nagy, Zoltán; Legeza, Örs; Szolgay, Péter

2014-06-01

In the numerical analysis of strongly correlated quantum lattice models one of the leading algorithms developed to balance the size of the effective Hilbert space and the accuracy of the simulation is the density matrix renormalization group (DMRG) algorithm, in which the run-time is dominated by the iterative diagonalization of the Hamilton operator. As the most time-dominant step of the diagonalization can be expressed as a list of dense matrix operations, the DMRG is an appealing candidate to fully utilize the computing power residing in novel kilo-processor architectures. In the paper a smart hybrid CPU-GPU implementation is presented, which exploits the power of both CPU and GPU and tolerates problems exceeding the GPU memory size. Furthermore, a new CUDA kernel has been designed for asymmetric matrix-vector multiplication to accelerate the rest of the diagonalization. Besides the evaluation of the GPU implementation, the practical limits of an FPGA implementation are also discussed.
A Electro-Optical Image Algebra Processing System for Automatic Target Recognition

NASA Astrophysics Data System (ADS)

Coffield, Patrick Cyrus

The proposed electro-optical image algebra processing system is designed specifically for image processing and other related computations. The design is a hybridization of an optical correlator and a massively paralleled, single instruction multiple data processor. The architecture of the design consists of three tightly coupled components: a spatial configuration processor (the optical analog portion), a weighting processor (digital), and an accumulation processor (digital). The systolic flow of data and image processing operations are directed by a control buffer and pipelined to each of the three processing components. The image processing operations are defined in terms of basic operations of an image algebra developed by the University of Florida. The algebra is capable of describing all common image-to-image transformations. The merit of this architectural design is how it implements the natural decomposition of algebraic functions into spatially distributed, point use operations. The effect of this particular decomposition allows convolution type operations to be computed strictly as a function of the number of elements in the template (mask, filter, etc.) instead of the number of picture elements in the image. Thus, a substantial increase in throughput is realized. The implementation of the proposed design may be accomplished in many ways. While a hybrid electro-optical implementation is of primary interest, the benefits and design issues of an all digital implementation are also discussed. The potential utility of this architectural design lies in its ability to control a large variety of the arithmetic and logic operations of the image algebra's generalized matrix product. The generalized matrix product is the most powerful fundamental operation in the algebra, thus allowing a wide range of applications. No other known device or design has made this claim of processing speed and general implementation of a heterogeneous image algebra.
Gain in computational efficiency by vectorization in the dynamic simulation of multi-body systems

NASA Technical Reports Server (NTRS)

Amirouche, F. M. L.; Shareef, N. H.

1991-01-01

An improved technique for the identification and extraction of the exact quantities associated with the degrees of freedom at the element as well as the flexible body level is presented. It is implemented in the dynamic equations of motions based on the recursive formulation of Kane et al. (1987) and presented in a matrix form, integrating the concepts of strain energy, the finite-element approach, modal analysis, and reduction of equations. This technique eliminates the CPU intensive matrix multiplication operations in the code's hot spots for the dynamic simulation of the interconnected rigid and flexible bodies. A study of a simple robot with flexible links is presented by comparing the execution times on a scalar machine and a vector-processor with and without vector options. Performance figures demonstrating the substantial gains achieved by the technique are plotted.
Integrated optical circuits for numerical computation

NASA Technical Reports Server (NTRS)

Verber, C. M.; Kenan, R. P.

1983-01-01

The development of integrated optical circuits (IOC) for numerical-computation applications is reviewed, with a focus on the use of systolic architectures. The basic architecture criteria for optical processors are shown to be the same as those proposed by Kung (1982) for VLSI design, and the advantages of IOCs over bulk techniques are indicated. The operation and fabrication of electrooptic grating structures are outlined, and the application of IOCs of this type to an existing 32-bit, 32-Mbit/sec digital correlator, a proposed matrix multiplier, and a proposed pipeline processor for polynomial evaluation is discussed. The problems arising from the inherent nonlinearity of electrooptic gratings are considered. Diagrams and drawings of the application concepts are provided.
Optical modular arithmetic

NASA Astrophysics Data System (ADS)

Pavlichin, Dmitri S.; Mabuchi, Hideo

2014-06-01

Nanoscale integrated photonic devices and circuits offer a path to ultra-low power computation at the few-photon level. Here we propose an optical circuit that performs a ubiquitous operation: the controlled, random-access readout of a collection of stored memory phases or, equivalently, the computation of the inner product of a vector of phases with a binary selector" vector, where the arithmetic is done modulo 2pi and the result is encoded in the phase of a coherent field. This circuit, a collection of cascaded interferometers driven by a coherent input field, demonstrates the use of coherence as a computational resource, and of the use of recently-developed mathematical tools for modeling optical circuits with many coupled parts. The construction extends in a straightforward way to the computation of matrix-vector and matrix-matrix products, and, with the inclusion of an optical feedback loop, to the computation of a weighted" readout of stored memory phases. We note some applications of these circuits for error correction and for computing tasks requiring fast vector inner products, e.g. statistical classification and some machine learning algorithms.
Feedback controlled optics with wavefront compensation

NASA Technical Reports Server (NTRS)

Breckenridge, William G. (Inventor); Redding, David C. (Inventor)

1993-01-01

The sensitivity model of a complex optical system obtained by linear ray tracing is used to compute a control gain matrix by imposing the mathematical condition for minimizing the total wavefront error at the optical system's exit pupil. The most recent deformations or error states of the controlled segments or optical surfaces of the system are then assembled as an error vector, and the error vector is transformed by the control gain matrix to produce the exact control variables which will minimize the total wavefront error at the exit pupil of the optical system. These exact control variables are then applied to the actuators controlling the various optical surfaces in the system causing the immediate reduction in total wavefront error observed at the exit pupil of the optical system.
Two-dimensional acousto-optic processor using circular antenna array with a Butler matrix

NASA Astrophysics Data System (ADS)

Lee, Jim P.

1992-09-01

A two-dimensional acousto-optic signal processor is shown to be useful for providing simultaneous spectrum analysis and direction finding of radar signals over an instantaneous field of view of 360 deg. A system analysis with emphasis on the direction-finding aspect of this new architecture is presented. The peak location of the optical pattern provides a direct measure of bearing, independent of signal frequency. In addition, the sidelobe levels of the pattern can be effectively reduced using amplitude weighting. Performance parameters, such as mainlobe beamwidth, peak-sidelobe level, and pointing error, are analyzed as a function of the Gaussian laser illumination profile and the number of channels. Finally, a comparison with a linear antenna array architecture is also discussed.
Reducing adaptive optics latency using Xeon Phi many-core processors

NASA Astrophysics Data System (ADS)

Barr, David; Basden, Alastair; Dipper, Nigel; Schwartz, Noah

2015-11-01

The next generation of Extremely Large Telescopes (ELTs) for astronomy will rely heavily on the performance of their adaptive optics (AO) systems. Real-time control is at the heart of the critical technologies that will enable telescopes to deliver the best possible science and will require a very significant extrapolation from current AO hardware existing for 4-10 m telescopes. Investigating novel real-time computing architectures and testing their eligibility against anticipated challenges is one of the main priorities of technology development for the ELTs. This paper investigates the suitability of the Intel Xeon Phi, which is a commercial off-the-shelf hardware accelerator. We focus on wavefront reconstruction performance, implementing a straightforward matrix-vector multiplication (MVM) algorithm. We present benchmarking results of the Xeon Phi on a real-time Linux platform, both as a standalone processor and integrated into an existing real-time controller (RTC). Performance of single and multiple Xeon Phis are investigated. We show that this technology has the potential of greatly reducing the mean latency and variations in execution time (jitter) of large AO systems. We present both a detailed performance analysis of the Xeon Phi for a typical E-ELT first-light instrument along with a more general approach that enables us to extend to any AO system size. We show that systematic and detailed performance analysis is an essential part of testing novel real-time control hardware to guarantee optimal science results.
A Tensor Product Formulation of Strassen's Matrix Multiplication Algorithm with Memory Reduction

DOE PAGES

Kumar, B.; Huang, C. -H.; Sadayappan, P.; ...

1995-01-01

In this article, we present a program generation strategy of Strassen's matrix multiplication algorithm using a programming methodology based on tensor product formulas. In this methodology, block recursive programs such as the fast Fourier Transforms and Strassen's matrix multiplication algorithm are expressed as algebraic formulas involving tensor products and other matrix operations. Such formulas can be systematically translated to high-performance parallel/vector codes for various architectures. In this article, we present a nonrecursive implementation of Strassen's algorithm for shared memory vector processors such as the Cray Y-MP. A previous implementation of Strassen's algorithm synthesized from tensor product formulas required working storagemore » of size O(7 n ) for multiplying 2 n × 2 n matrices. We present a modified formulation in which the working storage requirement is reduced to O(4 n ). The modified formulation exhibits sufficient parallelism for efficient implementation on a shared memory multiprocessor. Performance results on a Cray Y-MP8/64 are presented.« less
Simple and practical approach for computing the ray Hessian matrix in geometrical optics.

PubMed

Lin, Psang Dain

2018-02-01

A method is proposed for simplifying the computation of the ray Hessian matrix in geometrical optics by replacing the angular variables in the system variable vector with their equivalent cosine and sine functions. The variable vector of a boundary surface is similarly defined in such a way as to exclude any angular variables. It is shown that the proposed formulations reduce the computation time of the Hessian matrix by around 10 times compared to the previous method reported by the current group in Advanced Geometrical Optics (2016). Notably, the method proposed in this study involves only polynomial differentiation, i.e., trigonometric function calls are not required. As a consequence, the computation complexity is significantly reduced. Five illustrative examples are given. The first three examples show that the proposed method is applicable to the determination of the Hessian matrix for any pose matrix, irrespective of the order in which the rotation and translation motions are specified. The last two examples demonstrate the use of the proposed Hessian matrix in determining the axial and lateral chromatic aberrations of a typical optical system.
Implementation of biological tissue Mueller matrix for polarization-sensitive optical coherence tomography based on LabVIEW

NASA Astrophysics Data System (ADS)

Lin, Yongping; Zhang, Xiyang; He, Youwu; Cai, Jianyong; Li, Hui

2018-02-01

The Jones matrix and the Mueller matrix are main tools to study polarization devices. The Mueller matrix can also be used for biological tissue research to get complete tissue properties, while the commercial optical coherence tomography system does not give relevant analysis function. Based on the LabVIEW, a near real time display method of Mueller matrix image of biological tissue is developed and it gives the corresponding phase retardant image simultaneously. A quarter-wave plate was placed at 45 in the sample arm. Experimental results of the two orthogonal channels show that the phase retardance based on incident light vector fixed mode and the Mueller matrix based on incident light vector dynamic mode can provide an effective analysis method of the existing system.
E-beam generated holographic masks for optical vector-matrix multiplication

NASA Technical Reports Server (NTRS)

Arnold, S. M.; Case, S. K.

1981-01-01

An optical vector matrix multiplication scheme that encodes the matrix elements as a holographic mask consisting of linear diffraction gratings is proposed. The binary, chrome on glass masks are fabricated by e-beam lithography. This approach results in a fairly simple optical system that promises both large numerical range and high accuracy. A partitioned computer generated hologram mask was fabricated and tested. This hologram was diagonally separated outputs, compact facets and symmetry about the axis. The resultant diffraction pattern at the output plane is shown. Since the grating fringes are written at 45 deg relative to the facet boundaries, the many on-axis sidelobes from each output are seen to be diagonally separated from the adjacent output signals.
Molecular Optics Nonlinear Optical Processes in Organic and Polymeric Crystals and Films. Part 1

DTIC Science & Technology

1991-11-01

Cycio-Octateraene ........... .93 Figure3.3; THG Dispersion Curve for Cyclo-Octateraene .... ......... 94 Figure3.4; Bloch Vector in Pauli Matrix Space... Jung , P. and Hanggi, P, Phys. Rev. Lett. 61, 11 (1989) I [90] Guckenheimer, J. and Holmes, P., Nonlinear Oscillations, Dynamical Sys- tems, and...identity matrix and Pauli matrices. p(t) = 1(1 + fr(t)F * 5) (3.5.6) I where the 3-vector FF is the linear coefficients of the Pauli matrices and is
Multithreading in vector processors

DOE Office of Scientific and Technical Information (OSTI.GOV)

Evangelinos, Constantinos; Kim, Changhoan; Nair, Ravi

In one embodiment, a system includes a processor having a vector processing mode and a multithreading mode. The processor is configured to operate on one thread per cycle in the multithreading mode. The processor includes a program counter register having a plurality of program counters, and the program counter register is vectorized. Each program counter in the program counter register represents a distinct corresponding thread of a plurality of threads. The processor is configured to execute the plurality of threads by activating the plurality of program counters in a round robin cycle.

Lanczos eigensolution method for high-performance computers

NASA Technical Reports Server (NTRS)

Bostic, Susan W.

1991-01-01

The theory, computational analysis, and applications are presented of a Lanczos algorithm on high performance computers. The computationally intensive steps of the algorithm are identified as: the matrix factorization, the forward/backward equation solution, and the matrix vector multiples. These computational steps are optimized to exploit the vector and parallel capabilities of high performance computers. The savings in computational time from applying optimization techniques such as: variable band and sparse data storage and access, loop unrolling, use of local memory, and compiler directives are presented. Two large scale structural analysis applications are described: the buckling of a composite blade stiffened panel with a cutout, and the vibration analysis of a high speed civil transport. The sequential computational time for the panel problem executed on a CONVEX computer of 181.6 seconds was decreased to 14.1 seconds with the optimized vector algorithm. The best computational time of 23 seconds for the transport problem with 17,000 degs of freedom was on the the Cray-YMP using an average of 3.63 processors.
Improved parallel data partitioning by nested dissection with applications to information retrieval.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wolf, Michael M.; Chevalier, Cedric; Boman, Erik Gunnar

The computational work in many information retrieval and analysis algorithms is based on sparse linear algebra. Sparse matrix-vector multiplication is a common kernel in many of these computations. Thus, an important related combinatorial problem in parallel computing is how to distribute the matrix and the vectors among processors so as to minimize the communication cost. We focus on minimizing the total communication volume while keeping the computation balanced across processes. In [1], the first two authors presented a new 2D partitioning method, the nested dissection partitioning algorithm. In this paper, we improve on that algorithm and show that it ismore » a good option for data partitioning in information retrieval. We also show partitioning time can be substantially reduced by using the SCOTCH software, and quality improves in some cases, too.« less
Hypercluster - Parallel processing for computational mechanics

NASA Technical Reports Server (NTRS)

Blech, Richard A.

1988-01-01

An account is given of the development status, performance capabilities and implications for further development of NASA-Lewis' testbed 'hypercluster' parallel computer network, in which multiple processors communicate through a shared memory. Processors have local as well as shared memory; the hypercluster is expanded in the same manner as the hypercube, with processor clusters replacing the normal single processor node. The NASA-Lewis machine has three nodes with a vector personality and one node with a scalar personality. Each of the vector nodes uses four board-level vector processors, while the scalar node uses four general-purpose microcomputer boards.
Optical information processing for NASA's space exploration

NASA Technical Reports Server (NTRS)

Chao, Tien-Hsin; Ochoa, Ellen; Juday, Richard

1990-01-01

The development status of optical processing techniques under development at NASA-JPL, NASA-Ames, and NASA-Johnson, is evaluated with a view to their potential applications in future NASA planetary exploration missions. It is projected that such optical processing systems can yield major reductions in mass, volume, and power requirements relative to exclusively electronic systems of comparable processing capabilities. Attention is given to high-order neural networks for distortion-invariant classification and pattern recognition, multispectral imaging using an acoustooptic tunable filter, and an optical matrix processor for control problems.
HO2 rovibrational eigenvalue studies for nonzero angular momentum

NASA Astrophysics Data System (ADS)

Wu, Xudong T.; Hayes, Edward F.

1997-08-01

An efficient parallel algorithm is reported for determining all bound rovibrational energy levels for the HO2 molecule for nonzero angular momentum values, J=1, 2, and 3. Performance tests on the CRAY T3D indicate that the algorithm scales almost linearly when up to 128 processors are used. Sustained performance levels of up to 3.8 Gflops have been achieved using 128 processors for J=3. The algorithm uses a direct product discrete variable representation (DVR) basis and the implicitly restarted Lanczos method (IRLM) of Sorensen to compute the eigenvalues of the polyatomic Hamiltonian. Since the IRLM is an iterative method, it does not require storage of the full Hamiltonian matrix—it only requires the multiplication of the Hamiltonian matrix by a vector. When the IRLM is combined with a formulation such as DVR, which produces a very sparse matrix, both memory and computation times can be reduced dramatically. This algorithm has the potential to achieve even higher performance levels for larger values of the total angular momentum.
Optoelectronic Inner-Product Neural Associative Memory

NASA Technical Reports Server (NTRS)

Liu, Hua-Kuang

1993-01-01

Optoelectronic apparatus acts as artificial neural network performing associative recall of binary images. Recall process is iterative one involving optical computation of inner products between binary input vector and one or more reference binary vectors in memory. Inner-product method requires far less memory space than matrix-vector method.
High-performance computing — an overview

NASA Astrophysics Data System (ADS)

Marksteiner, Peter

1996-08-01

An overview of high-performance computing (HPC) is given. Different types of computer architectures used in HPC are discussed: vector supercomputers, high-performance RISC processors, various parallel computers like symmetric multiprocessors, workstation clusters, massively parallel processors. Software tools and programming techniques used in HPC are reviewed: vectorizing compilers, optimization and vector tuning, optimization for RISC processors; parallel programming techniques like shared-memory parallelism, message passing and data parallelism; and numerical libraries.
A High Performance Block Eigensolver for Nuclear Configuration Interaction Calculations

DOE PAGES

Aktulga, Hasan Metin; Afibuzzaman, Md.; Williams, Samuel; ...

2017-06-01

As on-node parallelism increases and the performance gap between the processor and the memory system widens, achieving high performance in large-scale scientific applications requires an architecture-aware design of algorithms and solvers. We focus on the eigenvalue problem arising in nuclear Configuration Interaction (CI) calculations, where a few extreme eigenpairs of a sparse symmetric matrix are needed. Here, we consider a block iterative eigensolver whose main computational kernels are the multiplication of a sparse matrix with multiple vectors (SpMM), and tall-skinny matrix operations. We then present techniques to significantly improve the SpMM and the transpose operation SpMM T by using themore » compressed sparse blocks (CSB) format. We achieve 3-4× speedup on the requisite operations over good implementations with the commonly used compressed sparse row (CSR) format. We develop a performance model that allows us to correctly estimate the performance of our SpMM kernel implementations, and we identify cache bandwidth as a potential performance bottleneck beyond DRAM. We also analyze and optimize the performance of LOBPCG kernels (inner product and linear combinations on multiple vectors) and show up to 15× speedup over using high performance BLAS libraries for these operations. The resulting high performance LOBPCG solver achieves 1.4× to 1.8× speedup over the existing Lanczos solver on a series of CI computations on high-end multicore architectures (Intel Xeons). We also analyze the performance of our techniques on an Intel Xeon Phi Knights Corner (KNC) processor.« less
A High Performance Block Eigensolver for Nuclear Configuration Interaction Calculations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aktulga, Hasan Metin; Afibuzzaman, Md.; Williams, Samuel

As on-node parallelism increases and the performance gap between the processor and the memory system widens, achieving high performance in large-scale scientific applications requires an architecture-aware design of algorithms and solvers. We focus on the eigenvalue problem arising in nuclear Configuration Interaction (CI) calculations, where a few extreme eigenpairs of a sparse symmetric matrix are needed. Here, we consider a block iterative eigensolver whose main computational kernels are the multiplication of a sparse matrix with multiple vectors (SpMM), and tall-skinny matrix operations. We then present techniques to significantly improve the SpMM and the transpose operation SpMM T by using themore » compressed sparse blocks (CSB) format. We achieve 3-4× speedup on the requisite operations over good implementations with the commonly used compressed sparse row (CSR) format. We develop a performance model that allows us to correctly estimate the performance of our SpMM kernel implementations, and we identify cache bandwidth as a potential performance bottleneck beyond DRAM. We also analyze and optimize the performance of LOBPCG kernels (inner product and linear combinations on multiple vectors) and show up to 15× speedup over using high performance BLAS libraries for these operations. The resulting high performance LOBPCG solver achieves 1.4× to 1.8× speedup over the existing Lanczos solver on a series of CI computations on high-end multicore architectures (Intel Xeons). We also analyze the performance of our techniques on an Intel Xeon Phi Knights Corner (KNC) processor.« less
Shift-, rotation-, and scale-invariant shape recognition system using an optical Hough transform

NASA Astrophysics Data System (ADS)

Schmid, Volker R.; Bader, Gerhard; Lueder, Ernst H.

1998-02-01

We present a hybrid shape recognition system with an optical Hough transform processor. The features of the Hough space offer a separate cancellation of distortions caused by translations and rotations. Scale invariance is also provided by suitable normalization. The proposed system extends the capabilities of Hough transform based detection from only straight lines to areas bounded by edges. A very compact optical design is achieved by a microlens array processor accepting incoherent light as direct optical input and realizing the computationally expensive connections massively parallel. Our newly developed algorithm extracts rotation and translation invariant normalized patterns of bright spots on a 2D grid. A neural network classifier maps the 2D features via a nonlinear hidden layer onto the classification output vector. We propose initialization of the connection weights according to regions of activity specifically assigned to each neuron in the hidden layer using a competitive network. The presented system is designed for industry inspection applications. Presently we have demonstrated detection of six different machined parts in real-time. Our method yields very promising detection results of more than 96% correctly classified parts.
Stokes-vector and Mueller-matrix polarimetry [Invited].

PubMed

Azzam, R M A

2016-07-01

This paper reviews the current status of instruments for measuring the full 4×1 Stokes vector S, which describes the state of polarization (SOP) of totally or partially polarized light, and the 4×4 Mueller matrix M, which determines how the SOP is transformed as light interacts with a material sample or an optical element or system. The principle of operation of each instrument is briefly explained by using the Stokes-Mueller calculus. The development of fast, automated, imaging, and spectroscopic instruments over the last 50 years has greatly expanded the range of applications of optical polarimetry and ellipsometry in almost every branch of science and technology. Current challenges and future directions of this important branch of optics are also discussed.
The CSM testbed matrix processors internal logic and dataflow descriptions

NASA Technical Reports Server (NTRS)

Regelbrugge, Marc E.; Wright, Mary A.

1988-01-01

This report constitutes the final report for subtask 1 of Task 5 of NASA Contract NAS1-18444, Computational Structural Mechanics (CSM) Research. This report contains a detailed description of the coded workings of selected CSM Testbed matrix processors (i.e., TOPO, K, INV, SSOL) and of the arithmetic utility processor AUS. These processors and the current sparse matrix data structures are studied and documented. Items examined include: details of the data structures, interdependence of data structures, data-blocking logic in the data structures, processor data flow and architecture, and processor algorithmic logic flow.
Optical information processing at NASA Ames Research Center

NASA Technical Reports Server (NTRS)

Reid, Max B.; Bualat, Maria G.; Cho, Young C.; Downie, John D.; Gary, Charles K.; Ma, Paul W.; Ozcan, Meric; Pryor, Anna H.; Spirkovska, Lilly

1993-01-01

The combination of analog optical processors with digital electronic systems offers the potential of tera-OPS computational performance, while often requiring less power and weight relative to all-digital systems. NASA is working to develop and demonstrate optical processing techniques for on-board, real time science and mission applications. Current research areas and applications under investigation include optical matrix processing for space structure vibration control and the analysis of Space Shuttle Main Engine plume spectra, optical correlation-based autonomous vision for robotic vehicles, analog computation for robotic path planning, free-space optical interconnections for information transfer within digital electronic computers, and multiplexed arrays of fiber optic interferometric sensors for acoustic and vibration measurements.
Implicit, nonswitching, vector-oriented algorithm for steady transonic flow

NASA Technical Reports Server (NTRS)

Lottati, I.

1983-01-01

A rapid computation of a sequence of transonic flow solutions has to be performed in many areas of aerodynamic technology. The employment of low-cost vector array processors makes the conduction of such calculations economically feasible. However, for a full utilization of the new hardware, the developed algorithms must take advantage of the special characteristics of the vector array processor. The present investigation has the objective to develop an efficient algorithm for solving transonic flow problems governed by mixed partial differential equations on an array processor.
SC'11 Poster: A Highly Efficient MGPT Implementation for LAMMPS; with Strong Scaling

DOE Office of Scientific and Technical Information (OSTI.GOV)

Oppelstrup, T; Stukowski, A; Marian, J

2011-12-07

The MGPT potential has been implemented as a drop in package to the general molecular dynamics code LAMMPS. We implement an improved communication scheme that shrinks the communication layer thickness, and increases the load balancing. This results in unprecedented strong scaling, and speedup continuing beyond 1/8 atom/core. In addition, we have optimized the small matrix linear algebra with generic blocking (for all processors) and specific SIMD intrinsics for vectorization on Intel, AMD, and BlueGene CPUs.
Massively parallel processor networks with optical express channels

DOEpatents

Deri, R.J.; Brooks, E.D. III; Haigh, R.E.; DeGroot, A.J.

1999-08-24

An optical method for separating and routing local and express channel data comprises interconnecting the nodes in a network with fiber optic cables. A single fiber optic cable carries both express channel traffic and local channel traffic, e.g., in a massively parallel processor (MPP) network. Express channel traffic is placed on, or filtered from, the fiber optic cable at a light frequency or a color different from that of the local channel traffic. The express channel traffic is thus placed on a light carrier that skips over the local intermediate nodes one-by-one by reflecting off of selective mirrors placed at each local node. The local-channel-traffic light carriers pass through the selective mirrors and are not reflected. A single fiber optic cable can thus be threaded throughout a three-dimensional matrix of nodes with the x,y,z directions of propagation encoded by the color of the respective light carriers for both local and express channel traffic. Thus frequency division multiple access is used to hierarchically separate the local and express channels to eliminate the bucket brigade latencies that would otherwise result if the express traffic had to hop between every local node to reach its ultimate destination. 3 figs.
Massively parallel processor networks with optical express channels

DOEpatents

Deri, Robert J.; Brooks, III, Eugene D.; Haigh, Ronald E.; DeGroot, Anthony J.

1999-01-01

An optical method for separating and routing local and express channel data comprises interconnecting the nodes in a network with fiber optic cables. A single fiber optic cable carries both express channel traffic and local channel traffic, e.g., in a massively parallel processor (MPP) network. Express channel traffic is placed on, or filtered from, the fiber optic cable at a light frequency or a color different from that of the local channel traffic. The express channel traffic is thus placed on a light carrier that skips over the local intermediate nodes one-by-one by reflecting off of selective mirrors placed at each local node. The local-channel-traffic light carriers pass through the selective mirrors and are not reflected. A single fiber optic cable can thus be threaded throughout a three-dimensional matrix of nodes with the x,y,z directions of propagation encoded by the color of the respective light carriers for both local and express channel traffic. Thus frequency division multiple access is used to hierarchically separate the local and express channels to eliminate the bucket brigade latencies that would otherwise result if the express traffic had to hop between every local node to reach its ultimate destination.
Optical Flow in a Smart Sensor Based on Hybrid Analog-Digital Architecture

PubMed Central

Guzmán, Pablo; Díaz, Javier; Agís, Rodrigo; Ros, Eduardo

2010-01-01

The purpose of this study is to develop a motion sensor (delivering optical flow estimations) using a platform that includes the sensor itself, focal plane processing resources, and co-processing resources on a general purpose embedded processor. All this is implemented on a single device as a SoC (System-on-a-Chip). Optical flow is the 2-D projection into the camera plane of the 3-D motion information presented at the world scenario. This motion representation is widespread well-known and applied in the science community to solve a wide variety of problems. Most applications based on motion estimation require work in real-time; hence, this restriction must be taken into account. In this paper, we show an efficient approach to estimate the motion velocity vectors with an architecture based on a focal plane processor combined on-chip with a 32 bits NIOS II processor. Our approach relies on the simplification of the original optical flow model and its efficient implementation in a platform that combines an analog (focal-plane) and digital (NIOS II) processor. The system is fully functional and is organized in different stages where the early processing (focal plane) stage is mainly focus to pre-process the input image stream to reduce the computational cost in the post-processing (NIOS II) stage. We present the employed co-design techniques and analyze this novel architecture. We evaluate the system’s performance and accuracy with respect to the different proposed approaches described in the literature. We also discuss the advantages of the proposed approach as well as the degree of efficiency which can be obtained from the focal plane processing capabilities of the system. The final outcome is a low cost smart sensor for optical flow computation with real-time performance and reduced power consumption that can be used for very diverse application domains. PMID:22319283
A GaAs vector processor based on parallel RISC microprocessors

NASA Astrophysics Data System (ADS)

Misko, Tim A.; Rasset, Terry L.

A vector processor architecture based on the development of a 32-bit microprocessor using gallium arsenide (GaAs) technology has been developed. The McDonnell Douglas vector processor (MVP) will be fabricated completely from GaAs digital integrated circuits. The MVP architecture includes a vector memory of 1 megabyte, a parallel bus architecture with eight processing elements connected in parallel, and a control processor. The processing elements consist of a reduced instruction set CPU (RISC) with four floating-point coprocessor units and necessary memory interface functions. This architecture has been simulated for several benchmark programs including complex fast Fourier transform (FFT), complex inner product, trigonometric functions, and sort-merge routine. The results of this study indicate that the MVP can process a 1024-point complex FFT at a speed of 112 microsec (389 megaflops) while consuming approximately 618 W of power in a volume of approximately 0.1 ft-cubed.
Code Samples Used for Complexity and Control

NASA Astrophysics Data System (ADS)

Ivancevic, Vladimir G.; Reid, Darryn J.

2015-11-01

The following sections are included: * MathematicaⓇ Code * Generic Chaotic Simulator * Vector Differential Operators * NLS Explorer * 2C++ Code * C++ Lambda Functions for Real Calculus * Accelerometer Data Processor * Simple Predictor-Corrector Integrator * Solving the BVP with the Shooting Method * Linear Hyperbolic PDE Solver * Linear Elliptic PDE Solver * Method of Lines for a Set of the NLS Equations * C# Code * Iterative Equation Solver * Simulated Annealing: A Function Minimum * Simple Nonlinear Dynamics * Nonlinear Pendulum Simulator * Lagrangian Dynamics Simulator * Complex-Valued Crowd Attractor Dynamics * Freeform Fortran Code * Lorenz Attractor Simulator * Complex Lorenz Attractor * Simple SGE Soliton * Complex Signal Presentation * Gaussian Wave Packet * Hermitian Matrices * Euclidean L2-Norm * Vector/Matrix Operations * Plain C-Code: Levenberg-Marquardt Optimizer * Free Basic Code: 2D Crowd Dynamics with 3000 Agents

System balance analysis for vector computers

NASA Technical Reports Server (NTRS)

Knight, J. C.; Poole, W. G., Jr.; Voight, R. G.

1975-01-01

The availability of vector processors capable of sustaining computing rates of 10 to the 8th power arithmetic results pers second raised the question of whether peripheral storage devices representing current technology can keep such processors supplied with data. By examining the solution of a large banded linear system on these computers, it was found that even under ideal conditions, the processors will frequently be waiting for problem data.
Accuracy requirements of optical linear algebra processors in adaptive optics imaging systems

NASA Technical Reports Server (NTRS)

Downie, John D.; Goodman, Joseph W.

1989-01-01

The accuracy requirements of optical processors in adaptive optics systems are determined by estimating the required accuracy in a general optical linear algebra processor (OLAP) that results in a smaller average residual aberration than that achieved with a conventional electronic digital processor with some specific computation speed. Special attention is given to an error analysis of a general OLAP with regard to the residual aberration that is created in an adaptive mirror system by the inaccuracies of the processor, and to the effect of computational speed of an electronic processor on the correction. Results are presented on the ability of an OLAP to compete with a digital processor in various situations.
High precision computing with charge domain devices and a pseudo-spectral method therefor

NASA Technical Reports Server (NTRS)

Barhen, Jacob (Inventor); Toomarian, Nikzad (Inventor); Fijany, Amir (Inventor); Zak, Michail (Inventor)

1997-01-01

The present invention enhances the bit resolution of a CCD/CID MVM processor by storing each bit of each matrix element as a separate CCD charge packet. The bits of each input vector are separately multiplied by each bit of each matrix element in massive parallelism and the resulting products are combined appropriately to synthesize the correct product. In another aspect of the invention, such arrays are employed in a pseudo-spectral method of the invention, in which partial differential equations are solved by expressing each derivative analytically as matrices, and the state function is updated at each computation cycle by multiplying it by the matrices. The matrices are treated as synaptic arrays of a neural network and the state function vector elements are treated as neurons. In a further aspect of the invention, moving target detection is performed by driving the soliton equation with a vector of detector outputs. The neural architecture consists of two synaptic arrays corresponding to the two differential terms of the soliton-equation and an adder connected to the output thereof and to the output of the detector array to drive the soliton equation.
Least-squares analysis of the Mueller matrix.

PubMed

Reimer, Michael; Yevick, David

2006-08-15

In a single-mode fiber excited by light with a fixed polarization state, the output polarizations obtained at two different optical frequencies are related by a Mueller matrix. We examine least-squares procedures for estimating this matrix from repeated measurements of the output Stokes vector for a random set of input polarization states. We then apply these methods to the determination of polarization mode dispersion and polarization-dependent loss in an optical fiber. We find that a relatively simple formalism leads to results that are comparable with those of far more involved techniques.
Compute Server Performance Results

NASA Technical Reports Server (NTRS)

Stockdale, I. E.; Barton, John; Woodrow, Thomas (Technical Monitor)

1994-01-01

Parallel-vector supercomputers have been the workhorses of high performance computing. As expectations of future computing needs have risen faster than projected vector supercomputer performance, much work has been done investigating the feasibility of using Massively Parallel Processor systems as supercomputers. An even more recent development is the availability of high performance workstations which have the potential, when clustered together, to replace parallel-vector systems. We present a systematic comparison of floating point performance and price-performance for various compute server systems. A suite of highly vectorized programs was run on systems including traditional vector systems such as the Cray C90, and RISC workstations such as the IBM RS/6000 590 and the SGI R8000. The C90 system delivers 460 million floating point operations per second (FLOPS), the highest single processor rate of any vendor. However, if the price-performance ration (PPR) is considered to be most important, then the IBM and SGI processors are superior to the C90 processors. Even without code tuning, the IBM and SGI PPR's of 260 and 220 FLOPS per dollar exceed the C90 PPR of 160 FLOPS per dollar when running our highly vectorized suite,
CUDAICA: GPU Optimization of Infomax-ICA EEG Analysis

PubMed Central

Raimondo, Federico; Kamienkowski, Juan E.; Sigman, Mariano; Fernandez Slezak, Diego

2012-01-01

In recent years, Independent Component Analysis (ICA) has become a standard to identify relevant dimensions of the data in neuroscience. ICA is a very reliable method to analyze data but it is, computationally, very costly. The use of ICA for online analysis of the data, used in brain computing interfaces, results are almost completely prohibitive. We show an increase with almost no cost (a rapid video card) of speed of ICA by about 25 fold. The EEG data, which is a repetition of many independent signals in multiple channels, is very suitable for processing using the vector processors included in the graphical units. We profiled the implementation of this algorithm and detected two main types of operations responsible of the processing bottleneck and taking almost 80% of computing time: vector-matrix and matrix-matrix multiplications. By replacing function calls to basic linear algebra functions to the standard CUBLAS routines provided by GPU manufacturers, it does not increase performance due to CUDA kernel launch overhead. Instead, we developed a GPU-based solution that, comparing with the original BLAS and CUBLAS versions, obtains a 25x increase of performance for the ICA calculation. PMID:22811699
Special purpose parallel computer architecture for real-time control and simulation in robotic applications

NASA Technical Reports Server (NTRS)

Fijany, Amir (Inventor); Bejczy, Antal K. (Inventor)

1993-01-01

This is a real-time robotic controller and simulator which is a MIMD-SIMD parallel architecture for interfacing with an external host computer and providing a high degree of parallelism in computations for robotic control and simulation. It includes a host processor for receiving instructions from the external host computer and for transmitting answers to the external host computer. There are a plurality of SIMD microprocessors, each SIMD processor being a SIMD parallel processor capable of exploiting fine grain parallelism and further being able to operate asynchronously to form a MIMD architecture. Each SIMD processor comprises a SIMD architecture capable of performing two matrix-vector operations in parallel while fully exploiting parallelism in each operation. There is a system bus connecting the host processor to the plurality of SIMD microprocessors and a common clock providing a continuous sequence of clock pulses. There is also a ring structure interconnecting the plurality of SIMD microprocessors and connected to the clock for providing the clock pulses to the SIMD microprocessors and for providing a path for the flow of data and instructions between the SIMD microprocessors. The host processor includes logic for controlling the RRCS by interpreting instructions sent by the external host computer, decomposing the instructions into a series of computations to be performed by the SIMD microprocessors, using the system bus to distribute associated data among the SIMD microprocessors, and initiating activity of the SIMD microprocessors to perform the computations on the data by procedure call.
Application of a VLSI vector quantization processor to real-time speech coding

NASA Technical Reports Server (NTRS)

Davidson, G.; Gersho, A.

1986-01-01

Attention is given to a working vector quantization processor for speech coding that is based on a first-generation VLSI chip which efficiently performs the pattern-matching operation needed for the codebook search process (CPS). Using this chip, the CPS architecture has been successfully incorporated into a compact, single-board Vector PCM implementation operating at 7-18 kbits/sec. A real time Adaptive Vector Predictive Coder system using the CPS has also been implemented.
Optical Processing Techniques For Pseudorandom Sequence Prediction

NASA Astrophysics Data System (ADS)

Gustafson, Steven C.

1983-11-01

Pseudorandom sequences are series of apparently random numbers generated, for example, by linear or nonlinear feedback shift registers. An important application of these sequences is in spread spectrum communication systems, in which, for example, the transmitted carrier phase is digitally modulated rapidly and pseudorandomly and in which the information to be transmitted is incorporated as a slow modulation in the pseudorandom sequence. In this case the transmitted information can be extracted only by a receiver that uses for demodulation the same pseudorandom sequence used by the transmitter, and thus this type of communication system has a very high immunity to third-party interference. However, if a third party can predict in real time the probable future course of the transmitted pseudorandom sequence given past samples of this sequence, then interference immunity can be significantly reduced.. In this application effective pseudorandom sequence prediction techniques should be (1) applicable in real time to rapid (e.g., megahertz) sequence generation rates, (2) applicable to both linear and nonlinear pseudorandom sequence generation processes, and (3) applicable to error-prone past sequence samples of limited number and continuity. Certain optical processing techniques that may meet these requirements are discussed in this paper. In particular, techniques based on incoherent optical processors that perform general linear transforms or (more specifically) matrix-vector multiplications are considered. Computer simulation examples are presented which indicate that significant prediction accuracy can be obtained using these transforms for simple pseudorandom sequences. However, the useful prediction of more complex pseudorandom sequences will probably require the application of more sophisticated optical processing techniques.
SPECIAL ISSUE ON OPTICAL PROCESSING OF INFORMATION: Optoelectronic processors with scanning CCD photodetectors

NASA Astrophysics Data System (ADS)

Esepkina, N. A.; Lavrov, A. P.; Anan'ev, M. N.; Blagodarnyi, V. S.; Ivanov, S. I.; Mansyrev, M. I.; Molodyakov, S. A.

1995-10-01

Two new types of optoelectronic radio-signal processors were investigated. Charge-coupled device (CCD) photodetectors are used in these processors under continuous scanning conditions, i.e. in a time delay and storage mode. One of these processors is based on a CCD photodetector array with a reference-signal amplitude transparency and the other is an adaptive acousto-optical signal processor with linear frequency modulation. The processor with the transparency performs multichannel discrete—analogue convolution of an input signal with a corresponding kernel of the transformation determined by the transparency. If a light source is an array of light-emitting diodes of special (stripe) geometry, the optical stages of the processor can be made from optical fibre components and the whole processor then becomes a rigid 'sandwich' (a compact hybrid optoelectronic microcircuit). A report is given also of a study of a prototype processor with optical fibre components for the reception of signals from a system with antenna aperture synthesis, which forms a radio image of the Earth.
Conditions for space invariance in optical data processors used with coherent or noncoherent light.

PubMed

Arsenault, H R

1972-10-01

The conditions for space invariance in coherent and noncoherent optical processors are considered. All linear optical processors are shown to belong to one of two types. The conditions for space invariance are more stringent for noncoherent processors than for coherent processors, so that a system that is linear in coherent light may be nonlinear in noncoherent light. However, any processor that is linear in noncoherent light is also linear in the coherent limit.
A new parallel-vector finite element analysis software on distributed-memory computers

NASA Technical Reports Server (NTRS)

Qin, Jiangning; Nguyen, Duc T.

1993-01-01

A new parallel-vector finite element analysis software package MPFEA (Massively Parallel-vector Finite Element Analysis) is developed for large-scale structural analysis on massively parallel computers with distributed-memory. MPFEA is designed for parallel generation and assembly of the global finite element stiffness matrices as well as parallel solution of the simultaneous linear equations, since these are often the major time-consuming parts of a finite element analysis. Block-skyline storage scheme along with vector-unrolling techniques are used to enhance the vector performance. Communications among processors are carried out concurrently with arithmetic operations to reduce the total execution time. Numerical results on the Intel iPSC/860 computers (such as the Intel Gamma with 128 processors and the Intel Touchstone Delta with 512 processors) are presented, including an aircraft structure and some very large truss structures, to demonstrate the efficiency and accuracy of MPFEA.
Parallel processors and nonlinear structural dynamics algorithms and software

NASA Technical Reports Server (NTRS)

Belytschko, Ted

1990-01-01

Techniques are discussed for the implementation and improvement of vectorization and concurrency in nonlinear explicit structural finite element codes. In explicit integration methods, the computation of the element internal force vector consumes the bulk of the computer time. The program can be efficiently vectorized by subdividing the elements into blocks and executing all computations in vector mode. The structuring of elements into blocks also provides a convenient way to implement concurrency by creating tasks which can be assigned to available processors for evaluation. The techniques were implemented in a 3-D nonlinear program with one-point quadrature shell elements. Concurrency and vectorization were first implemented in a single time step version of the program. Techniques were developed to minimize processor idle time and to select the optimal vector length. A comparison of run times between the program executed in scalar, serial mode and the fully vectorized code executed concurrently using eight processors shows speed-ups of over 25. Conjugate gradient methods for solving nonlinear algebraic equations are also readily adapted to a parallel environment. A new technique for improving convergence properties of conjugate gradients in nonlinear problems is developed in conjunction with other techniques such as diagonal scaling. A significant reduction in the number of iterations required for convergence is shown for a statically loaded rigid bar suspended by three equally spaced springs.
SPECIAL ISSUE ON OPTICAL PROCESSING OF INFORMATION: Semiconductor-laser Fourier processors of electric signals

NASA Astrophysics Data System (ADS)

Blok, A. S.; Bukhenskii, A. F.; Krupitskii, É. I.; Morozov, S. V.; Pelevin, V. Yu; Sergeenko, T. N.; Yakovlev, V. I.

1995-10-01

An investigation is reported of acousto-optical and fibre-optic Fourier processors of electric signals, based on semiconductor lasers. A description is given of practical acousto-optical processors with an analysis band 120 MHz wide, a resolution of 200 kHz, and 7 cm × 8 cm × 18 cm dimensions. Fibre-optic Fourier processors are considered: they represent a new class of devices which are promising for the processing of gigahertz signals.
An optical/digital processor - Hardware and applications

NASA Technical Reports Server (NTRS)

Casasent, D.; Sterling, W. M.

1975-01-01

A real-time two-dimensional hybrid processor consisting of a coherent optical system, an optical/digital interface, and a PDP-11/15 control minicomputer is described. The input electrical-to-optical transducer is an electron-beam addressed potassium dideuterium phosphate (KD2PO4) light valve. The requirements and hardware for the output optical-to-digital interface, which is constructed from modular computer building blocks, are presented. Initial experimental results demonstrating the operation of this hybrid processor in phased-array radar data processing, synthetic-aperture image correlation, and text correlation are included. The applications chosen emphasize the role of the interface in the analysis of data from an optical processor and possible extensions to the digital feedback control of an optical processor.
Hybrid Electro-Optic Processor

DTIC Science & Technology

1991-07-01

This report describes the design of a hybrid electro - optic processor to perform adaptive interference cancellation in radar systems. The processor is...modulator is reported. Included is this report is a discussion of the design, partial fabrication in the laboratory, and partial testing of the hybrid electro ... optic processor. A follow on effort is planned to complete the construction and testing of the processor. The work described in this report is the
Efficiency Analysis of the Parallel Implementation of the SIMPLE Algorithm on Multiprocessor Computers

NASA Astrophysics Data System (ADS)

Lashkin, S. V.; Kozelkov, A. S.; Yalozo, A. V.; Gerasimov, V. Yu.; Zelensky, D. K.

2017-12-01

This paper describes the details of the parallel implementation of the SIMPLE algorithm for numerical solution of the Navier-Stokes system of equations on arbitrary unstructured grids. The iteration schemes for the serial and parallel versions of the SIMPLE algorithm are implemented. In the description of the parallel implementation, special attention is paid to computational data exchange among processors under the condition of the grid model decomposition using fictitious cells. We discuss the specific features for the storage of distributed matrices and implementation of vector-matrix operations in parallel mode. It is shown that the proposed way of matrix storage reduces the number of interprocessor exchanges. A series of numerical experiments illustrates the effect of the multigrid SLAE solver tuning on the general efficiency of the algorithm; the tuning involves the types of the cycles used (V, W, and F), the number of iterations of a smoothing operator, and the number of cells for coarsening. Two ways (direct and indirect) of efficiency evaluation for parallelization of the numerical algorithm are demonstrated. The paper presents the results of solving some internal and external flow problems with the evaluation of parallelization efficiency by two algorithms. It is shown that the proposed parallel implementation enables efficient computations for the problems on a thousand processors. Based on the results obtained, some general recommendations are made for the optimal tuning of the multigrid solver, as well as for selecting the optimal number of cells per processor.
Vectorization with SIMD extensions speeds up reconstruction in electron tomography.

PubMed

Agulleiro, J I; Garzón, E M; García, I; Fernández, J J

2010-06-01

Electron tomography allows structural studies of cellular structures at molecular detail. Large 3D reconstructions are needed to meet the resolution requirements. The processing time to compute these large volumes may be considerable and so, high performance computing techniques have been used traditionally. This work presents a vector approach to tomographic reconstruction that relies on the exploitation of the SIMD extensions available in modern processors in combination to other single processor optimization techniques. This approach succeeds in producing full resolution tomograms with an important reduction in processing time, as evaluated with the most common reconstruction algorithms, namely WBP and SIRT. The main advantage stems from the fact that this approach is to be run on standard computers without the need of specialized hardware, which facilitates the development, use and management of programs. Future trends in processor design open excellent opportunities for vector processing with processor's SIMD extensions in the field of 3D electron microscopy.
Noncoherent parallel optical processor for discrete two-dimensional linear transformations.

PubMed

Glaser, I

1980-10-01

We describe a parallel optical processor, based on a lenslet array, that provides general linear two-dimensional transformations using noncoherent light. Such a processor could become useful in image- and signal-processing applications in which the throughput requirements cannot be adequately satisfied by state-of-the-art digital processors. Experimental results that illustrate the feasibility of the processor by demonstrating its use in parallel optical computation of the two-dimensional Walsh-Hadamard transformation are presented.
Floating point only SIMD instruction set architecture including compare, select, Boolean, and alignment operations

DOEpatents

Gschwind, Michael K [Chappaqua, NY

2011-03-01

Mechanisms for implementing a floating point only single instruction multiple data instruction set architecture are provided. A processor is provided that comprises an issue unit, an execution unit coupled to the issue unit, and a vector register file coupled to the execution unit. The execution unit has logic that implements a floating point (FP) only single instruction multiple data (SIMD) instruction set architecture (ISA). The floating point vector registers of the vector register file store both scalar and floating point values as vectors having a plurality of vector elements. The processor may be part of a data processing system.

Statistics of partially-polarized fields: beyond the Stokes vector and coherence matrix

NASA Astrophysics Data System (ADS)

Charnotskii, Mikhail

2017-08-01

Traditionally, the partially-polarized light is characterized by the four Stokes parameters. Equivalent description is also provided by correlation tensor of the optical field. These statistics specify only the second moments of the complex amplitudes of the narrow-band two-dimensional electric field of the optical wave. Electric field vector of the random quasi monochromatic wave is a nonstationary oscillating two-dimensional real random variable. We introduce a novel statistical description of these partially polarized waves: the Period-Averaged Probability Density Function (PA-PDF) of the field. PA-PDF contains more information on the polarization state of the field than the Stokes vector. In particular, in addition to the conventional distinction between the polarized and depolarized components of the field PA-PDF allows to separate the coherent and fluctuating components of the field. We present several model examples of the fields with identical Stokes vectors and very distinct shapes of PA-PDF. In the simplest case of the nonstationary, oscillating normal 2-D probability distribution of the real electrical field and stationary 4-D probability distribution of the complex amplitudes, the newly-introduced PA-PDF is determined by 13 parameters that include the first moments and covariance matrix of the quadrature components of the oscillating vector field.
Effective Vectorization with OpenMP 4.5

DOE Office of Scientific and Technical Information (OSTI.GOV)

Huber, Joseph N.; Hernandez, Oscar R.; Lopez, Matthew Graham

This paper describes how the Single Instruction Multiple Data (SIMD) model and its extensions in OpenMP work, and how these are implemented in different compilers. Modern processors are highly parallel computational machines which often include multiple processors capable of executing several instructions in parallel. Understanding SIMD and executing instructions in parallel allows the processor to achieve higher performance without increasing the power required to run it. SIMD instructions can significantly reduce the runtime of code by executing a single operation on large groups of data. The SIMD model is so integral to the processor s potential performance that, if SIMDmore » is not utilized, less than half of the processor is ever actually used. Unfortunately, using SIMD instructions is a challenge in higher level languages because most programming languages do not have a way to describe them. Most compilers are capable of vectorizing code by using the SIMD instructions, but there are many code features important for SIMD vectorization that the compiler cannot determine at compile time. OpenMP attempts to solve this by extending the C++/C and Fortran programming languages with compiler directives that express SIMD parallelism. OpenMP is used to pass hints to the compiler about the code to be executed in SIMD. This is a key resource for making optimized code, but it does not change whether or not the code can use SIMD operations. However, in many cases critical functions are limited by a poor understanding of how SIMD instructions are actually implemented, as SIMD can be implemented through vector instructions or simultaneous multi-threading (SMT). We have found that it is often the case that code cannot be vectorized, or is vectorized poorly, because the programmer does not have sufficient knowledge of how SIMD instructions work.« less
Optical computation using residue arithmetic.

PubMed

Huang, A; Tsunoda, Y; Goodman, J W; Ishihara, S

1979-01-15

Using residue arithmetic it is possible to perform additions, subtractions, multiplications, and polynomial evaluation without the necessity for carry operations. Calculations can, therefore, be performed in a fully parallel manner. Several different optical methods for performing residue arithmetic operations are described. A possible combination of such methods to form a matrix vector multiplier is considered. The potential advantages of optics in performing these kinds of operations are discussed.
Signed-negabinary-arithmetic-based optical computing by use of a single liquid-crystal-display panel.

PubMed

Datta, Asit K; Munshi, Soumika

2002-03-10

Based on the negabinary number representation, parallel one-step arithmetic operations (that is, addition and subtraction), logical operations, and matrix-vector multiplication on data have been optically implemented, by use of a two-dimensional spatial-encoding technique. For addition and subtraction, one of the operands in decimal form is converted into the unsigned negabinary form, whereas the other decimal number is represented in the signed negabinary form. The result of operation is obtained in the mixed negabinary form and is converted back into decimal. Matrix-vector multiplication for unsigned negabinary numbers is achieved through the convolution technique. Both of the operands for logical operation are converted to their signed negabinary forms. All operations are implemented by use of a unique optical architecture. The use of a single liquid-crystal-display panel to spatially encode the input data, operational kernels, and decoding masks have simplified the architecture as well as reduced the cost and complexity.
Reduction of solar vector magnetograph data using a microMSP array processor

NASA Technical Reports Server (NTRS)

Kineke, Jack

1990-01-01

The processing of raw data obtained by the solar vector magnetograph at NASA-Marshall requires extensive arithmetic operations on large arrays of real numbers. The objectives of this summer faculty fellowship study are to: (1) learn the programming language of the MicroMSP Array Processor and adapt some existing data reduction routines to exploit its capabilities; and (2) identify other applications and/or existing programs which lend themselves to array processor utilization which can be developed by undergraduate student programmers under the provisions of project JOVE.
Accuracy requirements of optical linear algebra processors in adaptive optics imaging systems

NASA Technical Reports Server (NTRS)

Downie, John D.

1990-01-01

A ground-based adaptive optics imaging telescope system attempts to improve image quality by detecting and correcting for atmospherically induced wavefront aberrations. The required control computations during each cycle will take a finite amount of time. Longer time delays result in larger values of residual wavefront error variance since the atmosphere continues to change during that time. Thus an optical processor may be well-suited for this task. This paper presents a study of the accuracy requirements in a general optical processor that will make it competitive with, or superior to, a conventional digital computer for the adaptive optics application. An optimization of the adaptive optics correction algorithm with respect to an optical processor's degree of accuracy is also briefly discussed.
Image Matrix Processor for Volumetric Computations Final Report CRADA No. TSB-1148-95

DOE Office of Scientific and Technical Information (OSTI.GOV)

Roberson, G. Patrick; Browne, Jolyon

The development of an Image Matrix Processor (IMP) was proposed that would provide an economical means to perform rapid ray-tracing processes on volume "Giga Voxel" data sets. This was a multi-phased project. The objective of the first phase of the IMP project was to evaluate the practicality of implementing a workstation-based Image Matrix Processor for use in volumetric reconstruction and rendering using hardware simulation techniques. Additionally, ARACOR and LLNL worked together to identify and pursue further funding sources to complete a second phase of this project.
High-performance image processing architecture

NASA Astrophysics Data System (ADS)

Coffield, Patrick C.

1992-04-01

The proposed architecture is a logical design specifically for image processing and other related computations. The design is a hybrid electro-optical concept consisting of three tightly coupled components: a spatial configuration processor (the optical analog portion), a weighting processor (digital), and an accumulation processor (digital). The systolic flow of data and image processing operations are directed by a control buffer and pipelined to each of the three processing components. The image processing operations are defined by an image algebra developed by the University of Florida. The algebra is capable of describing all common image-to-image transformations. The merit of this architectural design is how elegantly it handles the natural decomposition of algebraic functions into spatially distributed, point-wise operations. The effect of this particular decomposition allows convolution type operations to be computed strictly as a function of the number of elements in the template (mask, filter, etc.) instead of the number of picture elements in the image. Thus, a substantial increase in throughput is realized. The logical architecture may take any number of physical forms. While a hybrid electro-optical implementation is of primary interest, the benefits and design issues of an all digital implementation are also discussed. The potential utility of this architectural design lies in its ability to control all the arithmetic and logic operations of the image algebra's generalized matrix product. This is the most powerful fundamental formulation in the algebra, thus allowing a wide range of applications.
Luneburg lens and optical matrix algebra research

NASA Technical Reports Server (NTRS)

Wood, V. E.; Busch, J. R.; Verber, C. M.; Caulfield, H. J.

1984-01-01

Planar, as opposed to channelized, integrated optical circuits (IOCs) were stressed as the basis for computational devices. Both fully-parallel and systolic architectures are considered and the tradeoffs between the two device types are discussed. The Kalman filter approach is a most important computational method for many NASA problems. This approach to deriving a best-fit estimate for the state vector describing a large system leads to matrix sizes which are beyond the predicted capacities of planar IOCs. This problem is overcome by matrix partitioning, and several architectures for accomplishing this are described. The Luneburg lens work has involved development of lens design techniques, design of mask arrangements for producing lenses of desired shape, investigation of optical and chemical properties of arsenic trisulfide films, deposition of lenses both by thermal evaporation and by RF sputtering, optical testing of these lenses, modification of lens properties through ultraviolet irradiation, and comparison of measured lens properties with those expected from ray trace analyses.
Efficient multitasking of Choleski matrix factorization on CRAY supercomputers

NASA Technical Reports Server (NTRS)

Overman, Andrea L.; Poole, Eugene L.

1991-01-01

A Choleski method is described and used to solve linear systems of equations that arise in large scale structural analysis. The method uses a novel variable-band storage scheme and is structured to exploit fast local memory caches while minimizing data access delays between main memory and vector registers. Several parallel implementations of this method are described for the CRAY-2 and CRAY Y-MP computers demonstrating the use of microtasking and autotasking directives. A portable parallel language, FORCE, is used for comparison with the microtasked and autotasked implementations. Results are presented comparing the matrix factorization times for three representative structural analysis problems from runs made in both dedicated and multi-user modes on both computers. CPU and wall clock timings are given for the parallel implementations and are compared to single processor timings of the same algorithm.
Ultra-Reliable Digital Avionics (URDA) processor

NASA Astrophysics Data System (ADS)

Branstetter, Reagan; Ruszczyk, William; Miville, Frank

1994-10-01

Texas Instruments Incorporated (TI) developed the URDA processor design under contract with the U.S. Air Force Wright Laboratory and the U.S. Army Night Vision and Electro-Sensors Directorate. TI's approach couples advanced packaging solutions with advanced integrated circuit (IC) technology to provide a high-performance (200 MIPS/800 MFLOPS) modular avionics processor module for a wide range of avionics applications. TI's processor design integrates two Ada-programmable, URDA basic processor modules (BPM's) with a JIAWG-compatible PiBus and TMBus on a single F-22 common integrated processor-compatible form-factor SEM-E avionics card. A separate, high-speed (25-MWord/second 32-bit word) input/output bus is provided for sensor data. Each BPM provides a peak throughput of 100 MIPS scalar concurrent with 400-MFLOPS vector processing in a removable multichip module (MCM) mounted to a liquid-flowthrough (LFT) core and interfacing to a processor interface module printed wiring board (PWB). Commercial RISC technology coupled with TI's advanced bipolar complementary metal oxide semiconductor (BiCMOS) application specific integrated circuit (ASIC) and silicon-on-silicon packaging technologies are used to achieve the high performance in a miniaturized package. A Mips R4000-family reduced instruction set computer (RISC) processor and a TI 100-MHz BiCMOS vector coprocessor (VCP) ASIC provide, respectively, the 100 MIPS of a scalar processor throughput and 400 MFLOPS of vector processing throughput for each BPM. The TI Aladdim ASIC chipset was developed on the TI Aladdin Program under contract with the U.S. Army Communications and Electronics Command and was sponsored by the Advanced Research Projects Agency with technical direction from the U.S. Army Night Vision and Electro-Sensors Directorate.
Optical backplane interconnect switch for data processors and computers

NASA Technical Reports Server (NTRS)

Hendricks, Herbert D.; Benz, Harry F.; Hammer, Jacob M.

1989-01-01

An optoelectronic integrated device design is reported which can be used to implement an all-optical backplane interconnect switch. The switch is sized to accommodate an array of processors and memories suitable for direct replacement into the basic avionic multiprocessor backplane. The optical backplane interconnect switch is also suitable for direct replacement of the PI bus traffic switch and at the same time, suitable for supporting pipelining of the processor and memory. The 32 bidirectional switchable interconnects are configured with broadcast capability for controls, reconfiguration, and messages. The approach described here can handle a serial interconnection of data processors or a line-to-link interconnection of data processors. An optical fiber demonstration of this approach is presented.
Mathematical Methods for Optical Physics and Engineering

NASA Astrophysics Data System (ADS)

Gbur, Gregory J.

2011-01-01

1. Vector algebra; 2. Vector calculus; 3. Vector calculus in curvilinear coordinate systems; 4. Matrices and linear algebra; 5. Advanced matrix techniques and tensors; 6. Distributions; 7. Infinite series; 8. Fourier series; 9. Complex analysis; 10. Advanced complex analysis; 11. Fourier transforms; 12. Other integral transforms; 13. Discrete transforms; 14. Ordinary differential equations; 15. Partial differential equations; 16. Bessel functions; 17. Legendre functions and spherical harmonics; 18. Orthogonal functions; 19. Green's functions; 20. The calculus of variations; 21. Asymptotic techniques; Appendices; References; Index.
Hybrid Optical Processor

DTIC Science & Technology

1990-08-01

LCTVs) ..................... 17 2.14 JOINT FOURIER TRANSFORM PROCESSOR .................. 18 2.15 HOLOGRAPHIC ASSOCIATIVE MEMORY USING A MICRO ...RADC-TR-90-256 Final Technical Report August1990 AD-A227 163 HYBRID OPTICAL PROCESSOR Dove Electronics, Inc. J.F. Dove, F.T .S. Yu, C. Eldering...ANM SUSUE & FUNDING NUMBERS C - F19628-87-C-0086 HYBRID OPTICAL PROCESSOR PE - 61102F PR - 2305 &AUThNOA TA - J7 J.F. Dove, F.T.S. Yu, C. Eldering WU
Conceptual design of an on-board optical processor with components

NASA Technical Reports Server (NTRS)

Walsh, J. R.; Shackelford, R. G.

1977-01-01

The specification of components for a spacecraft on-board optical processor was investigated. A space oriented application of optical data processing and the investigation of certain aspects of optical correlators were examined. The investigation confirmed that real-time optical processing has made significant advances over the past few years, but that there are still critical components which will require further development for use in an on-board optical processor. The devices evaluated were the coherent light valve, the readout optical modulator, the liquid crystal modulator, and the image forming light modulator.
Vector processing efficiency of plasma MHD codes by use of the FACOM 230-75 APU

NASA Astrophysics Data System (ADS)

Matsuura, T.; Tanaka, Y.; Naraoka, K.; Takizuka, T.; Tsunematsu, T.; Tokuda, S.; Azumi, M.; Kurita, G.; Takeda, T.

1982-06-01

In the framework of pipelined vector architecture, the efficiency of vector processing is assessed with respect to plasma MHD codes in nuclear fusion research. By using a vector processor, the FACOM 230-75 APU, the limit of the enhancement factor due to parallelism of current vector machines is examined for three numerical codes based on a fluid model. Reasonable speed-up factors of approximately 6,6 and 4 times faster than the highly optimized scalar version are obtained for ERATO (linear stability code), AEOLUS-R1 (nonlinear stability code) and APOLLO (1-1/2D transport code), respectively. Problems of the pipelined vector processors are discussed from the viewpoint of restructuring, optimization and choice of algorithms. In conclusion, the important concept of "concurrency within pipelined parallelism" is emphasized.
3D polarisation speckle as a demonstration of tensor version of the van Cittert-Zernike theorem for stochastic electromagnetic beams

NASA Astrophysics Data System (ADS)

Ma, Ning; Zhao, Juan; Hanson, Steen G.; Takeda, Mitsuo; Wang, Wei

2016-10-01

Laser speckle has received extensive studies of its basic properties and associated applications. In the majority of research on speckle phenomena, the random optical field has been treated as a scalar optical field, and the main interest has been concentrated on their statistical properties and applications of its intensity distribution. Recently, statistical properties of random electric vector fields referred to as Polarization Speckle have come to attract new interest because of their importance in a variety of areas with practical applications such as biomedical optics and optical metrology. Statistical phenomena of random electric vector fields have close relevance to the theories of speckles, polarization and coherence theory. In this paper, we investigate the correlation tensor for stochastic electromagnetic fields modulated by a depolarizer consisting of a rough-surfaced retardation plate. Under the assumption that the microstructure of the scattering surface on the depolarizer is as fine as to be unresolvable in our observation region, we have derived a relationship between the polarization matrix/coherency matrix for the modulated electric fields behind the rough-surfaced retardation plate and the coherence matrix under the free space geometry. This relation is regarded as entirely analogous to the van Cittert-Zernike theorem of classical coherence theory. Within the paraxial approximation as represented by the ABCD-matrix formalism, the three-dimensional structure of the generated polarization speckle is investigated based on the correlation tensor, indicating a typical carrot structure with a much longer axial dimension than the extent in its transverse dimension.
Quantitative tissue polarimetry using polar decomposition of 3 x 3 Mueller matrix

NASA Astrophysics Data System (ADS)

Swami, M. K.; Manhas, S.; Buddhiwant, P.; Ghosh, N.; Uppal, A.; Gupta, P. K.

2007-05-01

Polarization properties of any optical system are completely described by a sixteen-element (4 x 4) matrix called Mueller matrix, which transform the Stokes vector describing the polarization properties of incident light to the stokes vector of scattered light. Measurement of all the elements of the matrix requires a minimum of sixteen measurements involving both linear and circularly polarized light. However, for many diagnostic applications, it would be useful if all the polarization parameters of the medium (depolarization (Δ), differential attenuation of two orthogonal polarizations, that is, diattenuation (d), and differential phase retardance of two orthogonal polarizations, i.e., retardance (δ )) can be quantified with linear polarization measurements alone. In this paper we show that for a turbid medium, like biological tissue, where the depolarization of linearly polarized light arises primarily due to the randomization of the field vector's direction by multiple scattering, the polarization parameters of the medium can be obtained from the nine Mueller matrix elements involving linear polarization measurements only. Use of the approach for measurement of polarization parameters (Δ, d and δ) of normal and malignant (squamous cell carcinoma) tissues resected from human oral cavity are presented.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Collins, Benjamin S.

The Futility package contains the following: 1) Definition of the size of integers and real numbers; 2) A generic Unit test harness; 3) Definitions for some basic extensions to the Fortran language: arbitrary length strings, a parameter list construct, exception handlers, command line processor, timers; 4) Geometry definitions: point, line, plane, box, cylinder, polyhedron; 5) File wrapper functions: standard Fortran input/output files, Fortran binary files, HDF5 files; 6) Parallel wrapper functions: MPI, and Open MP abstraction layers, partitioning algorithms; 7) Math utilities: BLAS, Matrix and Vector definitions, Linear Solver methods and wrappers for other TPLs (PETSC, MKL, etc), preconditioner classes;more » 8) Misc: random number generator, water saturation properties, sorting algorithms.« less
A parallel algorithm for computing the eigenvalues of a symmetric tridiagonal matrix

NASA Technical Reports Server (NTRS)

Swarztrauber, Paul N.

1993-01-01

A parallel algorithm, called polysection, is presented for computing the eigenvalues of a symmetric tridiagonal matrix. The method is based on a quadratic recurrence in which the characteristic polynomial is constructed on a binary tree from polynomials whose degree doubles at each level. Intervals that contain exactly one zero are determined by the zeros of polynomials at the previous level which ensures that different processors compute different zeros. The signs of the polynomials at the interval endpoints are determined a priori and used to guarantee that all zeros are found. The use of finite-precision arithmetic may result in multiple zeros; however, in this case, the intervals coalesce and their number determines exactly the multiplicity of the zero. For an N x N matrix the eigenvalues can be determined in O(log-squared N) time with N-squared processors and O(N) time with N processors. The method is compared with a parallel variant of bisection that requires O(N-squared) time on a single processor, O(N) time with N processors, and O(log N) time with N-squared processors.

Photorefractive Integrators and Correlators

DTIC Science & Technology

1992-12-01

The use of photorefractive crystals as optically addressed time integrating spatial light modulators in acousto - optic signal processing applications...adaptive acousto - optic processor. These results demonstrated the feasibility of using photorefractives for such applications.... Photorefractive, Acousto - optic processor.
General optical discrete z transform: design and application.

PubMed

Ngo, Nam Quoc

2016-12-20

This paper presents a generalization of the discrete z transform algorithm. It is shown that the GOD-ZT algorithm is a generalization of several important conventional discrete transforms. Based on the GOD-ZT algorithm, a tunable general optical discrete z transform (GOD-ZT) processor is synthesized using the silica-based finite impulse response transversal filter. To demonstrate the effectiveness of the method, the design and simulation of a tunable optical discrete Fourier transform (ODFT) processor as a special case of the synthesized GOD-ZT processor is presented. It is also shown that the ODFT processor can function as a real-time optical spectrum analyzer. The tunable ODFT has an important potential application as a tunable optical demultiplexer at the receiver end of an optical orthogonal frequency-division multiplexing transmission system.
Practical somewhat-secure quantum somewhat-homomorphic encryption with coherent states

NASA Astrophysics Data System (ADS)

Tan, Si-Hui; Ouyang, Yingkai; Rohde, Peter P.

2018-04-01

We present a scheme for implementing homomorphic encryption on coherent states encoded using phase-shift keys. The encryption operations require only rotations in phase space, which commute with computations in the code space performed via passive linear optics, and with generalized nonlinear phase operations that are polynomials of the photon-number operator in the code space. This encoding scheme can thus be applied to any computation with coherent-state inputs, and the computation proceeds via a combination of passive linear optics and generalized nonlinear phase operations. An example of such a computation is matrix multiplication, whereby a vector representing coherent-state amplitudes is multiplied by a matrix representing a linear optics network, yielding a new vector of coherent-state amplitudes. By finding an orthogonal partitioning of the support of our encoded states, we quantify the security of our scheme via the indistinguishability of the encrypted code words. While we focus on coherent-state encodings, we expect that this phase-key encoding technique could apply to any continuous-variable computation scheme where the phase-shift operator commutes with the computation.
Jones matrix analysis for a polarization-sensitive optical coherence tomography system using fiber-optic components.

PubMed

Park, B Hyle; Pierce, Mark C; Cense, Barry; de Boer, Johannes F

2004-11-01

We present an analysis for polarization-sensitive optical coherence tomography that facilitates the unrestricted use of fiber and fiber-optic components throughout an interferometer and yields sample birefringence, diattenuation, and relative optic axis orientation. We use a novel Jones matrix approach that compares the polarization states of light reflected from the sample surface with those reflected from within a biological sample for pairs of depth scans. The incident polarization alternated between two states that are perpendicular in a Poincaré sphere representation to ensure proper detection of tissue birefringence regardless of optical fiber contributions. The method was validated by comparing the calculated diattenuation of a polarizing sheet, chicken tendon, and muscle with that obtained by independent measurement. The relative importance of diattenuation versus birefringence to angular displacement of Stokes vectors on a Poincaré sphere was quantified.
Vectorized program architectures for supercomputer-aided circuit design

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rizzoli, V.; Ferlito, M.; Neri, A.

1986-01-01

Vector processors (supercomputers) can be effectively employed in MIC or MMIC applications to solve problems of large numerical size such as broad-band nonlinear design or statistical design (yield optimization). In order to fully exploit the capabilities of a vector hardware, any program architecture must be structured accordingly. This paper presents a possible approach to the ''semantic'' vectorization of microwave circuit design software. Speed-up factors of the order of 50 can be obtained on a typical vector processor (Cray X-MP), with respect to the most powerful scaler computers (CDC 7600), with cost reductions of more than one order of magnitude. Thismore » could broaden the horizon of microwave CAD techniques to include problems that are practically out of the reach of conventional systems.« less
Accuracy requirements of optical linear algebra processors in adaptive optics imaging systems.

PubMed

Downie, J D; Goodman, J W

1989-10-15

A ground-based adaptive optics imaging telescope system attempts to improve image quality by measuring and correcting for atmospherically induced wavefront aberrations. The necessary control computations during each cycle will take a finite amount of time, which adds to the residual error variance since the atmosphere continues to change during that time. Thus an optical processor may be well-suited for this task. This paper investigates this possibility by studying the accuracy requirements in a general optical processor that will make it competitive with, or superior to, a conventional digital computer for adaptive optics use.
Automatic differentiation for design sensitivity analysis of structural systems using multiple processors

NASA Technical Reports Server (NTRS)

Nguyen, Duc T.; Storaasli, Olaf O.; Qin, Jiangning; Qamar, Ramzi

1994-01-01

An automatic differentiation tool (ADIFOR) is incorporated into a finite element based structural analysis program for shape and non-shape design sensitivity analysis of structural systems. The entire analysis and sensitivity procedures are parallelized and vectorized for high performance computation. Small scale examples to verify the accuracy of the proposed program and a medium scale example to demonstrate the parallel vector performance on multiple CRAY C90 processors are included.
2-D Acousto-Optic Signal Processors for Simultaneous Spectrum Analysis and Direction Finding

DTIC Science & Technology

1990-11-01

National Dfense Defence nationale 2-D ACOUSTO - OPTIC SIGNAL PROCESSORS FOR SIMULTANEOUS SPECTRUM ANALYSIS 00 AND DIRECTION FINDING (U) by NM Jim P.Y...Wr pdft .1w I0~1111191 3 05089 National DIfense Defence nationale 2-D ACOUSTO - OPTIC SIGNAL PROCESSORS FOR SIMULTANEOUS SPECTRUM ANALYSIS AND DIRECTION...Processing, J.T. Tippet et al., Eds., Chapter 38, pp. 715-748, MIT Press, Cambridge 1965. [6] A.E. Spezio," Acousto - optics for Electronic Warfare
Vector generator scan converter

DOEpatents

Moore, James M.; Leighton, James F.

1990-01-01

High printing speeds for graphics data are achieved with a laser printer by transmitting compressed graphics data from a main processor over an I/O (input/output) channel to a vector generator scan converter which reconstructs a full graphics image for input to the laser printer through a raster data input port. The vector generator scan converter includes a microprocessor with associated microcode memory containing a microcode instruction set, a working memory for storing compressed data, vector generator hardward for drawing a full graphic image from vector parameters calculated by the microprocessor, image buffer memory for storing the reconstructed graphics image and an output scanner for reading the graphics image data and inputting the data to the printer. The vector generator scan converter eliminates the bottleneck created by the I/O channel for transmitting graphics data from the main processor to the laser printer, and increases printer speed up to thirty fold.
Vector generator scan converter

DOEpatents

Moore, J.M.; Leighton, J.F.

1988-02-05

High printing speeds for graphics data are achieved with a laser printer by transmitting compressed graphics data from a main processor over an I/O channel to a vector generator scan converter which reconstructs a full graphics image for input to the laser printer through a raster data input port. The vector generator scan converter includes a microprocessor with associated microcode memory containing a microcode instruction set, a working memory for storing compressed data, vector generator hardware for drawing a full graphic image from vector parameters calculated by the microprocessor, image buffer memory for storing the reconstructed graphics image and an output scanner for reading the graphics image data and inputting the data to the printer. The vector generator scan converter eliminates the bottleneck created by the I/O channel for transmitting graphics data from the main processor to the laser printer, and increases printer speed up to thirty fold. 7 figs.
Programmable optical processor chips: toward photonic RF filters with DSP-level flexibility and MHz-band selectivity

NASA Astrophysics Data System (ADS)

Xie, Yiwei; Geng, Zihan; Zhuang, Leimeng; Burla, Maurizio; Taddei, Caterina; Hoekman, Marcel; Leinse, Arne; Roeloffzen, Chris G. H.; Boller, Klaus-J.; Lowery, Arthur J.

2017-12-01

Integrated optical signal processors have been identified as a powerful engine for optical processing of microwave signals. They enable wideband and stable signal processing operations on miniaturized chips with ultimate control precision. As a promising application, such processors enables photonic implementations of reconfigurable radio frequency (RF) filters with wide design flexibility, large bandwidth, and high-frequency selectivity. This is a key technology for photonic-assisted RF front ends that opens a path to overcoming the bandwidth limitation of current digital electronics. Here, the recent progress of integrated optical signal processors for implementing such RF filters is reviewed. We highlight the use of a low-loss, high-index-contrast stoichiometric silicon nitride waveguide which promises to serve as a practical material platform for realizing high-performance optical signal processors and points toward photonic RF filters with digital signal processing (DSP)-level flexibility, hundreds-GHz bandwidth, MHz-band frequency selectivity, and full system integration on a chip scale.
Radiance and polarization of multiple scattered light from haze and clouds.

PubMed

Kattawar, G W; Plass, G N

1968-08-01

The radiance and polarization of multiple scattered light is calculated from the Stokes' vectors by a Monte Carlo method. The exact scattering matrix for a typical haze and for a cloud whose spherical drops have an average radius of 12 mu is calculated from the Mie theory. The Stokes' vector is transformed in a collision by this scattering matrix and the rotation matrix. The two angles that define the photon direction after scattering are chosen by a random process that correctly simulates the actual distribution functions for both angles. The Monte Carlo results for Rayleigh scattering compare favorably with well known tabulated results. Curves are given of the reflected and transmitted radiances and polarizations for both the haze and cloud models and for several solar angles, optical thicknesses, and surface albedos. The dependence on these various parameters is discussed.
Optical interconnection using polyimide waveguide for multichip module

NASA Astrophysics Data System (ADS)

Koyanagi, Mitsumasa

1996-01-01

We have developed a parallel processor system with 152 RISC processor chips specific for Monte-Carlo analysis. This system has the ring-bus architecture. The performance of several Gflops is expected in this system according to the computer simulation. However, it was revealed that the data transfer speed of the bus has to be increased more dramatically in order to further increase the performance. Then, we propose to introduce the optical interconnection into the parallel processor system to increase the data transfer speed of the buses. The double ringbus architecture is employed in this new parallel processor system with optical interconnection. The free-space optical interconnection arid the optical waveguide are used for the optical ring-bus. Thin polyimide film was used to form the optical waveguide. A relatively low propagation loss was achieved in the polyimide optical waveguide. In addition, it was confirmed that the propagation direction of signal light can be easily changed by using a micro-mirror.
Optical interconnection using polyimide waveguide for multichip module

NASA Astrophysics Data System (ADS)

Koyanagi, Mitsumasa

1996-01-01

We have developed a parallel processor system with 152 RISC processor chips specific for Monte-Carlo analysis. This system has the ring-bus architecture. The performance of several Gflops is expected in this system according to the computer simulation. However, it was revealed that the data transfer speed of the bus has to be increased more dramatically in order to further increase the performance. Then, we propose to introduce the optical interconnection into the parallel processor system to increase the data transfer speed of the buses. The double ring-bus architecture is employed in this new parallel processor system with optical interconnection. The free-space optical interconnection and the optical waveguide are used for the optical ring-bus. Thin polyimide film was used to form the optical waveguide. A relatively low propagation loss was achieved in the polyimide optical waveguide. In addition, it was confirmed that the propagation direction of signal light can be easily changed by using a micro-mirror.
Vectorization of a classical trajectory code on a floating point systems, Inc. Model 164 attached processor.

PubMed

Kraus, Wayne A; Wagner, Albert F

1986-04-01

A triatomic classical trajectory code has been modified by extensive vectorization of the algorithms to achieve much improved performance on an FPS 164 attached processor. Extensive timings on both the FPS 164 and a VAX 11/780 with floating point accelerator are presented as a function of the number of trajectories simultaneously run. The timing tests involve a potential energy surface of the LEPS variety and trajectories with 1000 time steps. The results indicate that vectorization results in timing improvements on both the VAX and the FPS. For larger numbers of trajectories run simultaneously, up to a factor of 25 improvement in speed occurs between VAX and FPS vectorized code. Copyright © 1986 John Wiley & Sons, Inc.
A hybrid algorithm for parallel molecular dynamics simulations

NASA Astrophysics Data System (ADS)

Mangiardi, Chris M.; Meyer, R.

2017-10-01

This article describes algorithms for the hybrid parallelization and SIMD vectorization of molecular dynamics simulations with short-range forces. The parallelization method combines domain decomposition with a thread-based parallelization approach. The goal of the work is to enable efficient simulations of very large (tens of millions of atoms) and inhomogeneous systems on many-core processors with hundreds or thousands of cores and SIMD units with large vector sizes. In order to test the efficiency of the method, simulations of a variety of configurations with up to 74 million atoms have been performed. Results are shown that were obtained on multi-core systems with Sandy Bridge and Haswell processors as well as systems with Xeon Phi many-core processors.
Space and frequency-multiplexed optical linear algebra processor - Fabrication and initial tests

NASA Technical Reports Server (NTRS)

Casasent, D.; Jackson, J.

1986-01-01

A new optical linear algebra processor architecture is described. Space and frequency-multiplexing are used to accommodate bipolar and complex-valued data. A fabricated laboratory version of this processor is described, the electronic support system used is discussed, and initial test data obtained on it are presented.
Parallel matrix multiplication on the Connection Machine

NASA Technical Reports Server (NTRS)

Tichy, Walter F.

1988-01-01

Matrix multiplication is a computation and communication intensive problem. Six parallel algorithms for matrix multiplication on the Connection Machine are presented and compared with respect to their performance and processor usage. For n by n matrices, the algorithms have theoretical running times of O(n to the 2nd power log n), O(n log n), O(n), and O(log n), and require n, n to the 2nd power, n to the 2nd power, and n to the 3rd power processors, respectively. With careful attention to communication patterns, the theoretically predicted runtimes can indeed be achieved in practice. The parallel algorithms illustrate the tradeoffs between performance, communication cost, and processor usage.
Scattering matrix analysis for evaluating the photocurrent in hydrogenated-amorphous-silicon-based thin film solar cells.

PubMed

Shin, Myunghun; Lee, Seong Hyun; Lim, Jung Wook; Yun, Sun Jin

2014-11-01

A scattering matrix (S-matrix) analysis method was developed for evaluating hydrogenated amorphous silicon (a-Si:H)-based thin film solar cells. In this approach, light wave vectors A and B represent the incoming and outgoing behaviors of the incident solar light, respectively, in terms of coherent wave and incoherent intensity components. The S-matrix determines the relation between A and B according to optical effects such as reflection and transmission, as described by the Fresnel equations, scattering at the boundary surfaces, or scattering within the propagation medium, as described by the Beer-Lambert law and the change in the phase of the propagating light wave. This matrix can be used to evaluate the behavior of angle-incident coherent and incoherent light simultaneously, and takes into account not only the light scattering process at material boundaries (haze effects) but also nonlinear optical processes within the material. The optical parameters in the S-matrix were determined by modeling both a 2%-gallium-doped zinc oxide transparent conducting oxide and germanium-compounded a-Si:H (a-SiGe:H). Using the S-matrix equations, the photocurrent for an a-Si:H/a-SiGe:H tandem cell and the optical loss in semitransparent a-Si:H solar cells for use in building-integrated photovoltaic applications were analyzed. The developed S-matrix method can also be used as a general analysis tool for various thin film solar cells.
SPAR reference manual

NASA Technical Reports Server (NTRS)

Whetstone, W. D.

1976-01-01

The functions and operating rules of the SPAR system, which is a group of computer programs used primarily to perform stress, buckling, and vibrational analyses of linear finite element systems, were given. The following subject areas were discussed: basic information, structure definition, format system matrix processors, utility programs, static solutions, stresses, sparse matrix eigensolver, dynamic response, graphics, and substructure processors.

The Unified Floating Point Vector Coprocessor for Reconfigurable Hardware

NASA Astrophysics Data System (ADS)

Kathiara, Jainik

There has been an increased interest recently in using embedded cores on FPGAs. Many of the applications that make use of these cores have floating point operations. Due to the complexity and expense of floating point hardware, these algorithms are usually converted to fixed point operations or implemented using floating-point emulation in software. As the technology advances, more and more homogeneous computational resources and fixed function embedded blocks are added to FPGAs and hence implementation of floating point hardware becomes a feasible option. In this research we have implemented a high performance, autonomous floating point vector Coprocessor (FPVC) that works independently within an embedded processor system. We have presented a unified approach to vector and scalar computation, using a single register file for both scalar operands and vector elements. The Hybrid vector/SIMD computational model of FPVC results in greater overall performance for most applications along with improved peak performance compared to other approaches. By parameterizing vector length and the number of vector lanes, we can design an application specific FPVC and take optimal advantage of the FPGA fabric. For this research we have also initiated designing a software library for various computational kernels, each of which adapts FPVC's configuration and provide maximal performance. The kernels implemented are from the area of linear algebra and include matrix multiplication and QR and Cholesky decomposition. We have demonstrated the operation of FPVC on a Xilinx Virtex 5 using the embedded PowerPC.
Holographic implementation of a binary associative memory for improved recognition

NASA Astrophysics Data System (ADS)

Bandyopadhyay, Somnath; Ghosh, Ajay; Datta, Asit K.

1998-03-01

Neural network associate memory has found wide application sin pattern recognition techniques. We propose an associative memory model for binary character recognition. The interconnection strengths of the memory are binary valued. The concept of sparse coding is sued to enhance the storage efficiency of the model. The question of imposed preconditioning of pattern vectors, which is inherent in a sparsely coded conventional memory, is eliminated by using a multistep correlation technique an the ability of correct association is enhanced in a real-time application. A potential optoelectronic implementation of the proposed associative memory is also described. The learning and recall is possible by using digital optical matrix-vector multiplication, where full use of parallelism and connectivity of optics is made. A hologram is used in the experiment as a longer memory (LTM) for storing all input information. The short-term memory or the interconnection weight matrix required during the recall process is configured by retrieving the necessary information from the holographic LTM.
Imer-product array processor for retrieval of stored images represented by bipolar binary (+1,-1) pixels using partial input trinary pixels represented by (+1,-1)

NASA Technical Reports Server (NTRS)

Liu, Hua-Kuang (Inventor); Awwal, Abdul A. S. (Inventor); Karim, Mohammad A. (Inventor)

1993-01-01

An inner-product array processor is provided with thresholding of the inner product during each iteration to make more significant the inner product employed in estimating a vector to be used as the input vector for the next iteration. While stored vectors and estimated vectors are represented in bipolar binary (1,-1), only those elements of an initial partial input vector that are believed to be common with those of a stored vector are represented in bipolar binary; the remaining elements of a partial input vector are set to 0. This mode of representation, in which the known elements of a partial input vector are in bipolar binary form and the remaining elements are set equal to 0, is referred to as trinary representation. The initial inner products corresponding to the partial input vector will then be equal to the number of known elements. Inner-product thresholding is applied to accelerate convergence and to avoid convergence to a negative input product.
Computationally Efficient Modeling and Simulation of Large Scale Systems

NASA Technical Reports Server (NTRS)

Jain, Jitesh (Inventor); Koh, Cheng-Kok (Inventor); Balakrishnan, Vankataramanan (Inventor); Cauley, Stephen F (Inventor); Li, Hong (Inventor)

2014-01-01

A system for simulating operation of a VLSI interconnect structure having capacitive and inductive coupling between nodes thereof, including a processor, and a memory, the processor configured to perform obtaining a matrix X and a matrix Y containing different combinations of passive circuit element values for the interconnect structure, the element values for each matrix including inductance L and inverse capacitance P, obtaining an adjacency matrix A associated with the interconnect structure, storing the matrices X, Y, and A in the memory, and performing numerical integration to solve first and second equations.
Ring-array processor distribution topology for optical interconnects

NASA Technical Reports Server (NTRS)

Li, Yao; Ha, Berlin; Wang, Ting; Wang, Sunyu; Katz, A.; Lu, X. J.; Kanterakis, E.

1992-01-01

The existing linear and rectangular processor distribution topologies for optical interconnects, although promising in many respects, cannot solve problems such as clock skews, the lack of supporting elements for efficient optical implementation, etc. The use of a ring-array processor distribution topology, however, can overcome these problems. Here, a study of the ring-array topology is conducted with an aim of implementing various fast clock rate, high-performance, compact optical networks for digital electronic multiprocessor computers. Practical design issues are addressed. Some proof-of-principle experimental results are included.
Efficient Parallel Formulations of Hierarchical Methods and Their Applications

NASA Astrophysics Data System (ADS)

Grama, Ananth Y.

1996-01-01

Hierarchical methods such as the Fast Multipole Method (FMM) and Barnes-Hut (BH) are used for rapid evaluation of potential (gravitational, electrostatic) fields in particle systems. They are also used for solving integral equations using boundary element methods. The linear systems arising from these methods are dense and are solved iteratively. Hierarchical methods reduce the complexity of the core matrix-vector product from O(n^2) to O(n log n) and the memory requirement from O(n^2) to O(n). We have developed highly scalable parallel formulations of a hybrid FMM/BH method that are capable of handling arbitrarily irregular distributions. We apply these formulations to astrophysical simulations of Plummer and Gaussian galaxies. We have used our parallel formulations to solve the integral form of the Laplace equation. We show that our parallel hierarchical mat-vecs yield high efficiency and overall performance even on relatively small problems. A problem containing approximately 200K nodes takes under a second to compute on 256 processors and yet yields over 85% efficiency. The efficiency and raw performance is expected to increase for bigger problems. For the 200K node problem, our code delivers about 5 GFLOPS of performance on a 256 processor T3D. This is impressive considering the fact that the problem has floating point divides and roots, and very little locality resulting in poor cache performance. A dense matrix-vector product of the same dimensions would require about 0.5 TeraBytes of memory and about 770 TeraFLOPS of computing speed. Clearly, if the loss in accuracy resulting from the use of hierarchical methods is acceptable, our code yields significant savings in time and memory. We also study the convergence of a GMRES solver built around this mat-vec. We accelerate the convergence of the solver using three preconditioning techniques: diagonal scaling, block-diagonal preconditioning, and inner-outer preconditioning. We study the performance and parallel efficiency of these preconditioned solvers. Using this solver, we solve dense linear systems with hundreds of thousands of unknowns. Solving a 105K unknown problem takes about 10 minutes on a 64 processor T3D. Until very recently, boundary element problems of this magnitude could not even be generated, let alone solved.
The Engineer Topographic Laboratories /ETL/ hybrid optical/digital image processor

NASA Astrophysics Data System (ADS)

Benton, J. R.; Corbett, F.; Tuft, R.

1980-01-01

An optical-digital processor for generalized image enhancement and filtering is described. The optical subsystem is a two-PROM Fourier filter processor. Input imagery is isolated, scaled, and imaged onto the first PROM; this input plane acts like a liquid gate and serves as an incoherent-to-coherent converter. The image is transformed onto a second PROM which also serves as a filter medium; filters are written onto the second PROM with a laser scanner in real time. A solid state CCTV camera records the filtered image, which is then digitized and stored in a digital image processor. The operator can then manipulate the filtered image using the gray scale and color remapping capabilities of the video processor as well as the digital processing capabilities of the minicomputer.
Lenslet array processors.

PubMed

Glaser, I

1982-04-01

By combining a lenslet array with masks it is possible to obtain a noncoherent optical processor capable of computing in parallel generalized 2-D discrete linear transformations. We present here an analysis of such lenslet array processors (LAP). The effect of several errors, including optical aberrations, diffraction, vignetting, and geometrical and mask errors, are calculated, and guidelines to optical design of LAP are derived. Using these results, both ultimate and practical performances of LAP are compared with those of competing techniques.
Computer Sciences and Data Systems, volume 2

NASA Technical Reports Server (NTRS)

1987-01-01

Topics addressed include: data storage; information network architecture; VHSIC technology; fiber optics; laser applications; distributed processing; spaceborne optical disk controller; massively parallel processors; and advanced digital SAR processors.
Method and apparatus for optimized processing of sparse matrices

DOEpatents

Taylor, Valerie E.

1993-01-01

A computer architecture for processing a sparse matrix is disclosed. The apparatus stores a value-row vector corresponding to nonzero values of a sparse matrix. Each of the nonzero values is located at a defined row and column position in the matrix. The value-row vector includes a first vector including nonzero values and delimiting characters indicating a transition from one column to another. The value-row vector also includes a second vector which defines row position values in the matrix corresponding to the nonzero values in the first vector and column position values in the matrix corresponding to the column position of the nonzero values in the first vector. The architecture also includes a circuit for detecting a special character within the value-row vector. Matrix-vector multiplication is executed on the value-row vector. This multiplication is performed by multiplying an index value of the first vector value by a column value from a second matrix to form a matrix-vector product which is added to a previous matrix-vector product.
Method and system to estimate variables in an integrated gasification combined cycle (IGCC) plant

DOEpatents

Kumar, Aditya; Shi, Ruijie; Dokucu, Mustafa

2013-09-17

System and method to estimate variables in an integrated gasification combined cycle (IGCC) plant are provided. The system includes a sensor suite to measure respective plant input and output variables. An extended Kalman filter (EKF) receives sensed plant input variables and includes a dynamic model to generate a plurality of plant state estimates and a covariance matrix for the state estimates. A preemptive-constraining processor is configured to preemptively constrain the state estimates and covariance matrix to be free of constraint violations. A measurement-correction processor may be configured to correct constrained state estimates and a constrained covariance matrix based on processing of sensed plant output variables. The measurement-correction processor is coupled to update the dynamic model with corrected state estimates and a corrected covariance matrix. The updated dynamic model may be configured to estimate values for at least one plant variable not originally sensed by the sensor suite.
Cylindrical Vector Beams for Rapid Polarization-Dependent Measurements in Atomic Systems

DTIC Science & Technology

2011-12-05

www.opticsinfobase.org/abstract.cfm?URI=oe-18-24-25035. 16. S. Tripathi and K. C. Toussaint, Jr., “Rapid Mueller matrix polarimetry based on parallelized...optical trapping [11], atom guiding [12], laser machining [13], charged particle acceleration [14,15], and polarimetry [16]. Yet despite numerous
Class network routing

DOEpatents

Bhanot, Gyan [Princeton, NJ; Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton On Hudson, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Steinmacher-Burow, Burkhard D [Mount Kisco, NY; Takken, Todd E [Mount Kisco, NY; Vranas, Pavlos M [Bedford Hills, NY

2009-09-08

Class network routing is implemented in a network such as a computer network comprising a plurality of parallel compute processors at nodes thereof. Class network routing allows a compute processor to broadcast a message to a range (one or more) of other compute processors in the computer network, such as processors in a column or a row. Normally this type of operation requires a separate message to be sent to each processor. With class network routing pursuant to the invention, a single message is sufficient, which generally reduces the total number of messages in the network as well as the latency to do a broadcast. Class network routing is also applied to dense matrix inversion algorithms on distributed memory parallel supercomputers with hardware class function (multicast) capability. This is achieved by exploiting the fact that the communication patterns of dense matrix inversion can be served by hardware class functions, which results in faster execution times.
Optical chirp z-transform processor with a simplified architecture.

PubMed

Ngo, Nam Quoc

2014-12-29

Using a simplified chirp z-transform (CZT) algorithm based on the discrete-time convolution method, this paper presents the synthesis of a simplified architecture of a reconfigurable optical chirp z-transform (OCZT) processor based on the silica-based planar lightwave circuit (PLC) technology. In the simplified architecture of the reconfigurable OCZT, the required number of optical components is small and there are no waveguide crossings which make fabrication easy. The design of a novel type of optical discrete Fourier transform (ODFT) processor as a special case of the synthesized OCZT is then presented to demonstrate its effectiveness. The designed ODFT can be potentially used as an optical demultiplexer at the receiver of an optical fiber orthogonal frequency division multiplexing (OFDM) transmission system.
Advances in optical information processing IV; Proceedings of the Meeting, Orlando, FL, Apr. 18-20, 1990

NASA Astrophysics Data System (ADS)

Pape, Dennis R.

1990-09-01

The present conference discusses topics in optical image processing, optical signal processing, acoustooptic spectrum analyzer systems and components, and optical computing. Attention is given to tradeoffs in nonlinearly recorded matched filters, miniature spatial light modulators, detection and classification using higher-order statistics of optical matched filters, rapid traversal of an image data base using binary synthetic discriminant filters, wideband signal processing for emitter location, an acoustooptic processor for autonomous SAR guidance, and sampling of Fresnel transforms. Also discussed are an acoustooptic RF signal-acquisition system, scanning acoustooptic spectrum analyzers, the effects of aberrations on acoustooptic systems, fast optical digital arithmetic processors, information utilization in analog and digital processing, optical processors for smart structures, and a self-organizing neural network for unsupervised learning.
The increase in the starting torque of PMSM motor by applying of FOC method

NASA Astrophysics Data System (ADS)

Plachta, Kamil

2017-05-01

The article presents field oriented control method of synchronous permanent magnet motor equipped in optical sensors. This method allows for a wide range regulation of torque and rotational speed of the electric motor. The paper presents mathematical model of electric motor and vector control method. Optical sensors have shorter time response as compared to the inductive sensors, which allow for faster response of the electronic control system to changes of motor loads. The motor driver is based on the digital signal processor which performs advanced mathematical operations in real time. The appliance of Clark and Park transformation in the software defines the angle of rotor position. The presented solution provides smooth adjustment of the rotational speed in the first operating zone and reduces the dead zone of the torque in the second and third operating zones.
FPGA wavelet processor design using language for instruction-set architectures (LISA)

NASA Astrophysics Data System (ADS)

Meyer-Bäse, Uwe; Vera, Alonzo; Rao, Suhasini; Lenk, Karl; Pattichis, Marios

2007-04-01

The design of an microprocessor is a long, tedious, and error-prone task consisting of typically three design phases: architecture exploration, software design (assembler, linker, loader, profiler), architecture implementation (RTL generation for FPGA or cell-based ASIC) and verification. The Language for instruction-set architectures (LISA) allows to model a microprocessor not only from instruction-set but also from architecture description including pipelining behavior that allows a design and development tool consistency over all levels of the design. To explore the capability of the LISA processor design platform a.k.a. CoWare Processor Designer we present in this paper three microprocessor designs that implement a 8/8 wavelet transform processor that is typically used in today's FBI fingerprint compression scheme. We have designed a 3 stage pipelined 16 bit RISC processor (NanoBlaze). Although RISC μPs are usually considered "fast" processors due to design concept like constant instruction word size, deep pipelines and many general purpose registers, it turns out that DSP operations consume essential processing time in a RISC processor. In a second step we have used design principles from programmable digital signal processor (PDSP) to improve the throughput of the DWT processor. A multiply-accumulate operation along with indirect addressing operation were the key to achieve higher throughput. A further improvement is possible with today's FPGA technology. Today's FPGAs offer a large number of embedded array multipliers and it is now feasible to design a "true" vector processor (TVP). A multiplication of two vectors can be done in just one clock cycle with our TVP, a complete scalar product in two clock cycles. Code profiling and Xilinx FPGA ISE synthesis results are provided that demonstrate the essential improvement that a TVP has compared with traditional RISC or PDSP designs.
A polarization measurement method for the quantification of retardation in optic nerve fiber layer

NASA Astrophysics Data System (ADS)

Fukuma, Yasufumi; Okazaki, Yoshio; Shioiri, Takashi; Iida, Yukio; Kikuta, Hisao; Ohnuma, Kazuhiko

2008-02-01

The thickness measurement of the optic nerve fiber layer is one of the most important evaluations for carrying out glaucoma diagnosis. Because the optic nerve fiber layer has birefringence, the thickness can be measured by illuminating eye optics with circular polarized light and analyzing the elliptical rate of the detected polarized light reflected from the optic nerve fiber layer. In this method, the scattering light from the background and the retardation caused by the cornea disturbs the precise measurement. If the Stokes vector expressing the whole state of polarization can be detected, we can eliminate numerically the influence of the background scattering and of the retardation caused by the cornea. Because the retardation process of the eye optics can be represented by a numerical equation using the retardation matrix of each component and also the nonpolarized background scattering light, it can be calculated by using the Stokes vector. We applied a polarization analysis system that can detect the Stokes vector onto the fundus camera. The polarization analysis system is constructed with a CCD area image sensor, a linear polarizing plate, a micro phase plate array, and a circularly polarized light illumination unit. With this simply constructed system, we can calculate the retardation caused only by the optic nerve fiber layer and it can predict the thickness of the optic nerve fiber layer. We report the method and the results graphically showing the retardation of the optic nerve fiber layer without the retardation of the cornea.
Satellite on-board real-time SAR processor prototype

NASA Astrophysics Data System (ADS)

Bergeron, Alain; Doucet, Michel; Harnisch, Bernd; Suess, Martin; Marchese, Linda; Bourqui, Pascal; Desnoyers, Nicholas; Legros, Mathieu; Guillot, Ludovic; Mercier, Luc; Châteauneuf, François

2017-11-01

A Compact Real-Time Optronic SAR Processor has been successfully developed and tested up to a Technology Readiness Level of 4 (TRL4), the breadboard validation in a laboratory environment. SAR, or Synthetic Aperture Radar, is an active system allowing day and night imaging independent of the cloud coverage of the planet. The SAR raw data is a set of complex data for range and azimuth, which cannot be compressed. Specifically, for planetary missions and unmanned aerial vehicle (UAV) systems with limited communication data rates this is a clear disadvantage. SAR images are typically processed electronically applying dedicated Fourier transformations. This, however, can also be performed optically in real-time. Originally the first SAR images were optically processed. The optical Fourier processor architecture provides inherent parallel computing capabilities allowing real-time SAR data processing and thus the ability for compression and strongly reduced communication bandwidth requirements for the satellite. SAR signal return data are in general complex data. Both amplitude and phase must be combined optically in the SAR processor for each range and azimuth pixel. Amplitude and phase are generated by dedicated spatial light modulators and superimposed by an optical relay set-up. The spatial light modulators display the full complex raw data information over a two-dimensional format, one for the azimuth and one for the range. Since the entire signal history is displayed at once, the processor operates in parallel yielding real-time performances, i.e. without resulting bottleneck. Processing of both azimuth and range information is performed in a single pass. This paper focuses on the onboard capabilities of the compact optical SAR processor prototype that allows in-orbit processing of SAR images. Examples of processed ENVISAT ASAR images are presented. Various SAR processor parameters such as processing capabilities, image quality (point target analysis), weight and size are reviewed.
Structuring Stokes correlation functions using vector-vortex beam

NASA Astrophysics Data System (ADS)

Kumar, Vijay; Anwar, Ali; Singh, R. P.

2018-01-01

Higher order statistical correlations of the optical vector speckle field, formed due to scattering of a vector-vortex beam, are explored. Here, we report on the experimental construction of the Stokes parameters covariance matrix, consisting of all possible spatial Stokes parameters correlation functions. We also propose and experimentally realize a new Stokes correlation functions called Stokes field auto correlation functions. It is observed that the Stokes correlation functions of the vector-vortex beam will be reflected in the respective Stokes correlation functions of the corresponding vector speckle field. The major advantage of proposing Stokes correlation functions is that the Stokes correlation function can be easily tuned by manipulating the polarization of vector-vortex beam used to generate vector speckle field and to get the phase information directly from the intensity measurements. Moreover, this approach leads to a complete experimental Stokes characterization of a broad range of random fields.

A Survey of Parallel Sorting Algorithms.

DTIC Science & Technology

1981-12-01

see that, in this algorithm, each Processor i, for 1 itp -2, interacts directly only with Processors i+l and i-l. Processor j 0 only interacts with...Chan76] Chandra, A.K., "Maximal Parallelism in Matrix Multiplication," IBM Report RC. 6193, Watson Research Center, Yorktown Heights, N.Y., October 1976
Simulation of continuously logical base cells (CL BC) with advanced functions for analog-to-digital converters and image processors

NASA Astrophysics Data System (ADS)

Krasilenko, Vladimir G.; Lazarev, Alexander A.; Nikitovich, Diana V.

2017-10-01

The paper considers results of design and modeling of continuously logical base cells (CL BC) based on current mirrors (CM) with functions of preliminary analogue and subsequent analogue-digital processing for creating sensor multichannel analog-to-digital converters (SMC ADCs) and image processors (IP). For such with vector or matrix parallel inputs-outputs IP and SMC ADCs it is needed active basic photosensitive cells with an extended electronic circuit, which are considered in paper. Such basic cells and ADCs based on them have a number of advantages: high speed and reliability, simplicity, small power consumption, high integration level for linear and matrix structures. We show design of the CL BC and ADC of photocurrents and their various possible implementations and its simulations. We consider CL BC for methods of selection and rank preprocessing and linear array of ADCs with conversion to binary codes and Gray codes. In contrast to our previous works here we will dwell more on analogue preprocessing schemes for signals of neighboring cells. Let us show how the introduction of simple nodes based on current mirrors extends the range of functions performed by the image processor. Each channel of the structure consists of several digital-analog cells (DC) on 15-35 CMOS. The amount of DC does not exceed the number of digits of the formed code, and for an iteration type, only one cell of DC, complemented by the device of selection and holding (SHD), is required. One channel of ADC with iteration is based on one DC-(G) and SHD, and it has only 35 CMOS transistors. In such ADCs easily parallel code can be realized and also serial-parallel output code. The circuits and simulation results of their design with OrCAD are shown. The supply voltage of the DC is 1.8÷3.3V, the range of an input photocurrent is 0.1÷24μA, the transformation time is 20÷30nS at 6-8 bit binary or Gray codes. The general power consumption of the ADC with iteration is only 50÷100μW, if the maximum input current is 4μA. Such simple structure of linear array of ADCs with low power consumption and supply voltage 3.3V, and at the same time with good dynamic characteristics (frequency of digitization even for 1.5μm CMOS-technologies is 40÷50 MHz, and can be increased up to 10 times) and accuracy characteristics are show. The SMC ADCs based on CL BC and CM opens new prospects for realization of linear and matrix IP and photo-electronic structures with matrix operands, which are necessary for neural networks, digital optoelectronic processors, neural-fuzzy controllers.
Right-Brain/Left-Brain Integrated Associative Processor Employing Convertible Multiple-Instruction-Stream Multiple-Data-Stream Elements

NASA Astrophysics Data System (ADS)

Hayakawa, Hitoshi; Ogawa, Makoto; Shibata, Tadashi

2005-04-01

A very large scale integrated circuit (VLSI) architecture for a multiple-instruction-stream multiple-data-stream (MIMD) associative processor has been proposed. The processor employs an architecture that enables seamless switching from associative operations to arithmetic operations. The MIMD element is convertible to a regular central processing unit (CPU) while maintaining its high performance as an associative processor. Therefore, the MIMD associative processor can perform not only on-chip perception, i.e., searching for the vector most similar to an input vector throughout the on-chip cache memory, but also arithmetic and logic operations similar to those in ordinary CPUs, both simultaneously in parallel processing. Three key technologies have been developed to generate the MIMD element: associative-operation-and-arithmetic-operation switchable calculation units, a versatile register control scheme within the MIMD element for flexible operations, and a short instruction set for minimizing the memory size for program storage. Key circuit blocks were designed and fabricated using 0.18 μm complementary metal-oxide-semiconductor (CMOS) technology. As a result, the full-featured MIMD element is estimated to be 3 mm2, showing the feasibility of an 8-parallel-MIMD-element associative processor in a single chip of 5 mm× 5 mm.
Solving the Cauchy-Riemann equations on parallel computers

NASA Technical Reports Server (NTRS)

Fatoohi, Raad A.; Grosch, Chester E.

1987-01-01

Discussed is the implementation of a single algorithm on three parallel-vector computers. The algorithm is a relaxation scheme for the solution of the Cauchy-Riemann equations; a set of coupled first order partial differential equations. The computers were chosen so as to encompass a variety of architectures. They are: the MPP, and SIMD machine with 16K bit serial processors; FLEX/32, an MIMD machine with 20 processors; and CRAY/2, an MIMD machine with four vector processors. The machine architectures are briefly described. The implementation of the algorithm is discussed in relation to these architectures and measures of the performance on each machine are given. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Conclusions are presented.
Photorefractive optical fuzzy-logic processor based on grating degeneracy

NASA Astrophysics Data System (ADS)

Wu, Weishu; Yang, Changxi; Campbell, Scott; Yeh, Pochi

1995-04-01

A novel optical fuzzy-logic processor using light-induced gratings in photorefractive crystals is proposed and demonstrated. By exploiting grating degeneracy, one can easily implement parallel fuzzy-logic functions in disjunctive normal form.
A high performance linear equation solver on the VPP500 parallel supercomputer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nakanishi, Makoto; Ina, Hiroshi; Miura, Kenichi

1994-12-31

This paper describes the implementation of two high performance linear equation solvers developed for the Fujitsu VPP500, a distributed memory parallel supercomputer system. The solvers take advantage of the key architectural features of VPP500--(1) scalability for an arbitrary number of processors up to 222 processors, (2) flexible data transfer among processors provided by a crossbar interconnection network, (3) vector processing capability on each processor, and (4) overlapped computation and transfer. The general linear equation solver based on the blocked LU decomposition method achieves 120.0 GFLOPS performance with 100 processors in the LIN-PACK Highly Parallel Computing benchmark.
An optical processor for object recognition and tracking

NASA Technical Reports Server (NTRS)

Sloan, J.; Udomkesmalee, S.

1987-01-01

The design and development of a miniaturized optical processor that performs real time image correlation are described. The optical correlator utilizes the Vander Lugt matched spatial filter technique. The correlation output, a focused beam of light, is imaged onto a CMOS photodetector array. In addition to performing target recognition, the device also tracks the target. The hardware, composed of optical and electro-optical components, occupies only 590 cu cm of volume. A complete correlator system would also include an input imaging lens. This optical processing system is compact, rugged, requires only 3.5 watts of operating power, and weighs less than 3 kg. It represents a major achievement in miniaturizing optical processors. When considered as a special-purpose processing unit, it is an attractive alternative to conventional digital image recognition processing. It is conceivable that the combined technology of both optical and ditital processing could result in a very advanced robot vision system.
The Level 0 Pixel Trigger system for the ALICE experiment

NASA Astrophysics Data System (ADS)

Aglieri Rinella, G.; Kluge, A.; Krivda, M.; ALICE Silicon Pixel Detector project

2007-01-01

The ALICE Silicon Pixel Detector contains 1200 readout chips. Fast-OR signals indicate the presence of at least one hit in the 8192 pixel matrix of each chip. The 1200 bits are transmitted every 100 ns on 120 data readout optical links using the G-Link protocol. The Pixel Trigger System extracts and processes them to deliver an input signal to the Level 0 trigger processor targeting a latency of 800 ns. The system is compact, modular and based on FPGA devices. The architecture allows the user to define and implement various trigger algorithms. The system uses advanced 12-channel parallel optical fiber modules operating at 1310 nm as optical receivers and 12 deserializer chips closely packed in small area receiver boards. Alternative solutions with multi-channel G-Link deserializers implemented directly in programmable hardware devices were investigated. The design of the system and the progress of the ALICE Pixel Trigger project are described in this paper.
Fixed weight Hopfield Neural Network based on optical implementation of all-optical MZI-XNOR logic gate

NASA Astrophysics Data System (ADS)

Nugamesh Mutter, Kussay; Mat Jafri, Mohd Zubir; Abdul Aziz, Azlan

2010-05-01

Many researches are conducted to improve Hopfield Neural Network (HNN) performance especially for speed and memory capacity in different approaches. However, there is still a significant scope of developing HNN using Optical Logic Gates. We propose here a new model of HNN based on all-optical XNOR logic gates for real time color image recognition. Firstly, we improved HNN toward optimum learning and converging operations. We considered each unipolar image as a set of small blocks of 3-pixels as vectors for HNN. This enables to save large number of images in the net with best reaching into global minima, and because there are only eight fixed states of weights so that only single iteration performed to construct a vector with stable state at minimum energy. HNN is useless in dealing with data not in bipolar representation. Therefore, HNN failed to work with color images. In RGB bands each represents different values of brightness, for d-bit RGB image it is simply consists of d-layers of unipolar. Each layer is as a single unipolar image for HNN. In addition, the weight matrices with stability of unity at the diagonal perform clear converging in comparison with no self-connecting architecture. Synchronously, each matrix-matrix multiplication operation would run optically in the second part, since we propose an array of all-optical XOR gates, which uses Mach-Zehnder Interferometer (MZI) for neurons setup and a controlling system to distribute timely signals with inverting to achieve XNOR function. The primary operation and simulation of the proposal HNN is demonstrated.
IMPLEMENTATION OF THE SMOKE EMISSION DATA PROCESSOR AND SMOKE TOOL INPUT DATA PROCESSOR IN MODELS-3

EPA Science Inventory

The U.S. Environmental Protection Agency has implemented Version 1.3 of SMOKE (Sparse Matrix Object Kernel Emission) processor for preparation of area, mobile, point, and biogenic sources emission data within Version 4.1 of the Models-3 air quality modeling framework. The SMOK...
Programmable Remapper with Single Flow Architecture

NASA Technical Reports Server (NTRS)

Fisher, Timothy E. (Inventor)

1993-01-01

An apparatus for image processing comprising a camera for receiving an original visual image and transforming the original visual image into an analog image, a first converter for transforming the analog image of the camera to a digital image, a processor having a single flow architecture for receiving the digital image and producing, with a single algorithm, an output image, a second converter for transforming the digital image of the processor to an analog image, and a viewer for receiving the analog image, transforming the analog image into a transformed visual image for observing the transformations applied to the original visual image. The processor comprises one or more subprocessors for the parallel reception of a digital image for producing an output matrix of the transformed visual image. More particularly, the processor comprises a plurality of subprocessors for receiving in parallel and transforming the digital image for producing a matrix of the transformed visual image, and an output interface means for receiving the respective portions of the transformed visual image from the respective subprocessor for producing an output matrix of the transformed visual image.
Does the Coherent Lidar System Corroborate Non-Interaction of Waves (NIW)?

NASA Technical Reports Server (NTRS)

Prasad, Narasimha S.; Roychoudhari, Chandrasekhar

2013-01-01

The NIW (non-interaction of waves) property has been proposed by one of the coauthors. The NIW property states that in the absence of any "obstructing" detectors, all the Huygens-Fresnel secondary wavelets will continue to propagate unhindered and without interacting (interfering) with each other. Since a coherent lidar system incorporates complex behaviors of optical components with different polarizations including circular polarization for the transmitted radiation, then the question arises whether the NIW principle accommodate elliptical polarization of light. Elliptical polarization presumes the summation of orthogonally polarized electric field vectors which contradicts the NIW principle. In this paper, we present working of a coherent lidar system using Jones matrix formulation. The Jones matrix elements represent the anisotropic dipolar properties of molecules of optical components. Accordingly, when we use the Jones matrix methodology to analyze the coherent lidar system, we find that the system behavior is congruent with the NIW property.
Three-dimensional polarization algebra.

PubMed

R Sheppard, Colin J; Castello, Marco; Diaspro, Alberto

2016-10-01

If light is focused or collected with a high numerical aperture lens, as may occur in imaging and optical encryption applications, polarization should be considered in three dimensions (3D). The matrix algebra of polarization behavior in 3D is discussed. It is useful to convert between the Mueller matrix and two different Hermitian matrices, representing an optical material or system, which are in the literature. Explicit transformation matrices for converting the column vector form of these different matrices are extended to the 3D case, where they are large (81×81) but can be generated using simple rules. It is found that there is some advantage in using a generalization of the Chandrasekhar phase matrix treatment, rather than that based on Gell-Mann matrices, as the resultant matrices are of simpler form and reduce to the two-dimensional case more easily. Explicit expressions are given for 3D complex field components in terms of Chandrasekhar-Stokes parameters.
Propagation of hollow Gaussian beam through a misaligned first-order optical system and its propagation properties

NASA Astrophysics Data System (ADS)

Zhao, Cheng Liang; Lu, Xuan Hui

2007-06-01

Propagation properties of hollow Gaussian beam through a misaligned first-order ABCD system is studied using the generalized Huygens-Fresnel diffraction integral, augmented matrix. It is shown that, as a hollow Gaussian beam passes through the misaligned first-order ABCD system, the beam shape is not preserved, the out-put beams have differences when passing different misaligned optical systems. We can adjust the size of dark region through adjusting the misaligned transverse vector E.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Yeung, Yu-Hong; Pothen, Alex; Halappanavar, Mahantesh

We present an augmented matrix approach to update the solution to a linear system of equations when the coefficient matrix is modified by a few elements within a principal submatrix. This problem arises in the dynamic security analysis of a power grid, where operators need to performmore » $N-x$ contingency analysis, i.e., determine the state of the system when up to $x$ links from $N$ fail. Our algorithms augment the coefficient matrix to account for the changes in it, and then compute the solution to the augmented system without refactoring the modified matrix. We provide two algorithms, a direct method, and a hybrid direct-iterative method for solving the augmented system. We also exploit the sparsity of the matrices and vectors to accelerate the overall computation. Our algorithms are compared on three power grids with PARDISO, a parallel direct solver, and CHOLMOD, a direct solver with the ability to modify the Cholesky factors of the coefficient matrix. We show that our augmented algorithms outperform PARDISO (by two orders of magnitude), and CHOLMOD (by a factor of up to 5). Further, our algorithms scale better than CHOLMOD as the number of elements updated increases. The solutions are computed with high accuracy. Our algorithms are capable of computing $N-x$ contingency analysis on a $778K$ bus grid, updating a solution with $x=20$ elements in $$1.6 \\times 10^{-2}$$ seconds on an Intel Xeon processor.« less
Fiber optic sensors for gas turbine control

NASA Technical Reports Server (NTRS)

Shu, Emily Yixie (Inventor); Petrucco, Louis Jacob (Inventor); Daum, Wolfgang (Inventor)

2005-01-01

An apparatus for detecting flashback occurrences in a premixed combustor system having at least one fuel nozzle includes at least one photodetector and at least one fiber optic element coupled between the at least one photodetector and a test region of the combustor system wherein a respective flame of the fuel nozzle is not present under normal operating conditions. A signal processor monitors a signal of the photodetector. The fiber optic element can include at least one optical fiber positioned within a protective tube. The fiber optic element can include two fiber optic elements coupled to the test region. The optical fiber and the protective tube can have lengths sufficient to situate the photodetector outside of an engine compartment. A plurality of fuel nozzles and a plurality of fiber optic elements can be used with the fiber optic elements being coupled to respective fuel nozzles and either to the photodetector or, wherein a plurality of photodetectors are used, to respective ones of the plurality of photodetectors. The signal processor can include a digital signal processor.
Fiber optic sensors for gas turbine control

NASA Technical Reports Server (NTRS)

Shu, Emily Yixie (Inventor); Brown, Dale Marius (Inventor); Petrucco, Louis Jacob (Inventor); Lovett, Jeffery Allan (Inventor); Daum, Wolfgang (Inventor); Dunki-Jacobs, Robert John (Inventor)

2003-01-01

An apparatus for detecting flashback occurrences in a premixed combustor system having at least one fuel nozzle includes at least one photodetector and at least one fiber optic element coupled between the at least one photodetector and a test region of the combustor system wherein a respective flame of the fuel nozzle is not present under normal operating conditions. A signal processor monitors a signal of the photodetector. The fiber optic element can include at least one optical fiber positioned within a protective tube. The fiber optic element can include two fiber optic elements coupled to the test region. The optical fiber and the protective tube can have lengths sufficient to situate the photodetector outside of an engine compartment. A plurality of fuel nozzles and a plurality of fiber optic elements can be used with the fiber optic elements being coupled to respective fuel nozzles and either to the photodetector or, wherein a plurality of photodetectors are used, to respective ones of the plurality of photodetectors. The signal processor can include a digital signal processor.
Fiber optic sensors for gas turbine control

NASA Technical Reports Server (NTRS)

Shu, Emily Yixie (Inventor); Brown, Dale Marius (Inventor); Petrucco, Louis Jacob (Inventor); Lovett, Jeffery Allan (Inventor); Daum, Wolfgang (Inventor); Dunki-Jacobs, Robert John (Inventor)

1999-01-01

An apparatus for detecting flashback occurrences in a premixed combustor system having at least one fuel nozzle includes at least one photodetector and at least one fiber optic element coupled between the at least one photodetector and a test region of the combustor system wherein a respective flame of the fuel nozzle is not present under normal operating conditions. A signal processor monitors a signal of the photodetector. The fiber optic element can include at least one optical fiber positioned within a protective tube. The fiber optic element can include two fiber optic elements coupled to the test region. The optical fiber and the protective tube can have lengths sufficient to situate the photodetector outside of an engine compartment. A plurality of fuel nozzles and a plurality of fiber optic elements can be used with the fiber optic elements being coupled to respective fuel nozzles and either to the photodetector or, wherein a plurality of photodetectors are used, to respective ones of the plurality of photodetectors. The signal processor can include a digital signal processor.
Finite elements and the method of conjugate gradients on a concurrent processor

NASA Technical Reports Server (NTRS)

Lyzenga, G. A.; Raefsky, A.; Hager, G. H.

1985-01-01

An algorithm for the iterative solution of finite element problems on a concurrent processor is presented. The method of conjugate gradients is used to solve the system of matrix equations, which is distributed among the processors of a MIMD computer according to an element-based spatial decomposition. This algorithm is implemented in a two-dimensional elastostatics program on the Caltech Hypercube concurrent processor. The results of tests on up to 32 processors show nearly linear concurrent speedup, with efficiencies over 90 percent for sufficiently large problems.
Finite elements and the method of conjugate gradients on a concurrent processor

NASA Technical Reports Server (NTRS)

Lyzenga, G. A.; Raefsky, A.; Hager, B. H.

1984-01-01

An algorithm for the iterative solution of finite element problems on a concurrent processor is presented. The method of conjugate gradients is used to solve the system of matrix equations, which is distributed among the processors of a MIMD computer according to an element-based spatial decomposition. This algorithm is implemented in a two-dimensional elastostatics program on the Caltech Hypercube concurrent processor. The results of tests on up to 32 processors show nearly linear concurrent speedup, with efficiencies over 90% for sufficiently large problems.

Real-Time Symbol Extraction From Grey-Level Images

NASA Astrophysics Data System (ADS)

Massen, R.; Simnacher, M.; Rosch, J.; Herre, E.; Wuhrer, H. W.

1988-04-01

A VME-bus image pipeline processor for extracting vectorized contours from grey-level images in real-time is presented. This 3 Giga operation per second processor uses large kernel convolvers and new non-linear neighbourhood processing algorithms to compute true 1-pixel wide and noise-free contours without thresholding even from grey-level images with quite varying edge sharpness. The local edge orientation is used as an additional cue to compute a list of vectors describing the closed and open contours in real-time and to dump a CAD-like symbolic image description into a symbol memory at pixel clock rate.
General linear codes for fault-tolerant matrix operations on processor arrays

NASA Technical Reports Server (NTRS)

Nair, V. S. S.; Abraham, J. A.

1988-01-01

Various checksum codes have been suggested for fault-tolerant matrix computations on processor arrays. Use of these codes is limited due to potential roundoff and overflow errors. Numerical errors may also be misconstrued as errors due to physical faults in the system. In this a set of linear codes is identified which can be used for fault-tolerant matrix operations such as matrix addition, multiplication, transposition, and LU-decomposition, with minimum numerical error. Encoding schemes are given for some of the example codes which fall under the general set of codes. With the help of experiments, a rule of thumb for the selection of a particular code for a given application is derived.
Direct RF A-O Processor Spectrum Analyzer.

DTIC Science & Technology

1981-08-01

The primary objective was to develop and demonstrate design approach, along with the associated processing technologies, for a wideband acousto optic Bragg...cell spectrum analyzer. The signal processor used to demonstrate feasibility of the technical approach consisted of two bulk wave acousto optic deflectors
The design and implementation of cost-effective algorithms for direct solution of banded linear systems on the vector processor system 32 supercomputer

NASA Technical Reports Server (NTRS)

Samba, A. S.

1985-01-01

The problem of solving banded linear systems by direct (non-iterative) techniques on the Vector Processor System (VPS) 32 supercomputer is considered. Two efficient direct methods for solving banded linear systems on the VPS 32 are described. The vector cyclic reduction (VCR) algorithm is discussed in detail. The performance of the VCR on a three parameter model problem is also illustrated. The VCR is an adaptation of the conventional point cyclic reduction algorithm. The second direct method is the Customized Reduction of Augmented Triangles' (CRAT). CRAT has the dominant characteristics of an efficient VPS 32 algorithm. CRAT is tailored to the pipeline architecture of the VPS 32 and as a consequence the algorithm is implicitly vectorizable.
Parallel Gaussian elimination of a block tridiagonal matrix using multiple microcomputers

NASA Technical Reports Server (NTRS)

Blech, Richard A.

1989-01-01

The solution of a block tridiagonal matrix using parallel processing is demonstrated. The multiprocessor system on which results were obtained and the software environment used to program that system are described. Theoretical partitioning and resource allocation for the Gaussian elimination method used to solve the matrix are discussed. The results obtained from running 1, 2 and 3 processor versions of the block tridiagonal solver are presented. The PASCAL source code for these solvers is given in the appendix, and may be transportable to other shared memory parallel processors provided that the synchronization outlines are reproduced on the target system.
A class of parallel algorithms for computation of the manipulator inertia matrix

NASA Technical Reports Server (NTRS)

Fijany, Amir; Bejczy, Antal K.

1989-01-01

Parallel and parallel/pipeline algorithms for computation of the manipulator inertia matrix are presented. An algorithm based on composite rigid-body spatial inertia method, which provides better features for parallelization, is used for the computation of the inertia matrix. Two parallel algorithms are developed which achieve the time lower bound in computation. Also described is the mapping of these algorithms with topological variation on a two-dimensional processor array, with nearest-neighbor connection, and with cardinality variation on a linear processor array. An efficient parallel/pipeline algorithm for the linear array was also developed, but at significantly higher efficiency.
CYBER-205 Devectorizer

NASA Technical Reports Server (NTRS)

Lakeotes, Christopher D.

1990-01-01

DEVECT (CYBER-205 Devectorizer) is CYBER-205 FORTRAN source-language-preprocessor computer program reducing vector statements to standard FORTRAN. In addition, DEVECT has many other standard and optional features simplifying conversion of vector-processor programs for CYBER 200 to other computers. Written in FORTRAN IV.
Generic accelerated sequence alignment in SeqAn using vectorization and multi-threading.

PubMed

Rahn, René; Budach, Stefan; Costanza, Pascal; Ehrhardt, Marcel; Hancox, Jonny; Reinert, Knut

2018-05-03

Pairwise sequence alignment is undoubtedly a central tool in many bioinformatics analyses. In this paper, we present a generically accelerated module for pairwise sequence alignments applicable for a broad range of applications. In our module, we unified the standard dynamic programming kernel used for pairwise sequence alignments and extended it with a generalized inter-sequence vectorization layout, such that many alignments can be computed simultaneously by exploiting SIMD (Single Instruction Multiple Data) instructions of modern processors. We then extended the module by adding two layers of thread-level parallelization, where we a) distribute many independent alignments on multiple threads and b) inherently parallelize a single alignment computation using a work stealing approach producing a dynamic wavefront progressing along the minor diagonal. We evaluated our alignment vectorization and parallelization on different processors, including the newest Intel® Xeon® (Skylake) and Intel® Xeon Phi™ (KNL) processors, and use cases. The instruction set AVX512-BW (Byte and Word), available on Skylake processors, can genuinely improve the performance of vectorized alignments. We could run single alignments 1600 times faster on the Xeon Phi™ and 1400 times faster on the Xeon® than executing them with our previous sequential alignment module. The module is programmed in C++ using the SeqAn (Reinert et al., 2017) library and distributed with version 2.4. under the BSD license. We support SSE4, AVX2, AVX512 instructions and included UME::SIMD, a SIMD-instruction wrapper library, to extend our module for further instruction sets. We thoroughly test all alignment components with all major C++ compilers on various platforms. rene.rahn@fu-berlin.de.
Optical apparatus for forming correlation spectrometers and optical processors

DOEpatents

Butler, Michael A.; Ricco, Antonio J.; Sinclair, Michael B.; Senturia, Stephen D.

1999-01-01

Optical apparatus for forming correlation spectrometers and optical processors. The optical apparatus comprises one or more diffractive optical elements formed on a substrate for receiving light from a source and processing the incident light. The optical apparatus includes an addressing element for alternately addressing each diffractive optical element thereof to produce for one unit of time a first correlation with the incident light, and to produce for a different unit of time a second correlation with the incident light that is different from the first correlation. In preferred embodiments of the invention, the optical apparatus is in the form of a correlation spectrometer; and in other embodiments, the apparatus is in the form of an optical processor. In some embodiments, the optical apparatus comprises a plurality of diffractive optical elements on a common substrate for forming first and second gratings that alternately intercept the incident light for different units of time. In other embodiments, the optical apparatus includes an electrically-programmable diffraction grating that may be alternately switched between a plurality of grating states thereof for processing the incident light. The optical apparatus may be formed, at least in part, by a micromachining process.
Optical apparatus for forming correlation spectrometers and optical processors

DOEpatents

Butler, M.A.; Ricco, A.J.; Sinclair, M.B.; Senturia, S.D.

1999-05-18

Optical apparatus is disclosed for forming correlation spectrometers and optical processors. The optical apparatus comprises one or more diffractive optical elements formed on a substrate for receiving light from a source and processing the incident light. The optical apparatus includes an addressing element for alternately addressing each diffractive optical element thereof to produce for one unit of time a first correlation with the incident light, and to produce for a different unit of time a second correlation with the incident light that is different from the first correlation. In preferred embodiments of the invention, the optical apparatus is in the form of a correlation spectrometer; and in other embodiments, the apparatus is in the form of an optical processor. In some embodiments, the optical apparatus comprises a plurality of diffractive optical elements on a common substrate for forming first and second gratings that alternately intercept the incident light for different units of time. In other embodiments, the optical apparatus includes an electrically-programmable diffraction grating that may be alternately switched between a plurality of grating states thereof for processing the incident light. The optical apparatus may be formed, at least in part, by a micromachining process. 24 figs.
Implementation of an ADI method on parallel computers

NASA Technical Reports Server (NTRS)

Fatoohi, Raad A.; Grosch, Chester E.

1987-01-01

The implementation of an ADI method for solving the diffusion equation on three parallel/vector computers is discussed. The computers were chosen so as to encompass a variety of architectures. They are: the MPP, an SIMD machine with 16K bit serial processors; FLEX/32, an MIMD machine with 20 processors; and CRAY/2, an MIMD machine with four vector processors. The Gaussian elimination algorithm is used to solve a set of tridiagonal systems on the FLEX/32 and CRAY/2 while the cyclic elimination algorithm is used to solve these systems on the MPP. The implementation of the method is discussed in relation to these architectures and measures of the performance on each machine are given. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Finally, conclusions are presented.
Implementation of an ADI method on parallel computers

NASA Technical Reports Server (NTRS)

Fatoohi, Raad A.; Grosch, Chester E.

1987-01-01

In this paper the implementation of an ADI method for solving the diffusion equation on three parallel/vector computers is discussed. The computers were chosen so as to encompass a variety of architectures. They are the MPP, an SIMD machine with 16-Kbit serial processors; Flex/32, an MIMD machine with 20 processors; and Cray/2, an MIMD machine with four vector processors. The Gaussian elimination algorithm is used to solve a set of tridiagonal systems on the Flex/32 and Cray/2 while the cyclic elimination algorithm is used to solve these systems on the MPP. The implementation of the method is discussed in relation to these architectures and measures of the performance on each machine are given. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Finally conclusions are presented.
Optimizing Performance of Combustion Chemistry Solvers on Intel's Many Integrated Core (MIC) Architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sitaraman, Hariswaran; Grout, Ray W

This work investigates novel algorithm designs and optimization techniques for restructuring chemistry integrators in zero and multidimensional combustion solvers, which can then be effectively used on the emerging generation of Intel's Many Integrated Core/Xeon Phi processors. These processors offer increased computing performance via large number of lightweight cores at relatively lower clock speeds compared to traditional processors (e.g. Intel Sandybridge/Ivybridge) used in current supercomputers. This style of processor can be productively used for chemistry integrators that form a costly part of computational combustion codes, in spite of their relatively lower clock speeds. Performance commensurate with traditional processors is achieved heremore » through the combination of careful memory layout, exposing multiple levels of fine grain parallelism and through extensive use of vendor supported libraries (Cilk Plus and Math Kernel Libraries). Important optimization techniques for efficient memory usage and vectorization have been identified and quantified. These optimizations resulted in a factor of ~ 3 speed-up using Intel 2013 compiler and ~ 1.5 using Intel 2017 compiler for large chemical mechanisms compared to the unoptimized version on the Intel Xeon Phi. The strategies, especially with respect to memory usage and vectorization, should also be beneficial for general purpose computational fluid dynamics codes.« less
SU-G-TeP1-15: Toward a Novel GPU Accelerated Deterministic Solution to the Linear Boltzmann Transport Equation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yang, R; Fallone, B; Cross Cancer Institute, Edmonton, AB

Purpose: To develop a Graphic Processor Unit (GPU) accelerated deterministic solution to the Linear Boltzmann Transport Equation (LBTE) for accurate dose calculations in radiotherapy (RT). A deterministic solution yields the potential for major speed improvements due to the sparse matrix-vector and vector-vector multiplications and would thus be of benefit to RT. Methods: In order to leverage the massively parallel architecture of GPUs, the first order LBTE was reformulated as a second order self-adjoint equation using the Least Squares Finite Element Method (LSFEM). This produces a symmetric positive-definite matrix which is efficiently solved using a parallelized conjugate gradient (CG) solver. Themore » LSFEM formalism is applied in space, discrete ordinates is applied in angle, and the Multigroup method is applied in energy. The final linear system of equations produced is tightly coupled in space and angle. Our code written in CUDA-C was benchmarked on an Nvidia GeForce TITAN-X GPU against an Intel i7-6700K CPU. A spatial mesh of 30,950 tetrahedral elements was used with an S4 angular approximation. Results: To avoid repeating a full computationally intensive finite element matrix assembly at each Multigroup energy, a novel mapping algorithm was developed which minimized the operations required at each energy. Additionally, a parallelized memory mapping for the kronecker product between the sparse spatial and angular matrices, including Dirichlet boundary conditions, was created. Atomicity is preserved by graph-coloring overlapping nodes into separate kernel launches. The one-time mapping calculations for matrix assembly, kronecker product, and boundary condition application took 452±1ms on GPU. Matrix assembly for 16 energy groups took 556±3s on CPU, and 358±2ms on GPU using the mappings developed. The CG solver took 93±1s on CPU, and 468±2ms on GPU. Conclusion: Three computationally intensive subroutines in deterministically solving the LBTE have been formulated on GPU, resulting in two orders of magnitude speedup. Funding support from Natural Sciences and Engineering Research Council and Alberta Innovates Health Solutions. Dr. Fallone is a co-founder and CEO of MagnetTx Oncology Solutions (under discussions to license Alberta bi-planar linac MR for commercialization).« less
Noise limitations in optical linear algebra processors.

PubMed

Batsell, S G; Jong, T L; Walkup, J F; Krile, T F

1990-05-10

A general statistical noise model is presented for optical linear algebra processors. A statistical analysis which includes device noise, the multiplication process, and the addition operation is undertaken. We focus on those processes which are architecturally independent. Finally, experimental results which verify the analytical predictions are also presented.
Loop Mirror Laser Neural Network with a Fast Liquid-Crystal Display

NASA Astrophysics Data System (ADS)

Mos, Evert C.; Schleipen, Jean J. H. B.; de Waardt, Huug; Khoe, Djan G. D.

1999-07-01

In our laser neural network (LNN) all-optical threshold action is obtained by application of controlled optical feedback to a laser diode. Here an extended experimental LNN is presented with as many as 32 neurons and 12 inputs. In the setup we use a fast liquid-crystal display to implement an optical matrix vector multiplier. This display, based on ferroelectric liquid-crystal material, enables us to present 125 training examples s to the LNN. To maximize the optical feedback efficiency of the setup, a loop mirror is introduced. We use a -rule learning algorithm to train the network to perform a number of functions toward the application area of telecommunication data switching.
Acceleration of linear stationary iterative processes in multiprocessor computers. II

DOE Office of Scientific and Technical Information (OSTI.GOV)

Romm, Ya.E.

1982-05-01

For pt.I, see Kibernetika, vol.18, no.1, p.47 (1982). For pt.I, see Cybernetics, vol.18, no.1, p.54 (1982). Considers a reduced system of linear algebraic equations x=ax+b, where a=(a/sub ij/) is a real n*n matrix; b is a real vector with common euclidean norm >>>. It is supposed that the existence and uniqueness of solution det (0-a) not equal to e is given, where e is a unit matrix. The linear iterative process converging to x x/sup (k+1)/=fx/sup (k)/, k=0, 1, 2, ..., where the operator f translates r/sup n/ into r/sup n/. In considering implementation of the iterative process (ip) inmore » a multiprocessor system, it is assumed that the number of processors is constant, and are various values of the latter investigated; it is assumed in addition, that the processors perform elementary binary arithmetic operations of addition and multiestimates only include the time of execution of arithmetic operations. With any paralleling of individual iteration, the execution time of the ip is proportional to the number of sequential steps k+1. The author sets the task of reducing the number of sequential steps in the ip so as to execute it in a time proportional to a value smaller than k+1. He also sets the goal of formulating a method of accelerated bit serial-parallel execution of each successive step of the ip, with, in the modification sought, a reduced number of steps in a time comparable to the operation time of logical elements. 6 references.« less
A fast reconstruction algorithm for fluorescence optical diffusion tomography based on preiteration.

PubMed

Song, Xiaolei; Xiong, Xiaoyun; Bai, Jing

2007-01-01

Fluorescence optical diffusion tomography in the near-infrared (NIR) bandwidth is considered to be one of the most promising ways for noninvasive molecular-based imaging. Many reconstructive approaches to it utilize iterative methods for data inversion. However, they are time-consuming and they are far from meeting the real-time imaging demands. In this work, a fast preiteration algorithm based on the generalized inverse matrix is proposed. This method needs only one step of matrix-vector multiplication online, by pushing the iteration process to be executed offline. In the preiteration process, the second-order iterative format is employed to exponentially accelerate the convergence. Simulations based on an analytical diffusion model show that the distribution of fluorescent yield can be well estimated by this algorithm and the reconstructed speed is remarkably increased.
Optical linear algebra processors: noise and error-source modeling.

PubMed

Casasent, D; Ghosh, A

1985-06-01

The modeling of system and component noise and error sources in optical linear algebra processors (OLAP's) are considered, with attention to the frequency-multiplexed OLAP. General expressions are obtained for the output produced as a function of various component errors and noise. A digital simulator for this model is discussed.
Optical linear algebra processors - Noise and error-source modeling

NASA Technical Reports Server (NTRS)

Casasent, D.; Ghosh, A.

1985-01-01

The modeling of system and component noise and error sources in optical linear algebra processors (OLAPs) are considered, with attention to the frequency-multiplexed OLAP. General expressions are obtained for the output produced as a function of various component errors and noise. A digital simulator for this model is discussed.

Matrix multiplication operations with data pre-conditioning in a high performance computing architecture

DOEpatents

Eichenberger, Alexandre E; Gschwind, Michael K; Gunnels, John A

2013-11-05

Mechanisms for performing matrix multiplication operations with data pre-conditioning in a high performance computing architecture are provided. A vector load operation is performed to load a first vector operand of the matrix multiplication operation to a first target vector register. A load and splat operation is performed to load an element of a second vector operand and replicating the element to each of a plurality of elements of a second target vector register. A multiply add operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the matrix multiplication operation. The partial product of the matrix multiplication operation is accumulated with other partial products of the matrix multiplication operation.
A generalized graph-theoretical matrix of heterosystems and its application to the VMV procedure.

PubMed

Mozrzymas, Anna

2011-12-14

The extensions of generalized (molecular) graph-theoretical matrix and vector-matrix-vector procedure are considered. The elements of the generalized matrix are redefined in order to describe molecules containing heteroatoms and multiple bonds. The adjacency, distance, detour and reciprocal distance matrices of heterosystems, and corresponding vectors are derived from newly defined generalized graph matrix. The topological indices, which are most widely used in predicting physicochemical and biological properties/activities of various compounds, can be calculated from the new generalized vector-matrix-vector invariant. Copyright © 2011 Elsevier Ltd. All rights reserved.
Acousto-optic time- and space-integrating spotlight-mode SAR processor

NASA Astrophysics Data System (ADS)

Haney, Michael W.; Levy, James J.; Michael, Robert R., Jr.

1993-09-01

The technical approach and recent experimental results for the acousto-optic time- and space- integrating real-time SAR image formation processor program are reported. The concept overcomes the size and power consumption limitations of electronic approaches by using compact, rugged, and low-power analog optical signal processing techniques for the most computationally taxing portions of the SAR imaging problem. Flexibility and performance are maintained by the use of digital electronics for the critical low-complexity filter generation and output image processing functions. The results include a demonstration of the processor's ability to perform high-resolution spotlight-mode SAR imaging by simultaneously compensating for range migration and range/azimuth coupling in the analog optical domain, thereby avoiding a highly power-consuming digital interpolation or reformatting operation usually required in all-electronic approaches.
Vectorial approach of determining the wave propagation at metasurfaces

DOE Office of Scientific and Technical Information (OSTI.GOV)

Smith, Daniel, E-mail: D.Smith1966@outlook.com; Campbell, Michael, E-mail: mhl.campbell@gmail.com; Bergmann, Andreas, E-mail: a.bergmann@hotmail.com

2015-10-15

Vector approach often benefits optical engineers and physicists, and a vector formulation of the laws of reflection and refraction has been studied (Tkaczyk, 2012). However, the conventional reflection and refraction laws may be violated in the presence of a metasurface, and reflection and refraction at the metasurface obey generalized laws of reflection and refraction (Yu et al., 2011). In this letter, the vectorial laws of reflection and refraction at the metasurface were derived, and the matrix formulation of these vectorial laws are also obtained. These results enable highly efficient and unambiguous computations in ray-tracing problems that involve a metasurface.
Study on diagnosis of micro-biomechanical structure using optical coherence tomography

NASA Astrophysics Data System (ADS)

Saeki, Souichi; Hashimoto, Youhei; Saito, Takashi; Hiro, Takafumi; Matsuzaki, Masunori

2007-02-01

Acute coronary syndromes, e.g. myocardial infarctions, are caused by the rupture of unstable plaques on coronary arteries. The stability of plaque, which depends on biomechanical properties of fibrous cap, should be diagnosed crucially. Recently, Optical Coherence Tomography (OCT) has been developed as a cross-sectional imaging method of microstructural biological tissue with high resolution 1~10 μm. Multi-functional OCT system has been promising, e.g. an estimator of biomechanical characteristics. It has been, however, difficult to estimate biomechanical characteristics, because OCT images have just speckle patterns by back-scattering light from tissue. In this study, presented is Optical Coherence Straingraphy (OCS) on the basis of OCT system, which can diagnose tissue strain distribution. This is basically composed of Recursive Cross-correlation technique (RC), which can provide a displacement vector distribution with high resolution. Furthermore, Adjacent Cross-correlation Multiplication (ACM) is introduced as a speckle noise reduction method. Multiplying adjacent correlation maps can eliminate anomalies from speckle noise, and then can enhance S/N in the determination of maximum correlation coefficient. Error propagation also can be further prevented by introducing to the recursive algorithm (RC). In addition, the spatial vector interpolation by local least square method is introduced to remove erroneous vectors and smooth the vector distribution. This was numerically applied to compressed elastic heterogeneous tissue samples to carry out the accuracy verifications. Consequently, it was quantitatively confirmed that its accuracy of displacement vectors and strain matrix components could be enhanced, comparing with the conventional method. Therefore, the proposed method was validated by the identification of different elastic objects with having nearly high resolution for that defined by optical system.
Adaptive Control Of Woofer-Tweeter Adaptive Optics

DTIC Science & Technology

2009-03-01

the actuator geometry and the matrix F describes the lowpass filter. The columns of T form a set of basis vectors in the space of the master...set equal to the simulated aperture size of 76 cm. The tweeter DM has 39 actuators across the aperture with a spacing of 2 cm for a total of 1521...actuators over the square aperture. The
High speed optical object recognition processor with massive holographic memory

NASA Technical Reports Server (NTRS)

Chao, T.; Zhou, H.; Reyes, G.

2002-01-01

Real-time object recognition using a compact grayscale optical correlator will be introduced. A holographic memory module for storing a large bank of optimum correlation filters, to accommodate the large data throughput rate needed for many real-world applications, has also been developed. System architecture of the optical processor and the holographic memory will be presented. Application examples of this object recognition technology will also be demonstrated.
Prototype Focal-Plane-Array Optoelectronic Image Processor

NASA Technical Reports Server (NTRS)

Fang, Wai-Chi; Shaw, Timothy; Yu, Jeffrey

1995-01-01

Prototype very-large-scale integrated (VLSI) planar array of optoelectronic processing elements combines speed of optical input and output with flexibility of reconfiguration (programmability) of electronic processing medium. Basic concept of processor described in "Optical-Input, Optical-Output Morphological Processor" (NPO-18174). Performs binary operations on binary (black and white) images. Each processing element corresponds to one picture element of image and located at that picture element. Includes input-plane photodetector in form of parasitic phototransistor part of processing circuit. Output of each processing circuit used to modulate one picture element in output-plane liquid-crystal display device. Intended to implement morphological processing algorithms that transform image into set of features suitable for high-level processing; e.g., recognition.
Finding a Hadamard matrix by simulated annealing of spin vectors

NASA Astrophysics Data System (ADS)

Bayu Suksmono, Andriyan

2017-05-01

Reformulation of a combinatorial problem into optimization of a statistical-mechanics system enables finding a better solution using heuristics derived from a physical process, such as by the simulated annealing (SA). In this paper, we present a Hadamard matrix (H-matrix) searching method based on the SA on an Ising model. By equivalence, an H-matrix can be converted into a seminormalized Hadamard (SH) matrix, whose first column is unit vector and the rest ones are vectors with equal number of -1 and +1 called SH-vectors. We define SH spin vectors as representation of the SH vectors, which play a similar role as the spins on Ising model. The topology of the lattice is generalized into a graph, whose edges represent orthogonality relationship among the SH spin vectors. Starting from a randomly generated quasi H-matrix Q, which is a matrix similar to the SH-matrix without imposing orthogonality, we perform the SA. The transitions of Q are conducted by random exchange of {+, -} spin-pair within the SH-spin vectors that follow the Metropolis update rule. Upon transition toward zeroth energy, the Q-matrix is evolved following a Markov chain toward an orthogonal matrix, at which the H-matrix is said to be found. We demonstrate the capability of the proposed method to find some low-order H-matrices, including the ones that cannot trivially be constructed by the Sylvester method.
Electro-optic voltage sensor with Multiple Beam Splitting

DOEpatents

Woods, Gregory K.; Renak, Todd W.; Crawford, Thomas M.; Davidson, James R.

2000-01-01

A miniature electro-optic voltage sensor system capable of accurate operation at high voltages without use of the dedicated voltage dividing hardware. The invention achieves voltage measurement without significant error contributions from neighboring conductors or environmental perturbations. The invention employs a transmitter, a sensor, a detector, and a signal processor. The transmitter produces a beam of electromagnetic radiation which is routed into the sensor. Within the sensor the beam undergoes the Pockels electro-optic effect. The electro-optic effect produces a modulation of the beam's polarization, which is in turn converted to a pair of independent conversely-amplitude-modulated signals, from which the voltage of the E-field is determined by the signal processor. The use of converse AM signals enables the signal processor to better distinguish signal from noise. The sensor converts the beam by splitting the beam in accordance with the axes of the beam's polarization state (an ellipse) into at least two AM signals. These AM signals are fed into a signal processor and processed to determine the voltage between a ground conductor and the conductor on which voltage is being measured.
On nonlinear finite element analysis in single-, multi- and parallel-processors

NASA Technical Reports Server (NTRS)

Utku, S.; Melosh, R.; Islam, M.; Salama, M.

1982-01-01

Numerical solution of nonlinear equilibrium problems of structures by means of Newton-Raphson type iterations is reviewed. Each step of the iteration is shown to correspond to the solution of a linear problem, therefore the feasibility of the finite element method for nonlinear analysis is established. Organization and flow of data for various types of digital computers, such as single-processor/single-level memory, single-processor/two-level-memory, vector-processor/two-level-memory, and parallel-processors, with and without sub-structuring (i.e. partitioning) are given. The effect of the relative costs of computation, memory and data transfer on substructuring is shown. The idea of assigning comparable size substructures to parallel processors is exploited. Under Cholesky type factorization schemes, the efficiency of parallel processing is shown to decrease due to the occasional shared data, just as that due to the shared facilities.
Cargo Movement Operations System (CMOS). Requirements Traceability Matrix Increment II

DTIC Science & Technology

1990-05-17

NO [ ] COMMENT DISPOSITION: ACCEPT [ ] REJECT [ ] COMMENT STATUS: OPEN [ ] CLOSED [ ] Cmnt Page Paragraph No. No. Number Comment 1. C-i SS0-3 Change "workstation" to "processor". 2. C-2 SS0009 Change "workstation" to "processor". SS0016 3. C-6 SS0032 Change "workstation" to "processor". SS0035 4. C-9 SS0063 Add comma after "e.g." 5. C-i SS0082 Change "workstation" to "processor". 6. C-17 SS0131 Change "workstation" to "processor". SS0132 7. C-28 SS0242 Change "workstation"
Feasibility of optically interconnected parallel processors using wavelength division multiplexing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Deri, R.J.; De Groot, A.J.; Haigh, R.E.

1996-03-01

New national security demands require enhanced computing systems for nearly ab initio simulations of extremely complex systems and analyzing unprecedented quantities of remote sensing data. This computational performance is being sought using parallel processing systems, in which many less powerful processors are ganged together to achieve high aggregate performance. Such systems require increased capability to communicate information between individual processor and memory elements. As it is likely that the limited performance of today`s electronic interconnects will prevent the system from achieving its ultimate performance, there is great interest in using fiber optic technology to improve interconnect communication. However, little informationmore » is available to quantify the requirements on fiber optical hardware technology for this application. Furthermore, we have sought to explore interconnect architectures that use the complete communication richness of the optical domain rather than using optics as a simple replacement for electronic interconnects. These considerations have led us to study the performance of a moderate size parallel processor with optical interconnects using multiple optical wavelengths. We quantify the bandwidth, latency, and concurrency requirements which allow a bus-type interconnect to achieve scalable computing performance using up to 256 nodes, each operating at GFLOP performance. Our key conclusion is that scalable performance, to {approx}150 GFLOPS, is achievable for several scientific codes using an optical bus with a small number of WDM channels (8 to 32), only one WDM channel received per node, and achievable optoelectronic bandwidth and latency requirements. 21 refs. , 10 figs.« less
Complex matrix multiplication operations with data pre-conditioning in a high performance computing architecture

DOEpatents

Eichenberger, Alexandre E; Gschwind, Michael K; Gunnels, John A

2014-02-11

Mechanisms for performing a complex matrix multiplication operation are provided. A vector load operation is performed to load a first vector operand of the complex matrix multiplication operation to a first target vector register. The first vector operand comprises a real and imaginary part of a first complex vector value. A complex load and splat operation is performed to load a second complex vector value of a second vector operand and replicate the second complex vector value within a second target vector register. The second complex vector value has a real and imaginary part. A cross multiply add operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the complex matrix multiplication operation. The partial product is accumulated with other partial products and a resulting accumulated partial product is stored in a result vector register.
Time evolution of photon-pulse propagation in scattering and absorbing media: The dynamic radiative transfer system

NASA Astrophysics Data System (ADS)

Georgakopoulos, A.; Politopoulos, K.; Georgiou, E.

2018-03-01

A new dynamic-system approach to the problem of radiative transfer inside scattering and absorbing media is presented, directly based on first-hand physical principles. This method, the Dynamic Radiative Transfer System (DRTS), employs a dynamical system formality using a global sparse matrix, which characterizes the physical, optical and geometrical properties of the material-volume of interest. The new system state is generated by the above time-independent matrix, using simple matrix-vector multiplication for each subsequent time step. DRTS is capable of calculating accurately the time evolution of photon propagation in media of complex structure and shape. The flexibility of DRTS allows the integration of time-dependent sources, boundary conditions, different media and several optical phenomena like reflection and refraction in a unified and consistent way. Various examples of DRTS simulation results are presented for ultra-fast light pulse 3-D propagation, demonstrating greatly reduced computational cost and resource requirements compared to other methods.
Reconfigurable lattice mesh designs for programmable photonic processors.

PubMed

Pérez, Daniel; Gasulla, Ivana; Capmany, José; Soref, Richard A

2016-05-30

We propose and analyse two novel mesh design geometries for the implementation of tunable optical cores in programmable photonic processors. These geometries are the hexagonal and the triangular lattice. They are compared here to a previously proposed square mesh topology in terms of a series of figures of merit that account for metrics that are relevant to on-chip integration of the mesh. We find that that the hexagonal mesh is the most suitable option of the three considered for the implementation of the reconfigurable optical core in the programmable processor.
Parallel optical information, concept, and response evolver: POINCARE

NASA Astrophysics Data System (ADS)

Caulfield, H. John; Caulfield, Kimberly

1991-08-01

It is now possible to build a nonlinear adaptive system which will incorporate many of the properties of the human mind, such as true originality in such skills as reasoning by analogy and reasoning by retrodiction, including literally unpredictable thoughts; and development of individual styles, personalities, expertise, etc. Like humans, these optical processors will have a rich `subconscious'' experience. Like humans, they will be clonable, but clones will develop differently as they experience the world differently, make different decisions, develop different habits, etc. In short, powerful optical processors with some of the properties normally associated with human intelligence can be made. This approach can result in a powerful optical processor with those properties. A demonstration chosen for simplicity of implementation is suggested. This could be the first computer of any type which uses quantum indeterminacy in an integral and important way.
A wideband software reconfigurable modem

NASA Astrophysics Data System (ADS)

Turner, J. H., Jr.; Vickers, H.

A wideband modem is described which provides signal processing capability for four Lx-band signals employing QPSK, MSK and PPM waveforms and employs a software reconfigurable architecture for maximum system flexibility and graceful degradation. The current processor uses a 2901 and two 8086 microprocessors per channel and performs acquisition, tracking, and data demodulation for JITDS, GPS, IFF and TACAN systems. The next generation processor will be implemented using a VHSIC chip set employing a programmable complex array vector processor module, a GP computer module, customized gate array modules, and a digital array correlator. This integrated processor has application to a wide number of diverse system waveforms, and will bring the benefits of VHSIC technology insertion into avionic antijam communications systems.
MR imaging of hand and wrist with a dedicated 0.1-T low-field imaging system.

PubMed

Gries, P; Constantinesco, A; Brunot, B; Facello, A

1991-01-01

We describe the first results of a new magnetic resonance imaging (MRI) system specially developed for hand and wrist imaging. The system uses a small resistive water-cooled magnet with a vertical magnetic field of 0.1 T in an air gap of 15 cm. The console is based on a microcomputer with a vector signal processor and an image-processing board. There is actually no Faraday cage. For the whole hand, the in-plane spatial resolution is less than 1 mm in the 128 x 128-pixels format for typical slice thicknesses of 3 to 5 mm. Solenoidal volume coils for fingers were developed, giving, in the same matrix format, an in-plane high spatial resolution of 0.22 mm for a typical slice thickness of 3 mm.
On improving linear solver performance: a block variant of GMRES

DOE Office of Scientific and Technical Information (OSTI.GOV)

Baker, A H; Dennis, J M; Jessup, E R

2004-05-10

The increasing gap between processor performance and memory access time warrants the re-examination of data movement in iterative linear solver algorithms. For this reason, we explore and establish the feasibility of modifying a standard iterative linear solver algorithm in a manner that reduces the movement of data through memory. In particular, we present an alternative to the restarted GMRES algorithm for solving a single right-hand side linear system Ax = b based on solving the block linear system AX = B. Algorithm performance, i.e. time to solution, is improved by using the matrix A in operations on groups of vectors.more » Experimental results demonstrate the importance of implementation choices on data movement as well as the effectiveness of the new method on a variety of problems from different application areas.« less

Automated vector selection of SIVQ and parallel computing integration MATLAB™: Innovations supporting large-scale and high-throughput image analysis studies.

PubMed

Cheng, Jerome; Hipp, Jason; Monaco, James; Lucas, David R; Madabhushi, Anant; Balis, Ulysses J

2011-01-01

Spatially invariant vector quantization (SIVQ) is a texture and color-based image matching algorithm that queries the image space through the use of ring vectors. In prior studies, the selection of one or more optimal vectors for a particular feature of interest required a manual process, with the user initially stochastically selecting candidate vectors and subsequently testing them upon other regions of the image to verify the vector's sensitivity and specificity properties (typically by reviewing a resultant heat map). In carrying out the prior efforts, the SIVQ algorithm was noted to exhibit highly scalable computational properties, where each region of analysis can take place independently of others, making a compelling case for the exploration of its deployment on high-throughput computing platforms, with the hypothesis that such an exercise will result in performance gains that scale linearly with increasing processor count. An automated process was developed for the selection of optimal ring vectors to serve as the predicate matching operator in defining histopathological features of interest. Briefly, candidate vectors were generated from every possible coordinate origin within a user-defined vector selection area (VSA) and subsequently compared against user-identified positive and negative "ground truth" regions on the same image. Each vector from the VSA was assessed for its goodness-of-fit to both the positive and negative areas via the use of the receiver operating characteristic (ROC) transfer function, with each assessment resulting in an associated area-under-the-curve (AUC) figure of merit. Use of the above-mentioned automated vector selection process was demonstrated in two cases of use: First, to identify malignant colonic epithelium, and second, to identify soft tissue sarcoma. For both examples, a very satisfactory optimized vector was identified, as defined by the AUC metric. Finally, as an additional effort directed towards attaining high-throughput capability for the SIVQ algorithm, we demonstrated the successful incorporation of it with the MATrix LABoratory (MATLAB™) application interface. The SIVQ algorithm is suitable for automated vector selection settings and high throughput computation.
Beyond core count: a look at new mainstream computing platforms for HEP workloads

NASA Astrophysics Data System (ADS)

Szostek, P.; Nowak, A.; Bitzes, G.; Valsan, L.; Jarp, S.; Dotti, A.

2014-06-01

As Moore's Law continues to deliver more and more transistors, the mainstream processor industry is preparing to expand its investments in areas other than simple core count. These new interests include deep integration of on-chip components, advanced vector units, memory, cache and interconnect technologies. We examine these moving trends with parallelized and vectorized High Energy Physics workloads in mind. In particular, we report on practical experience resulting from experiments with scalable HEP benchmarks on the Intel "Ivy Bridge-EP" and "Haswell" processor families. In addition, we examine the benefits of the new "Haswell" microarchitecture and its impact on multiple facets of HEP software. Finally, we report on the power efficiency of new systems.
Photonic Breast Tomography and Tumor Aggressiveness Assessment

DTIC Science & Technology

2011-07-01

incorporates, in optical domain, the vector subspace classification method, Multiple Signal Classification ( MUSIC ). MUSIC was developed by Devaney...and co-workers for finding the location of scattering targets whose size is smaller than the wavelength of acoustic waves or electromagnetic waves...general area of array processing for acoustic and radar time-reversal imaging [12]. The eigenvalue equation of TR matrix is solved, and the signal and
Fourier transform-wavefront reconstruction for the pyramid wavefront sensor

NASA Astrophysics Data System (ADS)

Quirós-Pacheco, Fernando; Correia, Carlos; Esposito, Simone

The application of Fourier-transform reconstruction techniques to the pyramid wavefront sensor has been investigated. A preliminary study based on end-to-end simulations of an adaptive optics system with ≈40x40 subapertures and actuators shows that the performance of the Fourier-transform reconstructor (FTR) is of the same order of magnitude than the one obtained with a conventional matrix-vector multiply (MVM) method.
Opto-microwave, Butler matrixes based front-end for a multi-beam large direct radiating array antenna

NASA Astrophysics Data System (ADS)

Piqueras, M. A.; Mengual, T.; Navasquillo, O.; Sotom, M.; Caille, G.

2017-11-01

The evolution of broadband communication satellites shows a clear trend towards beam forming and beamswitching systems with efficient multiple access schemes with wide bandwidths, for which to be economically viable, the communication price shall be as low as possible. In such applications, the most demanding antenna concept is the Direct Radiating Array (DRA) since its use allows a flexible power allocation between beams and may afford failures in their active chains with low impact on the antenna radiating pattern. Forming multiple antenna beams, as for `multimedia via satellite' missions, can be done mainly in three ways: in microwave domain, by digital or optical processors: - Microwave beam-formers are strongly constrained by the mass and volume of microwave devices and waveguides - the bandwidth of digital processors is limited due to power consumption and complexity constraints. - The microwave photonics is an enabling technology that can improve the antenna feeding network performances, overcoming the limitations of the traditional technology in the more demanding scenarios, and may overcome the conventional RF beam-former issues, to generate accurately the very numerous time delays or phase shifts required in a DRA with a large number of beams and of radiating elements. Integrated optics technology can play a crucial role as an alternative technology for implementing beam-forming structures for satellite applications thanks to the well known advantages of this technology such as low volume and weight, huge electrical bandwidth, electro-magnetic interference immunity, low consumption, remote delivery capability with low-attenuation (by carrying all microwave signals over optical fibres) and the robustness and precision that exhibits integrated optics. Under the ESA contract 4000105095/12/NL/RA the consortium formed by DAS Photonics, Thales Alenia Space and the Nanophotonic Technology Center of Valencia is developing a three-dimensional Optical Beamforming Network (OBFN) based on integrated photonics, with fibre-optics remote antenna feeding capabilities, that addresses the requirements of SoA DRA antennas in space communications, able to feed potentially hundreds of antenna elements with hundred of simultaneous, orthogonal beams. The core of this OBFN is a Photonic Integrated Circuit (PIC) implementing a passive Butler matrix similar to the structure well known by the RF community, but overcoming the issues of scalability, size, compactness and manufacturability associated to the fact of addressing hundred of elements. This fully-integrated beam-former solution also overcomes the opto-mechanical issues and environmental sensitivity of other free-space based OBFNs.
Polarization-analyzing circuit on InP for integrated Stokes vector receiver.

PubMed

Ghosh, Samir; Kawabata, Yuto; Tanemura, Takuo; Nakano, Yoshiaki

2017-05-29

Stokes vector modulation and direct detection (SVM/DD) has immense potentiality to reduce the cost burden for the next-generation short-reach optical communication networks. In this paper, we propose and demonstrate an InGaAsP/InP waveguide-based polarization-analyzing circuit for an integrated Stokes vector (SV) receiver. By transforming the input state-of-polarization (SOP) and projecting its SV onto three different vectors on the Poincare sphere, we show that the actual SOP can be retrieved by simple calculation. We also reveal that this projection matrix has a flexibility and its deviation due to device imperfectness can be calibrated to a certain degree, so that the proposed device would be fundamentally robust against fabrication errors. A proof-of-concept photonic integrated circuit (PIC) is fabricated on InP by using half-ridge waveguides to successfully demonstrate detection of different SOPs scattered on the Poincare sphere.
Vectorial laws of refraction and reflection using the cross product and dot product.

PubMed

Tkaczyk, Eric R

2012-03-01

We demonstrate that published vectorial laws of reflection and refraction of light based solely on the cross product do not, in general, uniquely determine the direction of the reflected and refracted waves without additional information. This is because the cross product does not have a unique inverse operation, which is explained in this Letter in linear algebra terms. However, a vector is in fact uniquely determined if both the cross product (vector product) and dot product (scalar product) with a known vector are specified, which can be written as a single equation with a left-invertible matrix. It is thus possible to amend the vectorial laws of reflection and refraction to incorporate both the cross and dot products for a complete specification with unique solution. This enables highly efficient, unambiguous computation of reflected and refracted wave vectors from the incident wave and surface normal. © 2012 Optical Society of America
DOE Office of Scientific and Technical Information (OSTI.GOV)

Dupertuis, M.A.; Proctor, M.; Acklin, B.

Energy balance and reciprocity relations are studied for harmonic inhomogeneous plane waves that are incident upon a stack of continuous absorbing dielectric media that are macroscopically characterized by their electric and magnetic permittivities and their conductivities. New cross terms between parallel electric and parallel magnetic modes are identified in the fully generalized Poynting vector. The symmetry and the relations between the general Fresnel coefficients are investigated in the context of energy balance at the interface. The contributions of the so-called mixed Poynting vector are discussed in detail. In particular a new transfer matrix is introduced for energy fluxes in thin-filmmore » optics based on the Poynting and mixed Poynting vectors. Finally, the study of reciprocity relations leads to a generalization of a theorem of reversibility for conducting and dielectric media. 16 refs.« less
Tunable multi-wavelength fiber lasers based on an Opto-VLSI processor and optical amplifiers.

PubMed

Xiao, Feng; Alameh, Kamal; Lee, Yong Tak

2009-12-07

A multi-wavelength tunable fiber laser based on the use of an Opto-VLSI processor in conjunction with different optical amplifiers is proposed and experimentally demonstrated. The Opto-VLSI processor can simultaneously select any part of the gain spectrum from each optical amplifier into its associated fiber ring, leading to a multiport tunable fiber laser source. We experimentally demonstrate a 3-port tunable fiber laser source, where each output wavelength of each port can independently be tuned within the C-band with a wavelength step of about 0.05 nm. Experimental results demonstrate a laser linewidth as narrow as 0.05 nm and an optical side-mode-suppression-ratio (SMSR) of about 35 dB. The demonstrated three fiber lasers have excellent stability at room temperature and output power uniformity less than 0.5 dB over the whole C-band.
Optical signal processing of spatially distributed sensor data in smart structures

NASA Technical Reports Server (NTRS)

Bennett, K. D.; Claus, R. O.; Murphy, K. A.; Goette, A. M.

1989-01-01

Smart structures which contain dense two- or three-dimensional arrays of attached or embedded sensor elements inherently require signal multiplexing and processing capabilities to permit good spatial data resolution as well as the adequately short calculation times demanded by real time active feedback actuator drive circuitry. This paper reports the implementation of an in-line optical signal processor and its application in a structural sensing system which incorporates multiple discrete optical fiber sensor elements. The signal processor consists of an array of optical fiber couplers having tailored s-parameters and arranged to allow gray code amplitude scaling of sensor inputs. The use of this signal processor in systems designed to indicate the location of distributed strain and damage in composite materials, as well as to quantitatively characterize that damage, is described. Extension of similar signal processing methods to more complicated smart materials and structures applications are discussed.
Numerical implementation of the S-matrix algorithm for modeling of relief diffraction gratings

NASA Astrophysics Data System (ADS)

Yaremchuk, Iryna; Tamulevičius, Tomas; Fitio, Volodymyr; Gražulevičiūte, Ieva; Bobitski, Yaroslav; Tamulevičius, Sigitas

2013-11-01

A new numerical implementation is developed to calculate the diffraction efficiency of relief diffraction gratings. In the new formulation, vectors containing the expansion coefficients of electric and magnetic fields on boundaries of the grating layer are expressed by additional constants. An S-matrix algorithm has been systematically described in detail and adapted to a simple matrix form. This implementation is suitable for the study of optical characteristics of periodic structures by using modern object-oriented programming languages and different standard mathematical software. The modeling program has been developed on the basis of this numerical implementation and tested by comparison with other commercially available programs and experimental data. Numerical examples are given to show the usefulness of the new implementation.
System and method for authentication of goods

DOEpatents

Kaish, Norman; Fraser, Jay; Durst, David I.

1999-01-01

An authentication system comprising a medium having a plurality of elements, the elements being distinctive, detectable and disposed in an irregular pattern or having an intrinsic irregularity. Each element is characterized by a determinable attribute distinct from a two-dimensional coordinate representation of simple optical absorption or simple optical reflection intensity. An attribute and position of the plurality of elements, with respect to a positional reference is detected. A processor generates an encrypted message including at least a portion of the attribute and position of the plurality of elements. The encrypted message is recorded in physical association with the medium. The elements are preferably dichroic fibers, and the attribute is preferably a polarization or dichroic axis, which may vary over the length of a fiber. An authentication of the medium based on the encrypted message may be authenticated with a statistical tolerance, based on a vector mapping of the elements of the medium, without requiring a complete image of the medium and elements to be recorded.
Polarimetric signature imaging of anisotropic bio-medical tissues

NASA Astrophysics Data System (ADS)

Wu, Stewart H.; Yang, De-Ming; Chiou, Arthur; Nee, Soe-Mie F.; Nee, Tsu-Wei

2010-02-01

Polarimetric imaging of Stokes vector (I, Q, U, V) can provide 4 independent signatures showing the linear and circular polarizations of biological tissues and cells. Using a recently developed Stokes digital imaging system, we measured the Stokes vector images of tissue samples from sections of rat livers containing normal portions and hematomas. The derived Mueller matrix elements can quantitatively provide multi-signature data of the bio-sample. This polarimetric optical technology is a new option of biosensing technology to inspect the structures of tissue samples, particularly for discriminating tumor and non-tumor biopsy. This technology is useful for critical disease discrimination and medical diagnostics applications.
A Parallel Vector Machine for the PM Programming Language

NASA Astrophysics Data System (ADS)

Bellerby, Tim

2016-04-01

PM is a new programming language which aims to make the writing of computational geoscience models on parallel hardware accessible to scientists who are not themselves expert parallel programmers. It is based around the concept of communicating operators: language constructs that enable variables local to a single invocation of a parallelised loop to be viewed as if they were arrays spanning the entire loop domain. This mechanism enables different loop invocations (which may or may not be executing on different processors) to exchange information in a manner that extends the successful Communicating Sequential Processes idiom from single messages to collective communication. Communicating operators avoid the additional synchronisation mechanisms, such as atomic variables, required when programming using the Partitioned Global Address Space (PGAS) paradigm. Using a single loop invocation as the fundamental unit of concurrency enables PM to uniformly represent different levels of parallelism from vector operations through shared memory systems to distributed grids. This paper describes an implementation of PM based on a vectorised virtual machine. On a single processor node, concurrent operations are implemented using masked vector operations. Virtual machine instructions operate on vectors of values and may be unmasked, masked using a Boolean field, or masked using an array of active vector cell locations. Conditional structures (such as if-then-else or while statement implementations) calculate and apply masks to the operations they control. A shift in mask representation from Boolean to location-list occurs when active locations become sufficiently sparse. Parallel loops unfold data structures (or vectors of data structures for nested loops) into vectors of values that may additionally be distributed over multiple computational nodes and then split into micro-threads compatible with the size of the local cache. Inter-node communication is accomplished using standard OpenMP and MPI. Performance analyses of the PM vector machine, demonstrating its scaling properties with respect to domain size and the number of processor nodes will be presented for a range of hardware configurations. The PM software and language definition are being made available under unrestrictive MIT and Creative Commons Attribution licenses respectively: www.pm-lang.org.
Electro-optic voltage sensor with beam splitting

DOEpatents

Woods, Gregory K.; Renak, Todd W.; Davidson, James R.; Crawford, Thomas M.

2002-01-01

The invention is a miniature electro-optic voltage sensor system capable of accurate operation at high voltages without use of the dedicated voltage dividing hardware typically found in the prior art. The invention achieves voltage measurement without significant error contributions from neighboring conductors or environmental perturbations. The invention employs a transmitter, a sensor, a detector, and a signal processor. The transmitter produces a beam of electromagnetic radiation which is routed into the sensor. Within the sensor the beam undergoes the Pockels electro-optic effect. The electro-optic effect produces a modulation of the beam's polarization, which is in turn converted to a pair of independent conversely-amplitude-modulated signals, from which the voltage of the E-field is determined by the signal processor. The use of converse AM signals enables the signal processor to better distinguish signal from noise. The sensor converts the beam by splitting the beam in accordance with the axes of the beam's polarization state (an ellipse) into at least two AM signals. These AM signals are fed into a signal processor and processed to determine the voltage between a ground conductor and the conductor on which voltage is being measured.
Optical linear algebra processors - Architectures and algorithms

NASA Technical Reports Server (NTRS)

Casasent, David

1986-01-01

Attention is given to the component design and optical configuration features of a generic optical linear algebra processor (OLAP) architecture, as well as the large number of OLAP architectures, number representations, algorithms and applications encountered in current literature. Number-representation issues associated with bipolar and complex-valued data representations, high-accuracy (including floating point) performance, and the base or radix to be employed, are discussed, together with case studies on a space-integrating frequency-multiplexed architecture and a hybrid space-integrating and time-integrating multichannel architecture.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Gebis, Joseph; Oliker, Leonid; Shalf, John

The disparity between microprocessor clock frequencies and memory latency is a primary reason why many demanding applications run well below peak achievable performance. Software controlled scratchpad memories, such as the Cell local store, attempt to ameliorate this discrepancy by enabling precise control over memory movement; however, scratchpad technology confronts the programmer and compiler with an unfamiliar and difficult programming model. In this work, we present the Virtual Vector Architecture (ViVA), which combines the memory semantics of vector computers with a software-controlled scratchpad memory in order to provide a more effective and practical approach to latency hiding. ViVA requires minimal changesmore » to the core design and could thus be easily integrated with conventional processor cores. To validate our approach, we implemented ViVA on the Mambo cycle-accurate full system simulator, which was carefully calibrated to match the performance on our underlying PowerPC Apple G5 architecture. Results show that ViVA is able to deliver significant performance benefits over scalar techniques for a variety of memory access patterns as well as two important memory-bound compact kernels, corner turn and sparse matrix-vector multiplication -- achieving 2x-13x improvement compared the scalar version. Overall, our preliminary ViVA exploration points to a promising approach for improving application performance on leading microprocessors with minimal design and complexity costs, in a power efficient manner.« less
Algorithms and software for solving finite element equations on serial and parallel architectures

NASA Technical Reports Server (NTRS)

Chu, Eleanor; George, Alan

1988-01-01

The primary objective was to compare the performance of state-of-the-art techniques for solving sparse systems with those that are currently available in the Computational Structural Mechanics (MSC) testbed. One of the first tasks was to become familiar with the structure of the testbed, and to install some or all of the SPARSPAK package in the testbed. A brief overview of the CSM Testbed software and its usage is presented. An overview of the sparse matrix research for the Testbed currently employed in the CSM Testbed is given. An interface which was designed and implemented as a research tool for installing and appraising new matrix processors in the CSM Testbed is described. The results of numerical experiments performed in solving a set of testbed demonstration problems using the processor SPK and other experimental processors are contained.
Efficiently modeling neural networks on massively parallel computers

NASA Technical Reports Server (NTRS)

Farber, Robert M.

1993-01-01

Neural networks are a very useful tool for analyzing and modeling complex real world systems. Applying neural network simulations to real world problems generally involves large amounts of data and massive amounts of computation. To efficiently handle the computational requirements of large problems, we have implemented at Los Alamos a highly efficient neural network compiler for serial computers, vector computers, vector parallel computers, and fine grain SIMD computers such as the CM-2 connection machine. This paper describes the mapping used by the compiler to implement feed-forward backpropagation neural networks for a SIMD (Single Instruction Multiple Data) architecture parallel computer. Thinking Machines Corporation has benchmarked our code at 1.3 billion interconnects per second (approximately 3 gigaflops) on a 64,000 processor CM-2 connection machine (Singer 1990). This mapping is applicable to other SIMD computers and can be implemented on MIMD computers such as the CM-5 connection machine. Our mapping has virtually no communications overhead with the exception of the communications required for a global summation across the processors (which has a sub-linear runtime growth on the order of O(log(number of processors)). We can efficiently model very large neural networks which have many neurons and interconnects and our mapping can extend to arbitrarily large networks (within memory limitations) by merging the memory space of separate processors with fast adjacent processor interprocessor communications. This paper will consider the simulation of only feed forward neural network although this method is extendable to recurrent networks.
Potential of minicomputer/array-processor system for nonlinear finite-element analysis

NASA Technical Reports Server (NTRS)

Strohkorb, G. A.; Noor, A. K.

1983-01-01

The potential of using a minicomputer/array-processor system for the efficient solution of large-scale, nonlinear, finite-element problems is studied. A Prime 750 is used as the host computer, and a software simulator residing on the Prime is employed to assess the performance of the Floating Point Systems AP-120B array processor. Major hardware characteristics of the system such as virtual memory and parallel and pipeline processing are reviewed, and the interplay between various hardware components is examined. Effective use of the minicomputer/array-processor system for nonlinear analysis requires the following: (1) proper selection of the computational procedure and the capability to vectorize the numerical algorithms; (2) reduction of input-output operations; and (3) overlapping host and array-processor operations. A detailed discussion is given of techniques to accomplish each of these tasks. Two benchmark problems with 1715 and 3230 degrees of freedom, respectively, are selected to measure the anticipated gain in speed obtained by using the proposed algorithms on the array processor.

Interactive computer modeling of combustion chemistry and coalescence-dispersion modeling of turbulent combustion

NASA Technical Reports Server (NTRS)

Pratt, D. T.

1984-01-01

An interactive computer code for simulation of a high-intensity turbulent combustor as a single point inhomogeneous stirred reactor was developed from an existing batch processing computer code CDPSR. The interactive CDPSR code was used as a guide for interpretation and direction of DOE-sponsored companion experiments utilizing Xenon tracer with optical laser diagnostic techniques to experimentally determine the appropriate mixing frequency, and for validation of CDPSR as a mixing-chemistry model for a laboratory jet-stirred reactor. The coalescence-dispersion model for finite rate mixing was incorporated into an existing interactive code AVCO-MARK I, to enable simulation of a combustor as a modular array of stirred flow and plug flow elements, each having a prescribed finite mixing frequency, or axial distribution of mixing frequency, as appropriate. Further increase the speed and reliability of the batch kinetics integrator code CREKID was increased by rewriting in vectorized form for execution on a vector or parallel processor, and by incorporating numerical techniques which enhance execution speed by permitting specification of a very low accuracy tolerance.
Matrix multiplication operations using pair-wise load and splat operations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Eichenberger, Alexandre E.; Gschwind, Michael K.; Gunnels, John A.

Mechanisms for performing a matrix multiplication operation are provided. A vector load operation is performed to load a first vector operand of the matrix multiplication operation to a first target vector register. A pair-wise load and splat operation is performed to load a pair of scalar values of a second vector operand and replicate the pair of scalar values within a second target vector register. An operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the matrix multiplication operation. The partial product is accumulatedmore » with other partial products and a resulting accumulated partial product is stored. This operation may be repeated for a second pair of scalar values of the second vector operand.« less
Microlens array processor with programmable weight mask and direct optical input

NASA Astrophysics Data System (ADS)

Schmid, Volker R.; Lueder, Ernst H.; Bader, Gerhard; Maier, Gert; Siegordner, Jochen

1999-03-01

We present an optical feature extraction system with a microlens array processor. The system is suitable for online implementation of a variety of transforms such as the Walsh transform and DCT. Operating with incoherent light, our processor accepts direct optical input. Employing a sandwich- like architecture, we obtain a very compact design of the optical system. The key elements of the microlens array processor are a square array of 15 X 15 spherical microlenses on acrylic substrate and a spatial light modulator as transmissive mask. The light distribution behind the mask is imaged onto the pixels of a customized a-Si image sensor with adjustable gain. We obtain one output sample for each microlens image and its corresponding weight mask area as summation of the transmitted intensity within one sensor pixel. The resulting architecture is very compact and robust like a conventional camera lens while incorporating a high degree of parallelism. We successfully demonstrate a Walsh transform into the spatial frequency domain as well as the implementation of a discrete cosine transform with digitized gray values. We provide results showing the transformation performance for both synthetic image patterns and images of natural texture samples. The extracted frequency features are suitable for neural classification of the input image. Other transforms and correlations can be implemented in real-time allowing adaptive optical signal processing.
3-D parallel program for numerical calculation of gas dynamics problems with heat conductivity on distributed memory computational systems (CS)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sofronov, I.D.; Voronin, B.L.; Butnev, O.I.

1997-12-31

The aim of the work performed is to develop a 3D parallel program for numerical calculation of gas dynamics problem with heat conductivity on distributed memory computational systems (CS), satisfying the condition of numerical result independence from the number of processors involved. Two basically different approaches to the structure of massive parallel computations have been developed. The first approach uses the 3D data matrix decomposition reconstructed at temporal cycle and is a development of parallelization algorithms for multiprocessor CS with shareable memory. The second approach is based on using a 3D data matrix decomposition not reconstructed during a temporal cycle.more » The program was developed on 8-processor CS MP-3 made in VNIIEF and was adapted to a massive parallel CS Meiko-2 in LLNL by joint efforts of VNIIEF and LLNL staffs. A large number of numerical experiments has been carried out with different number of processors up to 256 and the efficiency of parallelization has been evaluated in dependence on processor number and their parameters.« less
SPECIAL ISSUE ON OPTICAL PROCESSING OF INFORMATION: Optoelectronic processor in the form of a hybrid microcircuit

NASA Astrophysics Data System (ADS)

Evtikhiev, N. N.; Esepkina, N. A.; Dolgii, V. A.; Lavrov, A. P.; Khotyanov, B. M.; Chernokozhin, V. V.; Shestak, S. A.

1995-10-01

An optoelectronic processor in the form of a hybrid microcircuit is described. An analysis is made of the feasibility of developing a new class of optoelectronic processors which are hybrid microcircuits and can operate both as self-contained specialised computers and also as functional components of computing systems.
SPECIAL ISSUE ON OPTICAL PROCESSING OF INFORMATION: Method of implementation of optoelectronic multiparametric signal processing systems based on multivalued-logic principles

NASA Astrophysics Data System (ADS)

Arestova, M. L.; Bykovskii, A. Yu

1995-10-01

An architecture is proposed for a specialised optoelectronic multivalued logic processor based on the Allen—Givone algebra. The processor is intended for multiparametric processing of data arriving from a large number of sensors or for tackling spectral analysis tasks. The processor architecture makes it possible to obtain an approximate general estimate of the state of an object being diagnosed on a p-level scale. Optoelectronic systems are proposed for MAXIMUM, MINIMUM, and LITERAL logic gates, based on optical-frequency encoding of logic levels. Corresponding logic gates form a complete set of logic functions in the Allen—Givone algebra.
A high-speed, large-capacity, 'jukebox' optical disk system

NASA Technical Reports Server (NTRS)

Ammon, G. J.; Calabria, J. A.; Thomas, D. T.

1985-01-01

Two optical disk 'jukebox' mass storage systems which provide access to any data in a store of 10 to the 13th bits (1250G bytes) within six seconds have been developed. The optical disk jukebox system is divided into two units, including a hardware/software controller and a disk drive. The controller provides flexibility and adaptability, through a ROM-based microcode-driven data processor and a ROM-based software-driven control processor. The cartridge storage module contains 125 optical disks housed in protective cartridges. Attention is given to a conceptual view of the disk drive unit, the NASA optical disk system, the NASA database management system configuration, the NASA optical disk system interface, and an open systems interconnect reference model.
Multiple scattered radiation emerging from Rayleigh and continental haze layers. I - Radiance, polarization, and neutral points

NASA Technical Reports Server (NTRS)

Kattawar, G. W.; Plass, G. N.; Hitzfelder, S. J.

1976-01-01

The matrix operator method was used to calculate the polarization of radiation scattered on layers of various optical thicknesses, with results compared for Rayleigh scattering and for scattering from a continental haze. In both cases, there are neutral points arising from the zeros of the polarization of single scattered photons at scattering angles of zero and 180 degrees. The angular position of these Rayleigh-like neutral points (RNP) in the sky shows appreciable variation with the optical thickness of the scattering layer for a Rayleigh phase matrix, but only a small variation for haze L phase matrix. Another type of neutral point exists for non-Rayleigh phase functions that is associated with the zeros of the polarization for single scattering which occurs between the end points of the curve. A comparison of radiances calculated from the complete theory of radiative transfer using Stokes vectors with those obtained from the scalar theory shows that differences of the order of 23% may be obtained for Rayleigh scattering, while the largest difference found for a haze L phase function was of the order of 0.1%.
Selection of optimum median-filter-based ambiguity removal algorithm parameters for NSCAT. [NASA scatterometer

NASA Technical Reports Server (NTRS)

Shaffer, Scott; Dunbar, R. Scott; Hsiao, S. Vincent; Long, David G.

1989-01-01

The NASA Scatterometer, NSCAT, is an active spaceborne radar designed to measure the normalized radar backscatter coefficient (sigma0) of the ocean surface. These measurements can, in turn, be used to infer the surface vector wind over the ocean using a geophysical model function. Several ambiguous wind vectors result because of the nature of the model function. A median-filter-based ambiguity removal algorithm will be used by the NSCAT ground data processor to select the best wind vector from the set of ambiguous wind vectors. This process is commonly known as dealiasing or ambiguity removal. The baseline NSCAT ambiguity removal algorithm and the method used to select the set of optimum parameter values are described. An extensive simulation of the NSCAT instrument and ground data processor provides a means of testing the resulting tuned algorithm. This simulation generates the ambiguous wind-field vectors expected from the instrument as it orbits over a set of realistic meoscale wind fields. The ambiguous wind field is then dealiased using the median-based ambiguity removal algorithm. Performance is measured by comparison of the unambiguous wind fields with the true wind fields. Results have shown that the median-filter-based ambiguity removal algorithm satisfies NSCAT mission requirements.
Double Stokes-Mueller polarimetry in KTP (Potassium Titanyl Phosphate) crystal

NASA Astrophysics Data System (ADS)

Shaji, Chitra; S B, Sruthil Lal; Sharan, Alok

2017-04-01

Ultra-structural properties of material are being probed by Double Stokes-Mueller polarimetry (DSMP) technique. It makes use of higher dimensions of Stokes vector (9 X 1) and Mueller matrix (4 X9) to characterize the nonlinear optical properties of a material. Second harmonic generation (SHG) at 532nm using 1064nm as fundamental cw beam from Nd: YAG laser in type II phase matched KTP (Potassium Titanyl Phosphate) crystal is studied using DSMP. The experimental measurements for determining double Mueller matrix are carried out in the ``Polarization In Polarization Out'' (PIPO) arrangement. Nine input polarization states are incident on the sample and the linear Stokes vector of the emerging light from the sample is measured. The KTP crystal is oriented such that the SHG signal efficiency at the incident horizontal and vertical polarizations is high as compared to diagonal polarization states. The susceptibility tensor components and the phase difference between them at this orientation are determined from the double Mueller matrix elements. These determined values give information regarding the crystal axis orientations. To our knowledge, this is the first report of the use of DSMP technique to determine the crystal orientations of a biaxial crystal.
Capabilities of Fully Parallelized MHD Stability Code MARS

NASA Astrophysics Data System (ADS)

Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang

2016-10-01

Results of full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. Parallel version of MARS, named PMARS, has been recently developed at FAR-TECH. Parallelized MARS is an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, implemented in MARS. Parallelization of the code included parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse vector iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the MARS algorithm using parallel libraries and procedures. Parallelized MARS is capable of calculating eigenmodes with significantly increased spatial resolution: up to 5,000 adapted radial grid points with up to 500 poloidal harmonics. Such resolution is sufficient for simulation of kink, tearing and peeling-ballooning instabilities with physically relevant parameters. Work is supported by the U.S. DOE SBIR program.
Numerical algorithms for finite element computations on concurrent processors

NASA Technical Reports Server (NTRS)

Ortega, J. M.

1986-01-01

The work of several graduate students which relate to the NASA grant is briefly summarized. One student has worked on a detailed analysis of the so-called ijk forms of Gaussian elemination and Cholesky factorization on concurrent processors. Another student has worked on the vectorization of the incomplete Cholesky conjugate method on the CYBER 205. Two more students implemented various versions of Gaussian elimination and Cholesky factorization on the FLEX/32.
Using algebra for massively parallel processor design and utilization

NASA Technical Reports Server (NTRS)

Campbell, Lowell; Fellows, Michael R.

1990-01-01

This paper summarizes the author's advances in the design of dense processor networks. Within is reported a collection of recent constructions of dense symmetric networks that provide the largest know values for the number of nodes that can be placed in a network of a given degree and diameter. The constructions are in the range of current potential engineering significance and are based on groups of automorphisms of finite-dimensional vector spaces.
Dense and Sparse Matrix Operations on the Cell Processor

DOE Office of Scientific and Technical Information (OSTI.GOV)

Williams, Samuel W.; Shalf, John; Oliker, Leonid

2005-05-01

The slowing pace of commodity microprocessor performance improvements combined with ever-increasing chip power demands has become of utmost concern to computational scientists. Therefore, the high performance computing community is examining alternative architectures that address the limitations of modern superscalar designs. In this work, we examine STI's forthcoming Cell processor: a novel, low-power architecture that combines a PowerPC core with eight independent SIMD processing units coupled with a software-controlled memory to offer high FLOP/s/Watt. Since neither Cell hardware nor cycle-accurate simulators are currently publicly available, we develop an analytic framework to predict Cell performance on dense and sparse matrix operations, usingmore » a variety of algorithmic approaches. Results demonstrate Cell's potential to deliver more than an order of magnitude better GFLOP/s per watt performance, when compared with the Intel Itanium2 and Cray X1 processors.« less
Electro-Optical and Optical Components for Processor to Processor Interconnects

DTIC Science & Technology

2013-04-01

Kwiat and others were instrumental in explicitly co-entangling other properties such as momentum (path) [4]. Others such as Barnett and Zeilinger ...19 4. References: 1. D. Bouwmeester, J.W. Pan, K. Mattle, M. Eibl, H. Weinfurter and A. Zeilinger , “Experimental quantum teleportation...Nature, Vol. 390, 11 December 1997, pp. 575- 579. 2. Jian-Wei Pan, Dik Bouwmeester, Harald Weinfurter, and Anton Zeilinger , “Experimental Entanglement
Rotations with Rodrigues' Vector

ERIC Educational Resources Information Center

Pina, E.

2011-01-01

The rotational dynamics was studied from the point of view of Rodrigues' vector. This vector is defined here by its connection with other forms of parametrization of the rotation matrix. The rotation matrix was expressed in terms of this vector. The angular velocity was computed using the components of Rodrigues' vector as coordinates. It appears…
Orthonormal vector general polynomials derived from the Cartesian gradient of the orthonormal Zernike-based polynomials.

PubMed

Mafusire, Cosmas; Krüger, Tjaart P J

2018-06-01

The concept of orthonormal vector circle polynomials is revisited by deriving a set from the Cartesian gradient of Zernike polynomials in a unit circle using a matrix-based approach. The heart of this model is a closed-form matrix equation of the gradient of Zernike circle polynomials expressed as a linear combination of lower-order Zernike circle polynomials related through a gradient matrix. This is a sparse matrix whose elements are two-dimensional standard basis transverse Euclidean vectors. Using the outer product form of the Cholesky decomposition, the gradient matrix is used to calculate a new matrix, which we used to express the Cartesian gradient of the Zernike circle polynomials as a linear combination of orthonormal vector circle polynomials. Since this new matrix is singular, the orthonormal vector polynomials are recovered by reducing the matrix to its row echelon form using the Gauss-Jordan elimination method. We extend the model to derive orthonormal vector general polynomials, which are orthonormal in a general pupil by performing a similarity transformation on the gradient matrix to give its equivalent in the general pupil. The outer form of the Gram-Schmidt procedure and the Gauss-Jordan elimination method are then applied to the general pupil to generate the orthonormal vector general polynomials from the gradient of the orthonormal Zernike-based polynomials. The performance of the model is demonstrated with a simulated wavefront in a square pupil inscribed in a unit circle.
Reflection Matrix for Optical Resonators in FEL (Free Electron Lasers) Oscillators

DTIC Science & Technology

1988-09-22

is the dominant factor determining the reflction coefficient. The effects of deflecting tho’ light beam enter as small corrections, of first order in...RESONATORS IN FEL OSCILLATORS I. INTRODUCTION 1-7 Free Electron Lasers (FEL) operating as oscillators require the 8-10 trapping of light pulses between...The simplest oscillator configuration is that of an open resonator with two opposed identical mirrors. The radiation vector potential for this
Optical RAM-enabled cache memory and optical routing for chip multiprocessors: technologies and architectures

NASA Astrophysics Data System (ADS)

Pleros, Nikos; Maniotis, Pavlos; Alexoudi, Theonitsa; Fitsios, Dimitris; Vagionas, Christos; Papaioannou, Sotiris; Vyrsokinos, K.; Kanellos, George T.

2014-03-01

The processor-memory performance gap, commonly referred to as "Memory Wall" problem, owes to the speed mismatch between processor and electronic RAM clock frequencies, forcing current Chip Multiprocessor (CMP) configurations to consume more than 50% of the chip real-estate for caching purposes. In this article, we present our recent work spanning from Si-based integrated optical RAM cell architectures up to complete optical cache memory architectures for Chip Multiprocessor configurations. Moreover, we discuss on e/o router subsystems with up to Tb/s routing capacity for cache interconnection purposes within CMP configurations, currently pursued within the FP7 PhoxTrot project.
Energy Dissipation of Rayleigh Waves due to Absorption Along the Path by the Use of Finite Element Method

DTIC Science & Technology

1979-07-31

3 x 3 t Strain vector a ij,j Space derivative of the stress tensor Fi Force vector per unit volume o Density x CHAPTER III F Total force K Stiffness...matrix 6Vector displacements M Mass matrix B Space operating matrix DO Matrix moduli 2 x 3 DZ Operating matrix in Z direction N Matrix of shape...dissipating medium the deformation of a solid is a function of time, temperature and space . Creep phenomenon is a deformation process in which there is

A new approach for implementation of associative memory using volume holographic materials

NASA Astrophysics Data System (ADS)

Habibi, Mohammad; Pashaie, Ramin

2012-02-01

Associative memory, also known as fault tolerant or content-addressable memory, has gained considerable attention in last few decades. This memory possesses important advantages over the more common random access memories since it provides the capability to correct faults and/or partially missing information in a given input pattern. There is general consensus that optical implementation of connectionist models and parallel processors including associative memory has a better record of success compared to their electronic counterparts. In this article, we describe a novel optical implementation of associative memory which not only has the advantage of all optical learning and recalling capabilities, it can also be realized easily. We present a new approach, inspired by tomographic imaging techniques, for holographic implementation of associative memories. In this approach, a volume holographic material is sandwiched within a matrix of inputs (optical point sources) and outputs (photodetectors). The memory capacity is realized by the spatial modulation of refractive index of the holographic material. Constructing the spatial distribution of the refractive index from an array of known inputs and outputs is formulated as an inverse problem consisting a set of linear integral equations.
Field-controllable Spin-Hall Effect of Light in Optical Crystals: A Conoscopic Mueller Matrix Analysis.

PubMed

Samlan, C T; Viswanathan, Nirmal K

2018-01-31

Electric-field applied perpendicular to the direction of propagation of paraxial beam through an optical crystal dynamically modifies the spin-orbit interaction (SOI), leading to the demonstration of controllable spin-Hall effect of light (SHEL). The electro- and piezo-optic effects of the crystal modifies the radially symmetric spatial variation in the fast-axis orientation of the crystal, resulting in a complex pattern with different topologies due to the symmetry-breaking effect of the applied field. This introduces spatially-varying Pancharatnam-Berry type geometric phase on to the paraxial beam of light, leading to the observation of SHEL in addition to the spin-to-vortex conversion. A wave-vector resolved conoscopic Mueller matrix measurement and analysis provides a first glimpse of the SHEL in the biaxial crystal, identified via the appearance of weak circular birefringence. The emergence of field-controllable fast-axis orientation of the crystal and the resulting SHEL provides a new degree of freedom for affecting and controlling the spin and orbital angular momentum of photons to unravel the rich underlying physics of optical crystals and aid in the development of active photonic spin-Hall devices.
What is the longitudinal magneto-optical Kerr effect?

NASA Astrophysics Data System (ADS)

Ander Arregi, Jon; Riego, Patricia; Berger, Andreas

2017-01-01

We explore the commonly used classification scheme for the magneto-optical Kerr effect (MOKE), which essentially utilizes a dual definition based simultaneously on the Cartesian coordinate components of the magnetization vector with respect to the plane of incidence reference frame and specific elements of the reflection matrix, which describes light reflection from a ferromagnetic surface. We find that an unambiguous correspondence in between reflection matrix elements and magnetization components is valid only in special cases, while in more general cases, it leads to inconsistencies due to an intermixing of the presumed separate effects of longitudinal, transverse and polar MOKE. As an example, we investigate in this work both theoretically and experimentally a material that possesses anisotropic magneto-optical properties in accordance with its crystal symmetry. The derived equations, which specifically predict a so-far unknown polarization effect for the transverse magnetization component, are confirmed by detailed experiments on epitaxial hcp Co films. The results indicate that magneto-optical anisotropy causes significant deviations from the commonly employed MOKE data interpretation. Our work addresses the associated anomalies, provides a suitable analysis route for reliable MOKE magnetometry procedures, and proposes a revised MOKE terminology scheme.
Parallel and fault-tolerant algorithms for hypercube multiprocessors

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aykanat, C.

1988-01-01

Several techniques for increasing the performance of parallel algorithms on distributed-memory message-passing multi-processor systems are investigated. These techniques are effectively implemented for the parallelization of the Scaled Conjugate Gradient (SCG) algorithm on a hypercube connected message-passing multi-processor. Significant performance improvement is achieved by using these techniques. The SCG algorithm is used for the solution phase of an FE modeling system. Almost linear speed-up is achieved, and it is shown that hypercube topology is scalable for an FE class of problem. The SCG algorithm is also shown to be suitable for vectorization, and near supercomputer performance is achieved on a vectormore » hypercube multiprocessor by exploiting both parallelization and vectorization. Fault-tolerance issues for the parallel SCG algorithm and for the hypercube topology are also addressed.« less
High-Speed GPU-Based Fully Three-Dimensional Diffuse Optical Tomographic System

PubMed Central

Saikia, Manob Jyoti; Kanhirodan, Rajan; Mohan Vasu, Ram

2014-01-01

We have developed a graphics processor unit (GPU-) based high-speed fully 3D system for diffuse optical tomography (DOT). The reduction in execution time of 3D DOT algorithm, a severely ill-posed problem, is made possible through the use of (1) an algorithmic improvement that uses Broyden approach for updating the Jacobian matrix and thereby updating the parameter matrix and (2) the multinode multithreaded GPU and CUDA (Compute Unified Device Architecture) software architecture. Two different GPU implementations of DOT programs are developed in this study: (1) conventional C language program augmented by GPU CUDA and CULA routines (C GPU), (2) MATLAB program supported by MATLAB parallel computing toolkit for GPU (MATLAB GPU). The computation time of the algorithm on host CPU and the GPU system is presented for C and Matlab implementations. The forward computation uses finite element method (FEM) and the problem domain is discretized into 14610, 30823, and 66514 tetrahedral elements. The reconstruction time, so achieved for one iteration of the DOT reconstruction for 14610 elements, is 0.52 seconds for a C based GPU program for 2-plane measurements. The corresponding MATLAB based GPU program took 0.86 seconds. The maximum number of reconstructed frames so achieved is 2 frames per second. PMID:24891848
High-Speed GPU-Based Fully Three-Dimensional Diffuse Optical Tomographic System.

PubMed

Saikia, Manob Jyoti; Kanhirodan, Rajan; Mohan Vasu, Ram

2014-01-01

We have developed a graphics processor unit (GPU-) based high-speed fully 3D system for diffuse optical tomography (DOT). The reduction in execution time of 3D DOT algorithm, a severely ill-posed problem, is made possible through the use of (1) an algorithmic improvement that uses Broyden approach for updating the Jacobian matrix and thereby updating the parameter matrix and (2) the multinode multithreaded GPU and CUDA (Compute Unified Device Architecture) software architecture. Two different GPU implementations of DOT programs are developed in this study: (1) conventional C language program augmented by GPU CUDA and CULA routines (C GPU), (2) MATLAB program supported by MATLAB parallel computing toolkit for GPU (MATLAB GPU). The computation time of the algorithm on host CPU and the GPU system is presented for C and Matlab implementations. The forward computation uses finite element method (FEM) and the problem domain is discretized into 14610, 30823, and 66514 tetrahedral elements. The reconstruction time, so achieved for one iteration of the DOT reconstruction for 14610 elements, is 0.52 seconds for a C based GPU program for 2-plane measurements. The corresponding MATLAB based GPU program took 0.86 seconds. The maximum number of reconstructed frames so achieved is 2 frames per second.
Fiber-guided modes conversion using superposed helical gratings

NASA Astrophysics Data System (ADS)

Ma, Yancheng; Fang, Liang; Wu, Guoan

2017-03-01

Optical fibers can support various modal forms, including vector modes, linear polarization (LP) modes, and orbital angular momentum (OAM) modes, etc. The modal correlation among these modes is investigated via Jones matrix, associated with polarization and helical phase corresponding to spin angular momentum (SAM) and OAM of light, respectively. We can generate different modal forms by adopting superposed helical gratings (SHGs) with opposite helix orientations. Detailed analysis and discussion on mode conversion is given as for mode coupling in optical fibers with both low and high contrast index, respectively. Our study may deepen the understanding for various fiber-guided modes and mode conversion among them via fiber gratings.
Compact self-contained electrical-to-optical converter/transmitter

DOEpatents

Seligmann, Daniel A.; Moss, William C.; Valk, Theodore C.; Conder, Alan D.

1995-01-01

A first optical receiver and a second optical receiver are provided for receiving a calibrate command and a power switching signal, respectively, from a remote processor. A third receiver is provided for receiving an analog electrical signal from a transducer. A calibrator generates a reference signal in response to the calibrate command. A combiner mixes the electrical signal with the reference signal to form a calibrated signal. A converter converts the calibrated signal to an optical signal. A transmitter transmits the optical signal to the remote processor. A primary battery supplies power to the calibrator, the combiner, the converter, and the transmitter. An optically-activated switch supplies power to the calibrator, the combiner, the converter, and the transmitter in response to the power switching signal. An auxiliary battery supplies power continuously to the switch.
A group matrix representation relevant to scales of measurement of clinical disease states via stratified vectors.

PubMed

Sawamura, Jitsuki; Morishita, Shigeru; Ishigooka, Jun

2016-02-09

Previously, we applied basic group theory and related concepts to scales of measurement of clinical disease states and clinical findings (including laboratory data). To gain a more concrete comprehension, we here apply the concept of matrix representation, which was not explicitly exploited in our previous work. Starting with a set of orthonormal vectors, called the basis, an operator Rj (an N-tuple patient disease state at the j-th session) was expressed as a set of stratified vectors representing plural operations on individual components, so as to satisfy the group matrix representation. The stratified vectors containing individual unit operations were combined into one-dimensional square matrices [Rj]s. The [Rj]s meet the matrix representation of a group (ring) as a K-algebra. Using the same-sized matrix of stratified vectors, we can also express changes in the plural set of [Rj]s. The method is demonstrated on simple examples. Despite the incompleteness of our model, the group matrix representation of stratified vectors offers a formal mathematical approach to clinical medicine, aligning it with other branches of natural science.
Realization of preconditioned Lanczos and conjugate gradient algorithms on optical linear algebra processors.

PubMed

Ghosh, A

1988-08-01

Lanczos and conjugate gradient algorithms are important in computational linear algebra. In this paper, a parallel pipelined realization of these algorithms on a ring of optical linear algebra processors is described. The flow of data is designed to minimize the idle times of the optical multiprocessor and the redundancy of computations. The effects of optical round-off errors on the solutions obtained by the optical Lanczos and conjugate gradient algorithms are analyzed, and it is shown that optical preconditioning can improve the accuracy of these algorithms substantially. Algorithms for optical preconditioning and results of numerical experiments on solving linear systems of equations arising from partial differential equations are discussed. Since the Lanczos algorithm is used mostly with sparse matrices, a folded storage scheme to represent sparse matrices on spatial light modulators is also described.
A microprocessor-based one dimensional optical data processor for spatial frequency analysis

NASA Technical Reports Server (NTRS)

Collier, R. L.; Ballard, G. S.

1982-01-01

A high degree of accuracy was obtained in measuring the spatial frequency spectrum of known samples using an optical data processor based on a microprocessor, which reliably collected intensity versus angle data. Stray light control, system alignment, and angle measurement problems were addressed and solved. The capabilities of the instrument were extended by the addition of appropriate optics to allow the use of different wavelengths of laser radiation and by increasing the travel limits of the rotating arm to + or - 160 degrees. The acquisition, storage, and plotting of data by the computer permits the researcher a free hand in data manipulation such as subtracting background scattering from a diffraction pattern. Tests conducted to verify the operation of the processor using a 25 mm diameter pinhole, a 39.37 line pairs per mm series of multiple slits, and a microscope slide coated with 1.091 mm diameter polystyrene latex spheres are described.
A hybrid optic-fiber sensor network with the function of self-diagnosis and self-healing

NASA Astrophysics Data System (ADS)

Xu, Shibo; Liu, Tiegen; Ge, Chunfeng; Chen, Cheng; Zhang, Hongxia

2014-11-01

We develop a hybrid wavelength division multiplexing optical fiber network with distributed fiber-optic sensors and quasi-distributed FBG sensor arrays which detect vibrations, temperatures and strains at the same time. The network has the ability to locate the failure sites automatically designated as self-diagnosis and make protective switching to reestablish sensing service designated as self-healing by cooperative work of software and hardware. The processes above are accomplished by master-slave processors with the help of optical and wireless telemetry signals. All the sensing and optical telemetry signals transmit in the same fiber either working fiber or backup fiber. We take wavelength 1450nm as downstream signal and wavelength 1350nm as upstream signal to control the network in normal circumstances, both signals are sent by a light emitting node of the corresponding processor. There is also a continuous laser wavelength 1310nm sent by each node and received by next node on both working and backup fibers to monitor their healthy states, but it does not carry any message like telemetry signals do. When fibers of two sensor units are completely damaged, the master processor will lose the communication with the node between the damaged ones.However we install RF module in each node to solve the possible problem. Finally, the whole network state is transmitted to host computer by master processor. Operator could know and control the network by human-machine interface if needed.
Configuration control of seven-degree-of-freedom arms

NASA Technical Reports Server (NTRS)

Seraji, Homayoun (Inventor); Long, Mark K. (Inventor); Lee, Thomas S. (Inventor)

1992-01-01

A seven degree of freedom robot arm with a six degree of freedom end effector is controlled by a processor employing a 6 by 7 Jacobian matrix for defining location and orientation of the end effector in terms of the rotation angles of the joints, a 1 (or more) by 7 Jacobian matrix for defining 1 (or more) user specified kinematic functions constraining location or movement of selected portions of the arm in terms of the joint angles, the processor combining the two Jacobian matrices to produce an augmented 7 (or more) by 7 Jacobian matrix, the processor effecting control by computing in accordance with forward kinematics from the augmented 7 by 7 Jacobian matrix and from the seven joint angles of the arm a set of seven desired joint angles for transmittal to the joint servo loops of the arm. One of the kinematic functions constraints the orientation of the elbow plane of the arm. Another one of the kinematic functions minimizes a sum of gravitational torques on the joints. Still another kinematic function constrains the location of the arm to perform collision avoidance. Generically, one kinematic function minimizes a sum of selected mechanical parameters of at least some of the joints associated with weighting coefficients which may be changed during arm movement. The mechanical parameters may be velocity errors or gravity torques associated with individual joints.
Configuration control of seven degree of freedom arms

NASA Technical Reports Server (NTRS)

Seraji, Homayoun (Inventor)

1995-01-01

A seven-degree-of-freedom robot arm with a six-degree-of-freedom end effector is controlled by a processor employing a 6-by-7 Jacobian matrix for defining location and orientation of the end effector in terms of the rotation angles of the joints, a 1 (or more)-by-7 Jacobian matrix for defining 1 (or more) user-specified kinematic functions constraining location or movement of selected portions of the arm in terms of the joint angles, the processor combining the two Jacobian matrices to produce an augmented 7 (or more)-by-7 Jacobian matrix, the processor effecting control by computing in accordance with forward kinematics from the augmented 7-by-7 Jacobian matrix and from the seven joint angles of the arm a set of seven desired joint angles for transmittal to the joint servo loops of the arms. One of the kinematic functions constrains the orientation of the elbow plane of the arm. Another one of the kinematic functions minimizing a sum of gravitational torques on the joints. Still another one of the kinematic functions constrains the location of the arm to perform collision avoidance. Generically, one of the kinematic functions minimizes a sum of selected mechanical parameters of at least some of the joints associated with weighting coefficients which may be changed during arm movement. The mechanical parameters may be velocity errors or position errors or gravity torques associated with individual joints.
Image Intensifier Modules For Use With Commercially Available Solid State Cameras

NASA Astrophysics Data System (ADS)

Murphy, Howard; Tyler, Al; Lake, Donald W.

1989-04-01

A modular approach to design has contributed greatly to the success of the family of machine vision video equipment produced by EG&G Reticon during the past several years. Internal modularity allows high-performance area (matrix) and line scan cameras to be assembled with two or three electronic subassemblies with very low labor costs, and permits camera control and interface circuitry to be realized by assemblages of various modules suiting the needs of specific applications. Product modularity benefits equipment users in several ways. Modular matrix and line scan cameras are available in identical enclosures (Fig. 1), which allows enclosure components to be purchased in volume for economies of scale and allows field replacement or exchange of cameras within a customer-designed system to be easily accomplished. The cameras are optically aligned (boresighted) at final test; modularity permits optical adjustments to be made with the same precise test equipment for all camera varieties. The modular cameras contain two, or sometimes three, hybrid microelectronic packages (Fig. 2). These rugged and reliable "submodules" perform all of the electronic operations internal to the camera except for the job of image acquisition performed by the monolithic image sensor. Heat produced by electrical power dissipation in the electronic modules is conducted through low resistance paths to the camera case by the metal plates, which results in a thermally efficient and environmentally tolerant camera with low manufacturing costs. A modular approach has also been followed in design of the camera control, video processor, and computer interface accessory called the Formatter (Fig. 3). This unit can be attached directly onto either a line scan or matrix modular camera to form a self-contained units, or connected via a cable to retain the advantages inherent to a small, light weight, and rugged image sensing component. Available modules permit the bus-structured Formatter to be configured as required by a specific camera application. Modular line and matrix scan cameras incorporating sensors with fiber optic faceplates (Fig 4) are also available. These units retain the advantages of interchangeability, simple construction, ruggedness, and optical precision offered by the more common lens input units. Fiber optic faceplate cameras are used for a wide variety of applications. A common usage involves mating of the Reticon-supplied camera to a customer-supplied intensifier tube for low light level and/or short exposure time situations.
Advanced miniature processing handware for ATR applications

NASA Technical Reports Server (NTRS)

Chao, Tien-Hsin (Inventor); Daud, Taher (Inventor); Thakoor, Anikumar (Inventor)

2003-01-01

A Hybrid Optoelectronic Neural Object Recognition System (HONORS), is disclosed, comprising two major building blocks: (1) an advanced grayscale optical correlator (OC) and (2) a massively parallel three-dimensional neural-processor. The optical correlator, with its inherent advantages in parallel processing and shift invariance, is used for target of interest (TOI) detection and segmentation. The three-dimensional neural-processor, with its robust neural learning capability, is used for target classification and identification. The hybrid optoelectronic neural object recognition system, with its powerful combination of optical processing and neural networks, enables real-time, large frame, automatic target recognition (ATR).
Solution of matrix equations using sparse techniques

NASA Technical Reports Server (NTRS)

Baddourah, Majdi

1994-01-01

The solution of large systems of matrix equations is key to the solution of a large number of scientific and engineering problems. This talk describes the sparse matrix solver developed at Langley which can routinely solve in excess of 263,000 equations in 40 seconds on one Cray C-90 processor. It appears that for large scale structural analysis applications, sparse matrix methods have a significant performance advantage over other methods.
Air-Lubricated Thermal Processor For Dry Silver Film

NASA Astrophysics Data System (ADS)

Siryj, B. W.

1980-09-01

Since dry silver film is processed by heat, it may be viewed on a light table only seconds after exposure. On the other hand, wet films require both bulky chemicals and substantial time before an image can be analyzed. Processing of dry silver film, although simple in concept, is not so simple when reduced to practice. The main concern is the effect of film temperature gradients on uniformity of optical film density. RCA has developed two thermal processors, different in implementation but based on the same philosophy. Pressurized air is directed to both sides of the film to support the film and to conduct the heat to the film. Porous graphite is used as the medium through which heat and air are introduced. The initial thermal processor was designed to process 9.5-inch-wide film moving at speeds ranging from 0.0034 to 0.008 inch per second. The processor configuration was curved to match the plane generated by the laser recording beam. The second thermal processor was configured to process 5-inch-wide film moving at a continuously variable rate ranging from 0.15 to 3.5 inches per second. Due to field flattening optics used in this laser recorder, the required film processing area was plane. In addition, this processor was sectioned in the direction of film motion, giving the processor the capability of varying both temperature and effective processing area.
Rapid prototyping and evaluation of programmable SIMD SDR processors in LISA

NASA Astrophysics Data System (ADS)

Chen, Ting; Liu, Hengzhu; Zhang, Botao; Liu, Dongpei

2013-03-01

With the development of international wireless communication standards, there is an increase in computational requirement for baseband signal processors. Time-to-market pressure makes it impossible to completely redesign new processors for the evolving standards. Due to its high flexibility and low power, software defined radio (SDR) digital signal processors have been proposed as promising technology to replace traditional ASIC and FPGA fashions. In addition, there are large numbers of parallel data processed in computation-intensive functions, which fosters the development of single instruction multiple data (SIMD) architecture in SDR platform. So a new way must be found to prototype the SDR processors efficiently. In this paper we present a bit-and-cycle accurate model of programmable SIMD SDR processors in a machine description language LISA. LISA is a language for instruction set architecture which can gain rapid model at architectural level. In order to evaluate the availability of our proposed processor, three common baseband functions, FFT, FIR digital filter and matrix multiplication have been mapped on the SDR platform. Analytical results showed that the SDR processor achieved the maximum of 47.1% performance boost relative to the opponent processor.
Multigrid Equation Solvers for Large Scale Nonlinear Finite Element Simulations

DTIC Science & Technology

1999-01-01

purpose of the second partitioning phase , on each SMP, is to minimize the communication within the SMP; even if a multi - threaded matrix vector product...8.7 Comparison of model with experimental data for send phase of matrix vector product on ne grid...140 8.4 Matrix vector product phase times : : : : : : : : : : : : : : : : : : : : : : : 145 9.1 Flat and

Spatial Phase Coding for Incoherent Optical Processors

NASA Technical Reports Server (NTRS)

Tigin, D. V.; Lavrentev, A. A.; Gary, C. K.

1994-01-01

In this paper we introduce spatial phase coding of incoherent optical signals for representing signed numbers in optical processors and present an experimental demonstration of this coding technique. If a diffraction grating, such as an acousto-optic cell, modulates a stream of light, the image of the grating can be recovered from the diffracted beam. The position of the grating image, or more precisely its phase, can be used to denote the sign of the number represented by the diffracted light. The intensity of the light represents the magnitude of the number. This technique is more economical than current methods in terms of the number of information channels required to represent a number and the amount of post processing required.
A fully reconfigurable photonic integrated signal processor

NASA Astrophysics Data System (ADS)

Liu, Weilin; Li, Ming; Guzzon, Robert S.; Norberg, Erik J.; Parker, John S.; Lu, Mingzhi; Coldren, Larry A.; Yao, Jianping

2016-03-01

Photonic signal processing has been considered a solution to overcome the inherent electronic speed limitations. Over the past few years, an impressive range of photonic integrated signal processors have been proposed, but they usually offer limited reconfigurability, a feature highly needed for the implementation of large-scale general-purpose photonic signal processors. Here, we report and experimentally demonstrate a fully reconfigurable photonic integrated signal processor based on an InP-InGaAsP material system. The proposed photonic signal processor is capable of performing reconfigurable signal processing functions including temporal integration, temporal differentiation and Hilbert transformation. The reconfigurability is achieved by controlling the injection currents to the active components of the signal processor. Our demonstration suggests great potential for chip-scale fully programmable all-optical signal processing.
An Improved Wavefront Control Algorithm for Large Space Telescopes

NASA Technical Reports Server (NTRS)

Sidick, Erkin; Basinger, Scott A.; Redding, David C.

2008-01-01

Wavefront sensing and control is required throughout the mission lifecycle of large space telescopes such as James Webb Space Telescope (JWST). When an optic of such a telescope is controlled with both surface-deforming and rigid-body actuators, the sensitivity-matrix obtained from the exit pupil wavefront vector divided by the corresponding actuator command value can sometimes become singular due to difference in actuator types and in actuator command values. In this paper, we propose a simple approach for preventing a sensitivity-matrix from singularity. We also introduce a new "minimum-wavefront and optimal control compensator". It uses an optimal control gain matrix obtained by feeding back the actuator commands along with the measured or estimated wavefront phase information to the estimator, thus eliminating the actuator modes that are not observable in the wavefront sensing process.
Combined fast multipole-QR compression technique for solving electrically small to large structures for broadband applications

NASA Technical Reports Server (NTRS)

Jandhyala, Vikram (Inventor); Chowdhury, Indranil (Inventor)

2011-01-01

An approach that efficiently solves for a desired parameter of a system or device that can include both electrically large fast multipole method (FMM) elements, and electrically small QR elements. The system or device is setup as an oct-tree structure that can include regions of both the FMM type and the QR type. An iterative solver is then used to determine a first matrix vector product for any electrically large elements, and a second matrix vector product for any electrically small elements that are included in the structure. These matrix vector products for the electrically large elements and the electrically small elements are combined, and a net delta for a combination of the matrix vector products is determined. The iteration continues until a net delta is obtained that is within predefined limits. The matrix vector products that were last obtained are used to solve for the desired parameter.
Sentinel-2 Level 2A Prototype Processor: Architecture, Algorithms And First Results

NASA Astrophysics Data System (ADS)

Muller-Wilm, Uwe; Louis, Jerome; Richter, Rudolf; Gascon, Ferran; Niezette, Marc

2013-12-01

Sen2Core is a prototype processor for Sentinel-2 Level 2A product processing and formatting. The processor is developed for and with ESA and performs the tasks of Atmospheric Correction and Scene Classification of Level 1C input data. Level 2A outputs are: Bottom-Of- Atmosphere (BOA) corrected reflectance images, Aerosol Optical Thickness-, Water Vapour-, Scene Classification maps and Quality indicators, including cloud and snow probabilities. The Level 2A Product Formatting performed by the processor follows the specification of the Level 1C User Product.
A New Experiment on Bengali Character Recognition

NASA Astrophysics Data System (ADS)

Barman, Sumana; Bhattacharyya, Debnath; Jeon, Seung-Whan; Kim, Tai-Hoon; Kim, Haeng-Kon

This paper presents a method to use View based approach in Bangla Optical Character Recognition (OCR) system providing reduced data set to the ANN classification engine rather than the traditional OCR methods. It describes how Bangla characters are processed, trained and then recognized with the use of a Backpropagation Artificial neural network. This is the first published account of using a segmentation-free optical character recognition system for Bangla using a view based approach. The methodology presented here assumes that the OCR pre-processor has presented the input images to the classification engine described here. The size and the font face used to render the characters are also significant in both training and classification. The images are first converted into greyscale and then to binary images; these images are then scaled to a fit a pre-determined area with a fixed but significant number of pixels. The feature vectors are then formed extracting the characteristics points, which in this case is simply a series of 0s and 1s of fixed length. Finally, an artificial neural network is chosen for the training and classification process.
Hypergraph partitioning implementation for parallelizing matrix-vector multiplication using CUDA GPU-based parallel computing

NASA Astrophysics Data System (ADS)

Murni, Bustamam, A.; Ernastuti, Handhika, T.; Kerami, D.

2017-07-01

Calculation of the matrix-vector multiplication in the real-world problems often involves large matrix with arbitrary size. Therefore, parallelization is needed to speed up the calculation process that usually takes a long time. Graph partitioning techniques that have been discussed in the previous studies cannot be used to complete the parallelized calculation of matrix-vector multiplication with arbitrary size. This is due to the assumption of graph partitioning techniques that can only solve the square and symmetric matrix. Hypergraph partitioning techniques will overcome the shortcomings of the graph partitioning technique. This paper addresses the efficient parallelization of matrix-vector multiplication through hypergraph partitioning techniques using CUDA GPU-based parallel computing. CUDA (compute unified device architecture) is a parallel computing platform and programming model that was created by NVIDIA and implemented by the GPU (graphics processing unit).
Burst-mode optical label processor with ultralow power consumption.

PubMed

Ibrahim, Salah; Nakahara, Tatsushi; Ishikawa, Hiroshi; Takahashi, Ryo

2016-04-04

A novel label processor subsystem for 100-Gbps (25-Gbps × 4λs) burst-mode optical packets is developed, in which a highly energy-efficient method is pursued for extracting and interfacing the ultrafast packet-label to a CMOS-based processor where label recognition takes place. The method involves performing serial-to-parallel conversion for the label bits on a bit-by-bit basis by using an optoelectronic converter that is operated with a set of optical triggers generated in a burst-mode manner upon packet arrival. Here we present three key achievements that enabled a significant reduction in the total power consumption and latency of the whole subsystem; 1) based on a novel operation mechanism for providing amplification with bit-level selectivity, an optical trigger pulse generator, that consumes power for a very short duration upon packet arrival, is proposed and experimentally demonstrated, 2) the energy of optical triggers needed by the optoelectronic serial-to-parallel converter is reduced by utilizing a negative-polarity signal while employing an enhanced conversion scheme entitled the discharge-or-hold scheme, 3) the necessary optical trigger energy is further cut down by half by coupling the triggers through the chip's backside, whereas a novel lens-free packaging method is developed to enable a low-cost alignment process that works with simple visual observation.
Improved performance in NASTRAN (R)

NASA Technical Reports Server (NTRS)

Chan, Gordon C.

1989-01-01

Three areas of improvement in COSMIC/NASTRAN, 1989 release, were incorporated recently that make the analysis program run faster on large problems. Actual log files and actual timings on a few test samples that were run on IBM, CDC, VAX, and CRAY computers were compiled. The speed improvement is proportional to the problem size and number of continuation cards. Vectorizing certain operations in BANDIT, makes BANDIT run twice as fast in some large problems using structural elements with many node points. BANDIT is a built-in NASTRAN processor that optimizes the structural matrix bandwidth. The VAX matrix packing routine BLDPK was modified so that it is now packing a column of a matrix 3 to 9 times faster. The denser and bigger the matrix, the greater is the speed improvement. This improvement makes a host of routines and modules that involve matrix operation run significantly faster, and saves disc space for dense matrices. A UNIX version, converted from 1988 COSMIC/NASTRAN, was tested successfully on a Silicon Graphics computer using the UNIX V Operating System, with Berkeley 4.3 Extensions. The Utility Modules INPUTT5 and OUTPUT5 were expanded to handle table data, as well as matrices. Both INPUTT5 and OUTPUT5 are general input/output modules that read and write FORTRAN files with or without format. More user informative messages are echoed from PARAMR, PARAMD, and SCALAR modules to ensure proper data values and data types being handled. Two new Utility Modules, GINOFILE and DATABASE, were written for the 1989 release. Seven rigid elements are added to COSMIC/NASTRAN. They are: CRROD, CRBAR, CRTRPLT, CRBE1, CRBE2, CRBE3, and CRSPLINE.
Compact self-contained electrical-to-optical converter/transmitter

DOEpatents

Seligmann, D.A.; Moss, W.C.; Valk, T.C.; Conder, A.D.

1995-11-21

A first optical receiver and a second optical receiver are provided for receiving a calibrate command and a power switching signal, respectively, from a remote processor. A third receiver is provided for receiving an analog electrical signal from a transducer. A calibrator generates a reference signal in response to the calibrate command. A combiner mixes the electrical signal with the reference signal to form a calibrated signal. A converter converts the calibrated signal to an optical signal. A transmitter transmits the optical signal to the remote processor. A primary battery supplies power to the calibrator, the combiner, the converter, and the transmitter. An optically-activated switch supplies power to the calibrator, the combiner, the converter, and the transmitter in response to the power switching signal. An auxiliary battery supplies power continuously to the switch. 13 figs.
Hypercluster Parallel Processor

NASA Technical Reports Server (NTRS)

Blech, Richard A.; Cole, Gary L.; Milner, Edward J.; Quealy, Angela

1992-01-01

Hypercluster computer system includes multiple digital processors, operation of which coordinated through specialized software. Configurable according to various parallel-computing architectures of shared-memory or distributed-memory class, including scalar computer, vector computer, reduced-instruction-set computer, and complex-instruction-set computer. Designed as flexible, relatively inexpensive system that provides single programming and operating environment within which one can investigate effects of various parallel-computing architectures and combinations on performance in solution of complicated problems like those of three-dimensional flows in turbomachines. Hypercluster software and architectural concepts are in public domain.
Electro-optic voltage sensor for sensing voltage in an E-field

DOEpatents

Woods, G.K.; Renak, T.W.

1999-04-06

A miniature electro-optic voltage sensor system capable of accurate operation at high voltages is disclosed. The system employs a transmitter, a sensor disposed adjacent to but out of direct electrical contact with a conductor on which the voltage is to be measured, a detector, and a signal processor. The transmitter produces a beam of electromagnetic radiation which is routed into the sensor where the beam undergoes the Pockels electro-optic effect. The electro-optic effect causes phase shifting in the beam, which is in turn converted to a pair of independent beams, from which the voltage of a system based on its E-field is determined when the two beams are normalized by the signal processor. The sensor converts the beam by splitting the beam in accordance with the axes of the beam`s polarization state (an ellipse whose ellipticity varies between -1 and +1 in proportion to voltage) into at least two AM signals. These AM signals are fed into a signal processor and processed to determine the voltage between a ground conductor and the conductor on which voltage is being measured. 18 figs.
Electro-optical voltage sensor head

DOEpatents

Woods, Gregory K.

1998-01-01

A miniature electro-optic voltage sensor system capable of accurate operation at high voltages. The system employs a transmitter, a sensor disposed adjacent to but out of direct electrical contact with a conductor on which the voltage is to be measured, a detector, and a signal processor. The transmitter produces a beam of electromagnetic radiation which is routed into the sensor where the beam undergoes the Pockels electro-optic effect. The electro-optic effect causes phase shifting in the beam, which is in turn converted to a pair of independent beams, from which the voltage of a system based on its E-field is determined when the two beams are normalized by the signal processor. The sensor converts the beam by splitting the beam in accordance with the axes of the beam's polarization state (an ellipse whose ellipticity varies between -1 and +1 in proportion to voltage) into at least two AM signals. These AM signals are fed into a signal processor and processed to determine the voltage between a ground conductor and the conductor on which voltage is being measured.
Electro-optic voltage sensor for sensing voltage in an E-field

DOEpatents

Woods, Gregory K.; Renak, Todd W.

1999-01-01

A miniature electro-optic voltage sensor system capable of accurate operation at high voltages. The system employs a transmitter, a sensor disposed adjacent to but out of direct electrical contact with a conductor on which the voltage is to be measured, a detector, and a signal processor. The transmitter produces a beam of electromagnetic radiation which is routed into the sensor where the beam undergoes the Pockels electro-optic effect. The electro-optic effect causes phase shifting in the beam, which is in turn converted to a pair of independent beams, from which the voltage of a system based on its E-field is determined when the two beams are normalized by the signal processor. The sensor converts the beam by splitting the beam in accordance with the axes of the beam's polarization state (an ellipse whose ellipticity varies between -1 and +1 in proportion to voltage) into at least two AM signals. These AM signals are fed into a signal processor and processed to determine the voltage between a ground conductor and the conductor on which voltage is being measured.
Electro-optical voltage sensor head

DOEpatents

Woods, G.K.

1998-03-24

A miniature electro-optic voltage sensor system capable of accurate operation at high voltages is disclosed. The system employs a transmitter, a sensor disposed adjacent to but out of direct electrical contact with a conductor on which the voltage is to be measured, a detector, and a signal processor. The transmitter produces a beam of electromagnetic radiation which is routed into the sensor where the beam undergoes the Pockels electro-optic effect. The electro-optic effect causes phase shifting in the beam, which is in turn converted to a pair of independent beams, from which the voltage of a system based on its E-field is determined when the two beams are normalized by the signal processor. The sensor converts the beam by splitting the beam in accordance with the axes of the beam`s polarization state (an ellipse whose ellipticity varies between -1 and +1 in proportion to voltage) into at least two AM signals. These AM signals are fed into a signal processor and processed to determine the voltage between a ground conductor and the conductor on which voltage is being measured. 6 figs.
Hardware realization of an SVM algorithm implemented in FPGAs

NASA Astrophysics Data System (ADS)

Wiśniewski, Remigiusz; Bazydło, Grzegorz; Szcześniak, Paweł

2017-08-01

The paper proposes a technique of hardware realization of a space vector modulation (SVM) of state function switching in matrix converter (MC), oriented on the implementation in a single field programmable gate array (FPGA). In MC the SVM method is based on the instantaneous space-vector representation of input currents and output voltages. The traditional computation algorithms usually involve digital signal processors (DSPs) which consumes the large number of power transistors (18 transistors and 18 independent PWM outputs) and "non-standard positions of control pulses" during the switching sequence. Recently, hardware implementations become popular since computed operations may be executed much faster and efficient due to nature of the digital devices (especially concurrency). In the paper, we propose a hardware algorithm of SVM computation. In opposite to the existing techniques, the presented solution applies COordinate Rotation DIgital Computer (CORDIC) method to solve the trigonometric operations. Furthermore, adequate arithmetic modules (that is, sub-devices) used for intermediate calculations, such as code converters or proper sectors selectors (for output voltages and input current) are presented in detail. The proposed technique has been implemented as a design described with the use of Verilog hardware description language. The preliminary results of logic implementation oriented on the Xilinx FPGA (particularly, low-cost device from Artix-7 family from Xilinx was used) are also presented.
Design Spectrum Analysis in NASTRAN

NASA Technical Reports Server (NTRS)

Butler, T. G.

1984-01-01

The utility of Design Spectrum Analysis is to give a mode by mode characterization of the behavior of a design under a given loading. The theory of design spectrum is discussed after operations are explained. User instructions are taken up here in three parts: Transient Preface, Maximum Envelope Spectrum, and RMS Average Spectrum followed by a Summary Table. A single DMAP ALTER packet will provide for all parts of the design spectrum operations. The starting point for getting a modal break-down of the response to acceleration loading is the Modal Transient rigid format. After eigenvalue extraction, modal vectors need to be isolated in the full set of physical coordinates (P-sized as opposed to the D-sized vectors in RF 12). After integration for transient response the results are scanned over the solution time interval for the peak values and for the times that they occur. A module called SCAN was written to do this job, that organizes these maxima into a diagonal output matrix. The maximum amplifier in each mode is applied to the eigenvector of each mode which then reveals the maximum displacements, stresses, forces and boundary reactions that the structure will experience for a load history, mode by mode. The standard NASTRAN output processors have been modified for this task. It is required that modes be normalized to mass.
Complete all-optical processing polarization-based binary logic gates and optical processors.

PubMed

Zaghloul, Y A; Zaghloul, A R M

2006-10-16

We present a complete all-optical-processing polarization-based binary-logic system, by which any logic gate or processor can be implemented. Following the new polarization-based logic presented in [Opt. Express 14, 7253 (2006)], we develop a new parallel processing technique that allows for the creation of all-optical-processing gates that produce a unique output either logic 1 or 0 only once in a truth table, and those that do not. This representation allows for the implementation of simple unforced OR, AND, XOR, XNOR, inverter, and more importantly NAND and NOR gates that can be used independently to represent any Boolean expression or function. In addition, the concept of a generalized gate is presented which opens the door for reconfigurable optical processors and programmable optical logic gates. Furthermore, the new design is completely compatible with the old one presented in [Opt. Express 14, 7253 (2006)], and with current semiconductor based devices. The gates can be cascaded, where the information is always on the laser beam. The polarization of the beam, and not its intensity, carries the information. The new methodology allows for the creation of multiple-input-multiple-output processors that implement, by itself, any Boolean function, such as specialized or non-specialized microprocessors. Three all-optical architectures are presented: orthoparallel optical logic architecture for all known and unknown binary gates, singlebranch architecture for only XOR and XNOR gates, and the railroad (RR) architecture for polarization optical processors (POP). All the control inputs are applied simultaneously leading to a single time lag which leads to a very-fast and glitch-immune POP. A simple and easy-to-follow step-by-step algorithm is provided for the POP, and design reduction methodologies are briefly discussed. The algorithm lends itself systematically to software programming and computer-assisted design. As examples, designs of all binary gates, multiple-input gates, and sequential and non-sequential Boolean expressions are presented and discussed. The operation of each design is simply understood by a bullet train traveling at the speed of light on a railroad system preconditioned by the crossover states predetermined by the control inputs. The presented designs allow for optical processing of the information eliminating the need to convert it, back and forth, to an electronic signal for processing purposes. All gates with a truth table, including for example Fredkin, Toffoli, testable reversible logic, and threshold logic gates, can be designed and implemented using the railroad architecture. That includes any future gates not known today. Those designs and the quantum gates are not discussed in this paper.
WARP3D-Release 10.8: Dynamic Nonlinear Analysis of Solids using a Preconditioned Conjugate Gradient Software Architecture

NASA Technical Reports Server (NTRS)

Koppenhoefer, Kyle C.; Gullerud, Arne S.; Ruggieri, Claudio; Dodds, Robert H., Jr.; Healy, Brian E.

1998-01-01

This report describes theoretical background material and commands necessary to use the WARP3D finite element code. WARP3D is under continuing development as a research code for the solution of very large-scale, 3-D solid models subjected to static and dynamic loads. Specific features in the code oriented toward the investigation of ductile fracture in metals include a robust finite strain formulation, a general J-integral computation facility (with inertia, face loading), an element extinction facility to model crack growth, nonlinear material models including viscoplastic effects, and the Gurson-Tver-gaard dilatant plasticity model for void growth. The nonlinear, dynamic equilibrium equations are solved using an incremental-iterative, implicit formulation with full Newton iterations to eliminate residual nodal forces. The history integration of the nonlinear equations of motion is accomplished with Newmarks Beta method. A central feature of WARP3D involves the use of a linear-preconditioned conjugate gradient (LPCG) solver implemented in an element-by-element format to replace a conventional direct linear equation solver. This software architecture dramatically reduces both the memory requirements and CPU time for very large, nonlinear solid models since formation of the assembled (dynamic) stiffness matrix is avoided. Analyses thus exhibit the numerical stability for large time (load) steps provided by the implicit formulation coupled with the low memory requirements characteristic of an explicit code. In addition to the much lower memory requirements of the LPCG solver, the CPU time required for solution of the linear equations during each Newton iteration is generally one-half or less of the CPU time required for a traditional direct solver. All other computational aspects of the code (element stiffnesses, element strains, stress updating, element internal forces) are implemented in the element-by- element, blocked architecture. This greatly improves vectorization of the code on uni-processor hardware and enables straightforward parallel-vector processing of element blocks on multi-processor hardware.
Time-variant analysis of rotorcraft systems dynamics - An exploitation of vector processors

NASA Technical Reports Server (NTRS)

Amirouche, F. M. L.; Xie, M.; Shareef, N. H.

1993-01-01

In this paper a generalized algorithmic procedure is presented for handling constraints in mechanical transmissions. The latter are treated as multibody systems of interconnected rigid/flexible bodies. The constraint Jacobian matrices are generated automatically and suitably updated in time, depending on the geometrical and kinematical constraint conditions describing the interconnection between shafts or gears. The type of constraints are classified based on the interconnection of the bodies by assuming that one or more points of contact exist between them. The effects due to elastic deformation of the flexible bodies are included by allowing each body element to undergo small deformations. The procedure is based on recursively formulated Kane's dynamical equations of motion and the finite element method, including the concept of geometrical stiffening effects. The method is implemented on an IBM-3090-600j vector processor with pipe-lining capabilities. A significant increase in the speed of execution is achieved by vectorizing the developed code in computationally intensive areas. An example consisting of two meshing disks rotating at high angular velocity is presented. Applications are intended for the study of the dynamic behavior of helicopter transmissions.

Embedded processor extensions for image processing

NASA Astrophysics Data System (ADS)

Thevenin, Mathieu; Paindavoine, Michel; Letellier, Laurent; Heyrman, Barthélémy

2008-04-01

The advent of camera phones marks a new phase in embedded camera sales. By late 2009, the total number of camera phones will exceed that of both conventional and digital cameras shipped since the invention of photography. Use in mobile phones of applications like visiophony, matrix code readers and biometrics requires a high degree of component flexibility that image processors (IPs) have not, to date, been able to provide. For all these reasons, programmable processor solutions have become essential. This paper presents several techniques geared to speeding up image processors. It demonstrates that a gain of twice is possible for the complete image acquisition chain and the enhancement pipeline downstream of the video sensor. Such results confirm the potential of these computing systems for supporting future applications.
The Sequential Implementation of Array Processors when there is Directional Uncertainty

DTIC Science & Technology

1975-08-01

University of Washington kindly supplied office space and ccputing facilities. -The author hat, benefited greatly from discussions with several other...if i Q- inverse of Q I L general observation space R general vector of observation _KR general observation vector of dimension K Exiv] "Tf -- ’ -"-T’T...7" i ’i ’:"’ - ’ ; ’ ’ ’ ’ ’ ’" ’"- Glossary of Symbols (continued) R. ith observation 1 Rm real vector space of dimension m R(T) autocorrelation
SPAR improved structural-fluid dynamic analysis capability

NASA Technical Reports Server (NTRS)

Pearson, M. L.

1985-01-01

The results of a study whose objective was to improve the operation of the SPAR computer code by improving efficiency, user features, and documentation is presented. Additional capability was added to the SPAR arithmetic utility system, including trigonometric functions, numerical integration, interpolation, and matrix combinations. Improvements were made in the EIG processor. A processor was created to compute and store principal stresses in table-format data sets. An additional capability was developed and incorporated into the plot processor which permits plotting directly from table-format data sets. Documentation of all these features is provided in the form of updates to the SPAR users manual.
AMA- and RWE- Based Adaptive Kalman Filter for Denoising Fiber Optic Gyroscope Drift Signal

PubMed Central

Yang, Gongliu; Liu, Yuanyuan; Li, Ming; Song, Shunguang

2015-01-01

An improved double-factor adaptive Kalman filter called AMA-RWE-DFAKF is proposed to denoise fiber optic gyroscope (FOG) drift signal in both static and dynamic conditions. The first factor is Kalman gain updated by random weighting estimation (RWE) of the covariance matrix of innovation sequence at any time to ensure the lowest noise level of output, but the inertia of KF response increases in dynamic condition. To decrease the inertia, the second factor is the covariance matrix of predicted state vector adjusted by RWE only when discontinuities are detected by adaptive moving average (AMA).The AMA-RWE-DFAKF is applied for denoising FOG static and dynamic signals, its performance is compared with conventional KF (CKF), RWE-based adaptive KF with gain correction (RWE-AKFG), AMA- and RWE- based dual mode adaptive KF (AMA-RWE-DMAKF). Results of Allan variance on static signal and root mean square error (RMSE) on dynamic signal show that this proposed algorithm outperforms all the considered methods in denoising FOG signal. PMID:26512665
AMA- and RWE- Based Adaptive Kalman Filter for Denoising Fiber Optic Gyroscope Drift Signal.

PubMed

Yang, Gongliu; Liu, Yuanyuan; Li, Ming; Song, Shunguang

2015-10-23

An improved double-factor adaptive Kalman filter called AMA-RWE-DFAKF is proposed to denoise fiber optic gyroscope (FOG) drift signal in both static and dynamic conditions. The first factor is Kalman gain updated by random weighting estimation (RWE) of the covariance matrix of innovation sequence at any time to ensure the lowest noise level of output, but the inertia of KF response increases in dynamic condition. To decrease the inertia, the second factor is the covariance matrix of predicted state vector adjusted by RWE only when discontinuities are detected by adaptive moving average (AMA).The AMA-RWE-DFAKF is applied for denoising FOG static and dynamic signals, its performance is compared with conventional KF (CKF), RWE-based adaptive KF with gain correction (RWE-AKFG), AMA- and RWE- based dual mode adaptive KF (AMA-RWE-DMAKF). Results of Allan variance on static signal and root mean square error (RMSE) on dynamic signal show that this proposed algorithm outperforms all the considered methods in denoising FOG signal.
DOC II 32-bit digital optical computer: optoelectronic hardware and software

NASA Astrophysics Data System (ADS)

Stone, Richard V.; Zeise, Frederick F.; Guilfoyle, Peter S.

1991-12-01

This paper describes current electronic hardware subsystems and software code which support OptiComp's 32-bit general purpose digital optical computer (DOC II). The reader is referred to earlier papers presented in this section for a thorough discussion of theory and application regarding DOC II. The primary optoelectronic subsystems include the drive electronics for the multichannel acousto-optic modulators, the avalanche photodiode amplifier, as well as threshold circuitry, and the memory subsystems. This device utilizes a single optical Boolean vector matrix multiplier and its VME based host controller interface in performing various higher level primitives. OptiComp Corporation wishes to acknowledge the financial support of the Office of Naval Research, the National Aeronautics and Space Administration, the Rome Air Development Center, and the Strategic Defense Initiative Office for the funding of this program under contracts N00014-87-C-0077, N00014-89-C-0266 and N00014-89-C- 0225.
Quantitative characterization of the carbon/carbon composites components based on video of polarized light microscope.

PubMed

Li, Yixian; Qi, Lehua; Song, Yongshan; Chao, Xujiang

2017-06-01

The components of carbon/carbon (C/C) composites have significant influence on the thermal and mechanical properties, so a quantitative characterization of component is necessary to study the microstructure of C/C composites, and further to improve the macroscopic properties of C/C composites. Considering the extinction crosses of the pyrocarbon matrix have significant moving features, the polarized light microscope (PLM) video is used to characterize C/C composites quantitatively because it contains sufficiently dynamic and structure information. Then the optical flow method is introduced to compute the optical flow field between the adjacent frames, and segment the components of C/C composites from PLM image by image processing. Meanwhile the matrix with different textures is re-segmented by the length difference of motion vectors, and then the component fraction of each component and extinction angle of pyrocarbon matrix are calculated directly. Finally, the C/C composites are successfully characterized from three aspects of carbon fiber, pyrocarbon, and pores by a series of image processing operators based on PLM video, and the errors of component fractions are less than 15%. © 2017 Wiley Periodicals, Inc.
Multiprocessing MCNP on an IBM RS/6000 cluster

DOE Office of Scientific and Technical Information (OSTI.GOV)

McKinney, G.W.; West, J.T.

1993-01-01

The advent of high-performance computer systems has brought to maturity programming concepts like vectorization, multiprocessing, and multitasking. While there are many schools of thought as to the most significant factor in obtaining order-of-magnitude increases in performance, such speedup can only be achieved by integrating the computer system and application code. Vectorization leads to faster manipulation of arrays by overlapping instruction CPU cycles. Discrete ordinates codes, which require the solving of large matrices, have proved to be major benefactors of vectorization. Monte Carlo transport, on the other hand, typically contains numerous logic statements and requires extensive redevelopment to benefit from vectorization.more » Multiprocessing and multitasking provide additional CPU cycles via multiple processors. Such systems are generally designed with either common memory access (multitasking) or distributed memory access. In both cases, theoretical speedup, as a function of the number of processors (P) and the fraction of task time that multiprocesses (f), can be formulated using Amdahl's Law S ((f,P) = 1 f + f/P). However, for most applications this theoretical limit cannot be achieved, due to additional terms not included in Amdahl's Law. Monte Carlo transport is a natural candidate for multiprocessing, since the particle tracks are generally independent and the precision of the result increases as the square root of the number of particles tracked.« less
Multi-color incomplete Cholesky conjugate gradient methods for vector computers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Poole, E.L.

1986-01-01

This research is concerned with the solution on vector computers of linear systems of equations. Ax = b, where A is a large, sparse symmetric positive definite matrix with non-zero elements lying only along a few diagonals of the matrix. The system is solved using the incomplete Cholesky conjugate gradient method (ICCG). Multi-color orderings are used of the unknowns in the linear system to obtain p-color matrices for which a no-fill block ICCG method is implemented on the CYBER 205 with O(N/p) length vector operations in both the decomposition of A and, more importantly, in the forward and back solvesmore » necessary at each iteration of the method. (N is the number of unknowns and p is a small constant). A p-colored matrix is a matrix that can be partitioned into a p x p block matrix where the diagonal blocks are diagonal matrices. The matrix is stored by diagonals and matrix multiplication by diagonals is used to carry out the decomposition of A and the forward and back solves. Additionally, if the vectors across adjacent blocks line up, then some of the overhead associated with vector startups can be eliminated in the matrix vector multiplication necessary at each conjugate gradient iteration. Necessary and sufficient conditions are given to determine which multi-color orderings of the unknowns correspond to p-color matrices, and a process is indicated for choosing multi-color orderings.« less
Elliptic-symmetry vector optical fields.

PubMed

Pan, Yue; Li, Yongnan; Li, Si-Min; Ren, Zhi-Cheng; Kong, Ling-Jun; Tu, Chenghou; Wang, Hui-Tian

2014-08-11

We present in principle and demonstrate experimentally a new kind of vector fields: elliptic-symmetry vector optical fields. This is a significant development in vector fields, as this breaks the cylindrical symmetry and enriches the family of vector fields. Due to the presence of an additional degrees of freedom, which is the interval between the foci in the elliptic coordinate system, the elliptic-symmetry vector fields are more flexible than the cylindrical vector fields for controlling the spatial structure of polarization and for engineering the focusing fields. The elliptic-symmetry vector fields can find many specific applications from optical trapping to optical machining and so on.
SPECIAL ISSUE ON OPTICAL PROCESSING OF INFORMATION: Associative properties of a multichannel photon echo and optical memory

NASA Astrophysics Data System (ADS)

Bikbov, I. S.; Zuikov, V. A.; Popov, I. I.; Popova, G. L.; Samartsev, V. V.

1995-10-01

An analysis is made of the results of an investigation of the physical principles underlying the operation of an associative optical memory and of processors utilising the photon (optical) echo phenomenon. The feasibility of constructing such optical memories is considered.
Strategies for vectorizing the sparse matrix vector product on the CRAY XMP, CRAY 2, and CYBER 205

NASA Technical Reports Server (NTRS)

Bauschlicher, Charles W., Jr.; Partridge, Harry

1987-01-01

Large, randomly sparse matrix vector products are important in a number of applications in computational chemistry, such as matrix diagonalization and the solution of simultaneous equations. Vectorization of this process is considered for the CRAY XMP, CRAY 2, and CYBER 205, using a matrix of dimension of 20,000 with from 1 percent to 6 percent nonzeros. Efficient scatter/gather capabilities add coding flexibility and yield significant improvements in performance. For the CYBER 205, it is shown that minor changes in the IO can reduce the CPU time by a factor of 50. Similar changes in the CRAY codes make a far smaller improvement.
Microscopy imaging system and method employing stimulated raman spectroscopy as a contrast mechanism

DOEpatents

Xie, Xiaoliang Sunney [Lexington, MA; Freudiger, Christian [Boston, MA; Min, Wei [Cambridge, MA

2011-09-27

A microscopy imaging system includes a first light source for providing a first train of pulses at a first center optical frequency .omega..sub.1, a second light source for providing a second train of pulses at a second center optical frequency .omega..sub.2, a modulator system, an optical detector, and a processor. The modulator system is for modulating a beam property of the second train of pulses at a modulation frequency f of at least 100 kHz. The optical detector is for detecting an integrated intensity of substantially all optical frequency components of the first train of pulses from the common focal volume by blocking the second train of pulses being modulated. The processor is for detecting, a modulation at the modulation frequency f, of the integrated intensity of the optical frequency components of the first train of pulses to provide a pixel of an image for the microscopy imaging system.
Electro-Optic Computing Architectures. Volume I

DTIC Science & Technology

1998-02-01

The objective of the Electro - Optic Computing Architecture (EOCA) program was to develop multi-function electro - optic interfaces and optical...interconnect units to enhance the performance of parallel processor systems and form the building blocks for future electro - optic computing architectures...Specifically, three multi-function interface modules were targeted for development - an Electro - Optic Interface (EOI), an Optical Interconnection Unit (OW
Underwater Threat Source Localization: Processing Sensor Network TDOAs with a Terascale Optical Core Device

DOE Office of Scientific and Technical Information (OSTI.GOV)

Barhen, Jacob; Imam, Neena

2007-01-01

Revolutionary computing technologies are defined in terms of technological breakthroughs, which leapfrog over near-term projected advances in conventional hardware and software to produce paradigm shifts in computational science. For underwater threat source localization using information provided by a dynamical sensor network, one of the most promising computational advances builds upon the emergence of digital optical-core devices. In this article, we present initial results of sensor network calculations that focus on the concept of signal wavefront time-difference-of-arrival (TDOA). The corresponding algorithms are implemented on the EnLight processing platform recently introduced by Lenslet Laboratories. This tera-scale digital optical core processor is optimizedmore » for array operations, which it performs in a fixed-point-arithmetic architecture. Our results (i) illustrate the ability to reach the required accuracy in the TDOA computation, and (ii) demonstrate that a considerable speed-up can be achieved when using the EnLight 64a prototype processor as compared to a dual Intel XeonTM processor.« less
Multi-gigabit optical interconnects for next-generation on-board digital equipment

NASA Astrophysics Data System (ADS)

Venet, Norbert; Favaro, Henri; Sotom, Michel; Maignan, Michel; Berthon, Jacques

2017-11-01

Parallel optical interconnects are experimentally assessed as a technology that may offer the high-throughput data communication capabilities required to the next-generation on-board digital processing units. An optical backplane interconnect was breadboarded, on the basis of a digital transparent processor that provides flexible connectivity and variable bandwidth in telecom missions with multi-beam antenna coverage. The unit selected for the demonstration required that more than tens of Gbit/s be supported by the backplane. The demonstration made use of commercial parallel optical link modules at 850 nm wavelength, with 12 channels running at up to 2.5 Gbit/s. A flexible optical fibre circuit was developed so as to route board-to-board connections. It was plugged to the optical transmitter and receiver modules through 12-fibre MPO connectors. BER below 10-14 and optical link budgets in excess of 12 dB were measured, which would enable to integrate broadcasting. Integration of the optical backplane interconnect was successfully demonstrated by validating the overall digital processor functionality.
Multi-gigabit optical interconnects for next-generation on-board digital equipment

NASA Astrophysics Data System (ADS)

Venet, Norbert; Favaro, Henri; Sotom, Michel; Maignan, Michel; Berthon, Jacques

2004-06-01

Parallel optical interconnects are experimentally assessed as a technology that may offer the high-throughput data communication capabilities required to the next-generation on-board digital processing units. An optical backplane interconnect was breadboarded, on the basis of a digital transparent processor that provides flexible connectivity and variable bandwidth in telecom missions with multi-beam antenna coverage. The unit selected for the demonstration required that more than tens of Gbit/s be supported by the backplane. The demonstration made use of commercial parallel optical link modules at 850 nm wavelength, with 12 channels running at up to 2.5 Gbit/s. A flexible optical fibre circuit was developed so as to route board-to-board connections. It was plugged to the optical transmitter and receiver modules through 12-fibre MPO connectors. BER below 10-14 and optical link budgets in excess of 12 dB were measured, which would enable to integrate broadcasting. Integration of the optical backplane interconnect was successfully demonstrated by validating the overall digital processor functionality.
Adaptive packet switch with an optical core (demonstrator)

NASA Astrophysics Data System (ADS)

Abdo, Ahmad; Bishtein, Vadim; Clark, Stewart A.; Dicorato, Pino; Lu, David T.; Paredes, Sofia A.; Taebi, Sareh; Hall, Trevor J.

2004-11-01

A three-stage opto-electronic packet switch architecture is described consisting of a reconfigurable optical centre stage surrounded by two electronic buffering stages partitioned into sectors to ease memory contention. A Flexible Bandwidth Provision (FBP) algorithm, implemented on a soft-core processor, is used to change the configuration of the input sectors and optical centre stage to set up internal paths that will provide variable bandwidth to serve the traffic. The switch is modeled by a bipartite graph built from a service matrix, which is a function of the arriving traffic. The bipartite graph is decomposed by solving an edge-colouring problem and the resulting permutations are used to configure the switch. Simulation results show that this architecture exhibits a dramatic reduction of complexity and increased potential for scalability, at the price of only a modest spatial speed-up k, 1
AOF LTAO mode: reconstruction strategy and first test results

NASA Astrophysics Data System (ADS)

Oberti, Sylvain; Kolb, Johann; Le Louarn, Miska; La Penna, Paolo; Madec, Pierre-Yves; Neichel, Benoit; Sauvage, Jean-François; Fusco, Thierry; Donaldson, Robert; Soenke, Christian; Suárez Valles, Marcos; Arsenault, Robin

2016-07-01

GALACSI is the Adaptive Optics (AO) system serving the instrument MUSE in the framework of the Adaptive Optics Facility (AOF) project. Its Narrow Field Mode (NFM) is a Laser Tomography AO (LTAO) mode delivering high resolution in the visible across a small Field of View (FoV) of 7.5" diameter around the optical axis. From a reconstruction standpoint, GALACSI NFM intends to optimize the correction on axis by estimating the turbulence in volume via a tomographic process, then projecting the turbulence profile onto one single Deformable Mirror (DM) located in the pupil, close to the ground. In this paper, the laser tomographic reconstruction process is described. Several methods (virtual DM, virtual layer projection) are studied, under the constraint of a single matrix vector multiplication. The pseudo-synthetic interaction matrix model and the LTAO reconstructor design are analysed. Moreover, the reconstruction parameter space is explored, in particular the regularization terms. Furthermore, we present here the strategy to define the modal control basis and split the reconstruction between the Low Order (LO) loop and the High Order (HO) loop. Finally, closed loop performance obtained with a 3D turbulence generator will be analysed with respect to the most relevant system parameters to be tuned.
First experience of vectorizing electromagnetic physics models for detector simulation

NASA Astrophysics Data System (ADS)

Amadio, G.; Apostolakis, J.; Bandieramonte, M.; Bianchini, C.; Bitzes, G.; Brun, R.; Canal, P.; Carminati, F.; de Fine Licht, J.; Duhem, L.; Elvira, D.; Gheata, A.; Jun, S. Y.; Lima, G.; Novak, M.; Presbyterian, M.; Shadura, O.; Seghal, R.; Wenzel, S.

2015-12-01

The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. The GeantV vector prototype for detector simulations has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth, parallelization needed to achieve optimal performance or memory access latency and speed. An additional challenge is to avoid the code duplication often inherent to supporting heterogeneous platforms. In this paper we present the first experience of vectorizing electromagnetic physics models developed for the GeantV project.

An FPGA Architecture for Extracting Real-Time Zernike Coefficients from Measured Phase Gradients

NASA Astrophysics Data System (ADS)

Moser, Steven; Lee, Peter; Podoleanu, Adrian

2015-04-01

Zernike modes are commonly used in adaptive optics systems to represent optical wavefronts. However, real-time calculation of Zernike modes is time consuming due to two factors: the large factorial components in the radial polynomials used to define them and the large inverse matrix calculation needed for the linear fit. This paper presents an efficient parallel method for calculating Zernike coefficients from phase gradients produced by a Shack-Hartman sensor and its real-time implementation using an FPGA by pre-calculation and storage of subsections of the large inverse matrix. The architecture exploits symmetries within the Zernike modes to achieve a significant reduction in memory requirements and a speed-up of 2.9 when compared to published results utilising a 2D-FFT method for a grid size of 8×8. Analysis of processor element internal word length requirements show that 24-bit precision in precalculated values of the Zernike mode partial derivatives ensures less than 0.5% error per Zernike coefficient and an overall error of <1%. The design has been synthesized on a Xilinx Spartan-6 XC6SLX45 FPGA. The resource utilisation on this device is <3% of slice registers, <15% of slice LUTs, and approximately 48% of available DSP blocks independent of the Shack-Hartmann grid size. Block RAM usage is <16% for Shack-Hartmann grid sizes up to 32×32.
High-Speed Computation of the Kleene Star in Max-Plus Algebraic System Using a Cell Broadband Engine

NASA Astrophysics Data System (ADS)

Goto, Hiroyuki

This research addresses a high-speed computation method for the Kleene star of the weighted adjacency matrix in a max-plus algebraic system. We focus on systems whose precedence constraints are represented by a directed acyclic graph and implement it on a Cell Broadband Engine™ (CBE) processor. Since the resulting matrix gives the longest travel times between two adjacent nodes, it is often utilized in scheduling problem solvers for a class of discrete event systems. This research, in particular, attempts to achieve a speedup by using two approaches: parallelization and SIMDization (Single Instruction, Multiple Data), both of which can be accomplished by a CBE processor. The former refers to a parallel computation using multiple cores, while the latter is a method whereby multiple elements are computed by a single instruction. Using the implementation on a Sony PlayStation 3™ equipped with a CBE processor, we found that the SIMDization is effective regardless of the system's size and the number of processor cores used. We also found that the scalability of using multiple cores is remarkable especially for systems with a large number of nodes. In a numerical experiment where the number of nodes is 2000, we achieved a speedup of 20 times compared with the method without the above techniques.
Development of a software interface for optical disk archival storage for a new life sciences flight experiments computer

NASA Technical Reports Server (NTRS)

Bartram, Peter N.

1989-01-01

The current Life Sciences Laboratory Equipment (LSLE) microcomputer for life sciences experiment data acquisition is now obsolete. Among the weaknesses of the current microcomputer are small memory size, relatively slow analog data sampling rates, and the lack of a bulk data storage device. While life science investigators normally prefer data to be transmitted to Earth as it is taken, this is not always possible. No down-link exists for experiments performed in the Shuttle middeck region. One important aspect of a replacement microcomputer is provision for in-flight storage of experimental data. The Write Once, Read Many (WORM) optical disk was studied because of its high storage density, data integrity, and the availability of a space-qualified unit. In keeping with the goals for a replacement microcomputer based upon commercially available components and standard interfaces, the system studied includes a Small Computer System Interface (SCSI) for interfacing the WORM drive. The system itself is designed around the STD bus, using readily available boards. Configurations examined were: (1) master processor board and slave processor board with the SCSI interface; (2) master processor with SCSI interface; (3) master processor with SCSI and Direct Memory Access (DMA); (4) master processor controlling a separate STD bus SCSI board; and (5) master processor controlling a separate STD bus SCSI board with DMA.
SIMD Optimization of Linear Expressions for Programmable Graphics Hardware

PubMed Central

Bajaj, Chandrajit; Ihm, Insung; Min, Jungki; Oh, Jinsang

2009-01-01

The increased programmability of graphics hardware allows efficient graphical processing unit (GPU) implementations of a wide range of general computations on commodity PCs. An important factor in such implementations is how to fully exploit the SIMD computing capacities offered by modern graphics processors. Linear expressions in the form of ȳ = Ax̄ + b̄, where A is a matrix, and x̄, ȳ and b̄ are vectors, constitute one of the most basic operations in many scientific computations. In this paper, we propose a SIMD code optimization technique that enables efficient shader codes to be generated for evaluating linear expressions. It is shown that performance can be improved considerably by efficiently packing arithmetic operations into four-wide SIMD instructions through reordering of the operations in linear expressions. We demonstrate that the presented technique can be used effectively for programming both vertex and pixel shaders for a variety of mathematical applications, including integrating differential equations and solving a sparse linear system of equations using iterative methods. PMID:19946569
Simulating and Detecting Radiation-Induced Errors for Onboard Machine Learning

NASA Technical Reports Server (NTRS)

Wagstaff, Kiri L.; Bornstein, Benjamin; Granat, Robert; Tang, Benyang; Turmon, Michael

2009-01-01

Spacecraft processors and memory are subjected to high radiation doses and therefore employ radiation-hardened components. However, these components are orders of magnitude more expensive than typical desktop components, and they lag years behind in terms of speed and size. We have integrated algorithm-based fault tolerance (ABFT) methods into onboard data analysis algorithms to detect radiation-induced errors, which ultimately may permit the use of spacecraft memory that need not be fully hardened, reducing cost and increasing capability at the same time. We have also developed a lightweight software radiation simulator, BITFLIPS, that permits evaluation of error detection strategies in a controlled fashion, including the specification of the radiation rate and selective exposure of individual data structures. Using BITFLIPS, we evaluated our error detection methods when using a support vector machine to analyze data collected by the Mars Odyssey spacecraft. We found ABFT error detection for matrix multiplication is very successful, while error detection for Gaussian kernel computation still has room for improvement.
Improved full analytical polygon-based method using Fourier analysis of the three-dimensional affine transformation.

PubMed

Pan, Yijie; Wang, Yongtian; Liu, Juan; Li, Xin; Jia, Jia

2014-03-01

Previous research [Appl. Opt.52, A290 (2013)] has revealed that Fourier analysis of three-dimensional affine transformation theory can be used to improve the computation speed of the traditional polygon-based method. In this paper, we continue our research and propose an improved full analytical polygon-based method developed upon this theory. Vertex vectors of primitive and arbitrary triangles and the pseudo-inverse matrix were used to obtain an affine transformation matrix representing the spatial relationship between the two triangles. With this relationship and the primitive spectrum, we analytically obtained the spectrum of the arbitrary triangle. This algorithm discards low-level angular dependent computations. In order to add diffusive reflection to each arbitrary surface, we also propose a whole matrix computation approach that takes advantage of the affine transformation matrix and uses matrix multiplication to calculate shifting parameters of similar sub-polygons. The proposed method improves hologram computation speed for the conventional full analytical approach. Optical experimental results are demonstrated which prove that the proposed method can effectively reconstruct three-dimensional scenes.
Linear optical response of carbon nanotubes under axial magnetic field

NASA Astrophysics Data System (ADS)

Moradian, Rostam; Chegel, Raad; Behzad, Somayeh

2010-04-01

We considered single walled carbon naotubes (SWCNTs) as real three dimensional (3D) systems in a cylindrical coordinate. The optical matrix elements and linear susceptibility, χ(ω), in the tight binding approximation in terms of one-dimensional wave vector, kz and subband index, l are calculated. In an external axial magnetic field optical frequency dependence of linear susceptibility are investigated. We found that axial magnetic field has two effects on the imaginary part of the linear susceptibility spectrum, in agreement with experimental results. The first effect is broadening and the second, splitting. Also we found that for all metallic zigzag and armchair SWCNTs, the axial magnetic field leads to the creation of a peak with energy less than 1.5 eV, contrary to what is observed in the absence of a magnetic field.
Closed-form integrator for the quaternion (euler angle) kinematics equations

NASA Technical Reports Server (NTRS)

Whitmore, Stephen A. (Inventor)

2000-01-01

The invention is embodied in a method of integrating kinematics equations for updating a set of vehicle attitude angles of a vehicle using 3-dimensional angular velocities of the vehicle, which includes computing an integrating factor matrix from quantities corresponding to the 3-dimensional angular velocities, computing a total integrated angular rate from the quantities corresponding to a 3-dimensional angular velocities, computing a state transition matrix as a sum of (a) a first complementary function of the total integrated angular rate and (b) the integrating factor matrix multiplied by a second complementary function of the total integrated angular rate, and updating the set of vehicle attitude angles using the state transition matrix. Preferably, the method further includes computing a quanternion vector from the quantities corresponding to the 3-dimensional angular velocities, in which case the updating of the set of vehicle attitude angles using the state transition matrix is carried out by (a) updating the quanternion vector by multiplying the quanternion vector by the state transition matrix to produce an updated quanternion vector and (b) computing an updated set of vehicle attitude angles from the updated quanternion vector. The first and second trigonometric functions are complementary, such as a sine and a cosine. The quantities corresponding to the 3-dimensional angular velocities include respective averages of the 3-dimensional angular velocities over plural time frames. The updating of the quanternion vector preserves the norm of the vector, whereby the updated set of vehicle attitude angles are virtually error-free.
Guidance of Autonomous Aerospace Vehicles for Vertical Soft Landing using Nonlinear Control Theory

DTIC Science & Technology

2015-08-11

Measured and Kalman filter Estimate of the Roll Attitude of the Quad . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.4...and faster Hart- ley et al. [2013]. With availability of small, light, high fidelity sensors (Inertial Measurement Units IMU ) and processors on board...is a product of inverse of rotation matrix and inertia matrix for the quad frame. Since both the matrix are invertible at all times except when roll
Decoding and optimized implementation of SECDED codes over GF(q)

DOEpatents

Ward, H. Lee; Ganti, Anand; Resnick, David R

2013-10-22

A plurality of columns for a check matrix that implements a distance d linear error correcting code are populated by providing a set of vectors from which to populate the columns, and applying to the set of vectors a filter operation that reduces the set by eliminating therefrom all vectors that would, if used to populate the columns, prevent the check matrix from satisfying a column-wise linear independence requirement associated with check matrices of distance d linear codes. One of the vectors from the reduced set may then be selected to populate one of the columns. The filtering and selecting repeats iteratively until either all of the columns are populated or the number of currently unpopulated columns exceeds the number of vectors in the reduced set. Columns for the check matrix may be processed to reduce the amount of logic needed to implement the check matrix in circuit logic.
Design, decoding and optimized implementation of SECDED codes over GF(q)

DOEpatents

Ward, H Lee; Ganti, Anand; Resnick, David R

2014-06-17

A plurality of columns for a check matrix that implements a distance d linear error correcting code are populated by providing a set of vectors from which to populate the columns, and applying to the set of vectors a filter operation that reduces the set by eliminating therefrom all vectors that would, if used to populate the columns, prevent the check matrix from satisfying a column-wise linear independence requirement associated with check matrices of distance d linear codes. One of the vectors from the reduced set may then be selected to populate one of the columns. The filtering and selecting repeats iteratively until either all of the columns are populated or the number of currently unpopulated columns exceeds the number of vectors in the reduced set. Columns for the check matrix may be processed to reduce the amount of logic needed to implement the check matrix in circuit logic.
Decoding and optimized implementation of SECDED codes over GF(q)

DOEpatents

Ward, H Lee; Ganti, Anand; Resnick, David R

2014-11-18

A plurality of columns for a check matrix that implements a distance d linear error correcting code are populated by providing a set of vectors from which to populate the columns, and applying to the set of vectors a filter operation that reduces the set by eliminating therefrom all vectors that would, if used to populate the columns, prevent the check matrix from satisfying a column-wise linear independence requirement associated with check matrices of distance d linear codes. One of the vectors from the reduced set may then be selected to populate one of the columns. The filtering and selecting repeats iteratively until either all of the columns are populated or the number of currently unpopulated columns exceeds the number of vectors in the reduced set. Columns for the check matrix may be processed to reduce the amount of logic needed to implement the check matrix in circuit logic.
Benefit of the UltraZoom beamforming technology in noise in cochlear implant users.

PubMed

Mosnier, Isabelle; Mathias, Nathalie; Flament, Jonathan; Amar, Dorith; Liagre-Callies, Amelie; Borel, Stephanie; Ambert-Dahan, Emmanuèle; Sterkers, Olivier; Bernardeschi, Daniele

2017-09-01

The objectives of the study were to demonstrate the audiological and subjective benefits of the adaptive UltraZoom beamforming technology available in the Naída CI Q70 sound processor, in cochlear-implanted adults upgraded from a previous generation sound processor. Thirty-four adults aged between 21 and 89 years (mean 53 ± 19) were prospectively included. Nine subjects were unilaterally implanted, 11 bilaterally and 14 were bimodal users. The mean duration of cochlear implant use was 7 years (range 5-15 years). Subjects were tested in quiet with monosyllabic words and in noise with the adaptive French Matrix test in the best-aided conditions. The test setup contained a signal source in front of the subject and three noise sources at +/-90° and 180°. The noise was presented at a fixed level of 65 dB SPL and the level of speech signal was varied to obtain the speech reception threshold (SRT). During the upgrade visit, subjects were tested with the Harmony and with the Naída CI sound processors in omnidirectional microphone configuration. After a take-home phase of 2 months, tests were repeated with the Naída CI processor with and without UltraZoom. Subjective assessment of the sound quality in daily environments was recorded using the APHAB questionnaire. No difference in performance was observed in quiet between the two processors. The Matrix test in noise was possible in the 21 subjects with the better performance. No difference was observed between the two processors for performance in noise when using the omnidirectional microphone. At the follow-up session, the median SRT with the Naída CI processor with UltraZoom was -4 dB compared to -0.45 dB without UltraZoom. The use of UltraZoom improved the median SRT by 3.6 dB (p < 0.0001, Wilcoxon paired test). When looking at the APHAB outcome, improvement was observed for speech understanding in noisy environments (p < 0.01) and in aversive situations (p < 0.05) in the group of 21 subjects who were able to perform the Matrix test in noise and for speech understanding in noise (p < 0.05) in the group of 13 subjects with the poorest performance, who were not able to perform the Matrix test in noise. The use of UltraZoom beamforming technology, available on the new sound processor Naída CI, improves speech performance in difficult and realistic noisy conditions when the cochlear implant user needs to focus on the person speaking at the front. Using the APHAB questionnaire, a subjective benefit for listening in background noise was also observed in subjects with good performance as well as in those with poor performance. This study highlighted the importance of upgrading CI recipients to new technology and to include assessment in noise and subjective feedback evaluation as part of the process.
Communication Optimal Parallel Multiplication of Sparse Random Matrices

DTIC Science & Technology

2013-02-21

Definition 2.1), and (2) the algorithm is sparsity- independent, where the computation is statically partitioned to processors independent of the sparsity...struc- ture of the input matrices (see Definition 2.5). The second assumption applies to nearly all existing al- gorithms for general sparse matrix-matrix...where A and B are n× n ER(d) matrices: Definition 2.1 An ER(d) matrix is an adjacency matrix of an Erdős-Rényi graph with parameters n and d/n. That
Real-time optical signal processors employing optical feedback: amplitude and phase control.

PubMed

Gallagher, N C

1976-04-01

The development of real-time coherent optical signal processors has increased the appeal of optical computing techniques in signal processing applications. A major limitation of these real-time systems is the. fact that the optical processing material is generally of a phase-only type. The result is that the spatial filters synthesized with these systems must be either phase-only filters or amplitude-only filters. The main concern of this paper is the application of optical feedback techniques to obtain simultaneous and independent amplitude and phase control of the light passing through the system. It is shown that optical feedback techniques may be employed with phase-only spatial filters to obtain this amplitude and phase control. The feedback system with phase-only filters is compared with other feedback systems that employ combinations of phase-only and amplitude-only filters; it is found that the phase-only system is substantially more flexible than the other two systems investigated.
Technology and design of an active-matrix OLED on crystalline silicon direct-view display for a wristwatch computer

NASA Astrophysics Data System (ADS)

Sanford, James L.; Schlig, Eugene S.; Prache, Olivier; Dove, Derek B.; Ali, Tariq A.; Howard, Webster E.

2002-02-01

The IBM Research Division and eMagin Corp. jointly have developed a low-power VGA direct view active matrix OLED display, fabricated on a crystalline silicon CMOS chip. The display is incorporated in IBM prototype wristwatch computers running the Linus operating system. IBM designed the silicon chip and eMagin developed the organic stack and performed the back-end-of line processing and packaging. Each pixel is driven by a constant current source controlled by a CMOS RAM cell, and the display receives its data from the processor memory bus. This paper describes the OLED technology and packaging, and outlines the design of the pixel and display electronics and the processor interface. Experimental results are presented.
Photographic film image enhancement

NASA Technical Reports Server (NTRS)

Horner, J. L.

1975-01-01

A series of experiments were undertaken to assess the feasibility of defogging color film by the techniques of optical spatial filtering. A coherent optical processor was built using red, blue, and green laser light input and specially designed Fourier transformation lenses. An array of spatial filters was fabricated on black and white emulsion slides using the coherent optical processor. The technique was first applied to laboratory white light fogged film, and the results were successful. However, when the same technique was applied to some original Apollo X radiation fogged color negatives, the results showed no similar restoration. Examples of each experiment are presented and possible reasons for the lack of restoration in the Apollo films are discussed.
ms2: A molecular simulation tool for thermodynamic properties

NASA Astrophysics Data System (ADS)

Deublein, Stephan; Eckl, Bernhard; Stoll, Jürgen; Lishchuk, Sergey V.; Guevara-Carrion, Gabriela; Glass, Colin W.; Merker, Thorsten; Bernreuther, Martin; Hasse, Hans; Vrabec, Jadran

2011-11-01

This work presents the molecular simulation program ms2 that is designed for the calculation of thermodynamic properties of bulk fluids in equilibrium consisting of small electro-neutral molecules. ms2 features the two main molecular simulation techniques, molecular dynamics (MD) and Monte-Carlo. It supports the calculation of vapor-liquid equilibria of pure fluids and multi-component mixtures described by rigid molecular models on the basis of the grand equilibrium method. Furthermore, it is capable of sampling various classical ensembles and yields numerous thermodynamic properties. To evaluate the chemical potential, Widom's test molecule method and gradual insertion are implemented. Transport properties are determined by equilibrium MD simulations following the Green-Kubo formalism. ms2 is designed to meet the requirements of academia and industry, particularly achieving short response times and straightforward handling. It is written in Fortran90 and optimized for a fast execution on a broad range of computer architectures, spanning from single processor PCs over PC-clusters and vector computers to high-end parallel machines. The standard Message Passing Interface (MPI) is used for parallelization and ms2 is therefore easily portable to different computing platforms. Feature tools facilitate the interaction with the code and the interpretation of input and output files. The accuracy and reliability of ms2 has been shown for a large variety of fluids in preceding work. Program summaryProgram title:ms2 Catalogue identifier: AEJF_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEJF_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Special Licence supplied by the authors No. of lines in distributed program, including test data, etc.: 82 794 No. of bytes in distributed program, including test data, etc.: 793 705 Distribution format: tar.gz Programming language: Fortran90 Computer: The simulation tool ms2 is usable on a wide variety of platforms, from single processor machines over PC-clusters and vector computers to vector-parallel architectures. (Tested with Fortran compilers: gfortran, Intel, PathScale, Portland Group and Sun Studio.) Operating system: Unix/Linux, Windows Has the code been vectorized or parallelized?: Yes. Message Passing Interface (MPI) protocol Scalability. Excellent scalability up to 16 processors for molecular dynamics and >512 processors for Monte-Carlo simulations. RAM:ms2 runs on single processors with 512 MB RAM. The memory demand rises with increasing number of processors used per node and increasing number of molecules. Classification: 7.7, 7.9, 12 External routines: Message Passing Interface (MPI) Nature of problem: Calculation of application oriented thermodynamic properties for rigid electro-neutral molecules: vapor-liquid equilibria, thermal and caloric data as well as transport properties of pure fluids and multi-component mixtures. Solution method: Molecular dynamics, Monte-Carlo, various classical ensembles, grand equilibrium method, Green-Kubo formalism. Restrictions: No. The system size is user-defined. Typical problems addressed by ms2 can be solved by simulating systems containing typically 2000 molecules or less. Unusual features: Feature tools are available for creating input files, analyzing simulation results and visualizing molecular trajectories. Additional comments: Sample makefiles for multiple operation platforms are provided. Documentation is provided with the installation package and is available at http://www.ms-2.de. Running time: The running time of ms2 depends on the problem set, the system size and the number of processes used in the simulation. Running four processes on a "Nehalem" processor, simulations calculating VLE data take between two and twelve hours, calculating transport properties between six and 24 hours.
Analysis of structural response data using discrete modal filters. M.S. Thesis

NASA Technical Reports Server (NTRS)

Freudinger, Lawrence C.

1991-01-01

The application of reciprocal modal vectors to the analysis of structural response data is described. Reciprocal modal vectors are constructed using an existing experimental modal model and an existing frequency response matrix of a structure, and can be assembled into a matrix that effectively transforms the data from the physical space to a modal space within a particular frequency range. In other words, the weighting matrix necessary for modal vector orthogonality (typically the mass matrix) is contained within the reciprocal model matrix. The underlying goal of this work is mostly directed toward observing the modal state responses in the presence of unknown, possibly closed loop forcing functions, thus having an impact on both operating data analysis techniques and independent modal space control techniques. This study investigates the behavior of reciprocol modal vectors as modal filters with respect to certain calculation parameters and their performance with perturbed system frequency response data.
Elements of the quality management in the materials' industry

NASA Astrophysics Data System (ADS)

Ioana, Adrian; Semenescu, Augustin; Costoiu, Mihnea; Marcu, Dragoş

2017-12-01

The criteria function concept consists of transforming the criteria function (CF) in a quality-economical matrix math MQE. The levels of prescribing the criteria function was obtained by using a composition algorithm for three vectors: T¯ vector - technical parameters' vector (ti); Ē vector - economical parameters' vector (ej) and P¯ vector - weight vector (p1). For each product or service, the area of the circle represents the value of its sales. The BCG Matrix thus offers a very useful map of the organization's service strengths and weaknesses, at least in terms of current profitability, as well as the likely cash flows.

Scalable non-negative matrix tri-factorization.

PubMed

Čopar, Andrej; Žitnik, Marinka; Zupan, Blaž

2017-01-01

Matrix factorization is a well established pattern discovery tool that has seen numerous applications in biomedical data analytics, such as gene expression co-clustering, patient stratification, and gene-disease association mining. Matrix factorization learns a latent data model that takes a data matrix and transforms it into a latent feature space enabling generalization, noise removal and feature discovery. However, factorization algorithms are numerically intensive, and hence there is a pressing challenge to scale current algorithms to work with large datasets. Our focus in this paper is matrix tri-factorization, a popular method that is not limited by the assumption of standard matrix factorization about data residing in one latent space. Matrix tri-factorization solves this by inferring a separate latent space for each dimension in a data matrix, and a latent mapping of interactions between the inferred spaces, making the approach particularly suitable for biomedical data mining. We developed a block-wise approach for latent factor learning in matrix tri-factorization. The approach partitions a data matrix into disjoint submatrices that are treated independently and fed into a parallel factorization system. An appealing property of the proposed approach is its mathematical equivalence with serial matrix tri-factorization. In a study on large biomedical datasets we show that our approach scales well on multi-processor and multi-GPU architectures. On a four-GPU system we demonstrate that our approach can be more than 100-times faster than its single-processor counterpart. A general approach for scaling non-negative matrix tri-factorization is proposed. The approach is especially useful parallel matrix factorization implemented in a multi-GPU environment. We expect the new approach will be useful in emerging procedures for latent factor analysis, notably for data integration, where many large data matrices need to be collectively factorized.
Analog hardware for learning neural networks

NASA Technical Reports Server (NTRS)

Eberhardt, Silvio P. (Inventor)

1991-01-01

This is a recurrent or feedforward analog neural network processor having a multi-level neuron array and a synaptic matrix for storing weighted analog values of synaptic connection strengths which is characterized by temporarily changing one connection strength at a time to determine its effect on system output relative to the desired target. That connection strength is then adjusted based on the effect, whereby the processor is taught the correct response to training examples connection by connection.
Parallel Implementation of the Wideband DOA Algorithm on the IBM Cell BE Processor

DTIC Science & Technology

2010-05-01

Abstract—The Multiple Signal Classification ( MUSIC ) algorithm is a powerful technique for determining the Direction of Arrival (DOA) of signals...Broadband Engine Processor (Cell BE). The process of adapting the serial based MUSIC algorithm to the Cell BE will be analyzed in terms of parallelism and...using Multiple Signal Classification MUSIC algorithm [4] • Computation of Focus matrix • Computation of number of sources • Separation of Signal
[Orthogonal Vector Projection Algorithm for Spectral Unmixing].

PubMed

Song, Mei-ping; Xu, Xing-wei; Chang, Chein-I; An, Ju-bai; Yao, Li

2015-12-01

Spectrum unmixing is an important part of hyperspectral technologies, which is essential for material quantity analysis in hyperspectral imagery. Most linear unmixing algorithms require computations of matrix multiplication and matrix inversion or matrix determination. These are difficult for programming, especially hard for realization on hardware. At the same time, the computation costs of the algorithms increase significantly as the number of endmembers grows. Here, based on the traditional algorithm Orthogonal Subspace Projection, a new method called. Orthogonal Vector Projection is prompted using orthogonal principle. It simplifies this process by avoiding matrix multiplication and inversion. It firstly computes the final orthogonal vector via Gram-Schmidt process for each endmember spectrum. And then, these orthogonal vectors are used as projection vector for the pixel signature. The unconstrained abundance can be obtained directly by projecting the signature to the projection vectors, and computing the ratio of projected vector length and orthogonal vector length. Compared to the Orthogonal Subspace Projection and Least Squares Error algorithms, this method does not need matrix inversion, which is much computation costing and hard to implement on hardware. It just completes the orthogonalization process by repeated vector operations, easy for application on both parallel computation and hardware. The reasonability of the algorithm is proved by its relationship with Orthogonal Sub-space Projection and Least Squares Error algorithms. And its computational complexity is also compared with the other two algorithms', which is the lowest one. At last, the experimental results on synthetic image and real image are also provided, giving another evidence for effectiveness of the method.
Multibeam single frequency synthetic aperture radar processor for imaging separate range swaths

NASA Technical Reports Server (NTRS)

Jain, A. (Inventor)

1982-01-01

A single-frequency multibeam synthetic aperture radar for large swath imaging is disclosed. Each beam illuminates a separate ""footprint'' (i.e., range and azimuth interval). The distinct azimuth intervals for the separate beams produce a distinct Doppler frequency spectrum for each beam. After range correlation of raw data, an optical processor develops image data for the different beams by spatially separating the beams to place each beam of different Doppler frequency spectrum in a different location in the frequency plane as well as the imaging plane of the optical processor. Selection of a beam for imaging may be made in the frequency plane by adjusting the position of an aperture, or in the image plane by adjusting the position of a slit. The raw data may also be processed in digital form in an analogous manner.
Reproducibility of Mammography Units, Film Processing and Quality Imaging

NASA Astrophysics Data System (ADS)

Gaona, Enrique

2003-09-01

The purpose of this study was to carry out an exploratory survey of the problems of quality control in mammography and processors units as a diagnosis of the current situation of mammography facilities. Measurements of reproducibility, optical density, optical difference and gamma index are included. Breast cancer is the most frequently diagnosed cancer and is the second leading cause of cancer death among women in the Mexican Republic. Mammography is a radiographic examination specially designed for detecting breast pathology. We found that the problems of reproducibility of AEC are smaller than the problems of processors units because almost all processors fall outside of the acceptable variation limits and they can affect the mammography quality image and the dose to breast. Only four mammography units agree with the minimum score established by ACR and FDA for the phantom image.
Optical Inference Machines

DTIC Science & Technology

1988-06-27

de olf nessse end Id e ;-tl Sb ieeI smleo) ,Optical Artificial Intellegence ; Optical inference engines; Optical logic; Optical informationprocessing...common. They arise in areas such as expert systems and other artificial intelligence systems. In recent years, the computer science language PROLOG has...cal processors should in principle be well suited for : I artificial intelligence applications. In recent years, symbolic logic processing. , the
Performance of a Bounce-Averaged Global Model of Super-Thermal Electron Transport in the Earth's Magnetic Field

NASA Technical Reports Server (NTRS)

McGuire, Tim

1998-01-01

In this paper, we report the results of our recent research on the application of a multiprocessor Cray T916 supercomputer in modeling super-thermal electron transport in the earth's magnetic field. In general, this mathematical model requires numerical solution of a system of partial differential equations. The code we use for this model is moderately vectorized. By using Amdahl's Law for vector processors, it can be verified that the code is about 60% vectorized on a Cray computer. Speedup factors on the order of 2.5 were obtained compared to the unvectorized code. In the following sections, we discuss the methodology of improving the code. In addition to our goal of optimizing the code for solution on the Cray computer, we had the goal of scalability in mind. Scalability combines the concepts of portabilty with near-linear speedup. Specifically, a scalable program is one whose performance is portable across many different architectures with differing numbers of processors for many different problem sizes. Though we have access to a Cray at this time, the goal was to also have code which would run well on a variety of architectures.
Electro-Optic Computing Architectures: Volume II. Components and System Design and Analysis

DTIC Science & Technology

1998-02-01

The objective of the Electro - Optic Computing Architecture (EOCA) program was to develop multi-function electro - optic interfaces and optical...interconnect units to enhance the performance of parallel processor systems and form the building blocks for future electro - optic computing architectures...Specifically, three multi-function interface modules were targeted for development - an Electro - Optic Interface (EOI), an Optical Interconnection Unit
Fractal vector optical fields.

PubMed

Pan, Yue; Gao, Xu-Zhen; Cai, Meng-Qiang; Zhang, Guan-Lin; Li, Yongnan; Tu, Chenghou; Wang, Hui-Tian

2016-07-15

We introduce the concept of a fractal, which provides an alternative approach for flexibly engineering the optical fields and their focal fields. We propose, design, and create a new family of optical fields-fractal vector optical fields, which build a bridge between the fractal and vector optical fields. The fractal vector optical fields have polarization states exhibiting fractal geometry, and may also involve the phase and/or amplitude simultaneously. The results reveal that the focal fields exhibit self-similarity, and the hierarchy of the fractal has the "weeding" role. The fractal can be used to engineer the focal field.
Optical Potential Field Mapping System

NASA Technical Reports Server (NTRS)

Reid, Max B. (Inventor)

1996-01-01

The present invention relates to an optical system for creating a potential field map of a bounded two dimensional region containing a goal location and an arbitrary number of obstacles. The potential field mapping system has an imaging device and a processor. Two image writing modes are used by the imaging device, electron deposition and electron depletion. Patterns written in electron deposition mode appear black and expand. Patterns written in electron depletion mode are sharp and appear white. The generated image represents a robot's workspace. The imaging device under processor control then writes a goal location in the work-space using the electron deposition mode. The black image of the goal expands in the workspace. The processor stores the generated images, and uses them to generate a feedback pattern. The feedback pattern is written in the workspace by the imaging device in the electron deposition mode to enhance the expansion of the original goal pattern. After the feedback pattern is written, an obstacle pattern is written by the imaging device in the electron depletion mode to represent the obstacles in the robot's workspace. The processor compares a stored image to a previously stored image to determine a change therebetween. When no change occurs, the processor averages the stored images to produce the potential field map.
An implementation of a tree code on a SIMD, parallel computer

NASA Technical Reports Server (NTRS)

Olson, Kevin M.; Dorband, John E.

1994-01-01

We describe a fast tree algorithm for gravitational N-body simulation on SIMD parallel computers. The tree construction uses fast, parallel sorts. The sorted lists are recursively divided along their x, y and z coordinates. This data structure is a completely balanced tree (i.e., each particle is paired with exactly one other particle) and maintains good spatial locality. An implementation of this tree-building algorithm on a 16k processor Maspar MP-1 performs well and constitutes only a small fraction (approximately 15%) of the entire cycle of finding the accelerations. Each node in the tree is treated as a monopole. The tree search and the summation of accelerations also perform well. During the tree search, node data that is needed from another processor is simply fetched. Roughly 55% of the tree search time is spent in communications between processors. We apply the code to two problems of astrophysical interest. The first is a simulation of the close passage of two gravitationally, interacting, disk galaxies using 65,636 particles. We also simulate the formation of structure in an expanding, model universe using 1,048,576 particles. Our code attains speeds comparable to one head of a Cray Y-MP, so single instruction, multiple data (SIMD) type computers can be used for these simulations. The cost/performance ratio for SIMD machines like the Maspar MP-1 make them an extremely attractive alternative to either vector processors or large multiple instruction, multiple data (MIMD) type parallel computers. With further optimizations (e.g., more careful load balancing), speeds in excess of today's vector processing computers should be possible.
Multiprocessing MCNP on an IBN RS/6000 cluster

DOE Office of Scientific and Technical Information (OSTI.GOV)

McKinney, G.W.; West, J.T.

1993-01-01

The advent of high-performance computer systems has brought to maturity programming concepts like vectorization, multiprocessing, and multitasking. While there are many schools of thought as to the most significant factor in obtaining order-of-magnitude increases in performance, such speedup can only be achieved by integrating the computer system and application code. Vectorization leads to faster manipulation of arrays by overlapping instruction CPU cycles. Discrete ordinates codes, which require the solving of large matrices, have proved to be major benefactors of vectorization. Monte Carlo transport, on the other hand, typically contains numerous logic statements and requires extensive redevelopment to benefit from vectorization.more » Multiprocessing and multitasking provide additional CPU cycles via multiple processors. Such systems are generally designed with either common memory access (multitasking) or distributed memory access. In both cases, theoretical speedup, as a function of the number of processors P and the fraction f of task time that multiprocesses, can be formulated using Amdahl's law: S(f, P) =1/(1-f+f/P). However, for most applications, this theoretical limit cannot be achieved because of additional terms (e.g., multitasking overhead, memory overlap, etc.) that are not included in Amdahl's law. Monte Carlo transport is a natural candidate for multiprocessing because the particle tracks are generally independent, and the precision of the result increases as the square Foot of the number of particles tracked.« less
Multiprocessing MCNP on an IBM RS/6000 cluster

DOE Office of Scientific and Technical Information (OSTI.GOV)

McKinney, G.W.; West, J.T.

1993-03-01

The advent of high-performance computer systems has brought to maturity programming concepts like vectorization, multiprocessing, and multitasking. While there are many schools of thought as to the most significant factor in obtaining order-of-magnitude increases in performance, such speedup can only be achieved by integrating the computer system and application code. Vectorization leads to faster manipulation of arrays by overlapping instruction CPU cycles. Discrete ordinates codes, which require the solving of large matrices, have proved to be major benefactors of vectorization. Monte Carlo transport, on the other hand, typically contains numerous logic statements and requires extensive redevelopment to benefit from vectorization.more » Multiprocessing and multitasking provide additional CPU cycles via multiple processors. Such systems are generally designed with either common memory access (multitasking) or distributed memory access. In both cases, theoretical speedup, as a function of the number of processors (P) and the fraction of task time that multiprocesses (f), can be formulated using Amdahl`s Law S ((f,P) = 1 f + f/P). However, for most applications this theoretical limit cannot be achieved, due to additional terms not included in Amdahl`s Law. Monte Carlo transport is a natural candidate for multiprocessing, since the particle tracks are generally independent and the precision of the result increases as the square root of the number of particles tracked.« less
Parallel solution of closely coupled systems

NASA Technical Reports Server (NTRS)

Utku, S.; Salama, M.

1986-01-01

The odd-even permutation and associated unitary transformations for reordering the matrix coefficient A are employed as means of breaking the strong seriality which is characteristic of closely coupled systems. The nested dissection technique is also reviewed, and the equivalence between reordering A and dissecting its network is established. The effect of transforming A with odd-even permutation on its topology and the topology of its Cholesky factors is discussed. This leads to the construction of directed graphs showing the computational steps required for factoring A, their precedence relationships and their sequential and concurrent assignment to the available processors. Expressions for the speed-up and efficiency of using N processors in parallel relative to the sequential use of a single processor are derived from the directed graph. Similar expressions are also derived when the number of available processors is fewer than required.
Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications

NASA Technical Reports Server (NTRS)

OKeefe, Matthew (Editor); Kerr, Christopher L. (Editor)

1998-01-01

This report contains the abstracts and technical papers from the Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications, held June 15-18, 1998, in Scottsdale, Arizona. The purpose of the workshop is to bring together software developers in meteorology and oceanography to discuss software engineering and code design issues for parallel architectures, including Massively Parallel Processors (MPP's), Parallel Vector Processors (PVP's), Symmetric Multi-Processors (SMP's), Distributed Shared Memory (DSM) multi-processors, and clusters. Issues to be discussed include: (1) code architectures for current parallel models, including basic data structures, storage allocation, variable naming conventions, coding rules and styles, i/o and pre/post-processing of data; (2) designing modular code; (3) load balancing and domain decomposition; (4) techniques that exploit parallelism efficiently yet hide the machine-related details from the programmer; (5) tools for making the programmer more productive; and (6) the proliferation of programming models (F--, OpenMP, MPI, and HPF).
Optical Associative Processors For Visual Perception"

NASA Astrophysics Data System (ADS)

Casasent, David; Telfer, Brian

1988-05-01

We consider various associative processor modifications required to allow these systems to be used for visual perception, scene analysis, and object recognition. For these applications, decisions on the class of the objects present in the input image are required and thus heteroassociative memories are necessary (rather than the autoassociative memories that have been given most attention). We analyze the performance of both associative processors and note that there is considerable difference between heteroassociative and autoassociative memories. We describe associative processors suitable for realizing functions such as: distortion invariance (using linear discriminant function memory synthesis techniques), noise and image processing performance (using autoassociative memories in cascade with with a heteroassociative processor and with a finite number of autoassociative memory iterations employed), shift invariance (achieved through the use of associative processors operating on feature space data), and the analysis of multiple objects in high noise (which is achieved using associative processing of the output from symbolic correlators). We detail and provide initial demonstrations of the use of associative processors operating on iconic, feature space and symbolic data, as well as adaptive associative processors.
Vector optical fields with bipolar symmetry of linear polarization.

PubMed

Pan, Yue; Li, Yongnan; Li, Si-Min; Ren, Zhi-Cheng; Si, Yu; Tu, Chenghou; Wang, Hui-Tian

2013-09-15

We focus on a new kind of vector optical field with bipolar symmetry of linear polarization instead of cylindrical and elliptical symmetries, enriching members of family of vector optical fields. We design theoretically and generate experimentally the demanded vector optical fields and then explore some novel tightly focusing properties. The geometric configurations of states of polarization provide additional degrees of freedom assisting in engineering the field distribution at the focus to the specific applications such as lithography, optical trapping, and material processing.
Optical Interconnections for VLSI Computational Systems Using Computer-Generated Holography.

NASA Astrophysics Data System (ADS)

Feldman, Michael Robert

Optical interconnects for VLSI computational systems using computer generated holograms are evaluated in theory and experiment. It is shown that by replacing particular electronic connections with free-space optical communication paths, connection of devices on a single chip or wafer and between chips or modules can be improved. Optical and electrical interconnects are compared in terms of power dissipation, communication bandwidth, and connection density. Conditions are determined for which optical interconnects are advantageous. Based on this analysis, it is shown that by applying computer generated holographic optical interconnects to wafer scale fine grain parallel processing systems, dramatic increases in system performance can be expected. Some new interconnection networks, designed to take full advantage of optical interconnect technology, have been developed. Experimental Computer Generated Holograms (CGH's) have been designed, fabricated and subsequently tested in prototype optical interconnected computational systems. Several new CGH encoding methods have been developed to provide efficient high performance CGH's. One CGH was used to decrease the access time of a 1 kilobit CMOS RAM chip. Another was produced to implement the inter-processor communication paths in a shared memory SIMD parallel processor array.
High-performance parallel processors based on star-coupled wavelength division multiplexing optical interconnects

DOEpatents

Deri, Robert J.; DeGroot, Anthony J.; Haigh, Ronald E.

2002-01-01

As the performance of individual elements within parallel processing systems increases, increased communication capability between distributed processor and memory elements is required. There is great interest in using fiber optics to improve interconnect communication beyond that attainable using electronic technology. Several groups have considered WDM, star-coupled optical interconnects. The invention uses a fiber optic transceiver to provide low latency, high bandwidth channels for such interconnects using a robust multimode fiber technology. Instruction-level simulation is used to quantify the bandwidth, latency, and concurrency required for such interconnects to scale to 256 nodes, each operating at 1 GFLOPS performance. Performance scales have been shown to .apprxeq.100 GFLOPS for scientific application kernels using a small number of wavelengths (8 to 32), only one wavelength received per node, and achievable optoelectronic bandwidth and latency.

Automated target recognition and tracking using an optical pattern recognition neural network

NASA Technical Reports Server (NTRS)

Chao, Tien-Hsin

1991-01-01

The on-going development of an automatic target recognition and tracking system at the Jet Propulsion Laboratory is presented. This system is an optical pattern recognition neural network (OPRNN) that is an integration of an innovative optical parallel processor and a feature extraction based neural net training algorithm. The parallel optical processor provides high speed and vast parallelism as well as full shift invariance. The neural network algorithm enables simultaneous discrimination of multiple noisy targets in spite of their scales, rotations, perspectives, and various deformations. This fully developed OPRNN system can be effectively utilized for the automated spacecraft recognition and tracking that will lead to success in the Automated Rendezvous and Capture (AR&C) of the unmanned Cargo Transfer Vehicle (CTV). One of the most powerful optical parallel processors for automatic target recognition is the multichannel correlator. With the inherent advantages of parallel processing capability and shift invariance, multiple objects can be simultaneously recognized and tracked using this multichannel correlator. This target tracking capability can be greatly enhanced by utilizing a powerful feature extraction based neural network training algorithm such as the neocognitron. The OPRNN, currently under investigation at JPL, is constructed with an optical multichannel correlator where holographic filters have been prepared using the neocognitron training algorithm. The computation speed of the neocognitron-type OPRNN is up to 10(exp 14) analog connections/sec that enabling the OPRNN to outperform its state-of-the-art electronics counterpart by at least two orders of magnitude.
Passive IFF: Autonomous Nonintrusive Rapid Identification of Friendly Assets

NASA Technical Reports Server (NTRS)

Moynihan, Philip; Steenburg, Robert Van; Chao, Tien-Hsin

2004-01-01

A proposed optoelectronic instrument would identify targets rapidly, without need to radiate an interrogating signal, apply identifying marks to the targets, or equip the targets with transponders. The instrument was conceived as an identification, friend or foe (IFF) system in a battlefield setting, where it would be part of a targeting system for weapons, by providing rapid identification for aimed weapons to help in deciding whether and when to trigger them. The instrument could also be adapted to law-enforcement and industrial applications in which it is necessary to rapidly identify objects in view. The instrument would comprise mainly an optical correlator and a neural processor (see figure). The inherent parallel-processing speed and capability of the optical correlator would be exploited to obtain rapid identification of a set of probable targets within a scene of interest and to define regions within the scene for the neural processor to analyze. The neural processor would then concentrate on each region selected by the optical correlator in an effort to identify the target. Depending on whether or not a target was recognized by comparison of its image data with data in an internal database on which the neural processor was trained, the processor would generate an identifying signal (typically, friend or foe ). The time taken for this identification process would be less than the time needed by a human or robotic gunner to acquire a view of, and aim at, a target. An optical correlator that has been under development for several years and that has been demonstrated to be capable of tracking a cruise missile might be considered a prototype of the optical correlator in the proposed IFF instrument. This optical correlator features a 512-by-512-pixel input image frame and operates at an input frame rate of 60 Hz. It includes a spatial light modulator (SLM) for video-to-optical image conversion, a pair of precise lenses to effect Fourier transforms, a filter SLM for digital-to-optical correlation-filter data conversion, and a charge-coupled device (CCD) for detection of correlation peaks. In operation, the input scene grabbed by a video sensor is streamed into the input SLM. Precomputed correlation-filter data files representative of known targets are then downloaded and sequenced into the filter SLM at a rate of 1,000 Hz. When there occurs a match between the input target data and one of the known-target data files, the CCD detects a correlation peak at the location of the target. Distortion- invariant correlation filters from a bank of such filters are then sequenced through the optical correlator for each input frame. The net result is the rapid preliminary recognition of one or a few targets.
Design and Implementation of the PALM-3000 Real-Time Control System

NASA Technical Reports Server (NTRS)

Truong, Tuan N.; Bouchez, Antonin H.; Burruss, Rick S.; Dekany, Richard G.; Guiwits, Stephen R.; Roberts, Jennifer E.; Shelton, Jean C.; Troy, Mitchell

2012-01-01

This paper reflects, from a computational perspective, on the experience gathered in designing and implementing realtime control of the PALM-3000 adaptive optics system currently in operation at the Palomar Observatory. We review the algorithms that serve as functional requirements driving the architecture developed, and describe key design issues and solutions that contributed to the system's low compute-latency. Additionally, we describe an implementation of dense matrix-vector-multiplication for wavefront reconstruction that exceeds 95% of the maximum sustained achievable bandwidth on NVIDIA Geforce 8800GTX GPU.
Solving the corner-turning problem for large interferometers

NASA Astrophysics Data System (ADS)

Lutomirski, Andrew; Tegmark, Max; Sanchez, Nevada J.; Stein, Leo C.; Urry, W. Lynn; Zaldarriaga, Matias

2011-01-01

The so-called corner-turning problem is a major bottleneck for radio telescopes with large numbers of antennas. The problem is essentially that of rapidly transposing a matrix that is too large to store on one single device; in radio interferometry, it occurs because data from each antenna need to be routed to an array of processors each of which will handle a limited portion of the data (say, a frequency range) but requires input from each antenna. We present a low-cost solution allowing the correlator to transpose its data in real time, without contending for bandwidth, via a butterfly network requiring neither additional RAM memory nor expensive general-purpose switching hardware. We discuss possible implementations of this using FPGA, CMOS, analog logic and optical technology, and conclude that the corner-turner cost can be small even for upcoming massive radio arrays.
Small star trackers for modern space vehicles

NASA Astrophysics Data System (ADS)

Kouzmin, Vladimir; Jushkov, Vladimir; Zaikin, Vladimir

2017-11-01

Based on experience of many years creation of spacecrafts' star trackers with diversified detectors (from the first star trackers of 60's to tens versions of star trackers in the following years), using technological achievements in the field of optics and electronics the NPP "Geofizika-Cosmos" has provided celestial orientation for all the space vehicles created in Russia and now has developed a series of new star trackers with CCD matrix and special processors, which are able to meet needs in celestial orientation of the modern spacecrafts for the nearest 10-15 years. In the given article the main characteristics and description of some star trackers' versions are presented. The star trackers have various levels of technical characteristics and use both combined (Russian and foreign) procurement parts, and only national (Russian) procurement parts for the main units.
Quantitative analysis of eyes and other optical systems in linear optics.

PubMed

Harris, William F; Evans, Tanya; van Gool, Radboud D

2017-05-01

To show that 14-dimensional spaces of augmented point P and angle Q characteristics, matrices obtained from the ray transference, are suitable for quantitative analysis although only the latter define an inner-product space and only on it can one define distances and angles. The paper examines the nature of the spaces and their relationships to other spaces including symmetric dioptric power space. The paper makes use of linear optics, a three-dimensional generalization of Gaussian optics. Symmetric 2 × 2 dioptric power matrices F define a three-dimensional inner-product space which provides a sound basis for quantitative analysis (calculation of changes, arithmetic means, etc.) of refractive errors and thin systems. For general systems the optical character is defined by the dimensionally-heterogeneous 4 × 4 symplectic matrix S, the transference, or if explicit allowance is made for heterocentricity, the 5 × 5 augmented symplectic matrix T. Ordinary quantitative analysis cannot be performed on them because matrices of neither of these types constitute vector spaces. Suitable transformations have been proposed but because the transforms are dimensionally heterogeneous the spaces are not naturally inner-product spaces. The paper obtains 14-dimensional spaces of augmented point P and angle Q characteristics. The 14-dimensional space defined by the augmented angle characteristics Q is dimensionally homogenous and an inner-product space. A 10-dimensional subspace of the space of augmented point characteristics P is also an inner-product space. The spaces are suitable for quantitative analysis of the optical character of eyes and many other systems. Distances and angles can be defined in the inner-product spaces. The optical systems may have multiple separated astigmatic and decentred refracting elements. © 2017 The Authors Ophthalmic & Physiological Optics © 2017 The College of Optometrists.
Large-scale frequency- and time-domain quantum entanglement over the optical frequency comb (Conference Presentation)

NASA Astrophysics Data System (ADS)

Pfister, Olivier

2017-05-01

When it comes to practical quantum computing, the two main challenges are circumventing decoherence (devastating quantum errors due to interactions with the environmental bath) and achieving scalability (as many qubits as needed for a real-life, game-changing computation). We show that using, in lieu of qubits, the "qumodes" represented by the resonant fields of the quantum optical frequency comb of an optical parametric oscillator allows one to create bona fide, large scale quantum computing processors, pre-entangled in a cluster state. We detail our recent demonstration of 60-qumode entanglement (out of an estimated 3000) and present an extension to combining this frequency-tagged with time-tagged entanglement, in order to generate an arbitrarily large, universal quantum computing processor.
Compact time- and space-integrating SAR processor: design and development status

NASA Astrophysics Data System (ADS)

Haney, Michael W.; Levy, James J.; Christensen, Marc P.; Michael, Robert R., Jr.; Mock, Michael M.

1994-06-01

Progress toward a flight demonstration of the acousto-optic time- and space- integrating real-time SAR image formation processor program is reported. The concept overcomes the size and power consumption limitations of electronic approaches by using compact, rugged, and low-power analog optical signal processing techniques for the most computationally taxing portions of the SAR imaging problem. Flexibility and performance are maintained by the use of digital electronics for the critical low-complexity filter generation and output image processing functions. The results reported include tests of a laboratory version of the concept, a description of the compact optical design that will be implemented, and an overview of the electronic interface and controller modules of the flight-test system.
An Alternative Method for Computing Mean and Covariance Matrix of Some Multivariate Distributions

ERIC Educational Resources Information Center

Radhakrishnan, R.; Choudhury, Askar

2009-01-01

Computing the mean and covariance matrix of some multivariate distributions, in particular, multivariate normal distribution and Wishart distribution are considered in this article. It involves a matrix transformation of the normal random vector into a random vector whose components are independent normal random variables, and then integrating…
Polar decomposition for attitude determination from vector observations

NASA Technical Reports Server (NTRS)

Bar-Itzhack, Itzhack Y.

1993-01-01

This work treats the problem of weighted least squares fitting of a 3D Euclidean-coordinate transformation matrix to a set of unit vectors measured in the reference and transformed coordinates. A closed-form analytic solution to the problem is re-derived. The fact that the solution is the closest orthogonal matrix to some matrix defined on the measured vectors and their weights is clearly demonstrated. Several known algorithms for computing the analytic closed form solution are considered. An algorithm is discussed which is based on the polar decomposition of matrices into the closest unitary matrix to the decomposed matrix and a Hermitian matrix. A somewhat longer improved algorithm is suggested too. A comparison of several algorithms is carried out using simulated data as well as real data from the Upper Atmosphere Research Satellite. The comparison is based on accuracy and time consumption. It is concluded that the algorithms based on polar decomposition yield a simple although somewhat less accurate solution. The precision of the latter algorithms increase with the number of the measured vectors and with the accuracy of their measurement.
Damping of Bogoliubov excitations in optical lattices

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tsuchiya, Shunji; Department of Physics, Waseda University, 3-4-1 Okubo, Tokyo 169-8555; Griffin, Allan

2004-08-01

Extending recent work to finite temperatures, we calculate the Landau damping of a Bogoliubov excitation in an optical lattice, due to the coupling to a thermal cloud of such excitations. For simplicity, we consider a one-dimensional Bose-Hubbard model and restrict ourselves to the first energy band. For energy conservation to be satisfied, the excitations in the collision processes must exhibit ''anomalous dispersion,'' analogous to phonons in superfluid {sup 4}He. This leads to the disappearance of all damping processes when Un{sup c0}{>=}6J, where U is the on-site interaction, J is the hopping matrix element, and n{sup c0}(T) is the number ofmore » condensate atoms at a lattice site. This phenomenon also occurs in two-dimensional and three-dimensional optical lattices. The disappearance of Beliaev damping above a threshold wave vector is noted.« less
Overview of microoptics: Past, present, and future

NASA Technical Reports Server (NTRS)

Veldkamp, Wilfrid B.

1993-01-01

Through advances in semiconductor miniaturization technology, microrelief patterns, with characteristic dimensions as small as the wavelength of light, can now be mass reproduced to form high-quality and low-cost optical components. In a unique example of technology transfer, from electronics to optics, this capability is allowing optics designers to create innovative optical components that promise to solve key problems in optical sensors, optical communication channels, and optical processors.
Optical stereo video signal processor

NASA Technical Reports Server (NTRS)

Craig, G. D. (Inventor)

1985-01-01

An otpical video signal processor is described which produces a two-dimensional cross-correlation in real time of images received by a stereo camera system. The optical image of each camera is projected on respective liquid crystal light valves. The images on the liquid crystal valves modulate light produced by an extended light source. This modulated light output becomes the two-dimensional cross-correlation when focused onto a video detector and is a function of the range of a target with respect to the stereo camera. Alternate embodiments utilize the two-dimensional cross-correlation to determine target movement and target identification.
Negative base encoding in optical linear algebra processors

NASA Technical Reports Server (NTRS)

Perlee, C.; Casasent, D.

1986-01-01

In the digital multiplication by analog convolution algorithm, the bits of two encoded numbers are convolved to form the product of the two numbers in mixed binary representation; this output can be easily converted to binary. Attention is presently given to negative base encoding, treating base -2 initially, and then showing that the negative base system can be readily extended to any radix. In general, negative base encoding in optical linear algebra processors represents a more efficient technique than either sign magnitude or 2's complement encoding, when the additions of digitally encoded products are performed in parallel.
Proposed ultralow-energy dual photonic-crystal nanobeam devices for on-chip N x N switching, logic, and wavelength multiplexing.

PubMed

Soref, Richard; Hendrickson, Joshua

2015-12-14

Silicon-on-insulator Mach-Zehnder interferometer structures that utilize a photonic crystal nanobeam waveguide in each of two connecting arms are proposed here as efficient 2 × 2 resonant, wavelength-selective electro-optical routing switches that are readily cascaded into on-chip N × N switching networks. A localized lateral PN junction of length ~2 μm within each of two identical nanobeams is proposed as a means of shifting the transmission resonance by 400 pm within the 1550 nm band. Using a bias swing ΔV = 2.7 V, the 474 attojoules-per-bit switching mechanism is free-carrier sweepout due to PN depletion layer widening. Simulations of the 2 × 2 outputs versus voltage are presented. Dual-nanobeam designs are given for N × N data-routing matrix switches, electrooptical logic unit cells, N × M wavelength selective switches, and vector matrix multipliers. Performance penalties are analyzed for possible fabrication induced errors such as non-ideal 3-dB couplers, differences in optical path lengths, and variations in photonic crystal cavity resonances.
Photographic Film Image Enhancement

DOT National Transportation Integrated Search

1975-01-01

A series of experiments were undertaken to assess the feasibility of defogging color film by the techniques of Optical Spatial Filtering. A coherent optical processor was built using red, blue, and green laser light input and specially designed Fouri...
Coding for parallel execution of hardware-in-the-loop millimeter-wave scene generation models on multicore SIMD processor architectures

NASA Astrophysics Data System (ADS)

Olson, Richard F.

2013-05-01

Rendering of point scatterer based radar scenes for millimeter wave (mmW) seeker tests in real-time hardware-in-the-loop (HWIL) scene generation requires efficient algorithms and vector-friendly computer architectures for complex signal synthesis. New processor technology from Intel implements an extended 256-bit vector SIMD instruction set (AVX, AVX2) in a multi-core CPU design providing peak execution rates of hundreds of GigaFLOPS (GFLOPS) on one chip. Real world mmW scene generation code can approach peak SIMD execution rates only after careful algorithm and source code design. An effective software design will maintain high computing intensity emphasizing register-to-register SIMD arithmetic operations over data movement between CPU caches or off-chip memories. Engineers at the U.S. Army Aviation and Missile Research, Development and Engineering Center (AMRDEC) applied two basic parallel coding methods to assess new 256-bit SIMD multi-core architectures for mmW scene generation in HWIL. These include use of POSIX threads built on vector library functions and more portable, highlevel parallel code based on compiler technology (e.g. OpenMP pragmas and SIMD autovectorization). Since CPU technology is rapidly advancing toward high processor core counts and TeraFLOPS peak SIMD execution rates, it is imperative that coding methods be identified which produce efficient and maintainable parallel code. This paper describes the algorithms used in point scatterer target model rendering, the parallelization of those algorithms, and the execution performance achieved on an AVX multi-core machine using the two basic parallel coding methods. The paper concludes with estimates for scale-up performance on upcoming multi-core technology.
Optical cage generated by azimuthal- and radial-variant vector beams.

PubMed

Man, Zhongsheng; Bai, Zhidong; Li, Jinjian; Zhang, Shuoshuo; Li, Xiaoyu; Zhang, Yuquan; Ge, Xiaolu; Fu, Shenggui

2018-05-01

We propose a method to generate an optical cage using azimuthal- and radial-variant vector beams in a high numerical aperture optical system. A new kind of vector beam that has azimuthal- and radial-variant polarization states is proposed and demonstrated theoretically. Then, an integrated analytical model to calculate the electromagnetic field and Poynting vector distributions of the input azimuthal- and radial-variant vector beams is derived and built based on the vector diffraction theory of Richards and Wolf. From calculations, a full polarization-controlled optical cage is obtained by simply tailoring the radial index of the polarization, the uniformity U of which is up to 0.7748, and the cleanness C is zero. Additionally, a perfect optical cage can be achieved with U=1, and C=0 by introducing an amplitude modulation; its magnetic field and energy flow are also demonstrated in detail. Such optical cages may be helpful in applications such as optical trapping and high-resolution imaging.
More About Vector Adaptive/Predictive Coding Of Speech

NASA Technical Reports Server (NTRS)

Jedrey, Thomas C.; Gersho, Allen

1992-01-01

Report presents additional information about digital speech-encoding and -decoding system described in "Vector Adaptive/Predictive Encoding of Speech" (NPO-17230). Summarizes development of vector adaptive/predictive coding (VAPC) system and describes basic functions of algorithm. Describes refinements introduced enabling receiver to cope with errors. VAPC algorithm implemented in integrated-circuit coding/decoding processors (codecs). VAPC and other codecs tested under variety of operating conditions. Tests designed to reveal effects of various background quiet and noisy environments and of poor telephone equipment. VAPC found competitive with and, in some respects, superior to other 4.8-kb/s codecs and other codecs of similar complexity.
A digital video tracking system

NASA Astrophysics Data System (ADS)

Giles, M. K.

1980-01-01

The Real-Time Videotheodolite (RTV) was developed in connection with the requirement to replace film as a recording medium to obtain the real-time location of an object in the field-of-view (FOV) of a long focal length theodolite. Design philosophy called for a system capable of discriminatory judgment in identifying the object to be tracked with 60 independent observations per second, capable of locating the center of mass of the object projection on the image plane within about 2% of the FOV in rapidly changing background/foreground situations, and able to generate a predicted observation angle for the next observation. A description is given of a number of subsystems of the RTV, taking into account the processor configuration, the video processor, the projection processor, the tracker processor, the control processor, and the optics interface and imaging subsystem.

Static assignment of complex stochastic tasks using stochastic majorization

NASA Technical Reports Server (NTRS)

Nicol, David; Simha, Rahul; Towsley, Don

1992-01-01

We consider the problem of statically assigning many tasks to a (smaller) system of homogeneous processors, where a task's structure is modeled as a branching process, and all tasks are assumed to have identical behavior. We show how the theory of majorization can be used to obtain a partial order among possible task assignments. Our results show that if the vector of numbers of tasks assigned to each processor under one mapping is majorized by that of another mapping, then the former mapping is better than the latter with respect to a large number of objective functions. In particular, we show how measurements of finishing time, resource utilization, and reliability are all captured by the theory. We also show how the theory may be applied to the problem of partitioning a pool of processors for distribution among parallelizable tasks.
Accelerated Adaptive MGS Phase Retrieval

NASA Technical Reports Server (NTRS)

Lam, Raymond K.; Ohara, Catherine M.; Green, Joseph J.; Bikkannavar, Siddarayappa A.; Basinger, Scott A.; Redding, David C.; Shi, Fang

2011-01-01

The Modified Gerchberg-Saxton (MGS) algorithm is an image-based wavefront-sensing method that can turn any science instrument focal plane into a wavefront sensor. MGS characterizes optical systems by estimating the wavefront errors in the exit pupil using only intensity images of a star or other point source of light. This innovative implementation of MGS significantly accelerates the MGS phase retrieval algorithm by using stream-processing hardware on conventional graphics cards. Stream processing is a relatively new, yet powerful, paradigm to allow parallel processing of certain applications that apply single instructions to multiple data (SIMD). These stream processors are designed specifically to support large-scale parallel computing on a single graphics chip. Computationally intensive algorithms, such as the Fast Fourier Transform (FFT), are particularly well suited for this computing environment. This high-speed version of MGS exploits commercially available hardware to accomplish the same objective in a fraction of the original time. The exploit involves performing matrix calculations in nVidia graphic cards. The graphical processor unit (GPU) is hardware that is specialized for computationally intensive, highly parallel computation. From the software perspective, a parallel programming model is used, called CUDA, to transparently scale multicore parallelism in hardware. This technology gives computationally intensive applications access to the processing power of the nVidia GPUs through a C/C++ programming interface. The AAMGS (Accelerated Adaptive MGS) software takes advantage of these advanced technologies, to accelerate the optical phase error characterization. With a single PC that contains four nVidia GTX-280 graphic cards, the new implementation can process four images simultaneously to produce a JWST (James Webb Space Telescope) wavefront measurement 60 times faster than the previous code.
Intelligent systems technology infrastructure for integrated systems

NASA Technical Reports Server (NTRS)

Lum, Henry

1991-01-01

A system infrastructure must be properly designed and integrated from the conceptual development phase to accommodate evolutionary intelligent technologies. Several technology development activities were identified that may have application to rendezvous and capture systems. Optical correlators in conjunction with fuzzy logic control might be used for the identification, tracking, and capture of either cooperative or non-cooperative targets without the intensive computational requirements associated with vision processing. A hybrid digital/analog system was developed and tested with a robotic arm. An aircraft refueling application demonstration is planned within two years. Initially this demonstration will be ground based with a follow-on air based demonstration. System dependability measurement and modeling techniques are being developed for fault management applications. This involves usage of incremental solution/evaluation techniques and modularized systems to facilitate reuse and to take advantage of natural partitions in system models. Though not yet commercially available and currently subject to accuracy limitations, technology is being developed to perform optical matrix operations to enhance computational speed. Optical terrain recognition using camera image sequencing processed with optical correlators is being developed to determine position and velocity in support of lander guidance. The system is planned for testing in conjunction with Dryden Flight Research Facility. Advanced architecture technology is defining open architecture design constraints, test bed concepts (processors, multiple hardware/software and multi-dimensional user support, knowledge/tool sharing infrastructure), and software engineering interface issues.
CPU architecture for a fast and energy-saving calculation of convolution neural networks

NASA Astrophysics Data System (ADS)

Knoll, Florian J.; Grelcke, Michael; Czymmek, Vitali; Holtorf, Tim; Hussmann, Stephan

2017-06-01

One of the most difficult problem in the use of artificial neural networks is the computational capacity. Although large search engine companies own specially developed hardware to provide the necessary computing power, for the conventional user only remains the state of the art method, which is the use of a graphic processing unit (GPU) as a computational basis. Although these processors are well suited for large matrix computations, they need massive energy. Therefore a new processor on the basis of a field programmable gate array (FPGA) has been developed and is optimized for the application of deep learning. This processor is presented in this paper. The processor can be adapted for a particular application (in this paper to an organic farming application). The power consumption is only a fraction of a GPU application and should therefore be well suited for energy-saving applications.
Quantum Support Vector Machine for Big Data Classification

NASA Astrophysics Data System (ADS)

Rebentrost, Patrick; Mohseni, Masoud; Lloyd, Seth

2014-09-01

Supervised machine learning is the classification of new data based on already classified training examples. In this work, we show that the support vector machine, an optimized binary classifier, can be implemented on a quantum computer, with complexity logarithmic in the size of the vectors and the number of training examples. In cases where classical sampling algorithms require polynomial time, an exponential speedup is obtained. At the core of this quantum big data algorithm is a nonsparse matrix exponentiation technique for efficiently performing a matrix inversion of the training data inner-product (kernel) matrix.
Optical triple-in digital logic using nonlinear optical four-wave mixing

NASA Astrophysics Data System (ADS)

Widjaja, Joewono; Tomita, Yasuo

1995-08-01

A new programmable optical processor is proposed for implementing triple-in combinatorial digital logic that uses four-wave mixing. Binary-coded decimal-to-octal decoding is experimentally demonstrated by use of a photorefractive BaTiO 3 crystal. The result confirms the feasibility of the proposed system.
Design concepts for an on-board coherent optical image processor

NASA Technical Reports Server (NTRS)

Husain-Abidi, A. S.

1972-01-01

On-board spacecraft image data processing systems for transmitting processed data rather than raw data are discussed. A brief history of the development of the optical data processing techniques is presented along with the conceptual design of a coherent optical system with a noncoherent image input.
Increasing the computational efficient of digital cross correlation by a vectorization method

NASA Astrophysics Data System (ADS)

Chang, Ching-Yuan; Ma, Chien-Ching

2017-08-01

This study presents a vectorization method for use in MATLAB programming aimed at increasing the computational efficiency of digital cross correlation in sound and images, resulting in a speedup of 6.387 and 36.044 times compared with performance values obtained from looped expression. This work bridges the gap between matrix operations and loop iteration, preserving flexibility and efficiency in program testing. This paper uses numerical simulation to verify the speedup of the proposed vectorization method as well as experiments to measure the quantitative transient displacement response subjected to dynamic impact loading. The experiment involved the use of a high speed camera as well as a fiber optic system to measure the transient displacement in a cantilever beam under impact from a steel ball. Experimental measurement data obtained from the two methods are in excellent agreement in both the time and frequency domain, with discrepancies of only 0.68%. Numerical and experiment results demonstrate the efficacy of the proposed vectorization method with regard to computational speed in signal processing and high precision in the correlation algorithm. We also present the source code with which to build MATLAB-executable functions on Windows as well as Linux platforms, and provide a series of examples to demonstrate the application of the proposed vectorization method.
Implementation and simulations of the sphere solution in FAST

NASA Astrophysics Data System (ADS)

Murgolo, F. P.; Schirone, M. G.; Lattanzi, M.; Bernacca, P. L.

1989-06-01

The details of the implementation of the sphere solution software in the Fundamental Astronomy by Space Techniques (FAST) consortium, are described. The simulation results for realistic data sets, both with and without grid-step errors are given. Expected errors on the astrometric parameters of the primary stars and the precision of the reference great circle zero points, are provided as a function of mission duration. The design matrix, the diagrams of the context processor and the processors experimental results are given.
Split Octonion Reformulation for Electromagnetic Chiral Media of Massive Dyons

NASA Astrophysics Data System (ADS)

Chanyal, B. C.

2017-12-01

In an explicit, unified, and covariant formulation of an octonion algebra, we study and generalize the electromagnetic chiral fields equations of massive dyons with the split octonionic representation. Starting with 2×2 Zorn’s vector matrix realization of split-octonion and its dual Euclidean spaces, we represent the unified structure of split octonionic electric and magnetic induction vectors for chiral media. As such, in present paper, we describe the chiral parameter and pairing constants in terms of split octonionic matrix representation of Drude-Born-Fedorov constitutive relations. We have expressed a split octonionic electromagnetic field vector for chiral media, which exhibits the unified field structure of electric and magnetic chiral fields of dyons. The beauty of split octonionic representation of Zorn vector matrix realization is that, the every scalar and vector components have its own meaning in the generalized chiral electromagnetism of dyons. Correspondingly, we obtained the alternative form of generalized Proca-Maxwell’s equations of massive dyons in chiral media. Furthermore, the continuity equations, Poynting theorem and wave propagation for generalized electromagnetic fields of chiral media of massive dyons are established by split octonionic form of Zorn vector matrix algebra.
Rank-Optimized Logistic Matrix Regression toward Improved Matrix Data Classification.

PubMed

Zhang, Jianguang; Jiang, Jianmin

2018-02-01

While existing logistic regression suffers from overfitting and often fails in considering structural information, we propose a novel matrix-based logistic regression to overcome the weakness. In the proposed method, 2D matrices are directly used to learn two groups of parameter vectors along each dimension without vectorization, which allows the proposed method to fully exploit the underlying structural information embedded inside the 2D matrices. Further, we add a joint [Formula: see text]-norm on two parameter matrices, which are organized by aligning each group of parameter vectors in columns. This added co-regularization term has two roles-enhancing the effect of regularization and optimizing the rank during the learning process. With our proposed fast iterative solution, we carried out extensive experiments. The results show that in comparison to both the traditional tensor-based methods and the vector-based regression methods, our proposed solution achieves better performance for matrix data classifications.
BSR: B-spline atomic R-matrix codes

NASA Astrophysics Data System (ADS)

Zatsarinny, Oleg

2006-02-01

BSR is a general program to calculate atomic continuum processes using the B-spline R-matrix method, including electron-atom and electron-ion scattering, and radiative processes such as bound-bound transitions, photoionization and polarizabilities. The calculations can be performed in LS-coupling or in an intermediate-coupling scheme by including terms of the Breit-Pauli Hamiltonian. New version program summaryTitle of program: BSR Catalogue identifier: ADWY Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADWY Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Computers on which the program has been tested: Microway Beowulf cluster; Compaq Beowulf cluster; DEC Alpha workstation; DELL PC Operating systems under which the new version has been tested: UNIX, Windows XP Programming language used: FORTRAN 95 Memory required to execute with typical data: Typically 256-512 Mwords. Since all the principal dimensions are allocatable, the available memory defines the maximum complexity of the problem No. of bits in a word: 8 No. of processors used: 1 Has the code been vectorized or parallelized?: no No. of lines in distributed program, including test data, etc.: 69 943 No. of bytes in distributed program, including test data, etc.: 746 450 Peripherals used: scratch disk store; permanent disk store Distribution format: tar.gz Nature of physical problem: This program uses the R-matrix method to calculate electron-atom and electron-ion collision processes, with options to calculate radiative data, photoionization, etc. The calculations can be performed in LS-coupling or in an intermediate-coupling scheme, with options to include Breit-Pauli terms in the Hamiltonian. Method of solution: The R-matrix method is used [P.G. Burke, K.A. Berrington, Atomic and Molecular Processes: An R-Matrix Approach, IOP Publishing, Bristol, 1993; P.G. Burke, W.D. Robb, Adv. At. Mol. Phys. 11 (1975) 143; K.A. Berrington, W.B. Eissner, P.H. Norrington, Comput. Phys. Comm. 92 (1995) 290].
Statistical analysis and machine learning algorithms for optical biopsy

NASA Astrophysics Data System (ADS)

Wu, Binlin; Liu, Cheng-hui; Boydston-White, Susie; Beckman, Hugh; Sriramoju, Vidyasagar; Sordillo, Laura; Zhang, Chunyuan; Zhang, Lin; Shi, Lingyan; Smith, Jason; Bailin, Jacob; Alfano, Robert R.

2018-02-01

Analyzing spectral or imaging data collected with various optical biopsy methods is often times difficult due to the complexity of the biological basis. Robust methods that can utilize the spectral or imaging data and detect the characteristic spectral or spatial signatures for different types of tissue is challenging but highly desired. In this study, we used various machine learning algorithms to analyze a spectral dataset acquired from human skin normal and cancerous tissue samples using resonance Raman spectroscopy with 532nm excitation. The algorithms including principal component analysis, nonnegative matrix factorization, and autoencoder artificial neural network are used to reduce dimension of the dataset and detect features. A support vector machine with a linear kernel is used to classify the normal tissue and cancerous tissue samples. The efficacies of the methods are compared.
Polarization Control with Plasmonic Antenna Tips: A Universal Approach to Optical Nanocrystallography and Vector-Field Imaging

NASA Astrophysics Data System (ADS)

Park, Kyoung-Duck; Raschke, Markus B.

2018-05-01

Controlling the propagation and polarization vectors in linear and nonlinear optical spectroscopy enables to probe the anisotropy of optical responses providing structural symmetry selective contrast in optical imaging. Here we present a novel tilted antenna-tip approach to control the optical vector-field by breaking the axial symmetry of the nano-probe in tip-enhanced near-field microscopy. This gives rise to a localized plasmonic antenna effect with significantly enhanced optical field vectors with control of both \\textit{in-plane} and \\textit{out-of-plane} components. We use the resulting vector-field specificity in the symmetry selective nonlinear optical response of second-harmonic generation (SHG) for a generalized approach to optical nano-crystallography and -imaging. In tip-enhanced SHG imaging of monolayer MoS$_2$ films and single-crystalline ferroelectric YMnO$_3$, we reveal nano-crystallographic details of domain boundaries and domain topology with enhanced sensitivity and nanoscale spatial resolution. The approach is applicable to any anisotropic linear and nonlinear optical response, and provides for optical nano-crystallographic imaging of molecular or quantum materials.
Single-Sided Noinvasive Inspection of Multielement Sample Using Fan-Beam Multiplexed Compton Scatter Tomography

DTIC Science & Technology

2000-05-01

a vector , ρ "# represents the set of voxel densities sorted into a vector , and ( )A ρ $# "# represents a 8 mapping of the voxel densities to...density vector in equation (4) suggests that solving for ρ "# by direct inversion is not possible, calling for an iterative technique beginning with...the vector of measured spectra, and D is the diagonal matrix of the inverse of the variances. The diagonal matrix provides weighting terms, which
Hand-held spectrophotometer design for textile fabrics

NASA Astrophysics Data System (ADS)

Böcekçi, Veysel Gökhan; Yıldız, Kazım

2017-09-01

In this study, a hand-held spectrophotometer was designed by taking advantage of the developments in modern optoelectronic technology. Spectrophotometer devices are used to determine the color information from the optic properties of the materials. As an alternative to a desktop spectrophotometer device we have implemented, it is the first prototype, low cost and portable. The prototype model designed for the textile industry can detect the color tone of any fabric. The prototype model consists of optic sensor, processor, display floors. According to the color applied on the optic sensor, it produces special frequency information on its output at that color value. In Arduino type processor, the frequency information is evaluated by the program we have written and the color tone information between 0-255 ton is decided and displayed on the screen.
Geospace simulations on the Cell BE processor

NASA Astrophysics Data System (ADS)

Germaschewski, K.; Raeder, J.; Larson, D.

2008-12-01

OpenGGCM (Open Geospace General circulation Model) is an established numerical code that simulates the Earth's space environment. The most computing intensive part is the MHD (magnetohydrodynamics) solver that models the plasma surrounding Earth and its interaction with Earth's magnetic field and the solar wind flowing in from the sun. Like other global magnetosphere codes, OpenGGCM's realism is limited by computational constraints on grid resolution. We investigate porting of the MHD solver to the Cell BE architecture, a novel inhomogeneous multicore architecture capable of up to 230 GFlops per processor. Realizing this high performance on the Cell processor is a programming challenge, though. We implemented the MHD solver using a multi-level parallel approach: On the coarsest level, the problem is distributed to processors based upon the usual domain decomposition approach. Then, on each processor, the problem is divided into 3D columns, each of which is handled by the memory limited SPEs (synergistic processing elements) slice by slice. Finally, SIMD instructions are used to fully exploit the vector/SIMD FPUs in each SPE. Memory management needs to be handled explicitly by the code, using DMA to move data from main memory to the per-SPE local store and vice versa. We obtained excellent performance numbers, a speed-up of a factor of 25 compared to just using the main processor, while still keeping the numerical implementation details of the code maintainable.
Attitude determination using vector observations: A fast optimal matrix algorithm

NASA Technical Reports Server (NTRS)

Markley, F. Landis

1993-01-01

The attitude matrix minimizing Wahba's loss function is computed directly by a method that is competitive with the fastest known algorithm for finding this optimal estimate. The method also provides an estimate of the attitude error covariance matrix. Analysis of the special case of two vector observations identifies those cases for which the TRIAD or algebraic method minimizes Wahba's loss function.
Huygens' optical vector wave field synthesis via in-plane electric dipole metasurface.

PubMed

Park, Hyeonsoo; Yun, Hansik; Choi, Chulsoo; Hong, Jongwoo; Kim, Hwi; Lee, Byoungho

2018-04-16

We investigate Huygens' optical vector wave field synthesis scheme for electric dipole metasurfaces with the capability of modulating in-plane polarization and complex amplitude and discuss the practical issues involved in realizing multi-modulation metasurfaces. The proposed Huygens' vector wave field synthesis scheme identifies the vector Airy disk as a synthetic unit element and creates a designed vector optical field by integrating polarization-controlled and complex-modulated Airy disks. The metasurface structure for the proposed vector field synthesis is analyzed in terms of the signal-to-noise ratio of the synthesized field distribution. The design of practical metasurface structures with true vector modulation capability is possible through the analysis of the light field modulation characteristics of various complex modulated geometric phase metasurfaces. It is shown that the regularization of meta-atoms is a key factor that needs to be considered in field synthesis, given that it is essential for a wide range of optical field synthetic applications, including holographic displays, microscopy, and optical lithography.
Parallel conjugate gradient algorithms for manipulator dynamic simulation

NASA Technical Reports Server (NTRS)

Fijany, Amir; Scheld, Robert E.

1989-01-01

Parallel conjugate gradient algorithms for the computation of multibody dynamics are developed for the specialized case of a robot manipulator. For an n-dimensional positive-definite linear system, the Classical Conjugate Gradient (CCG) algorithms are guaranteed to converge in n iterations, each with a computation cost of O(n); this leads to a total computational cost of O(n sq) on a serial processor. A conjugate gradient algorithms is presented that provide greater efficiency using a preconditioner, which reduces the number of iterations required, and by exploiting parallelism, which reduces the cost of each iteration. Two Preconditioned Conjugate Gradient (PCG) algorithms are proposed which respectively use a diagonal and a tridiagonal matrix, composed of the diagonal and tridiagonal elements of the mass matrix, as preconditioners. Parallel algorithms are developed to compute the preconditioners and their inversions in O(log sub 2 n) steps using n processors. A parallel algorithm is also presented which, on the same architecture, achieves the computational time of O(log sub 2 n) for each iteration. Simulation results for a seven degree-of-freedom manipulator are presented. Variants of the proposed algorithms are also developed which can be efficiently implemented on the Robot Mathematics Processor (RMP).

Optical recognition of statistical patterns

NASA Astrophysics Data System (ADS)

Lee, S. H.

1981-12-01

Optical implementation of the Fukunaga-Koontz transform (FKT) and the Least-Squares Linear Mapping Technique (LSLMT) is described. The FKT is a linear transformation which performs image feature extraction for a two-class image classification problem. The LSLMT performs a transform from large dimensional feature space to small dimensional decision space for separating multiple image classes by maximizing the interclass differences while minimizing the intraclass variations. The FKT and the LSLMT were optically implemented by utilizing a coded phase optical processor. The transform was used for classifying birds and fish. After the F-K basis functions were calculated, those most useful for classification were incorporated into a computer generated hologram. The output of the optical processor, consisting of the squared magnitude of the F-K coefficients, was detected by a T.V. camera, digitized, and fed into a micro-computer for classification. A simple linear classifier based on only two F-K coefficients was able to separate the images into two classes, indicating that the F-K transform had chosen good features. Two advantages of optically implementing the FKT and LSLMT are parallel and real time processing.
Optical recognition of statistical patterns

NASA Technical Reports Server (NTRS)

Lee, S. H.

1981-01-01

Optical implementation of the Fukunaga-Koontz transform (FKT) and the Least-Squares Linear Mapping Technique (LSLMT) is described. The FKT is a linear transformation which performs image feature extraction for a two-class image classification problem. The LSLMT performs a transform from large dimensional feature space to small dimensional decision space for separating multiple image classes by maximizing the interclass differences while minimizing the intraclass variations. The FKT and the LSLMT were optically implemented by utilizing a coded phase optical processor. The transform was used for classifying birds and fish. After the F-K basis functions were calculated, those most useful for classification were incorporated into a computer generated hologram. The output of the optical processor, consisting of the squared magnitude of the F-K coefficients, was detected by a T.V. camera, digitized, and fed into a micro-computer for classification. A simple linear classifier based on only two F-K coefficients was able to separate the images into two classes, indicating that the F-K transform had chosen good features. Two advantages of optically implementing the FKT and LSLMT are parallel and real time processing.
The covariance matrix for the solution vector of an equality-constrained least-squares problem

NASA Technical Reports Server (NTRS)

Lawson, C. L.

1976-01-01

Methods are given for computing the covariance matrix for the solution vector of an equality-constrained least squares problem. The methods are matched to the solution algorithms given in the book, 'Solving Least Squares Problems.'
Structured caustic vector vortex optical field: manipulating optical angular momentum flux and polarization rotation.

PubMed

Chen, Rui-Pin; Chen, Zhaozhong; Chew, Khian-Hooi; Li, Pei-Gang; Yu, Zhongliang; Ding, Jianping; He, Sailing

2015-05-29

A caustic vector vortex optical field is experimentally generated and demonstrated by a caustic-based approach. The desired caustic with arbitrary acceleration trajectories, as well as the structured states of polarization (SoP) and vortex orders located in different positions in the field cross-section, is generated by imposing the corresponding spatial phase function in a vector vortex optical field. Our study reveals that different spin and orbital angular momentum flux distributions (including opposite directions) in different positions in the cross-section of a caustic vector vortex optical field can be dynamically managed during propagation by intentionally choosing the initial polarization and vortex topological charges, as a result of the modulation of the caustic phase. We find that the SoP in the field cross-section rotates during propagation due to the existence of the vortex. The unique structured feature of the caustic vector vortex optical field opens the possibility of multi-manipulation of optical angular momentum fluxes and SoP, leading to more complex manipulation of the optical field scenarios. Thus this approach further expands the functionality of an optical system.
Polarization division multiplexing for optical data communications

NASA Astrophysics Data System (ADS)

Ivanovich, Darko; Powell, Samuel B.; Gruev, Viktor; Chamberlain, Roger D.

2018-02-01

Multiple parallel channels are ubiquitous in optical communications, with spatial division multiplexing (separate physical paths) and wavelength division multiplexing (separate optical wavelengths) being the most common forms. Here, we investigate the viability of polarization division multiplexing, the separation of distinct parallel optical communication channels through the polarization properties of light. Two or more linearly polarized optical signals (at different polarization angles) are transmitted through a common medium, filtered using aluminum nanowire optical filters fabricated on-chip, and received using individual silicon photodetectors (one per channel). The entire receiver (including optics) is compatible with standard CMOS fabrication processes. The filter model is based upon an input optical signal formed as the sum of the Stokes vectors for each individual channel, transformed by the Mueller matrix that models the filter proper, resulting in an output optical signal that impinges on each photodiode. The results show that two- and three-channel systems can operate with a fixed-threshold comparator in the receiver circuit, but four-channel systems (and larger) will require channel coding of some form. For example, in the four-channel system, 10 of 16 distinct bit patterns are separable by the receiver. The model supports investigation of the range of variability tolerable in the fabrication of the on-chip polarization filters.
Polarimetric signatures of a canopy of dielectric cylinders based on first and second order vector radiative transfer theory

NASA Technical Reports Server (NTRS)

Tsang, Leung; Chan, Chi Hou; Kong, Jin AU; Joseph, James

1992-01-01

Complete polarimetric signatures of a canopy of dielectric cylinders overlying a homogeneous half space are studied with the first and second order solutions of the vector radiative transfer theory. The vector radiative transfer equations contain a general nondiagonal extinction matrix and a phase matrix. The energy conservation issue is addressed by calculating the elements of the extinction matrix and the elements of the phase matrix in a manner that is consistent with energy conservation. Two methods are used. In the first method, the surface fields and the internal fields of the dielectric cylinder are calculated by using the fields of an infinite cylinder. The phase matrix is calculated and the extinction matrix is calculated by summing the absorption and scattering to ensure energy conservation. In the second method, the method of moments is used to calculate the elements of the extinction and phase matrices. The Mueller matrix based on the first order and second order multiple scattering solutions of the vector radiative transfer equation are calculated. Results from the two methods are compared. The vector radiative transfer equations, combined with the solution based on method of moments, obey both energy conservation and reciprocity. The polarimetric signatures, copolarized and depolarized return, degree of polarization, and phase differences are studied as a function of the orientation, sizes, and dielectric properties of the cylinders. It is shown that second order scattering is generally important for vegetation canopy at C band and can be important at L band for some cases.
Fuzzy logic particle tracking velocimetry

NASA Technical Reports Server (NTRS)

Wernet, Mark P.

1993-01-01

Fuzzy logic has proven to be a simple and robust method for process control. Instead of requiring a complex model of the system, a user defined rule base is used to control the process. In this paper the principles of fuzzy logic control are applied to Particle Tracking Velocimetry (PTV). Two frames of digitally recorded, single exposure particle imagery are used as input. The fuzzy processor uses the local particle displacement information to determine the correct particle tracks. Fuzzy PTV is an improvement over traditional PTV techniques which typically require a sequence (greater than 2) of image frames for accurately tracking particles. The fuzzy processor executes in software on a PC without the use of specialized array or fuzzy logic processors. A pair of sample input images with roughly 300 particle images each, results in more than 200 velocity vectors in under 8 seconds of processing time.
Multiple degree of freedom optical pattern recognition

NASA Technical Reports Server (NTRS)

Casasent, D.

1987-01-01

Three general optical approaches to multiple degree of freedom object pattern recognition (where no stable object rest position exists) are advanced. These techniques include: feature extraction, correlation, and artificial intelligence. The details of the various processors are advanced together with initial results.
Heading-vector navigation based on head-direction cells and path integration.

PubMed

Kubie, John L; Fenton, André A

2009-05-01

Insect navigation is guided by heading vectors that are computed by path integration. Mammalian navigation models, on the other hand, are typically based on map-like place representations provided by hippocampal place cells. Such models compute optimal routes as a continuous series of locations that connect the current location to a goal. We propose a "heading-vector" model in which head-direction cells or their derivatives serve both as key elements in constructing the optimal route and as the straight-line guidance during route execution. The model is based on a memory structure termed the "shortcut matrix," which is constructed during the initial exploration of an environment when a set of shortcut vectors between sequential pairs of visited waypoint locations is stored. A mechanism is proposed for calculating and storing these vectors that relies on a hypothesized cell type termed an "accumulating head-direction cell." Following exploration, shortcut vectors connecting all pairs of waypoint locations are computed by vector arithmetic and stored in the shortcut matrix. On re-entry, when local view or place representations query the shortcut matrix with a current waypoint and goal, a shortcut trajectory is retrieved. Since the trajectory direction is in head-direction compass coordinates, navigation is accomplished by tracking the firing of head-direction cells that are tuned to the heading angle. Section 1 of the manuscript describes the properties of accumulating head-direction cells. It then shows how accumulating head-direction cells can store local vectors and perform vector arithmetic to perform path-integration-based homing. Section 2 describes the construction and use of the shortcut matrix for computing direct paths between any pair of locations that have been registered in the shortcut matrix. In the discussion, we analyze the advantages of heading-based navigation over map-based navigation. Finally, we survey behavioral evidence that nonhippocampal, heading-based navigation is used in small mammals and humans. Copyright 2008 Wiley-Liss, Inc.
Exploring and Making Sense of Large Graphs

DTIC Science & Technology

2015-08-01

and bold) are n × n ; vectors (lower-case bold) are n × 1 column vectors, and scalars (in lower-case plain font) typically correspond to strength of...graph is often denoted as |V| or n . Edges or Links: A finite set E of lines between objects in a graph. The edges represent relationships between the...Adjacency matrix of a simple, unweighted and undirected graph. Adjacency matrix: The adjacency matrix of a graph G is an n × n matrix A, whose element aij
Dataflow Integration and Simulation Techniques for DSP System Design Tools

DTIC Science & Technology

2007-01-01

Lebak, M. Richards , and D. Campbell, “VSIPL: An object-based open standard API for vector, signal, and image processing,” in Proceedings of the...Inc., document Version 0.98a. [56] P. Marwedel and G. Goossens , Eds., Code Generation for Embedded Processors. Kluwer Academic Publishers, 1995. [57
Does the Intel Xeon Phi processor fit HEP workloads?

NASA Astrophysics Data System (ADS)

Nowak, A.; Bitzes, G.; Dotti, A.; Lazzaro, A.; Jarp, S.; Szostek, P.; Valsan, L.; Botezatu, M.; Leduc, J.

2014-06-01

This paper summarizes the five years of CERN openlab's efforts focused on the Intel Xeon Phi co-processor, from the time of its inception to public release. We consider the architecture of the device vis a vis the characteristics of HEP software and identify key opportunities for HEP processing, as well as scaling limitations. We report on improvements and speedups linked to parallelization and vectorization on benchmarks involving software frameworks such as Geant4 and ROOT. Finally, we extrapolate current software and hardware trends and project them onto accelerators of the future, with the specifics of offline and online HEP processing in mind.
A Mueller matrix model of Haidinger's brushes.

PubMed

Misson, Gary P

2003-09-01

Stokes vectors and Mueller matrices are used to model the polarisation properties (birefringence, dichroism and depolarisation) of any optical system, in particular the human eye. An explanation of the form and behaviour of the entoptic phenomenon of Haidinger's brushes is derived that complements and expands upon a previous study. The relationship between the appearance of Haidinger's brushes and intrinsic ocular retardation is quantified and the model allows prediction of the effect of any retarder of any orientation placed between a source of polarised light and the eye. The simple relationship of minimum contrast of Haidinger's brushes to the cosine of total retardation is derived.
Linear scaling computation of the Fock matrix. VI. Data parallel computation of the exchange-correlation matrix

NASA Astrophysics Data System (ADS)

Gan, Chee Kwan; Challacombe, Matt

2003-05-01

Recently, early onset linear scaling computation of the exchange-correlation matrix has been achieved using hierarchical cubature [J. Chem. Phys. 113, 10037 (2000)]. Hierarchical cubature differs from other methods in that the integration grid is adaptive and purely Cartesian, which allows for a straightforward domain decomposition in parallel computations; the volume enclosing the entire grid may be simply divided into a number of nonoverlapping boxes. In our data parallel approach, each box requires only a fraction of the total density to perform the necessary numerical integrations due to the finite extent of Gaussian-orbital basis sets. This inherent data locality may be exploited to reduce communications between processors as well as to avoid memory and copy overheads associated with data replication. Although the hierarchical cubature grid is Cartesian, naive boxing leads to irregular work loads due to strong spatial variations of the grid and the electron density. In this paper we describe equal time partitioning, which employs time measurement of the smallest sub-volumes (corresponding to the primitive cubature rule) to load balance grid-work for the next self-consistent-field iteration. After start-up from a heuristic center of mass partitioning, equal time partitioning exploits smooth variation of the density and grid between iterations to achieve load balance. With the 3-21G basis set and a medium quality grid, equal time partitioning applied to taxol (62 heavy atoms) attained a speedup of 61 out of 64 processors, while for a 110 molecule water cluster at standard density it achieved a speedup of 113 out of 128. The efficiency of equal time partitioning applied to hierarchical cubature improves as the grid work per processor increases. With a fine grid and the 6-311G(df,p) basis set, calculations on the 26 atom molecule α-pinene achieved a parallel efficiency better than 99% with 64 processors. For more coarse grained calculations, superlinear speedups are found to result from reduced computational complexity associated with data parallelism.
High-performance sparse matrix-matrix products on Intel KNL and multicore architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nagasaka, Y; Matsuoka, S; Azad, A

Sparse matrix-matrix multiplication (SpGEMM) is a computational primitive that is widely used in areas ranging from traditional numerical applications to recent big data analysis and machine learning. Although many SpGEMM algorithms have been proposed, hardware specific optimizations for multi- and many-core processors are lacking and a detailed analysis of their performance under various use cases and matrices is not available. We firstly identify and mitigate multiple bottlenecks with memory management and thread scheduling on Intel Xeon Phi (Knights Landing or KNL). Specifically targeting multi- and many-core processors, we develop a hash-table-based algorithm and optimize a heap-based shared-memory SpGEMM algorithm. Wemore » examine their performance together with other publicly available codes. Different from the literature, our evaluation also includes use cases that are representative of real graph algorithms, such as multi-source breadth-first search or triangle counting. Our hash-table and heap-based algorithms are showing significant speedups from libraries in the majority of the cases while different algorithms dominate the other scenarios with different matrix size, sparsity, compression factor and operation type. We wrap up in-depth evaluation results and make a recipe to give the best SpGEMM algorithm for target scenario. A critical finding is that hash-table-based SpGEMM gets a significant performance boost if the nonzeros are not required to be sorted within each row of the output matrix.« less
Dominant phonon wave vectors and strain-induced splitting of the 2D Raman mode of graphene

NASA Astrophysics Data System (ADS)

Narula, Rohit; Bonini, Nicola; Marzari, Nicola; Reich, Stephanie

2012-03-01

The dominant phonon wave vectors q* probed by the 2D Raman mode of pristine and uniaxially strained graphene are determined via a combination of ab initio calculations and a full two-dimensional integration of the transition matrix. We show that q* are highly anisotropic and rotate about K with the polarizer and analyzer condition relative to the lattice. The corresponding phonon-mediated electronic transitions show a finite component along K-Γ that sensitively determines q*. We invalidate the notion of “inner” and “outer” processes. The characteristic splitting of the 2D mode of graphene under uniaxial tensile strain and given polarizer and analyzer setting is correctly predicted only if the strain-induced distortion and red-shift of the in-plane transverse optical (iTO) phonon dispersion as well as the changes in the electronic band structure are taken into account.
Recommended coordinate systems for thin spherocylindrical lenses.

PubMed

Deal, F C; Toop, J

1993-05-01

Because the set of thin spherocylindrical lenses forms a vector space, any such lens can be expressed in terms of its cartesian coordinates with respect to whatever set of basis lenses we may choose. Two types of cartesian coordinate systems have become prominent, those having coordinates associated with the lens power matrix and those having coordinates associated with the Humphrey Vision Analyzer. This paper emphasizes the value of a particular cartesian coordinate system of the latter type, and the cylindrical coordinate system related to it, by showing how it can simplify the trigonometry of adding lenses and how it preserves symmetry in depicting the sets of all spherical lenses, all Jackson crossed-cylinders, and all cylindrical lenses. It also discusses appropriate coordinates for keeping statistics on lenses and shows that an easy extension of the lens vector space to include general optical systems is not possible.
Vectorization for Molecular Dynamics on Intel Xeon Phi Corpocessors

NASA Astrophysics Data System (ADS)

Yi, Hongsuk

2014-03-01

Many modern processors are capable of exploiting data-level parallelism through the use of single instruction multiple data (SIMD) execution. The new Intel Xeon Phi coprocessor supports 512 bit vector registers for the high performance computing. In this paper, we have developed a hierarchical parallelization scheme for accelerated molecular dynamics simulations with the Terfoff potentials for covalent bond solid crystals on Intel Xeon Phi coprocessor systems. The scheme exploits multi-level parallelism computing. We combine thread-level parallelism using a tightly coupled thread-level and task-level parallelism with 512-bit vector register. The simulation results show that the parallel performance of SIMD implementations on Xeon Phi is apparently superior to their x86 CPU architecture.
FPGA-based coprocessor for matrix algorithms implementation

NASA Astrophysics Data System (ADS)

Amira, Abbes; Bensaali, Faycal

2003-03-01

Matrix algorithms are important in many types of applications including image and signal processing. These areas require enormous computing power. A close examination of the algorithms used in these, and related, applications reveals that many of the fundamental actions involve matrix operations such as matrix multiplication which is of O (N3) on a sequential computer and O (N3/p) on a parallel system with p processors complexity. This paper presents an investigation into the design and implementation of different matrix algorithms such as matrix operations, matrix transforms and matrix decompositions using an FPGA based environment. Solutions for the problem of processing large matrices have been proposed. The proposed system architectures are scalable, modular and require less area and time complexity with reduced latency when compared with existing structures.
Vector assembly of colloids on monolayer substrates

NASA Astrophysics Data System (ADS)

Jiang, Lingxiang; Yang, Shenyu; Tsang, Boyce; Tu, Mei; Granick, Steve

2017-06-01

The key to spontaneous and directed assembly is to encode the desired assembly information to building blocks in a programmable and efficient way. In computer graphics, raster graphics encodes images on a single-pixel level, conferring fine details at the expense of large file sizes, whereas vector graphics encrypts shape information into vectors that allow small file sizes and operational transformations. Here, we adapt this raster/vector concept to a 2D colloidal system and realize `vector assembly' by manipulating particles on a colloidal monolayer substrate with optical tweezers. In contrast to raster assembly that assigns optical tweezers to each particle, vector assembly requires a minimal number of optical tweezers that allow operations like chain elongation and shortening. This vector approach enables simple uniform particles to form a vast collection of colloidal arenes and colloidenes, the spontaneous dissociation of which is achieved with precision and stage-by-stage complexity by simply removing the optical tweezers.

Parallel halftoning technique using dot diffusion optimization

NASA Astrophysics Data System (ADS)

Molina-Garcia, Javier; Ponomaryov, Volodymyr I.; Reyes-Reyes, Rogelio; Cruz-Ramos, Clara

2017-05-01

In this paper, a novel approach for halftone images is proposed and implemented for images that are obtained by the Dot Diffusion (DD) method. Designed technique is based on an optimization of the so-called class matrix used in DD algorithm and it consists of generation new versions of class matrix, which has no baron and near-baron in order to minimize inconsistencies during the distribution of the error. Proposed class matrix has different properties and each is designed for two different applications: applications where the inverse-halftoning is necessary, and applications where this method is not required. The proposed method has been implemented in GPU (NVIDIA GeForce GTX 750 Ti), multicore processors (AMD FX(tm)-6300 Six-Core Processor and in Intel core i5-4200U), using CUDA and OpenCV over a PC with linux. Experimental results have shown that novel framework generates a good quality of the halftone images and the inverse halftone images obtained. The simulation results using parallel architectures have demonstrated the efficiency of the novel technique when it is implemented in real-time processing.
Rotman Lens Sidewall Design and Optimization with Hybrid Hardware/Software Based Programming

DTIC Science & Technology

2015-01-09

conventional MoM and stored in memory. The components of Zfar are computed as needed through a fast matrix vector multiplication ( MVM ), which...V vector. Iterative methods, e.g. BiCGSTAB, are employed for solving the linear equation. The matrix-vector multiplications ( MVMs ), which dominate...most of the computation in the solving phase, consists of calculating near and far MVMs . The far MVM comprises aggregation, translation, and
The Forest Method as a New Parallel Tree Method with the Sectional Voronoi Tessellation

NASA Astrophysics Data System (ADS)

Yahagi, Hideki; Mori, Masao; Yoshii, Yuzuru

1999-09-01

We have developed a new parallel tree method which will be called the forest method hereafter. This new method uses the sectional Voronoi tessellation (SVT) for the domain decomposition. The SVT decomposes a whole space into polyhedra and allows their flat borders to move by assigning different weights. The forest method determines these weights based on the load balancing among processors by means of the overload diffusion (OLD). Moreover, since all the borders are flat, before receiving the data from other processors, each processor can collect enough data to calculate the gravity force with precision. Both the SVT and the OLD are coded in a highly vectorizable manner to accommodate on vector parallel processors. The parallel code based on the forest method with the Message Passing Interface is run on various platforms so that a wide portability is guaranteed. Extensive calculations with 15 processors of Fujitsu VPP300/16R indicate that the code can calculate the gravity force exerted on 105 particles in each second for some ideal dark halo. This code is found to enable an N-body simulation with 107 or more particles for a wide dynamic range and is therefore a very powerful tool for the study of galaxy formation and large-scale structure in the universe.
Optical microwave filter based on spectral slicing by use of arrayed waveguide gratings.

PubMed

Pastor, Daniel; Ortega, Beatriz; Capmany, José; Sales, Salvador; Martinez, Alfonso; Muñoz, Pascual

2003-10-01

We have experimentally demonstrated a new optical signal processor based on the use of arrayed waveguide gratings. The structure exploits the concept of spectral slicing combined with the use of an optical dispersive medium. The approach presents increased flexibility from previous slicing-based structures in terms of tunability, reconfiguration, and apodization of the samples or coefficients of the transversal optical filter.
Transistor Laser Optical NOR Gate for High Speed Optical Logic Processors

DTIC Science & Technology

2017-03-20

proposes an optical bistable latch can be built with two universal photonic NOR gate circuits, which are implemented by the three-port tunneling ... Tunneling Junction Transistor Laser (TJ-TL); Optical NOR Gate. Introduction To fulfill the future national security and intelligence needs in this...two-terminal diode lasers. Three-Port Transistor Laser – an Integration of Quantum-Wells into Heterojunction Bipolar Transistor Different than
A MIMO-Inspired Rapidly Switchable Photonic Interconnect Architecture (Postprint)

DTIC Science & Technology

2009-07-01

capabilities of future systems. Highspeed optical processing has been looked to as a means for eliminating this interconnect bottleneck. Presented...here are the results of a study for a novel optical (integrated photonic) processor which would allow for a high-speed, secure means for arbitrarily...regarded as a Multiple Input Multiple Output (MIMO) architecture. 15. SUBJECT TERMS Free-space optical interconnects, Optical Phased Arrays, High-Speed
Software for System for Controlling a Magnetically Levitated Rotor

NASA Technical Reports Server (NTRS)

Morrison, Carlos R. (Inventor)

2004-01-01

In a rotor assembly having a rotor supported for rotation by magnetic bearings, a processor controlled by software or firmware controls the generation of force vectors that position the rotor relative to its bearings in a 'bounce' mode in which the rotor axis is displaced from the principal axis defined between the bearings and a 'tilt' mode in which the rotor axis is tilted or inclined relative to the principal axis. Waveform driven perturbations are introduced to generate force vectors that excite the rotor in either the 'bounce' or 'tilt' modes.
System for Controlling a Magnetically Levitated Rotor

NASA Technical Reports Server (NTRS)

Morrison, Carlos R. (Inventor)

2006-01-01

In a rotor assembly having a rotor supported for rotation by magnetic bearings, a processor controlled by software or firmware controls the generation of force vectors that position the rotor relative to its bearings in a "bounce" mode in which the rotor axis is displaced from the principal axis defined between the bearings and a "tilt" mode in which the rotor axis is tilted or inclined relative to the principal axis. Waveform driven perturbations are introduced to generate force vectors that excite the rotor in either the "bounce" or "tilt" modes.
MOSAIC - A space-multiplexing technique for optical processing of large images

NASA Technical Reports Server (NTRS)

Athale, Ravindra A.; Astor, Michael E.; Yu, Jeffrey

1993-01-01

A technique for Fourier processing of images larger than the space-bandwidth products of conventional or smart spatial light modulators and two-dimensional detector arrays is described. The technique involves a spatial combination of subimages displayed on individual spatial light modulators to form a phase-coherent image, which is subsequently processed with Fourier optical techniques. Because of the technique's similarity with the mosaic technique used in art, the processor used is termed an optical MOSAIC processor. The phase accuracy requirements of this system were studied by computer simulation. It was found that phase errors of less than lambda/8 did not degrade the performance of the system and that the system was relatively insensitive to amplitude nonuniformities. Several schemes for implementing the subimage combination are described. Initial experimental results demonstrating the validity of the mosaic concept are also presented.
Hypergraph-Based Combinatorial Optimization of Matrix-Vector Multiplication

ERIC Educational Resources Information Center

Wolf, Michael Maclean

2009-01-01

Combinatorial scientific computing plays an important enabling role in computational science, particularly in high performance scientific computing. In this thesis, we will describe our work on optimizing matrix-vector multiplication using combinatorial techniques. Our research has focused on two different problems in combinatorial scientific…
Effect of processor temperature on film dosimetry

DOE Office of Scientific and Technical Information (OSTI.GOV)

Srivastava, Shiv P.; Das, Indra J., E-mail: idas@iupui.edu

2012-07-01

Optical density (OD) of a radiographic film plays an important role in radiation dosimetry, which depends on various parameters, including beam energy, depth, field size, film batch, dose, dose rate, air film interface, postexposure processing time, and temperature of the processor. Most of these parameters have been studied for Kodak XV and extended dose range (EDR) films used in radiation oncology. There is very limited information on processor temperature, which is investigated in this study. Multiple XV and EDR films were exposed in the reference condition (d{sub max.}, 10 Multiplication-Sign 10 cm{sup 2}, 100 cm) to a given dose. Anmore » automatic film processor (X-Omat 5000) was used for processing films. The temperature of the processor was adjusted manually with increasing temperature. At each temperature, a set of films was processed to evaluate OD at a given dose. For both films, OD is a linear function of processor temperature in the range of 29.4-40.6 Degree-Sign C (85-105 Degree-Sign F) for various dose ranges. The changes in processor temperature are directly related to the dose by a quadratic function. A simple linear equation is provided for the changes in OD vs. processor temperature, which could be used for correcting dose in radiation dosimetry when film is used.« less
Bit-parallel arithmetic in a massively-parallel associative processor

NASA Technical Reports Server (NTRS)

Scherson, Isaac D.; Kramer, David A.; Alleyne, Brian D.

1992-01-01

A simple but powerful new architecture based on a classical associative processor model is presented. Algorithms for performing the four basic arithmetic operations both for integer and floating point operands are described. For m-bit operands, the proposed architecture makes it possible to execute complex operations in O(m) cycles as opposed to O(m exp 2) for bit-serial machines. A word-parallel, bit-parallel, massively-parallel computing system can be constructed using this architecture with VLSI technology. The operation of this system is demonstrated for the fast Fourier transform and matrix multiplication.
Distributed Matrix Completion: Application to Cooperative Positioning in Noisy Environments

DTIC Science & Technology

2013-12-11

positioning, and a gossip version of low-rank approximation were developed. A convex relaxation for positioning in the presence of noise was shown to...of a large data matrix through gossip algorithms. A new algorithm is proposed that amounts to iteratively multiplying a vector by independent random...sparsification of the original matrix and averaging the resulting normalized vectors. This can be viewed as a generalization of gossip algorithms for
Parallel-vector computation for linear structural analysis and non-linear unconstrained optimization problems

NASA Technical Reports Server (NTRS)

Nguyen, D. T.; Al-Nasra, M.; Zhang, Y.; Baddourah, M. A.; Agarwal, T. K.; Storaasli, O. O.; Carmona, E. A.

1991-01-01

Several parallel-vector computational improvements to the unconstrained optimization procedure are described which speed up the structural analysis-synthesis process. A fast parallel-vector Choleski-based equation solver, pvsolve, is incorporated into the well-known SAP-4 general-purpose finite-element code. The new code, denoted PV-SAP, is tested for static structural analysis. Initial results on a four processor CRAY 2 show that using pvsolve reduces the equation solution time by a factor of 14-16 over the original SAP-4 code. In addition, parallel-vector procedures for the Golden Block Search technique and the BFGS method are developed and tested for nonlinear unconstrained optimization. A parallel version of an iterative solver and the pvsolve direct solver are incorporated into the BFGS method. Preliminary results on nonlinear unconstrained optimization test problems, using pvsolve in the analysis, show excellent parallel-vector performance indicating that these parallel-vector algorithms can be used in a new generation of finite-element based structural design/analysis-synthesis codes.
A Perron-Frobenius theory for block matrices associated to a multiplex network

NASA Astrophysics Data System (ADS)

Romance, Miguel; Solá, Luis; Flores, Julio; García, Esther; García del Amo, Alejandro; Criado, Regino

2015-03-01

The uniqueness of the Perron vector of a nonnegative block matrix associated to a multiplex network is discussed. The conclusions come from the relationships between the irreducibility of some nonnegative block matrix associated to a multiplex network and the irreducibility of the corresponding matrices to each layer as well as the irreducibility of the adjacency matrix of the projection network. In addition the computation of that Perron vector in terms of the Perron vectors of the blocks is also addressed. Finally we present the precise relations that allow to express the Perron eigenvector of the multiplex network in terms of the Perron eigenvectors of its layers.
Separable decompositions of bipartite mixed states

NASA Astrophysics Data System (ADS)

Li, Jun-Li; Qiao, Cong-Feng

2018-04-01

We present a practical scheme for the decomposition of a bipartite mixed state into a sum of direct products of local density matrices, using the technique developed in Li and Qiao (Sci. Rep. 8:1442, 2018). In the scheme, the correlation matrix which characterizes the bipartite entanglement is first decomposed into two matrices composed of the Bloch vectors of local states. Then, we show that the symmetries of Bloch vectors are consistent with that of the correlation matrix, and the magnitudes of the local Bloch vectors are lower bounded by the correlation matrix. Concrete examples for the separable decompositions of bipartite mixed states are presented for illustration.
Errors induced by the neglect of polarization in radiance calculations for Rayleigh-scattering atmospheres

NASA Technical Reports Server (NTRS)

Mishchenko, M. I.; Lacis, A. A.; Travis, L. D.

1994-01-01

Although neglecting polarization and replacing the rigorous vector radiative transfer equation by its approximate scalar counterpart has no physical background, it is a widely used simplification when the incident light is unpolarized and only the intensity of the reflected light is to be computed. We employ accurate vector and scalar multiple-scattering calculations to perform a systematic study of the errors induced by the neglect of polarization in radiance calculations for a homogeneous, plane-parallel Rayleigh-scattering atmosphere (with and without depolarization) above a Lambertian surface. Specifically, we calculate percent errors in the reflected intensity for various directions of light incidence and reflection, optical thicknesses of the atmosphere, single-scattering albedos, depolarization factors, and surface albedos. The numerical data displayed can be used to decide whether or not the scalar approximation may be employed depending on the parameters of the problem. We show that the errors decrease with increasing depolarization factor and/or increasing surface albedo. For conservative or nearly conservative scattering and small surface albedos, the errors are maximum at optical thicknesses of about 1. The calculated errors may be too large for some practical applications, and, therefore, rigorous vector calculations should be employed whenever possible. However, if approximate scalar calculations are used, we recommend to avoid geometries involving phase angles equal or close to 0 deg and 90 deg, where the errors are especially significant. We propose a theoretical explanation of the large vector/scalar differences in the case of Rayleigh scattering. According to this explanation, the differences are caused by the particular structure of the Rayleigh scattering matrix and come from lower-order (except first-order) light scattering paths involving right scattering angles and right-angle rotations of the scattering plane.
Experiments applications guide: Advanced Communications Technology Satellite (ACTS)

NASA Technical Reports Server (NTRS)

1988-01-01

This applications guide first surveys the capabilities of the Advanced Communication Technology Satellite (ACTS) system (both the flight and ground segments). This overview is followed by a description of the baseband processor (BBP) and microwave switch matrix (MSM) operating modes. Terminals operating with the baseband processor are referred to as low burst rate (LBR); and those operating with the microwave switch matrix, as high burst rate (HBR). Three very small-aperture terminals (VSATs), LBR-1, LBR-2, and HBR, are described for various ACTS operating modes. Also described is the NASA Lewis link evaluation terminal. A section on ACTS experiment opportunities introduces a wide spectrum of network control, telecommunications, system, and scientific experiments. The performance of the VSATs is discussed in detail. This guide is intended as a catalyst to encourage participation by the telecommunications, business, and science communities in a broad spectrum of experiments.
Experimental demonstration of selective quantum process tomography on an NMR quantum information processor

NASA Astrophysics Data System (ADS)

Gaikwad, Akshay; Rehal, Diksha; Singh, Amandeep; Arvind, Dorai, Kavita

2018-02-01

We present the NMR implementation of a scheme for selective and efficient quantum process tomography without ancilla. We generalize this scheme such that it can be implemented efficiently using only a set of measurements involving product operators. The method allows us to estimate any element of the quantum process matrix to a desired precision, provided a set of quantum states can be prepared efficiently. Our modified technique requires fewer experimental resources as compared to the standard implementation of selective and efficient quantum process tomography, as it exploits the special nature of NMR measurements to allow us to compute specific elements of the process matrix by a restrictive set of subsystem measurements. To demonstrate the efficacy of our scheme, we experimentally tomograph the processes corresponding to "no operation," a controlled-NOT (CNOT), and a controlled-Hadamard gate on a two-qubit NMR quantum information processor, with high fidelities.
Pre-coding assisted generation of a frequency quadrupled optical vector D-band millimeter wave with one Mach-Zehnder modulator.

PubMed

Zhou, Wen; Li, Xinying; Yu, Jianjun

2017-10-30

We propose QPSK millimeter-wave (mm-wave) vector signal generation for D-band based on balanced precoding-assisted photonic frequency quadrupling technology employing a single intensity modulator without an optical filter. The intensity MZM is driven by a balanced pre-coding 37-GHz QPSK RF signal. The modulated optical subcarriers are directly sent into the single ended photodiode to generate 148-GHz QPSK vector signal. We experimentally demonstrate 1-Gbaud 148-GHz QPSK mm-wave vector signal generation, and investigate the bit-error-rate (BER) performance of the vector signals at 148-GHz. The experimental results show that the BER value can be achieved as low as 1.448 × 10 -3 when the optical power into photodiode is 8.8dBm. To the best of our knowledge, it is the first time to realize the frequency-quadrupling vector mm-wave signal generation at D-band based on only one MZM without an optical filter.

Quantum Chemical Calculations Using Accelerators: Migrating Matrix Operations to the NVIDIA Kepler GPU and the Intel Xeon Phi.

PubMed

Leang, Sarom S; Rendell, Alistair P; Gordon, Mark S

2014-03-11

Increasingly, modern computer systems comprise a multicore general-purpose processor augmented with a number of special purpose devices or accelerators connected via an external interface such as a PCI bus. The NVIDIA Kepler Graphical Processing Unit (GPU) and the Intel Phi are two examples of such accelerators. Accelerators offer peak performances that can be well above those of the host processor. How to exploit this heterogeneous environment for legacy application codes is not, however, straightforward. This paper considers how matrix operations in typical quantum chemical calculations can be migrated to the GPU and Phi systems. Double precision general matrix multiply operations are endemic in electronic structure calculations, especially methods that include electron correlation, such as density functional theory, second order perturbation theory, and coupled cluster theory. The use of approaches that automatically determine whether to use the host or an accelerator, based on problem size, is explored, with computations that are occurring on the accelerator and/or the host. For data-transfers over PCI-e, the GPU provides the best overall performance for data sizes up to 4096 MB with consistent upload and download rates between 5-5.6 GB/s and 5.4-6.3 GB/s, respectively. The GPU outperforms the Phi for both square and nonsquare matrix multiplications.
FFT Computation with Systolic Arrays, A New Architecture

NASA Technical Reports Server (NTRS)

Boriakoff, Valentin

1994-01-01

The use of the Cooley-Tukey algorithm for computing the l-d FFT lends itself to a particular matrix factorization which suggests direct implementation by linearly-connected systolic arrays. Here we present a new systolic architecture that embodies this algorithm. This implementation requires a smaller number of processors and a smaller number of memory cells than other recent implementations, as well as having all the advantages of systolic arrays. For the implementation of the decimation-in-frequency case, word-serial data input allows continuous real-time operation without the need of a serial-to-parallel conversion device. No control or data stream switching is necessary. Computer simulation of this architecture was done in the context of a 1024 point DFT with a fixed point processor, and CMOS processor implementation has started.
Cryptanalysis and security enhancement of optical cryptography based on computational ghost imaging

NASA Astrophysics Data System (ADS)

Yuan, Sheng; Yao, Jianbin; Liu, Xuemei; Zhou, Xin; Li, Zhongyang

2016-04-01

Optical cryptography based on computational ghost imaging (CGI) has attracted much attention of researchers because it encrypts plaintext into a random intensity vector rather than complexed-valued function. This promising feature of the CGI-based cryptography reduces the amount of data to be transmitted and stored and therefore brings convenience in practice. However, we find that this cryptography is vulnerable to chosen-plaintext attack because of the linear relationship between the input and output of the encryption system, and three feasible strategies are proposed to break it in this paper. Even though a large number of plaintexts need to be chosen in these attack methods, it means that this cryptography still exists security risks. To avoid these attacks, a security enhancement method utilizing an invertible matrix modulation is further discussed and the feasibility is verified by numerical simulations.
SHIP, a novel factor to ameliorate extracellular matrix accumulation via suppressing PI3K/Akt/CTGF signaling in diabetic kidney disease.

PubMed

Li, Fan; Li, Lisha; Cheng, Meijuan; Wang, Xiumin; Hao, Jun; Liu, Shuxia; Duan, Huijun

2017-01-22

Tubular interstitial extracellular matrix accumulation, which plays a key role in the pathogenesis and progression of diabetic kidney disease (DKD), is believed to be mediated by activation of PI3K/Akt signal pathway. However, it is still not clear whether SH2 domain-containing inositol 5'-phosphatase (SHIP), known as a negative regulator of PI3K/Akt pathway is also involved in extracellular matrix metabolism of diabetic kidney. In the present study, decreased SHIP and increased phospho-Akt (Ser 473, Thr 308) were found in renal tubular cells of diabetic mice accompanied by overexpression of connective tissue growth factor (CTGF) and extracellular matrix deposition versus normal mice. Again, high glucose attenuated SHIP expression in a time-dependent manner, concomitant with activation of PI3K/Akt signaling and extracellular matrix production in human renal proximal tubular epithelial cells (HK2) cultured in vitro, which was significantly prevented by transfection of M90-SHIP vector. Furthermore, in vivo delivery of rAd-INPP5D vector (SHIP expression vector) via intraperitoneal injection in diabetic mice increased SHIP expression by 3.36 times followed by 65.26%, 70.38% and 46.71% decreases of phospho-Akt (Ser 473), phospho-Akt (Thr 308) and CTGF expression versus diabetic mice receiving rAd-EGFP vector. Meanwhile, increased renal extracellular matrix accumulation of diabetic mice was also inhibited with intraperitoneal injection of rAd-INPP5D vector. These above data suggested that overexpression of SHIP might be a potent method to lessen renal extracellular matrix accumulation via inactivation of PI3K/Akt pathway and suppression of CTGF expression in DKD. Copyright © 2016 Elsevier Inc. All rights reserved.
Optical links in handheld multimedia devices

NASA Astrophysics Data System (ADS)

van Geffen, S.; Duis, J.; Miller, R.

2008-04-01

Ever emerging applications in handheld multimedia devices such as mobile phones, laptop computers, portable video games and digital cameras requiring increased screen resolutions are driving higher aggregate bitrates between host processor and display(s) enabling services such as mobile video conferencing, video on demand and TV broadcasting. Larger displays and smaller phones require complex mechanical 3D hinge configurations striving to combine maximum functionality with compact building volumes. Conventional galvanic interconnections such as Micro-Coax and FPC carrying parallel digital data between host processor and display module may produce Electromagnetic Interference (EMI) and bandwidth limitations caused by small cable size and tight cable bends. To reduce the number of signals through a hinge, the mobile phone industry, organized in the MIPI (Mobile Industry Processor Interface) alliance, is currently defining an electrical interface transmitting serialized digital data at speeds >1Gbps. This interface allows for electrical or optical interconnects. Above 1Gbps optical links may offer a cost effective alternative because of their flexibility, increased bandwidth and immunity to EMI. This paper describes the development of optical links for handheld communication devices. A cable assembly based on a special Plastic Optical Fiber (POF) selected for its mechanical durability is terminated with a small form factor molded lens assembly which interfaces between an 850nm VCSEL transmitter and a receiving device on the printed circuit board of the display module. A statistical approach based on a Lean Design For Six Sigma (LDFSS) roadmap for new product development tries to find an optimum link definition which will be robust and low cost meeting the power consumption requirements appropriate for battery operated systems.
On the use of finite difference matrix-vector products in Newton-Krylov solvers for implicit climate dynamics with spectral elements

DOE PAGES

Woodward, Carol S.; Gardner, David J.; Evans, Katherine J.

2015-01-01

Efficient solutions of global climate models require effectively handling disparate length and time scales. Implicit solution approaches allow time integration of the physical system with a step size governed by accuracy of the processes of interest rather than by stability of the fastest time scales present. Implicit approaches, however, require the solution of nonlinear systems within each time step. Usually, a Newton's method is applied to solve these systems. Each iteration of the Newton's method, in turn, requires the solution of a linear model of the nonlinear system. This model employs the Jacobian of the problem-defining nonlinear residual, but thismore » Jacobian can be costly to form. If a Krylov linear solver is used for the solution of the linear system, the action of the Jacobian matrix on a given vector is required. In the case of spectral element methods, the Jacobian is not calculated but only implemented through matrix-vector products. The matrix-vector multiply can also be approximated by a finite difference approximation which may introduce inaccuracy in the overall nonlinear solver. In this paper, we review the advantages and disadvantages of finite difference approximations of these matrix-vector products for climate dynamics within the spectral element shallow water dynamical core of the Community Atmosphere Model.« less
Interdisciplinary education in optics and photonics based on microcontrollers

NASA Astrophysics Data System (ADS)

Dreßler, Paul; Wielage, Heinz-Hermann; Haiss, Ulrich; Vauderwange, Oliver; Curticapean, Dan

2014-07-01

Not only is the number of new devices constantly increasing, but so is their application complexity and power. Most of their applications are in optics, photonics, acoustic and mobile devices. Working speed and functionality is achieved in most of media devices by strategic use of digital signal processors and microcontrollers of the new generation. Considering all these premises of media development dynamics, the authors present how to integrate microcontrollers and digital signal processors in the curricula of media technology lectures by using adequate content. This also includes interdisciplinary content that consists of using the acquired knowledge in media software. These entries offer a deeper understanding of photonics, acoustics and media engineering.
Implementation of a High-Speed FPGA and DSP Based FFT Processor for Improving Strain Demodulation Performance in a Fiber-Optic-Based Sensing System

NASA Technical Reports Server (NTRS)

Farley, Douglas L.

2005-01-01

NASA's Aviation Safety and Security Program is pursuing research in on-board Structural Health Management (SHM) technologies for purposes of reducing or eliminating aircraft accidents due to system and component failures. Under this program, NASA Langley Research Center (LaRC) is developing a strain-based structural health-monitoring concept that incorporates a fiber optic-based measuring system for acquiring strain values. This fiber optic-based measuring system provides for the distribution of thousands of strain sensors embedded in a network of fiber optic cables. The resolution of strain value at each discrete sensor point requires a computationally demanding data reduction software process that, when hosted on a conventional processor, is not suitable for near real-time measurement. This report describes the development and integration of an alternative computing environment using dedicated computing hardware for performing the data reduction. Performance comparison between the existing and the hardware-based system is presented.
EGR distribution and fluctuation probe based on CO.sub.2 measurements

DOEpatents

Parks, II, James E; Partridge, Jr., William P; Yoo, Ji Hyung

2015-04-07

A diagnostic system having a single-port EGR probe and a method for using the same. The system includes a light source, an EGR probe, a detector and a processor. The light source may provide a combined light beam composed of light from a mid-infrared signal source and a mid-infrared reference source. The signal source may be centered at 4.2 .mu.m and the reference source may be centered at 3.8 .mu.m. The EGR probe may be a single-port probe with internal optics and a sampling chamber with two flow cells arranged along the light path in series. The optics may include a lens for focusing the light beam and a mirror for reflecting the light beam received from a pitch optical cable to a catch optical cable. The signal and reference sources are modulated at different frequencies, thereby allowing them to be separated and the signal normalized by the processor.
Compact time- and space-integrating SAR processor: performance analysis

NASA Astrophysics Data System (ADS)

Haney, Michael W.; Levy, James J.; Michael, Robert R., Jr.; Christensen, Marc P.

1995-06-01

Progress made during the previous 12 months toward the fabrication and test of a flight demonstration prototype of the acousto-optic time- and space-integrating real-time SAR image formation processor is reported. Compact, rugged, and low-power analog optical signal processing techniques are used for the most computationally taxing portions of the SAR imaging problem to overcome the size and power consumption limitations of electronic approaches. Flexibility and performance are maintained by the use of digital electronics for the critical low-complexity filter generation and output image processing functions. The results reported for this year include tests of a laboratory version of the RAPID SAR concept on phase history data generated from real SAR high-resolution imagery; a description of the new compact 2D acousto-optic scanner that has a 2D space bandwidth product approaching 106 sports, specified and procured for NEOS Technologies during the last year; and a design and layout of the optical module portion of the flight-worthy prototype.
Photonics for aerospace sensors

NASA Astrophysics Data System (ADS)

Pellegrino, John; Adler, Eric D.; Filipov, Andree N.; Harrison, Lorna J.; van der Gracht, Joseph; Smith, Dale J.; Tayag, Tristan J.; Viveiros, Edward A.

1992-11-01

The maturation in the state-of-the-art of optical components is enabling increased applications for the technology. Most notable is the ever-expanding market for fiber optic data and communications links, familiar in both commercial and military markets. The inherent properties of optics and photonics, however, have suggested that components and processors may be designed that offer advantages over more commonly considered digital approaches for a variety of airborne sensor and signal processing applications. Various academic, industrial, and governmental research groups have been actively investigating and exploiting these properties of high bandwidth, large degree of parallelism in computation (e.g., processing in parallel over a two-dimensional field), and interconnectivity, and have succeeded in advancing the technology to the stage of systems demonstration. Such advantages as computational throughput and low operating power consumption are highly attractive for many computationally intensive problems. This review covers the key devices necessary for optical signal and image processors, some of the system application demonstration programs currently in progress, and active research directions for the implementation of next-generation architectures.
Speeding Up Non-Parametric Bootstrap Computations for Statistics Based on Sample Moments in Small/Moderate Sample Size Applications

PubMed Central

Chaibub Neto, Elias

2015-01-01

In this paper we propose a vectorized implementation of the non-parametric bootstrap for statistics based on sample moments. Basically, we adopt the multinomial sampling formulation of the non-parametric bootstrap, and compute bootstrap replications of sample moment statistics by simply weighting the observed data according to multinomial counts instead of evaluating the statistic on a resampled version of the observed data. Using this formulation we can generate a matrix of bootstrap weights and compute the entire vector of bootstrap replications with a few matrix multiplications. Vectorization is particularly important for matrix-oriented programming languages such as R, where matrix/vector calculations tend to be faster than scalar operations implemented in a loop. We illustrate the application of the vectorized implementation in real and simulated data sets, when bootstrapping Pearson’s sample correlation coefficient, and compared its performance against two state-of-the-art R implementations of the non-parametric bootstrap, as well as a straightforward one based on a for loop. Our investigations spanned varying sample sizes and number of bootstrap replications. The vectorized bootstrap compared favorably against the state-of-the-art implementations in all cases tested, and was remarkably/considerably faster for small/moderate sample sizes. The same results were observed in the comparison with the straightforward implementation, except for large sample sizes, where the vectorized bootstrap was slightly slower than the straightforward implementation due to increased time expenditures in the generation of weight matrices via multinomial sampling. PMID:26125965
Real-time implementing wavefront reconstruction for adaptive optics

NASA Astrophysics Data System (ADS)

Wang, Caixia; Li, Mei; Wang, Chunhong; Zhou, Luchun; Jiang, Wenhan

2004-12-01

The capability of real time wave-front reconstruction is important for an adaptive optics (AO) system. The bandwidth of system and the real-time processing ability of the wave-front processor is mainly affected by the speed of calculation. The system requires enough number of subapertures and high sampling frequency to compensate atmospheric turbulence. The number of reconstruction operation is increased accordingly. Since the performance of AO system improves with the decrease of calculation latency, it is necessary to study how to increase the speed of wavefront reconstruction. There are two methods to improve the real time of the reconstruction. One is to convert the wavefront reconstruction matrix, such as by wavelet or FFT. The other is enhancing the performance of the processing element. Analysis shows that the latency cutting is performed with the cost of reconstruction precision by the former method. In this article, the latter method is adopted. From the characteristic of the wavefront reconstruction algorithm, a systolic array by FPGA is properly designed to implement real-time wavefront reconstruction. The system delay is reduced greatly by the utilization of pipeline and parallel processing. The minimum latency of reconstruction is the reconstruction calculation of one subaperture.
Development of software for the MSFC solar vector magnetograph

NASA Technical Reports Server (NTRS)

Kineke, Jack

1996-01-01

The Marshall Space Flight Center Solar Vector Magnetograph is a special purpose telescope used to measure the vector magnetic field in active areas on the surface of the sun. This instrument measures the linear and circular polarization intensities (the Stokes vectors Q, U and V) produced by the Zeeman effect on a specific spectral line due to the solar magnetic field from which the longitudinal and transverse components of the magnetic field may be determined. Beginning in 1990 as a Summer Faculty Fellow in project JOVE and continuing under NASA Grant NAG8-1042, the author has been developing computer software to perform these computations, first using a DEC MicroVAX system equipped with a high speed array processor, and more recently using a DEC AXP/OSF system. This summer's work is a continuation of this development.
A vector scanning processing technique for pulsed laser velocimetry

NASA Technical Reports Server (NTRS)

Wernet, Mark P.; Edwards, Robert V.

1989-01-01

Pulsed-laser-sheet velocimetry yields two-dimensional velocity vectors across an extended planar region of a flow. Current processing techniques offer high-precision (1-percent) velocity estimates, but can require hours of processing time on specialized array processors. Sometimes, however, a less accurate (about 5 percent) data-reduction technique which also gives unambiguous velocity vector information is acceptable. Here, a direct space-domain processing technique is described and shown to be far superior to previous methods in achieving these objectives. It uses a novel data coding and reduction technique and has no 180-deg directional ambiguity. A complex convection vortex flow was recorded and completely processed in under 2 min on an 80386-based PC, producing a two-dimensional velocity-vector map of the flowfield. Pulsed-laser velocimetry data can thus be reduced quickly and reasonably accurately, without specialized array processing hardware.
Research on the application of a decoupling algorithm for structure analysis

NASA Technical Reports Server (NTRS)

Denman, E. D.

1980-01-01

The mathematical theory for decoupling mth-order matrix differential equations is presented. It is shown that the decoupling precedure can be developed from the algebraic theory of matrix polynomials. The role of eigenprojectors and latent projectors in the decoupling process is discussed and the mathematical relationships between eigenvalues, eigenvectors, latent roots, and latent vectors are developed. It is shown that the eigenvectors of the companion form of a matrix contains the latent vectors as a subset. The spectral decomposition of a matrix and the application to differential equations is given.
Optoelectronic Technology Consortium: Precompetitive Consortium for Optoelectronic Interconnect Technology

DTIC Science & Technology

1992-09-01

demonstrating the producibility of optoelectronic components for high-density/high-data-rate processors and accelerating the insertion of this technology...technology development stage, OETC will advance the development of optical components, produce links for a multiboard processor testbed demonstration, and...components that are affordable, initially at <$100 per line, and reliable, with a li~e BER᝺-15 and MTTF >10 6 hours. Under the OETC program, Honeywell will
Subpicosecond Optical Digital Computation Using Conjugate Parametric Generators

DTIC Science & Technology

1989-03-31

Using Phase Conjugate Farametric Generators ..... 12. PERSONAL AUTHOR(S) Alfano, Robert- Eichmann . George; Dorsinville. Roger! Li. Yao 13a. TYPE OF...conjugation-based optical residue arithmetic processor," Y. Li, G. Eichmann , R. Dorsinville, and R. R. Alfano, Opt. Lett. 13, (1988). [2] "Parallel ultrafast...optical digital and symbolic computation via optical phase conjugation," Y. Li, G. Eichmann , R. Dorsinville, Appl. Opt. 27, 2025 (1988). [3
Improved Magnetic STAR Methods for Real-Time, Point-by-Point Localization of Unexploded Ordnance and Buried Mines

DTIC Science & Technology

2008-09-01

of magnetic UXO. The prototype STAR Sensor comprises: a) A cubic array of eight fluxgate magnetometers . b) A 24-channel data acquisition/signal...array (shaded boxes) of eight low noise Triaxial Fluxgate Magnetometers (TFM) develops 24 channels of vector B- field data. Processor hardware
Scalable Vector Media-processors for Embedded Systems

DTIC Science & Technology

2002-05-01

Set Architecture for Multimedia “When you do the common things in life in an uncommon way, you will command the attention of the world.” George ...Bibliography [ABHS89] M. August, G. Brost , C. Hsiung, and C. Schiffleger. Cray X-MP: The Birth of a Super- computer. IEEE Computer, 22(1):45–52, January

Implementation and analysis of a Navier-Stokes algorithm on parallel computers

NASA Technical Reports Server (NTRS)

Fatoohi, Raad A.; Grosch, Chester E.

1988-01-01

The results of the implementation of a Navier-Stokes algorithm on three parallel/vector computers are presented. The object of this research is to determine how well, or poorly, a single numerical algorithm would map onto three different architectures. The algorithm is a compact difference scheme for the solution of the incompressible, two-dimensional, time-dependent Navier-Stokes equations. The computers were chosen so as to encompass a variety of architectures. They are the following: the MPP, an SIMD machine with 16K bit serial processors; Flex/32, an MIMD machine with 20 processors; and Cray/2. The implementation of the algorithm is discussed in relation to these architectures and measures of the performance on each machine are given. The basic comparison is among SIMD instruction parallelism on the MPP, MIMD process parallelism on the Flex/32, and vectorization of a serial code on the Cray/2. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Finally, conclusions are presented.
Parallel/Vector Integration Methods for Dynamical Astronomy

NASA Astrophysics Data System (ADS)

Fukushima, Toshio

1999-01-01

This paper reviews three recent works on the numerical methods to integrate ordinary differential equations (ODE), which are specially designed for parallel, vector, and/or multi-processor-unit(PU) computers. The first is the Picard-Chebyshev method (Fukushima, 1997a). It obtains a global solution of ODE in the form of Chebyshev polynomial of large (> 1000) degree by applying the Picard iteration repeatedly. The iteration converges for smooth problems and/or perturbed dynamics. The method runs around 100-1000 times faster in the vector mode than in the scalar mode of a certain computer with vector processors (Fukushima, 1997b). The second is a parallelization of a symplectic integrator (Saha et al., 1997). It regards the implicit midpoint rules covering thousands of timesteps as large-scale nonlinear equations and solves them by the fixed-point iteration. The method is applicable to Hamiltonian systems and is expected to lead an acceleration factor of around 50 in parallel computers with more than 1000 PUs. The last is a parallelization of the extrapolation method (Ito and Fukushima, 1997). It performs trial integrations in parallel. Also the trial integrations are further accelerated by balancing computational load among PUs by the technique of folding. The method is all-purpose and achieves an acceleration factor of around 3.5 by using several PUs. Finally, we give a perspective on the parallelization of some implicit integrators which require multiple corrections in solving implicit formulas like the implicit Hermitian integrators (Makino and Aarseth, 1992), (Hut et al., 1995) or the implicit symmetric multistep methods (Fukushima, 1998), (Fukushima, 1999).
Noniterative MAP reconstruction using sparse matrix representations.

PubMed

Cao, Guangzhi; Bouman, Charles A; Webb, Kevin J

2009-09-01

We present a method for noniterative maximum a posteriori (MAP) tomographic reconstruction which is based on the use of sparse matrix representations. Our approach is to precompute and store the inverse matrix required for MAP reconstruction. This approach has generally not been used in the past because the inverse matrix is typically large and fully populated (i.e., not sparse). In order to overcome this problem, we introduce two new ideas. The first idea is a novel theory for the lossy source coding of matrix transformations which we refer to as matrix source coding. This theory is based on a distortion metric that reflects the distortions produced in the final matrix-vector product, rather than the distortions in the coded matrix itself. The resulting algorithms are shown to require orthonormal transformations of both the measurement data and the matrix rows and columns before quantization and coding. The second idea is a method for efficiently storing and computing the required orthonormal transformations, which we call a sparse-matrix transform (SMT). The SMT is a generalization of the classical FFT in that it uses butterflies to compute an orthonormal transform; but unlike an FFT, the SMT uses the butterflies in an irregular pattern, and is numerically designed to best approximate the desired transforms. We demonstrate the potential of the noniterative MAP reconstruction with examples from optical tomography. The method requires offline computation to encode the inverse transform. However, once these offline computations are completed, the noniterative MAP algorithm is shown to reduce both storage and computation by well over two orders of magnitude, as compared to a linear iterative reconstruction methods.
Manipulation of dielectric Rayleigh particles using highly focused elliptically polarized vector fields.

PubMed

Gu, Bing; Xu, Danfeng; Rui, Guanghao; Lian, Meng; Cui, Yiping; Zhan, Qiwen

2015-09-20

Generation of vectorial optical fields with arbitrary polarization distribution is of great interest in areas where exotic optical fields are desired. In this work, we experimentally demonstrate the versatile generation of linearly polarized vector fields, elliptically polarized vector fields, and circularly polarized vortex beams through introducing attenuators in a common-path interferometer. By means of Richards-Wolf vectorial diffraction method, the characteristics of the highly focused elliptically polarized vector fields are studied. The optical force and torque on a dielectric Rayleigh particle produced by these tightly focused vector fields are calculated and exploited for the stable trapping of dielectric Rayleigh particles. It is shown that the additional degree of freedom provided by the elliptically polarized vector field allows one to control the spatial structure of polarization, to engineer the focusing field, and to tailor the optical force and torque on a dielectric Rayleigh particle.
Adaptive track scheduling to optimize concurrency and vectorization in GeantV

DOE PAGES

Apostolakis, J.; Bandieramonte, M.; Bitzes, G.; ...

2015-05-22

The GeantV project is focused on the R&D of new particle transport techniques to maximize parallelism on multiple levels, profiting from the use of both SIMD instructions and co-processors for the CPU-intensive calculations specific to this type of applications. In our approach, vectors of tracks belonging to multiple events and matching different locality criteria must be gathered and dispatched to algorithms having vector signatures. While the transport propagates tracks and changes their individual states, data locality becomes harder to maintain. The scheduling policy has to be changed to maintain efficient vectors while keeping an optimal level of concurrency. The modelmore » has complex dynamics requiring tuning the thresholds to switch between the normal regime and special modes, i.e. prioritizing events to allow flushing memory, adding new events in the transport pipeline to boost locality, dynamically adjusting the particle vector size or switching between vector to single track mode when vectorization causes only overhead. Lastly, this work requires a comprehensive study for optimizing these parameters to make the behaviour of the scheduler self-adapting, presenting here its initial results.« less
Scalable ion-photon quantum interface based on integrated diffractive mirrors

NASA Astrophysics Data System (ADS)

Ghadimi, Moji; BlÅ«ms, Valdis; Norton, Benjamin G.; Fisher, Paul M.; Connell, Steven C.; Amini, Jason M.; Volin, Curtis; Hayden, Harley; Pai, Chien-Shing; Kielpinski, David; Lobino, Mirko; Streed, Erik W.

2017-12-01

Quantum networking links quantum processors through remote entanglement for distributed quantum information processing and secure long-range communication. Trapped ions are a leading quantum information processing platform, having demonstrated universal small-scale processors and roadmaps for large-scale implementation. Overall rates of ion-photon entanglement generation, essential for remote trapped ion entanglement, are limited by coupling efficiency into single mode fibers and scaling to many ions. Here, we show a microfabricated trap with integrated diffractive mirrors that couples 4.1(6)% of the fluorescence from a 174Yb+ ion into a single mode fiber, nearly triple the demonstrated bulk optics efficiency. The integrated optic collects 5.8(8)% of the π transition fluorescence, images the ion with sub-wavelength resolution, and couples 71(5)% of the collected light into the fiber. Our technology is suitable for entangling multiple ions in parallel and overcomes mode quality limitations of existing integrated optical interconnects.
Benchmark tests on the digital equipment corporation Alpha AXP 21164-based AlphaServer 8400, including a comparison of optimized vector and superscalar processing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wasserman, H.J.

1996-02-01

The second generation of the Digital Equipment Corp. (DEC) DECchip Alpha AXP microprocessor is referred to as the 21164. From the viewpoint of numerically-intensive computing, the primary difference between it and its predecessor, the 21064, is that the 21164 has twice the multiply/add throughput per clock period (CP), a maximum of two floating point operations (FLOPS) per CP vs. one for 21064. The AlphaServer 8400 is a shared-memory multiprocessor server system that can accommodate up to 12 CPUs and up to 14 GB of memory. In this report we will compare single processor performance of the 8400 system with thatmore » of the International Business Machines Corp. (IBM) RISC System/6000 POWER-2 microprocessor running at 66 MHz, the Silicon Graphics, Inc. (SGI) MIPS R8000 microprocessor running at 75 MHz, and the Cray Research, Inc. CRAY J90. The performance comparison is based on a set of Fortran benchmark codes that represent a portion of the Los Alamos National Laboratory supercomputer workload. The advantage of using these codes, is that the codes also span a wide range of computational characteristics, such as vectorizability, problem size, and memory access pattern. The primary disadvantage of using them is that detailed, quantitative analysis of performance behavior of all codes on all machines is difficult. One important addition to the benchmark set appears for the first time in this report. Whereas the older version was written for a vector processor, the newer version is more optimized for microprocessor architectures. Therefore, we have for the first time, an opportunity to measure performance on a single application using implementations that expose the respective strengths of vector and superscalar architecture. All results in this report are from single processors. A subsequent article will explore shared-memory multiprocessing performance of the 8400 system.« less
Sen2Cor for Sentinel-2

NASA Astrophysics Data System (ADS)

Main-Knorn, Magdalena; Pflug, Bringfried; Louis, Jerome; Debaecker, Vincent; Müller-Wilm, Uwe; Gascon, Ferran

2017-10-01

In the frame of the Copernicus programme, ESA has developed and launched the Sentinel-2 optical imaging mission that delivers optical data products designed to feed downstream services mainly related to land monitoring, emergency management and security. The Sentinel-2 mission is the constellation of two polar orbiting satellites Sentinel-2A and Sentinel-2B, each one equipped with an optical imaging sensor MSI (Multi-Spectral Instrument). Sentinel-2A was launched on June 23rd, 2015 and Sentinel-2B followed on March 7th, 2017. With the beginning of the operational phase the constellation of both satellites enable image acquisition over the same area every 5 days or less. To use unique potential of the Sentinel-2 data for land applications and ensure the highest quality of scientific exploitation, accurate correction of satellite images for atmospheric effects is required. Therefore the atmospheric correction processor Sen2Cor was developed by Telespazio VEGA Deutschland GmbH on behalf of ESA. Sen2Cor is a Level-2A processor which main purpose is to correct single-date Sentinel-2 Level-1C Top-Of-Atmosphere (TOA) products from the effects of the atmosphere in order to deliver a Level-2A Bottom-Of-Atmosphere (BOA) reflectance product. Additional outputs are an Aerosol Optical Thickness (AOT) map, a Water Vapour (WV) map and a Scene Classification (SCL) map with Quality Indicators for cloud and snow probabilities. Telespazio France and DLR have teamed up in order to provide the calibration and validation of the Sen2Cor processor. Here we provide an overview over the Sentinel-2 data, processor and products. It presents some processing examples of Sen2Cor applied to Sentinel-2 data, provides up-to-date information about the Sen2Cor release status and recent validation results at the time of the SPIE Remote Sensing 2017.
Topological nature of nonlinear optical effects in solids.

PubMed

Morimoto, Takahiro; Nagaosa, Naoto

2016-05-01

There are a variety of nonlinear optical effects including higher harmonic generations, photovoltaic effects, and nonlinear Kerr rotations. They are realized by strong light irradiation to materials that results in nonlinear polarizations in the electric field. These are of great importance in studying the physics of excited states of the system as well as for applications to optical devices and solar cells. Nonlinear properties of materials are usually described by nonlinear susceptibilities, which have complex expressions including many matrix elements and energy denominators. On the other hand, a nonequilibrium steady state under an electric field periodic in time has a concise description in terms of the Floquet bands of electrons dressed by photons. We show theoretically, using the Floquet formalism, that various nonlinear optical effects, such as the shift current in noncentrosymmetric materials, photovoltaic Hall response, and photo-induced change of order parameters under the continuous irradiation of monochromatic light, can be described in a unified fashion by topological quantities involving the Berry connection and Berry curvature. We found that vector fields defined with the Berry connections in the space of momentum and/or parameters govern the nonlinear responses. This topological view offers a route to designing nonlinear optical materials.
Investigations on birefringence effects in polymer optical fiber Bragg gratings

NASA Astrophysics Data System (ADS)

Hu, X.; Sáez-Rodríguez, D.; Bang, O.; Webb, D. J.; Caucheteur, C.

2014-05-01

Step-index polymer optical fiber Bragg gratings (POFBGs) and microstructured polymer optical fiber Bragg gratings (mPOFBGs) present several attractive features, especially for sensing purposes. In comparison to FBGs written in silica fibers, they are more sensitive to temperature and pressure because of the larger thermo-optic coefficient and smaller Young's modulus of polymer materials. (M)POFBGs are most often photowritten in poly(methylmethacrylate) (PMMA) materials using a continuous-wave 325 nm HeCd laser. For the first time to the best of our knowledge, we study photoinduced birefringence effects in (m)POFBGs. To achieve this, highly reflective gratings were inscribed with the phase mask technique. They were then monitored in transmission with polarized light. For this, (m)POF sections a few cm in length containing the gratings were glued to angled silica fibers. Polarization dependent loss (PDL) and differential group delay (DGD) were computed from the Jones matrix eigenanalysis using an optical vector analyser. Maximum values exceeding several dB and a few picoseconds were obtained for the PDL and DGD, respectively. The response to lateral force was finally investigated. As it induces birefringence in addition to the photo-induced one, an increase of the PDL and DGD values were noticed.
Topological nature of nonlinear optical effects in solids

PubMed Central

Morimoto, Takahiro; Nagaosa, Naoto

2016-01-01

There are a variety of nonlinear optical effects including higher harmonic generations, photovoltaic effects, and nonlinear Kerr rotations. They are realized by strong light irradiation to materials that results in nonlinear polarizations in the electric field. These are of great importance in studying the physics of excited states of the system as well as for applications to optical devices and solar cells. Nonlinear properties of materials are usually described by nonlinear susceptibilities, which have complex expressions including many matrix elements and energy denominators. On the other hand, a nonequilibrium steady state under an electric field periodic in time has a concise description in terms of the Floquet bands of electrons dressed by photons. We show theoretically, using the Floquet formalism, that various nonlinear optical effects, such as the shift current in noncentrosymmetric materials, photovoltaic Hall response, and photo-induced change of order parameters under the continuous irradiation of monochromatic light, can be described in a unified fashion by topological quantities involving the Berry connection and Berry curvature. We found that vector fields defined with the Berry connections in the space of momentum and/or parameters govern the nonlinear responses. This topological view offers a route to designing nonlinear optical materials. PMID:27386523
Visualization of x-ray computer tomography using computer-generated holography

NASA Astrophysics Data System (ADS)

Daibo, Masahiro; Tayama, Norio

1998-09-01

The theory converted from x-ray projection data to the hologram directly by combining the computer tomography (CT) with the computer generated hologram (CGH), is proposed. The purpose of this study is to offer the theory for realizing the all- electronic and high-speed seeing through 3D visualization system, which is for the application to medical diagnosis and non- destructive testing. First, the CT is expressed using the pseudo- inverse matrix which is obtained by the singular value decomposition. CGH is expressed in the matrix style. Next, `projection to hologram conversion' (PTHC) matrix is calculated by the multiplication of phase matrix of CGH with pseudo-inverse matrix of the CT. Finally, the projection vector is converted to the hologram vector directly, by multiplication of the PTHC matrix with the projection vector. Incorporating holographic analog computation into CT reconstruction, it becomes possible that the calculation amount is drastically reduced. We demonstrate the CT cross section which is reconstituted by He-Ne laser in the 3D space from the real x-ray projection data acquired by x-ray television equipment, using our direct conversion technique.
Parallel Semi-Implicit Spectral Element Atmospheric Model

NASA Astrophysics Data System (ADS)

Fournier, A.; Thomas, S.; Loft, R.

2001-05-01

The shallow-water equations (SWE) have long been used to test atmospheric-modeling numerical methods. The SWE contain essential wave-propagation and nonlinear effects of more complete models. We present a semi-implicit (SI) improvement of the Spectral Element Atmospheric Model to solve the SWE (SEAM, Taylor et al. 1997, Fournier et al. 2000, Thomas & Loft 2000). SE methods are h-p finite element methods combining the geometric flexibility of size-h finite elements with the accuracy of degree-p spectral methods. Our work suggests that exceptional parallel-computation performance is achievable by a General-Circulation-Model (GCM) dynamical core, even at modest climate-simulation resolutions (>1o). The code derivation involves weak variational formulation of the SWE, Gauss(-Lobatto) quadrature over the collocation points, and Legendre cardinal interpolators. Appropriate weak variation yields a symmetric positive-definite Helmholtz operator. To meet the Ladyzhenskaya-Babuska-Brezzi inf-sup condition and avoid spurious modes, we use a staggered grid. The SI scheme combines leapfrog and Crank-Nicholson schemes for the nonlinear and linear terms respectively. The localization of operations to elements ideally fits the method to cache-based microprocessor computer architectures --derivatives are computed as collections of small (8x8), naturally cache-blocked matrix-vector products. SEAM also has desirable boundary-exchange communication, like finite-difference models. Timings on on the IBM SP and Compaq ES40 supercomputers indicate that the SI code (20-min timestep) requires 1/3 the CPU time of the explicit code (2-min timestep) for T42 resolutions. Both codes scale nearly linearly out to 400 processors. We achieved single-processor performance up to 30% of peak for both codes on the 375-MHz IBM Power-3 processors. Fast computation and linear scaling lead to a useful climate-simulation dycore only if enough model time is computed per unit wall-clock time. An efficient SI solver is essential to substantially increase this rate. Parallel preconditioning for an iterative conjugate-gradient elliptic solver is described. We are building a GCM dycore capable of 200 GF% lOPS sustained performance on clustered RISC/cache architectures using hybrid MPI/OpenMP programming.
Parallel ICA and its hardware implementation in hyperspectral image analysis

NASA Astrophysics Data System (ADS)

Du, Hongtao; Qi, Hairong; Peterson, Gregory D.

2004-04-01

Advances in hyperspectral images have dramatically boosted remote sensing applications by providing abundant information using hundreds of contiguous spectral bands. However, the high volume of information also results in excessive computation burden. Since most materials have specific characteristics only at certain bands, a lot of these information is redundant. This property of hyperspectral images has motivated many researchers to study various dimensionality reduction algorithms, including Projection Pursuit (PP), Principal Component Analysis (PCA), wavelet transform, and Independent Component Analysis (ICA), where ICA is one of the most popular techniques. It searches for a linear or nonlinear transformation which minimizes the statistical dependence between spectral bands. Through this process, ICA can eliminate superfluous but retain practical information given only the observations of hyperspectral images. One hurdle of applying ICA in hyperspectral image (HSI) analysis, however, is its long computation time, especially for high volume hyperspectral data sets. Even the most efficient method, FastICA, is a very time-consuming process. In this paper, we present a parallel ICA (pICA) algorithm derived from FastICA. During the unmixing process, pICA divides the estimation of weight matrix into sub-processes which can be conducted in parallel on multiple processors. The decorrelation process is decomposed into the internal decorrelation and the external decorrelation, which perform weight vector decorrelations within individual processors and between cooperative processors, respectively. In order to further improve the performance of pICA, we seek hardware solutions in the implementation of pICA. Until now, there are very few hardware designs for ICA-related processes due to the complicated and iterant computation. This paper discusses capacity limitation of FPGA implementations for pICA in HSI analysis. A synthesis of Application-Specific Integrated Circuit (ASIC) is designed for pICA-based dimensionality reduction in HSI analysis. The pICA design is implemented using standard-height cells and aimed at TSMC 0.18 micron process. During the synthesis procedure, three ICA-related reconfigurable components are developed for the reuse and retargeting purpose. Preliminary results show that the standard-height cell based ASIC synthesis provide an effective solution for pICA and ICA-related processes in HSI analysis.
Refractive index inversion based on Mueller matrix method

NASA Astrophysics Data System (ADS)

Fan, Huaxi; Wu, Wenyuan; Huang, Yanhua; Li, Zhaozhao

2016-03-01

Based on Stokes vector and Jones vector, the correlation between Mueller matrix elements and refractive index was studied with the result simplified, and through Mueller matrix way, the expression of refractive index inversion was deduced. The Mueller matrix elements, under different incident angle, are simulated through the expression of specular reflection so as to analyze the influence of the angle of incidence and refractive index on it, which is verified through the measure of the Mueller matrix elements of polished metal surface. Research shows that, under the condition of specular reflection, the result of Mueller matrix inversion is consistent with the experiment and can be used as an index of refraction of inversion method, and it provides a new way for target detection and recognition technology.
Improved multi-stage neonatal seizure detection using a heuristic classifier and a data-driven post-processor.

PubMed

Ansari, A H; Cherian, P J; Dereymaeker, A; Matic, V; Jansen, K; De Wispelaere, L; Dielman, C; Vervisch, J; Swarte, R M; Govaert, P; Naulaers, G; De Vos, M; Van Huffel, S

2016-09-01

After identifying the most seizure-relevant characteristics by a previously developed heuristic classifier, a data-driven post-processor using a novel set of features is applied to improve the performance. The main characteristics of the outputs of the heuristic algorithm are extracted by five sets of features including synchronization, evolution, retention, segment, and signal features. Then, a support vector machine and a decision making layer remove the falsely detected segments. Four datasets including 71 neonates (1023h, 3493 seizures) recorded in two different university hospitals, are used to train and test the algorithm without removing the dubious seizures. The heuristic method resulted in a false alarm rate of 3.81 per hour and good detection rate of 88% on the entire test databases. The post-processor, effectively reduces the false alarm rate by 34% while the good detection rate decreases by 2%. This post-processing technique improves the performance of the heuristic algorithm. The structure of this post-processor is generic, improves our understanding of the core visually determined EEG features of neonatal seizures and is applicable for other neonatal seizure detectors. The post-processor significantly decreases the false alarm rate at the expense of a small reduction of the good detection rate. Copyright © 2016 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved.
Signal processing applications of massively parallel charge domain computing devices

NASA Technical Reports Server (NTRS)

Fijany, Amir (Inventor); Barhen, Jacob (Inventor); Toomarian, Nikzad (Inventor)

1999-01-01

The present invention is embodied in a charge coupled device (CCD)/charge injection device (CID) architecture capable of performing a Fourier transform by simultaneous matrix vector multiplication (MVM) operations in respective plural CCD/CID arrays in parallel in O(1) steps. For example, in one embodiment, a first CCD/CID array stores charge packets representing a first matrix operator based upon permutations of a Hartley transform and computes the Fourier transform of an incoming vector. A second CCD/CID array stores charge packets representing a second matrix operator based upon different permutations of a Hartley transform and computes the Fourier transform of an incoming vector. The incoming vector is applied to the inputs of the two CCD/CID arrays simultaneously, and the real and imaginary parts of the Fourier transform are produced simultaneously in the time required to perform a single MVM operation in a CCD/CID array.
Liquid lens: advances in adaptive optics

NASA Astrophysics Data System (ADS)

Casey, Shawn Patrick

2010-12-01

'Liquid lens' technologies promise significant advancements in machine vision and optical communications systems. Adaptations for machine vision, human vision correction, and optical communications are used to exemplify the versatile nature of this technology. Utilization of liquid lens elements allows the cost effective implementation of optical velocity measurement. The project consists of a custom image processor, camera, and interface. The images are passed into customized pattern recognition and optical character recognition algorithms. A single camera would be used for both speed detection and object recognition.
The characteristics of grating structure in magnetic field measurements based on polarization properties of fiber gratings

NASA Astrophysics Data System (ADS)

Su, Yang; Peng, Hui; Feng, Kui; Li, Yu-quan

2009-11-01

In this paper the characteristics of grating structure in magnetic field measurements based on differential group delay of fiber gratings are analyzed. Theoretical simulations are realized using the coupled-mode theory and transfer matrix method. The effects of grating parameters of uniform Bragg grating on measurement range and sensitivity are analyzed. The impacts of chirped, phase-shifted and apodized gratings on DGD peak values are also monitored. FBG transmitted spectrums and DGD spectrums are recorded by means of an optical vector analyzer (OVA). Both the simulations and experiments demonstrate that the phase-shifted gratings can obviously improve the sensitivity.
Subbarrier absorption in a stationary superlattice

NASA Technical Reports Server (NTRS)

Arutyunyan, G. M.; Nerkararyan, K. V.

1984-01-01

The calculation of the interband absorption coefficient was carried out in the classical case, when the frequency of light was assumed to bind two miniband subbarrier states of different bands. The influence of two dimensional Mott excitons on this absorption was studied and a comparison was made with the experiment. All of these considerations were done taking into account the photon wave vector (the phase spatial heterogeneity). The basic traits of the energy spectra of superlattice semiconductors, their kinetic and optical properties, and possible means of electromagnetic wave intensification were examined. By the density matrix method, a theory of electrical and electromagnetic properties of superlattices was suggested.

Vector optical activity in the Weyl semimetal TaAs

DOE PAGES

Norman, M. R.

2015-12-15

Here, it is shown that the Weyl semimetal TaAs can have a significant polar vector contribution to its optical activity. This is quantified by ab initio calculations of the resonant x-ray diffraction at the Ta L1 edge. For the Bragg vector (400), this polar vector contribution to the circular intensity differential between left and right polarized x-rays is predicted to be comparable to that arising from linear dichroism. Implications this result has in regards to optical effects predicted for topological Weyl semimetals are discussed.
Multichannel signal enhancement

DOEpatents

Lewis, Paul S.

1990-01-01

A mixed adaptive filter is formulated for the signal processing problem where desired a priori signal information is not available. The formulation generates a least squares problem which enables the filter output to be calculated directly from an input data matrix. In one embodiment, a folded processor array enables bidirectional data flow to solve the recursive problem by back substitution without global communications. In another embodiment, a balanced processor array solves the recursive problem by forward elimination through the array. In a particular application to magnetoencephalography, the mixed adaptive filter enables an evoked response to an auditory stimulus to be identified from only a single trial.
Communication requirements of sparse Cholesky factorization with nested dissection ordering

NASA Technical Reports Server (NTRS)

Naik, Vijay K.; Patrick, Merrell L.

1989-01-01

Load distribution schemes for minimizing the communication requirements of the Cholesky factorization of dense and sparse, symmetric, positive definite matrices on multiprocessor systems are presented. The total data traffic in factoring an n x n sparse symmetric positive definite matrix representing an n-vertex regular two-dimensional grid graph using n exp alpha, alpha not greater than 1, processors are shown to be O(n exp 1 + alpha/2). It is O(n), when n exp alpha, alpha not smaller than 1, processors are used. Under the conditions of uniform load distribution, these results are shown to be asymptotically optimal.
SPAR data set contents. [finite element structural analysis system

NASA Technical Reports Server (NTRS)

Cunningham, S. W.

1981-01-01

The contents of the stored data sets of the SPAR (space processing applications rocket) finite element structural analysis system are documented. The data generated by each of the system's processors are stored in a data file organized as a library. Each data set, containing a two-dimensional table or matrix, is identified by a four-word name listed in a table of contents. The creating SPAR processor, number of rows and columns, and definitions of each of the data items are listed for each data set. An example SPAR problem using these data sets is also presented.
Efficacy of Code Optimization on Cache-based Processors

NASA Technical Reports Server (NTRS)

VanderWijngaart, Rob F.; Chancellor, Marisa K. (Technical Monitor)

1997-01-01

The current common wisdom in the U.S. is that the powerful, cost-effective supercomputers of tomorrow will be based on commodity (RISC) micro-processors with cache memories. Already, most distributed systems in the world use such hardware as building blocks. This shift away from vector supercomputers and towards cache-based systems has brought about a change in programming paradigm, even when ignoring issues of parallelism. Vector machines require inner-loop independence and regular, non-pathological memory strides (usually this means: non-power-of-two strides) to allow efficient vectorization of array operations. Cache-based systems require spatial and temporal locality of data, so that data once read from main memory and stored in high-speed cache memory is used optimally before being written back to main memory. This means that the most cache-friendly array operations are those that feature zero or unit stride, so that each unit of data read from main memory (a cache line) contains information for the next iteration in the loop. Moreover, loops ought to be 'fat', meaning that as many operations as possible are performed on cache data-provided instruction caches do not overflow and enough registers are available. If unit stride is not possible, for example because of some data dependency, then care must be taken to avoid pathological strides, just ads on vector computers. For cache-based systems the issues are more complex, due to the effects of associativity and of non-unit block (cache line) size. But there is more to the story. Most modern micro-processors are superscalar, which means that they can issue several (arithmetic) instructions per clock cycle, provided that there are enough independent instructions in the loop body. This is another argument for providing fat loop bodies. With these restrictions, it appears fairly straightforward to produce code that will run efficiently on any cache-based system. It can be argued that although some of the important computational algorithms employed at NASA Ames require different programming styles on vector machines and cache-based machines, respectively, neither architecture class appeared to be favored by particular algorithms in principle. Practice tells us that the situation is more complicated. This report presents observations and some analysis of performance tuning for cache-based systems. We point out several counterintuitive results that serve as a cautionary reminder that memory accesses are not the only factors that determine performance, and that within the class of cache-based systems, significant differences exist.
Effective correlator for RadioAstron project

NASA Astrophysics Data System (ADS)

Sergeev, Sergey

This paper presents the implementation of programme FX-correlator for Very Long Baseline Interferometry, adapted for the project "RadioAstron". Software correlator implemented for heterogeneous computing systems using graphics accelerators. It is shown that for the task interferometry implementation of the graphics hardware has a high efficiency. The host processor of heterogeneous computing system, performs the function of forming the data flow for graphics accelerators, the number of which corresponds to the number of frequency channels. So, for the Radioastron project, such channels is seven. Each accelerator is perform correlation matrix for all bases for a single frequency channel. Initial data is converted to the floating-point format, is correction for the corresponding delay function and computes the entire correlation matrix simultaneously. Calculation of the correlation matrix is performed using the sliding Fourier transform. Thus, thanks to the compliance of a solved problem for architecture graphics accelerators, managed to get a performance for one processor platform Kepler, which corresponds to the performance of this task, the computing cluster platforms Intel on four nodes. This task successfully scaled not only on a large number of graphics accelerators, but also on a large number of nodes with multiple accelerators.
System-on-chip architecture and validation for real-time transceiver optimization: APC implementation on FPGA

NASA Astrophysics Data System (ADS)

Suarez, Hernan; Zhang, Yan R.

2015-05-01

New radar applications need to perform complex algorithms and process large quantity of data to generate useful information for the users. This situation has motivated the search for better processing solutions that include low power high-performance processors, efficient algorithms, and high-speed interfaces. In this work, hardware implementation of adaptive pulse compression for real-time transceiver optimization are presented, they are based on a System-on-Chip architecture for Xilinx devices. This study also evaluates the performance of dedicated coprocessor as hardware accelerator units to speed up and improve the computation of computing-intensive tasks such matrix multiplication and matrix inversion which are essential units to solve the covariance matrix. The tradeoffs between latency and hardware utilization are also presented. Moreover, the system architecture takes advantage of the embedded processor, which is interconnected with the logic resources through the high performance AXI buses, to perform floating-point operations, control the processing blocks, and communicate with external PC through a customized software interface. The overall system functionality is demonstrated and tested for real-time operations using a Ku-band tested together with a low-cost channel emulator for different types of waveforms.
SPECIAL ISSUE ON OPTICAL PROCESSING OF INFORMATION: Reconstruction of vector physical fields by optical tomography

NASA Astrophysics Data System (ADS)

Kulchin, Yurii N.; Vitrik, O. B.; Kamenev, O. T.; Kirichenko, O. V.; Petrov, Yu S.

1995-10-01

Reconstruction of vector physical fields by optical tomography, with the aid of a system of fibre-optic measuring lines, is considered. The reported experimental results are used to reconstruct the distribution of the square of the gradient of transverse displacements of a flat membrane.
An efficient and portable SIMD algorithm for charge/current deposition in Particle-In-Cell codes

DOE PAGES

Vincenti, H.; Lobet, M.; Lehe, R.; ...

2016-09-19

In current computer architectures, data movement (from die to network) is by far the most energy consuming part of an algorithm (≈20pJ/word on-die to ≈10,000 pJ/word on the network). To increase memory locality at the hardware level and reduce energy consumption related to data movement, future exascale computers tend to use many-core processors on each compute nodes that will have a reduced clock speed to allow for efficient cooling. To compensate for frequency decrease, machine vendors are making use of long SIMD instruction registers that are able to process multiple data with one arithmetic operator in one clock cycle. SIMD registermore » length is expected to double every four years. As a consequence, Particle-In-Cell (PIC) codes will have to achieve good vectorization to fully take advantage of these upcoming architectures. In this paper, we present a new algorithm that allows for efficient and portable SIMD vectorization of current/charge deposition routines that are, along with the field gathering routines, among the most time consuming parts of the PIC algorithm. Our new algorithm uses a particular data structure that takes into account memory alignment constraints and avoids gather/scat;ter instructions that can significantly affect vectorization performances on current CPUs. The new algorithm was successfully implemented in the 3D skeleton PIC code PICSAR and tested on Haswell Xeon processors (AVX2-256 bits wide data registers). Results show a factor of ×2 to ×2.5 speed-up in double precision for particle shape factor of orders 1–3. The new algorithm can be applied as is on future KNL (Knights Landing) architectures that will include AVX-512 instruction sets with 512 bits register lengths (8 doubles/16 singles). Program summary Program Title: vec_deposition Program Files doi:http://dx.doi.org/10.17632/nh77fv9k8c.1 Licensing provisions: BSD 3-Clause Programming language: Fortran 90 External routines/libraries: OpenMP > 4.0 Nature of problem: Exascale architectures will have many-core processors per node with long vector data registers capable of performing one single instruction on multiple data during one clock cycle. Data register lengths are expected to double every four years and this pushes for new portable solutions for efficiently vectorizing Particle-In-Cell codes on these future many-core architectures. One of the main hotspot routines of the PIC algorithm is the current/charge deposition for which there is no efficient and portable vector algorithm. Solution method: Here we provide an efficient and portable vector algorithm of current/charge deposition routines that uses a new data structure, which significantly reduces gather/scatter operations. Vectorization is controlled using OpenMP 4.0 compiler directives for vectorization which ensures portability across different architectures. Restrictions: Here we do not provide the full PIC algorithm with an executable but only vector routines for current/charge deposition. These scalar/vector routines can be used as library routines in your 3D Particle-In-Cell code. However, to get the best performances out of vector routines you have to satisfy the two following requirements: (1) Your code should implement particle tiling (as explained in the manuscript) to allow for maximized cache reuse and reduce memory accesses that can hinder vector performances. The routines can be used directly on each particle tile. (2) You should compile your code with a Fortran 90 compiler (e.g Intel, gnu or cray) and provide proper alignment flags and compiler alignment directives (more details in README file).« less
An efficient and portable SIMD algorithm for charge/current deposition in Particle-In-Cell codes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Vincenti, H.; Lobet, M.; Lehe, R.

In current computer architectures, data movement (from die to network) is by far the most energy consuming part of an algorithm (≈20pJ/word on-die to ≈10,000 pJ/word on the network). To increase memory locality at the hardware level and reduce energy consumption related to data movement, future exascale computers tend to use many-core processors on each compute nodes that will have a reduced clock speed to allow for efficient cooling. To compensate for frequency decrease, machine vendors are making use of long SIMD instruction registers that are able to process multiple data with one arithmetic operator in one clock cycle. SIMD registermore » length is expected to double every four years. As a consequence, Particle-In-Cell (PIC) codes will have to achieve good vectorization to fully take advantage of these upcoming architectures. In this paper, we present a new algorithm that allows for efficient and portable SIMD vectorization of current/charge deposition routines that are, along with the field gathering routines, among the most time consuming parts of the PIC algorithm. Our new algorithm uses a particular data structure that takes into account memory alignment constraints and avoids gather/scat;ter instructions that can significantly affect vectorization performances on current CPUs. The new algorithm was successfully implemented in the 3D skeleton PIC code PICSAR and tested on Haswell Xeon processors (AVX2-256 bits wide data registers). Results show a factor of ×2 to ×2.5 speed-up in double precision for particle shape factor of orders 1–3. The new algorithm can be applied as is on future KNL (Knights Landing) architectures that will include AVX-512 instruction sets with 512 bits register lengths (8 doubles/16 singles). Program summary Program Title: vec_deposition Program Files doi:http://dx.doi.org/10.17632/nh77fv9k8c.1 Licensing provisions: BSD 3-Clause Programming language: Fortran 90 External routines/libraries: OpenMP > 4.0 Nature of problem: Exascale architectures will have many-core processors per node with long vector data registers capable of performing one single instruction on multiple data during one clock cycle. Data register lengths are expected to double every four years and this pushes for new portable solutions for efficiently vectorizing Particle-In-Cell codes on these future many-core architectures. One of the main hotspot routines of the PIC algorithm is the current/charge deposition for which there is no efficient and portable vector algorithm. Solution method: Here we provide an efficient and portable vector algorithm of current/charge deposition routines that uses a new data structure, which significantly reduces gather/scatter operations. Vectorization is controlled using OpenMP 4.0 compiler directives for vectorization which ensures portability across different architectures. Restrictions: Here we do not provide the full PIC algorithm with an executable but only vector routines for current/charge deposition. These scalar/vector routines can be used as library routines in your 3D Particle-In-Cell code. However, to get the best performances out of vector routines you have to satisfy the two following requirements: (1) Your code should implement particle tiling (as explained in the manuscript) to allow for maximized cache reuse and reduce memory accesses that can hinder vector performances. The routines can be used directly on each particle tile. (2) You should compile your code with a Fortran 90 compiler (e.g Intel, gnu or cray) and provide proper alignment flags and compiler alignment directives (more details in README file).« less
System and method employing a minimum distance and a load feature database to identify electric load types of different electric loads

DOEpatents

Lu, Bin; Yang, Yi; Sharma, Santosh K; Zambare, Prachi; Madane, Mayura A

2014-12-23

A method identifies electric load types of a plurality of different electric loads. The method includes providing a load feature database of a plurality of different electric load types, each of the different electric load types including a first load feature vector having at least four different load features; sensing a voltage signal and a current signal for each of the different electric loads; determining a second load feature vector comprising at least four different load features from the sensed voltage signal and the sensed current signal for a corresponding one of the different electric loads; and identifying by a processor one of the different electric load types by determining a minimum distance of the second load feature vector to the first load feature vector of the different electric load types of the load feature database.
System and method employing a self-organizing map load feature database to identify electric load types of different electric loads

DOEpatents

Lu, Bin; Harley, Ronald G.; Du, Liang; Yang, Yi; Sharma, Santosh K.; Zambare, Prachi; Madane, Mayura A.

2014-06-17

A method identifies electric load types of a plurality of different electric loads. The method includes providing a self-organizing map load feature database of a plurality of different electric load types and a plurality of neurons, each of the load types corresponding to a number of the neurons; employing a weight vector for each of the neurons; sensing a voltage signal and a current signal for each of the loads; determining a load feature vector including at least four different load features from the sensed voltage signal and the sensed current signal for a corresponding one of the loads; and identifying by a processor one of the load types by relating the load feature vector to the neurons of the database by identifying the weight vector of one of the neurons corresponding to the one of the load types that is a minimal distance to the load feature vector.
Checking the Goldbach conjecture up to 4\\cdot 10^11

NASA Astrophysics Data System (ADS)

Sinisalo, Matti K.

1993-10-01

One of the most studied problems in additive number theory, Goldbach's conjecture, states that every even integer greater than or equal to 4 can be expressed as a sum of two primes. In this paper checking of this conjecture up to 4 \\cdot {10^{11}} by the IBM 3083 mainframe with vector processor is reported.
Sparse matrix-vector multiplication on network-on-chip

NASA Astrophysics Data System (ADS)

Sun, C.-C.; Götze, J.; Jheng, H.-Y.; Ruan, S.-J.

2010-12-01

In this paper, we present an idea for performing matrix-vector multiplication by using Network-on-Chip (NoC) architecture. In traditional IC design on-chip communications have been designed with dedicated point-to-point interconnections. Therefore, regular local data transfer is the major concept of many parallel implementations. However, when dealing with the parallel implementation of sparse matrix-vector multiplication (SMVM), which is the main step of all iterative algorithms for solving systems of linear equation, the required data transfers depend on the sparsity structure of the matrix and can be extremely irregular. Using the NoC architecture makes it possible to deal with arbitrary structure of the data transfers; i.e. with the irregular structure of the sparse matrices. So far, we have already implemented the proposed SMVM-NoC architecture with the size 4×4 and 5×5 in IEEE 754 single float point precision using FPGA.
Research and simulation of the decoupling transformation in AC motor vector control

NASA Astrophysics Data System (ADS)

He, Jiaojiao; Zhao, Zhongjie; Liu, Ken; Zhang, Yongping; Yao, Tuozhong

2018-04-01

Permanent magnet synchronous motor (PMSM) is a nonlinear, strong coupling, multivariable complex object, and transformation decoupling can solve the coupling problem of permanent magnet synchronous motor. This paper gives a permanent magnet synchronous motor (PMSM) mathematical model, introduces the permanent magnet synchronous motor vector control coordinate transformation in the process of modal matrix inductance matrix transform through the matrix related knowledge of different coordinates of diagonalization, which makes the coupling between the independent, realize the control of motor current and excitation the torque current coupling separation, and derived the coordinate transformation matrix, the thought to solve the coupling problem of AC motor. Finally, in the Matlab/Simulink environment, through the establishment and combination between the PMSM ontology, coordinate conversion module, built the simulation model of permanent magnet synchronous motor vector control, introduces the model of each part, and analyzed the simulation results.
Nonlinear force dependence on optically bound micro-particle arrays in the evanescent fields of fundamental and higher order microfibre modes

PubMed Central

Maimaiti, Aili; Holzmann, Daniela; Truong, Viet Giang; Ritsch, Helmut; Nic Chormaic, Síle

2016-01-01

Particles trapped in the evanescent field of an ultrathin optical fibre interact over very long distances via multiple scattering of the fibre-guided fields. In ultrathin fibres that support higher order modes, these interactions are stronger and exhibit qualitatively new behaviour due to the coupling of different fibre modes, which have different propagation wave-vectors, by the particles. Here, we study one dimensional longitudinal optical binding interactions of chains of 3 μm polystyrene spheres under the influence of the evanescent fields of a two-mode microfibre. The observation of long-range interactions, self-ordering and speed variation of particle chains reveals strong optical binding effects between the particles that can be modelled well by a tritter scattering-matrix approach. The optical forces, optical binding interactions and the velocity of bounded particle chains are calculated using this method. Results show good agreement with finite element numerical simulations. Experimental data and theoretical analysis show that higher order modes in a microfibre offer a promising method to not only obtain stable, multiple particle trapping or faster particle propulsion speeds, but that they also allow for better control over each individual trapped object in particle ensembles near the microfibre surface. PMID:27451935
Computer simulations and real-time control of ELT AO systems using graphical processing units

NASA Astrophysics Data System (ADS)

Wang, Lianqi; Ellerbroek, Brent

2012-07-01

The adaptive optics (AO) simulations at the Thirty Meter Telescope (TMT) have been carried out using the efficient, C based multi-threaded adaptive optics simulator (MAOS, http://github.com/lianqiw/maos). By porting time-critical parts of MAOS to graphical processing units (GPU) using NVIDIA CUDA technology, we achieved a 10 fold speed up for each GTX 580 GPU used compared to a modern quad core CPU. Each time step of full scale end to end simulation for the TMT narrow field infrared AO system (NFIRAOS) takes only 0.11 second in a desktop with two GTX 580s. We also demonstrate that the TMT minimum variance reconstructor can be assembled in matrix vector multiply (MVM) format in 8 seconds with 8 GTX 580 GPUs, meeting the TMT requirement for updating the reconstructor. Analysis show that it is also possible to apply the MVM using 8 GTX 580s within the required latency.
Estimation of the chemical rank for the three-way data: a principal norm vector orthogonal projection approach.

PubMed

Hong-Ping, Xie; Jian-Hui, Jiang; Guo-Li, Shen; Ru-Qin, Yu

2002-01-01

A new approach for estimating the chemical rank of the three-way array called the principal norm vector orthogonal projection method has been proposed. The method is based on the fact that the chemical rank of the three-way data array is equal to one of the column space of the unfolded matrix along the spectral or chromatographic mode. A vector with maximum Frobenius norm is selected among all the column vectors of the unfolded matrix as the principal norm vector (PNV). A transformation is conducted for the column vectors with an orthogonal projection matrix formulated by PNV. The mathematical rank of the column space of the residual matrix thus obtained should decrease by one. Such orthogonal projection is carried out repeatedly till the contribution of chemical species to the signal data is all deleted. At this time the decrease of the mathematical rank would equal that of the chemical rank, and the remaining residual subspace would entirely be due to the noise contribution. The chemical rank can be estimated easily by using an F-test. The method has been used successfully to the simulated HPLC-DAD type three-way data array and two real excitation-emission fluorescence data sets of amino acid mixtures and dye mixtures. The simulation with added relatively high level noise shows that the method is robust in resisting the heteroscedastic noise. The proposed algorithm is simple and easy to program with quite light computational burden.
Reverse ray tracing for transformation optics.

PubMed

Hu, Chia-Yu; Lin, Chun-Hung

2015-06-29

Ray tracing is an important technique for predicting optical system performance. In the field of transformation optics, the Hamiltonian equations of motion for ray tracing are well known. The numerical solutions to the Hamiltonian equations of motion are affected by the complexities of the inhomogeneous and anisotropic indices of the optical device. Based on our knowledge, no previous work has been conducted on ray tracing for transformation optics with extreme inhomogeneity and anisotropicity. In this study, we present the use of 3D reverse ray tracing in transformation optics. The reverse ray tracing is derived from Fermat's principle based on a sweeping method instead of finding the full solution to ordinary differential equations. The sweeping method is employed to obtain the eikonal function. The wave vectors are then obtained from the gradient of that eikonal function map in the transformed space to acquire the illuminance. Because only the rays in the points of interest have to be traced, the reverse ray tracing provides an efficient approach to investigate the illuminance of a system. This approach is useful in any form of transformation optics where the material property tensor is a symmetric positive definite matrix. The performance and analysis of three transformation optics with inhomogeneous and anisotropic indices are explored. The ray trajectories and illuminances in these demonstration cases are successfully solved by the proposed reverse ray tracing method.
Effect of Thin Cirrus Clouds on Dust Optical Depth Retrievals From MODIS Observations

NASA Technical Reports Server (NTRS)

Feng, Qian; Hsu, N. Christina; Yang, Ping; Tsay, Si-Chee

2011-01-01

The effect of thin cirrus clouds in retrieving the dust optical depth from MODIS observations is investigated by using a simplified aerosol retrieval algorithm based on the principles of the Deep Blue aerosol property retrieval method. Specifically, the errors of the retrieved dust optical depth due to thin cirrus contamination are quantified through the comparison of two retrievals by assuming dust-only atmospheres and the counterparts with overlapping mineral dust and thin cirrus clouds. To account for the effect of the polarization state of radiation field on radiance simulation, a vector radiative transfer model is used to generate the lookup tables. In the forward radiative transfer simulations involved in generating the lookup tables, the Rayleigh scattering by atmospheric gaseous molecules and the reflection of the surface assumed to be Lambertian are fully taken into account. Additionally, the spheroid model is utilized to account for the nonsphericity of dust particles In computing their optical properties. For simplicity, the single-scattering albedo, scattering phase matrix, and optical depth are specified a priori for thin cirrus clouds assumed to consist of droxtal ice crystals. The present results indicate that the errors in the retrieved dust optical depths due to the contamination of thin cirrus clouds depend on the scattering angle, underlying surface reflectance, and dust optical depth. Under heavy dusty conditions, the absolute errors are comparable to the predescribed optical depths of thin cirrus clouds.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.