Note: This page contains sample records for the topic parallel processor array from Science.gov.
While these samples are representative of the content of Science.gov,
they are not comprehensive nor are they the most current set.
We encourage you to perform a real-time search of Science.gov
to obtain the most current and comprehensive results.
Last update: August 15, 2014.
1

Digital Parallel Processor Array for Optimum Path Planning  

NASA Technical Reports Server (NTRS)

The invention computes the optimum path across a terrain or topology represented by an array of parallel processor cells interconnected between neighboring cells by links extending along different directions to the neighboring cells. Such an array is preferably implemented as a high-speed integrated circuit. The computation of the optimum path is accomplished by, in each cell, receiving stimulus signals from neighboring cells along corresponding directions, determining and storing the identity of a direction along which the first stimulus signal is received, broadcasting a subsequent stimulus signal to the neighboring cells after a predetermined delay time, whereby stimulus signals propagate throughout the array from a starting one of the cells. After propagation of the stimulus signal throughout the array, a master processor traces back from a selected destination cell to the starting cell along an optimum path of the cells in accordance with the identity of the directions stored in each of the cells.

Kremeny, Sabrina E. (Inventor); Fossum, Eric R. (Inventor); Nixon, Robert H. (Inventor)

1996-01-01

2

Multiprocessors and array processors  

SciTech Connect

This book presents the papers given at a conference on supercomputers and array processors. Topics considered at the conference included pipelining, parallel processing, vector processing, computer architecture, cray computers, algorithms, memory devices, computerized simulation, MIMD computers, SIMD processors, array processors in digital image processing applications, and graphic visualization using multiprocessors.

Karplus, W.J.

1987-01-01

3

Parallel processing in a host plus multiple array processor system for radar  

NASA Technical Reports Server (NTRS)

Host plus multiple array processor architecture is demonstrated to yield a modular, fast, and cost-effective system for radar processing. Software methodology for programming such a system is developed. Parallel processing with pipelined data flow among the host, array processors, and discs is implemented. Theoretical analysis of performance is made and experimentally verified. The broad class of problems to which the architecture and methodology can be applied is indicated.

Barkan, B. Z.

1983-01-01

4

VLSI Array processors  

Microsoft Academic Search

High speed signal processing depends critically on parallel processor technology. In most applications, general-purpose parallel computers cannot offer satisfactory real-time processing speed due to severe system overhead. Therefore, for real-time digital signal processing (DSP) systems, special-purpose array processors have become the only appealing alternative. In designing or using such array Processors, most signal processing algorithms share the critical attributes of

S. Kung

1985-01-01

5

Multiprocessors and array processors  

SciTech Connect

This book contains the papers presented at the conference on the subject of parallel processing and supercomputers. They include: Using the distributed array of processors (DAP) in electronic computer-aided design and The architecture of the Convex C240.

Karplus, W.J. (Univ. of California, Los Angeles, CA (US))

1989-01-01

6

Computational Cost of Image Registration with a Parallel Binary Array Processor  

Microsoft Academic Search

The application of a simulated binary array processor (BAP) to the rapid analysis of a sequence of images has been studied. Several algorithms have been developed which may be implemented on many existing parallel processing machines. The characteristic operations of a BAP are discussed and analyzed. A set of preprocessing algorithms are described which are designed to register two images

A. P. Reeves; A. Rostampour

1982-01-01

7

Peripheral array processors  

SciTech Connect

This book contains papers presented at a conference on peripheral array processors. Topics include the following: Interactive Array Processing; Historical Aspects of Array Processors; Computer Architecture; Vibration, Noise and Array Processors; Array Processors for Microcomputers; Missile Simulation with Array Processors; and, Real Time Signal Processing with Array Processors.

Karplus, W.J.

1984-01-01

8

Massively Parallel Processor Array for Mid\\/Back-end Ultrasound Signal Processing  

Microsoft Academic Search

Ultrasound remains a popular imaging modality due to its mobility and cost-e ectiveness. As general purpose computing and DSPs are entering an era of multi-core architec- tures, the potential for parallel performance gains are significant when used properly. This work explores the possibility of using a massively parallel processor array to meet real-time throughputs for mid-\\/back-end ultrasound processing. A many-core

Dean N. Truong; Bevan M. Baas

2010-01-01

9

Seasat synthetic-aperture radar data reduction using parallel programmable array processors  

NASA Technical Reports Server (NTRS)

This paper presents a digital processing system that produces the Seasat synthetic-aperture radar (SAR) imagery. The system consists of a SEL 32/77 host minicomputer and three AP-120B array processors. The partitioning of the SAR processing functions and the design of software modules is described. The rationale for selecting the parallel array processor architecture and the methodology for developing the parallel processing scheme on this system is described. This system attains a Seasat SAR data reduction speed of 2.5 h per 25-m resolution 4-look and 100 km x 100 km image frame. A preliminary performance evaluation of this parallel processing system and potential future applications for remote sensing data reduction are described.

Wu, C.; Barkan, B.; Karplus, W. J.; Caswell, D.

1982-01-01

10

The massively parallel processor  

NASA Technical Reports Server (NTRS)

Future sensor systems will utilize massively parallel computing systems for rapid analysis of two-dimensional data. The Goddard Space Flight Center has an ongoing program to develop these systems. A single-instruction multiple data computer known as the Massively Parallel Processor (MPP) is being fabricated for NASA by the Goodyear Aerospace Corporation. This processor contains 16,384 processing elements arranged in a 128 x 128 array. The MPP will be capable of adding more than 6 billion 8-bit numbers per second. Multiplication of eight-bit numbers can occur at a rate of 2 billion per second. Delivery of the MPP to Goddard Space Flight Center is scheduled for 1983.

Schaefer, D. H.; Fischer, J. R.; Wallgren, K. R.

1980-01-01

11

Massively parallel processor computer  

NASA Technical Reports Server (NTRS)

An apparatus for processing multidimensional data with strong spatial characteristics, such as raw image data, characterized by a large number of parallel data streams in an ordered array is described. It comprises a large number (e.g., 16,384 in a 128 x 128 array) of parallel processing elements operating simultaneously and independently on single bit slices of a corresponding array of incoming data streams under control of a single set of instructions. Each of the processing elements comprises a bidirectional data bus in communication with a register for storing single bit slices together with a random access memory unit and associated circuitry, including a binary counter/shift register device, for performing logical and arithmetical computations on the bit slices, and an I/O unit for interfacing the bidirectional data bus with the data stream source. The massively parallel processor architecture enables very high speed processing of large amounts of ordered parallel data, including spatial translation by shifting or sliding of bits vertically or horizontally to neighboring processing elements.

Fung, L. W. (inventor)

1983-01-01

12

Computational cost of image registration with a parallel binary array processor.  

PubMed

The application of a simulated binary array processor (BAP) to the rapid analysis of a sequence of images has been studied. Several algorithms have been developed which may be implemented on many existing parallel processing machines. The characteristic operations of a BAP are discussed and analyzed. A set of preprocessing algorithms are described which are designed to register two images of TV-type video data in real time. These algorithms illustrate the potential uses of a BAP and their cost is analyzed in detail. The results of applying these algorithms to FLIR data and to noisy optical data are given. An analysis of these algorithms illustrates the importance of an efficient global feature extraction hardware for image understanding applications. PMID:21869063

Reeves, A P; Rostampour, A

1982-04-01

13

Array processor architecture connection network  

NASA Technical Reports Server (NTRS)

A connection network is disclosed for use between a parallel array of processors and a parallel array of memory modules for establishing non-conflicting data communications paths between requested memory modules and requesting processors. The connection network includes a plurality of switching elements interposed between the processor array and the memory modules array in an Omega networking architecture. Each switching element includes a first and a second processor side port, a first and a second memory module side port, and control logic circuitry for providing data connections between the first and second processor ports and the first and second memory module ports. The control logic circuitry includes strobe logic for examining data arriving at the first and the second processor ports to indicate when the data arriving is requesting data from a requesting processor to a requested memory module. Further, connection circuitry is associated with the strobe logic for examining requesting data arriving at the first and the second processor ports for providing a data connection therefrom to the first and the second memory module ports in response thereto when the data connection so provided does not conflict with a pre-established data connection currently in use.

Barnes, George H. (Inventor); Lundstrom, Stephen F. (Inventor); Shafer, Philip E. (Inventor)

1982-01-01

14

Spaceborne Processor Array  

NASA Technical Reports Server (NTRS)

A Spaceborne Processor Array in Multifunctional Structure (SPAMS) can lower the total mass of the electronic and structural overhead of spacecraft, resulting in reduced launch costs, while increasing the science return through dynamic onboard computing. SPAMS integrates the multifunctional structure (MFS) and the Gilgamesh Memory, Intelligence, and Network Device (MIND) multi-core in-memory computer architecture into a single-system super-architecture. This transforms every inch of a spacecraft into a sharable, interconnected, smart computing element to increase computing performance while simultaneously reducing mass. The MIND in-memory architecture provides a foundation for high-performance, low-power, and fault-tolerant computing. The MIND chip has an internal structure that includes memory, processing, and communication functionality. The Gilgamesh is a scalable system comprising multiple MIND chips interconnected to operate as a single, tightly coupled, parallel computer. The array of MIND components shares a global, virtual name space for program variables and tasks that are allocated at run time to the distributed physical memory and processing resources. Individual processor- memory nodes can be activated or powered down at run time to provide active power management and to configure around faults. A SPAMS system is comprised of a distributed Gilgamesh array built into MFS, interfaces into instrument and communication subsystems, a mass storage interface, and a radiation-hardened flight computer.

Chow, Edward T.; Schatzel, Donald V.; Whitaker, William D.; Sterling, Thomas

2008-01-01

15

Array processors in chemistry  

SciTech Connect

The field of attached scientific processors (''array processors'') is surveyed, and an attempt is made to indicate their present and possible future use in computational chemistry. The current commercial products from Floating Point Systems, Inc., Datawest Corporation, and CSP, Inc. are discussed.

Ostlund, N.S.

1980-01-01

16

A comparison of CPUs, GPUs, FPGAs, and massively parallel processor arrays for random number generation  

Microsoft Academic Search

The future of high-performance computing is likely to rely on the ability to efficiently exploit huge amounts of paral- lelism. One way of taking advantage of this parallelism is to formulate problems as \\

David Barrie Thomas; Lee Howes; Wayne Luk

2009-01-01

17

Fast and Processor Efficient Parallel Matrix Multiplication Algorithms on a Linear Array With a Reconfigurable Pipelined Bus System  

Microsoft Academic Search

We present efficient parallel matrix multiplication algorithms for linear arrays with reconfigurable pipelined bus systems (LARPBS). Such systems are able to support a large volume of parallel communication of various patterns in constant time. An LARPBS can also be reconfigured into many independent subsystems and, thus, is able to support parallel implementations of divide-and-conquer computations like Strassen's algorithm. The main

Keqin Li; Yi Pan; Si-qing Zheng

1998-01-01

18

Parallel processor engine model program  

NASA Technical Reports Server (NTRS)

The Parallel Processor Engine Model Program is a generalized engineering tool intended to aid in the design of parallel processing real-time simulations of turbofan engines. It is written in the FORTRAN programming language and executes as a subset of the SOAPP simulation system. Input/output and execution control are provided by SOAPP; however, the analysis, emulation and simulation functions are completely self-contained. A framework in which a wide variety of parallel processing architectures could be evaluated and tools with which the parallel implementation of a real-time simulation technique could be assessed are provided.

Mclaughlin, P.

1984-01-01

19

A four-processor building block for SIMD processor arrays  

NASA Astrophysics Data System (ADS)

A four-processor chip, for use in processor arrays for image computations, is described. The large degree of data parallelism available in image computations allows dense array implementations where all processors operate under the control of a single instruction stream. An instruction decoder shared by the four processors on the chip minimizes the pin count allocated for global control of the processors. The chip incorporates an interface to an external static RAM for memory expansion without glue chips. The full-custom 2-micron CMOS chip contains 56,669 transistors and runs instructions at 10 MHz; 512 16-b processors and 4 Mbyte of distributed external memory fit on two industry-standard cards to yield 5 x 10 to the 9th instructions per second peak throughout. As image I/O can overlap perfectly with pixel computation, an array containing 128 of these chips can provide more than 600 16-b operations per pixel on 512 x 512 images at 30 Hz.

Fisher, Allan L.; Rockoff, Todd E.; Highnam, Peter T.

1990-04-01

20

Rectangular Array Of Digital Processors For Planning Paths  

NASA Technical Reports Server (NTRS)

Prototype 24 x 25 rectangular array of asynchronous parallel digital processors rapidly finds best path across two-dimensional field, which could be patch of terrain traversed by robotic or military vehicle. Implemented as single-chip very-large-scale integrated circuit. Excepting processors on edges, each processor communicates with four nearest neighbors along paths representing travel to north, south, east, and west. Each processor contains delay generator in form of 8-bit ripple counter, preset to 1 of 256 possible values. Operation begins with choice of processor representing starting point. Transmits signals to nearest neighbor processors, which retransmits to other neighboring processors, and process repeats until signals propagated across entire field.

Kemeny, Sabrina E.; Fossum, Eric R.; Nixon, Robert H.

1993-01-01

21

Parallel catastrophe modelling on a cell processor  

Microsoft Academic Search

In this paper we study the potential performance improvements for catastrophe modelling systems that can be achieved through parallelization on a Cell Processor. We studied and parallelized a critical section of catastrophe modelling, the so called \\

Frank K. H. A. Dehne; Glenn Hickey; Andrew Rau-Chaplin; Mark Byrne

2009-01-01

22

Stochastic scheduling of parallel processors  

SciTech Connect

Selected topics of interest from and area of parallel processing systems are investigated. Problems concern specifically an optimal scheduling of jobs subject to a dependency structure, an analysis of the performance of a heuristic assignment schedule in a multiserver system of many competing queues, and the optimal service rate control of a parallel processing system. In general, multi-tasking leads to a stochastic scheduling problem in which n jobs subject to precedence constraints are to be processed on m processors. Of particular interest are intree forms of the precedence constraints and i.i.d. job processing times. Using an optimal stochastic control formulation, it is shown, under some conditions on the distributions, that HLF (Highest Levels First) policies and HLF combined with LERPT (Longest Expected Remaining Processing Time) within each level minimize expected makespan for nonpreemptive and preemptive scheduling, respectively, when m = 2. The relative performance of HLF heuristics are investigated for a model in which the job execution times are i.i.d. with an exponential distribution. Many situations in resource sharing environments can be modeled as a multi-server system of many competing queues.

Ko, S.J.

1985-01-01

23

The Circulating Processor Model of Parallel Systems  

Microsoft Academic Search

This paper introduces the circulating processor model for parallel computer systems. Models of parallel systems tend to be computationally complex due to synchronization constraints such as task forking and joining. However, product form queuing network models remain computationally efficient as the size of the system grows by calculating only the mean performance metrics of the system. The circulating processor model

Amy W. Apon; Lawrence W. Dowdy

1997-01-01

24

Image processing using one-dimensional processor arrays  

Microsoft Academic Search

The first half of this paper presents the design rationale for CNAPS, a specialized one-dimensional (1-D) processor array developed by Adaptive Solutions Inc. In this context, we discuss the problem of Amdahl's law which severely constrains special-purpose architectures. We also discuss specific architectural decisions such as the kind of parallelism, the computational precision of the processors, on-chip versus off-chip processor

DAN W. HAMMERSTROM; DANIEL P. LULICH

1996-01-01

25

Peripheral array processors; Proceedings of the conference, Boston, MA, October 11, 12, 1984  

SciTech Connect

Various papers on peripheral array processors are presented. The topics discussed include: the changing role of peripheral array processors; recent developments in the CSPI MAP series of array processors; parallel processing approach selected by Floating Point Systems for providing a new generation of cost effective array processors and scientific computers; the Numerix MARS-432 array processor; and broader horizons for the System 10 Plus. Also examined are: the SPS-1000 - a data flow multiprocessor for real-time processing; interactive array processing; vibration, noise, and array processor; ZIP 3216: an array processor for microcomputer-based systems; a real-time hardware-in-the-loop missile simulation using the DPS-2400 as an executive controller; attached processor which improves SPAS simulation time; real-time signal processing with the SIGNATRON S320A/FPS 5000; and a two-board array processor for the VMEbus.

Karplus, W.J.

1984-01-01

26

Configurable Soft Processor Arrays Using the OpenFire Processor  

Microsoft Academic Search

Single-chip multiprocessor systems, while requiring significantly less design effort than custom hardware solutions, fall behind custom RTL in performance. In an effort to decrease this performance gap, the individual processors in an array can be tailored to their specific application. In this paper we present the OpenFire, a Xilinx MicroBlaze-compatible processor designed for configurable array research. A sample application is

Stephen Craven; Cameron Patterson; Peter Athanas

2005-01-01

27

Multiple-Fold Clustered Processor Mesh Array.  

National Technical Information Service (NTIS)

The multiple-fold clustered processor mesh array is a triangular organization of clustered processing elements. This multiple-fold array maintains functional equivalence to the nearest neighbor mesh computer with uni-directional interprocessor communicati...

G. G. Pechanek S. Vassiliadis J. G. Delgado

1993-01-01

28

Scheduling stochastic jobs on parallel processors  

SciTech Connect

The main objective is to develop some realistic models and corresponding strategies for dispatching stochastic jobs to parallel processors. Basically, the problem scenario under consideration consists a set of parallel processors and a collection of jobs that need one-step operation. Specifically addressed are the following problems: (1) The identical machine problem, to determine the optimal strategy of dispatching a single priority class of jobs on m-identical machines to maximize some expected reward functions. (2) The two-heterogeneous processor problem, to investigate the problem of non-preemptive dispatching of n-priority classes of jobs to two processors differing in speeds to minimize the expected flowtime of each job class. (3) The multi-heterogeneous processor problem, to study a more general version of the last problem-find the optimal nonpreemptive dispatching strategy for two-priority job classes on m heterogeneous processors to minimize the expected flowtime of each job class. The above problems are formulated mathematically and the corresponding optimal dispatching strategy for each problem is derived.

Xu, H.

1987-01-01

29

Parallel processor programs in the Federal Government  

NASA Technical Reports Server (NTRS)

In 1982, a report dealing with the nation's research needs in high-speed computing called for increased access to supercomputing resources for the research community, research in computational mathematics, and increased research in the technology base needed for the next generation of supercomputers. Since that time a number of programs addressing future generations of computers, particularly parallel processors, have been started by U.S. government agencies. The present paper provides a description of the largest government programs in parallel processing. Established in fiscal year 1985 by the Institute for Defense Analyses for the National Security Agency, the Supercomputing Research Center will pursue research to advance the state of the art in supercomputing. Attention is also given to the DOE applied mathematical sciences research program, the NYU Ultracomputer project, the DARPA multiprocessor system architectures program, NSF research on multiprocessor systems, ONR activities in parallel computing, and NASA parallel processor projects.

Schneck, P. B.; Austin, D.; Squires, S. L.; Lehmann, J.; Mizell, D.; Wallgren, K.

1985-01-01

30

Computing Data Cubes Using Massively Parallel Processors  

Microsoft Academic Search

To better support decision making, it was proposed to extend SQL to include data c ube operations. Computation o f data c ube requires computing a number of interrelated group-bys, which is rather expensive operation when databases are large. In this paper, we propose to couple a relational database management system with massively parallel processors (MPP) to facilitate on-line a

Hongjun Lu; Xiaohui Huang; Zhixian Li

31

Using a CSP Based Programming Model for Reconfigurable Processor Arrays  

Microsoft Academic Search

The growing trend towards adoption of flexible and heterogeneous, parallel computing architectures has increased the challenges faced by the programming community. We propose a method to program an emerging class of reconfigurable processor arrays by using the CSP based programming model of occam-pi. The paper describes the extension of an existing compiler platform to target such architectures. To evaluate the

Zain-ul-Abdin; Bertil Svensson

2008-01-01

32

Parallel encrypted array multipliers  

SciTech Connect

An algorithm for direct two's-complement and sign-magnitude parallel multiplication is described. The partial product matrix representing the multiplication is converted to an equivalent matrix by encryption. Its reduction, producing the final result, needs no specialized adders and can be added with any parallel array addition technique. It contains no negative terms and no extra ''correction'' rows; in addition, it produces the multiplication with fewer than the minimal number of rows required for a direct multiplication process.

Vassiliadis, S.; Putrino, M.; Schwarz, E.M.

1988-07-01

33

SYSTEMATIC METHODOLOGY OF MAPPING SIGNAL PROCESSING ALGORITHMS INTO ARRAYS OF PROCESSORS  

Microsoft Academic Search

Nowadays high speed signal processing has become the only alternative in modern communication system, given the rapidly growing microelectronics technology. This high speed, real time signal processing depends critically both on the parallel algorithms and on parallel processor technology. Special purpose array processor structures will have become the real possibility for high speed signal processing in the next few years.

J. Tasi?; U. Burnik

34

Intermediate-level computer-vision-processing algorithm development for the content-addressable-array parallel processor. Quarterly status report No. 3 for period ending 29 November 1986  

SciTech Connect

During this quarter a set of seven benchmark problems were developed and analyzed for the IUA. These included Hough Transform, Convex Hull, Voronoi Diagram, Minimal Spanning Tree, Visibility of Vertices in a projected 3-dimensional model, subgraph isomorphism, and the minimum-cost path between points in a weighted graph. These problems are commonly considered intermediate-level processing in many visions research groups parallel implementations of UMass intermediate level processing algorithms, such as Boldt's line merging and Anandan's motion analysis continued to develop. A commercial processor, the TMS320C25, was chosen as the Intermediate Communications and Associative Processor (ICAP) processing element. The TMS320C25 has the advantages that it is a five-million instruction per second signal-processing unit with a fast multiplier and software support for fast floating-point operations. It also has a built in 5 Mb/S serial port that will interface well with the intermediate-level communications network. Also being explored is a set of group-theoretic network topologies with respect to the communication needs of intermediate-level processing. This has required the analysis of the classes of communication needed in each of the algorithms implemented.

Not Available

1986-12-15

35

Phased array antenna beamforming using optical processor  

NASA Technical Reports Server (NTRS)

The feasibility of optical processor based beamforming for microwave array antennas is investigated. The primary focus is on systems utilizing the 20/30 GHz communications band and a transmit configuration exclusively to serve this band. A mathematical model is developed for computation of candidate design configurations. The model is capable of determination of the necessary design parameters required for spatial aspects of the microwave 'footprint' (beam) formation. Computed example beams transmitted from geosynchronous orbit are presented to demonstrate network capabilities. The effect of the processor on the output microwave signal to noise quality at the antenna interface is also considered.

Anderson, L. P.; Boldissar, F.; Chang, D. C. D.

1991-01-01

36

Vector Processing Performance on the Delft Parallel Processor,  

National Technical Information Service (NTIS)

The applicability and performance of vector processing on the Delft Parallel Processor (DPP) are considered. With respect to the performance, primarily the effects of the parallel implementation of algorithms and of the architecture of the DPP are analyze...

J. H. M. Andriessen S. W. Brok F. J. Pasveer

1988-01-01

37

Intelligent spatial ecosystem modeling using parallel processors  

SciTech Connect

Spatial modeling of ecosystems is essential if one's modeling goals include developing a relatively realistic description of past behavior and predictions of the impacts of alternative management policies on future ecosystem behavior. Development of these models has been limited in the past by the large amount of input data required and the difficulty of even large mainframe serial computers in dealing with large spatial arrays. These two limitations have begun to erode with the increasing availability of remote sensing data and GIS systems to manipulate it, and the development of parallel computer systems which allow computation of large, complex, spatial arrays. Although many forms of dynamic spatial modeling are highly amenable to parallel processing, the primary focus in this project is on process-based landscape models. These models simulate spatial structure by first compartmentalizing the landscape into some geometric design and then describing flows within compartments and spatial processes between compartments according to location-specific algorithms. The authors are currently building and running parallel spatial models at the regional scale for the Patuxent River region in Maryland, the Everglades in Florida, and Barataria Basin in Louisiana. The authors are also planning a project to construct a series of spatially explicit linked ecological and economic simulation models aimed at assessing the long-term potential impacts of global climate change.

Maxwell, T.; Costanza, R. (Maryland International Inst. for Ecological Economics, Solomons (United States))

1993-05-01

38

Optimized Data-Reuse in Processor Arrays  

Microsoft Academic Search

We present a method for co-partitioning affine indexed algorithms resulting in a processor array with an optimized data-reuse. Through this method, a memory hierarchy with an optimized data transfer is derived which allows a significant reduction of the power consumption caused by memory accesses. Apart from former design flows which begin with a space-time transformation, we start with the co-partitioning

Sebastian Siegel; Renate Merker

2004-01-01

39

Scan line graphics generation on the massively parallel processor  

NASA Technical Reports Server (NTRS)

Described here is how researchers implemented a scan line graphics generation algorithm on the Massively Parallel Processor (MPP). Pixels are computed in parallel and their results are applied to the Z buffer in large groups. To perform pixel value calculations, facilitate load balancing across the processors and apply the results to the Z buffer efficiently in parallel requires special virtual routing (sort computation) techniques developed by the author especially for use on single-instruction multiple-data (SIMD) architectures.

Dorband, John E.

1988-01-01

40

Chemical network problems solved on NASA/Goddard's massively parallel processor computer  

NASA Technical Reports Server (NTRS)

The single instruction stream, multiple data stream Massively Parallel Processor (MPP) unit consists of 16,384 bit serial arithmetic processors configured as a 128 x 128 array whose speed can exceed that of current supercomputers (Cyber 205). The applicability of the MPP for solving reaction network problems is presented and discussed, including the mapping of the calculation to the architecture, and CPU timing comparisons.

Cho, Seog Y.; Carmichael, Gregory R.

1987-01-01

41

QPACE -- a QCD parallel computer based on Cell processors  

Microsoft Academic Search

QPACE is a novel parallel computer which has been developed to be primarily used for lattice QCD simulations. The compute power is provided by the IBM PowerXCell 8i processor, an enhanced version of the Cell processor that is used in the Playstation 3. The QPACE nodes are interconnected by a custom, application optimized 3-dimensional torus network implemented on an FPGA.

H. Baier; H. Boettiger; M. Drochner; N. Eicker; U. Fischer; Z. Fodor; A. Frommer; C. Gomez; G. Goldrian; S. Heybrock; D. Hierl; M. Hüsken; T. Huth; B. Krill; J. Lauritsen; T. Lippert; T. Maurer; B. Mendl; N. Meyer; A. Nobile; I. Ouda; M. Pivanti; D. Pleiter; M. Ries; A. Schäfer; H. Schick; F. Schifano; H. Simma; S. Solbrig; T. Streuer; K.-H. Sulanke; R. Tripiccione; J.-S. Vogt; T. Wettig; F. Winter

2009-01-01

42

A Task Scheduling Algorithm of Single Processor Parallel Test System  

Microsoft Academic Search

The purpose of this paper is to implement parallel test in the single processor auto test system and to improve the test efficiency with a lower test cost. The main factor that impacts the test efficiency of test system is the performance of the parallel task scheduling algorithm. This paper puts forward a heuristic parallel task scheduling algorithm: scheduling-Q which

Jiajing Zhuo; Chen Meng; Minghu Zou

2007-01-01

43

Computing the Hough transform on a scan line array processor  

SciTech Connect

This paper describes a parallel algorithm for a line-finding Hough transform that runs on a linearly connected, SIMD vector of processors. The authors show that a high-precision transform, usually considered to be an expensive global operation, can be performed efficiently, in two to three times real time, with only local communication on a long vector. The algorithm also illustrates a decomposition principle that has wide application in algorithm design for large linear arrays. They include a review of straight-line Hough transform implementations.

Fisher, A.L.; Highnam, P.T.

1989-03-01

44

Peripheral array processors. Proceedings of the conference on peripheral array processors  

SciTech Connect

The following topics were dealt with: peripheral array processors; signal processing; data flow systems; multiprocessor systems; and applications. 14 papers were presented, all of which are published in full in the present proceedings. Abstracts of individual papers can be found under the relevant classification codes in this or other issues.

Karplus, W.J.

1982-01-01

45

High Performance Execution Engines for Instruction Level Parallel Processors.  

National Technical Information Service (NTIS)

The authors present high performance execution units suitable to instruction level pipelined parallel processors. First, the authors investigate arithmetic logic unit interlocks for fixed point instruction processing. The authors show that interlocked fix...

J. E. Phillips

1997-01-01

46

Breadboard Signal Processor for Arraying DSN Antennas  

NASA Technical Reports Server (NTRS)

A recently developed breadboard version of an advanced signal processor for arraying many antennas in NASA s Deep Space Network (DSN) can accept inputs in a 500-MHz-wide frequency band from six antennas. The next breadboard version is expected to accept inputs from 16 antennas, and a following developed version is expected to be designed according to an architecture that will be scalable to accept inputs from as many as 400 antennas. These and similar signal processors could also be used for combining multiple wide-band signals in non-DSN applications, including very-long-baseline interferometry and telecommunications. This signal processor performs functions of a wide-band FX correlator and a beam-forming signal combiner. [The term "FX" signifies that the digital samples of two given signals are fast Fourier transformed (F), then the fast Fourier transforms of the two signals are multiplied (X) prior to accumulation.] In this processor, the signals from the various antennas are broken up into channels in the frequency domain (see figure). In each frequency channel, the data from each antenna are correlated against the data from each other antenna; this is done for all antenna baselines (that is, for all antenna pairs). The results of the correlations are used to obtain calibration data to align the antenna signals in both phase and delay. Data from the various antenna frequency channels are also combined and calibration corrections are applied. The frequency-domain data thus combined are then synthesized back to the time domain for passing on to a telemetry receiver

Jongeling, Andre; Sigman, Elliott; Chandra, Kumar; Trinh, Joseph; Soriano, Melissa; Navarro, Robert; Rogstad, Stephen; Goodhart, Charles; Proctor, Robert; Jourdan, Michael; Rayhrer, Benno

2008-01-01

47

Chromosome image segmentation on PAL parallel image processor  

NASA Astrophysics Data System (ADS)

Chromosome image segmentation is an important step toward automatic karyotyping that involves visualization and interpretation of chromosomes. In this paper, we analyze the characteristics of chromosome images that can be effectively used for segmenting chromosomes and can be efficiently extracted on the Lockheed-Martin PAL parallel image processor. We design and implement a parallel algorithm that uses local features to split touching chromosomes.

Shi, Hongchi; Gader, Paul D.; Li, Hongzheng

1997-09-01

48

Processor Self-Scheduling for Multiple-Nested Parallel Loops  

Microsoft Academic Search

Processor self-scheduling is a useful scheme in a multiprocessor system if the execution time of each iteration in a parallel loop is not known in advance and varies substantially, or if there are multiple nestings in parallel loops which makes static scheduling difficult and inefficient. By using efficient synchronization primitives, the operating system is not needed for loop scheduling. The

Peiyi Tang; Pen-chung Yew

1986-01-01

49

Digital image processing software system using an array processor  

SciTech Connect

A versatile array processor-based system for general-purpose image processing was developed. At the heart of this system is an extensive, flexible software package that incorporates the array processor for effective interactive image processing. The software system is described in detail, and its application to a diverse set of applications at LLNL is briefly discussed. 4 figures, 1 table.

Sherwood, R.J.; Portnoff, M.R.; Journeay, C.H.; Twogood, R.E.

1981-03-10

50

New multilevel parallelism management for multimedia processors  

NASA Astrophysics Data System (ADS)

This paper presents a new parallelism manager for multimedia multiprocessors. An analysis of recent multimedia applications shows that the available parallelism moves from the data-level to the control-level. New architectures are required to be able to extract this kind of dynamic parallelism. Our proposed parallelism management describes the parallelism with a topological description of the task dependence graph. It allows to represent various and complex parallelism patterns. This parallelism description is separated from the program code to allow the task manager to decode it in parallel with the task execution. The task manager is based on a queue bank that stores the task graph. Control commands are inserted in the task dependence graph to allow a dynamic modification of this graph, depending on the processed data. Simulations on classical multiprocessing benchmarks show that in case of simple parallelism, we have similar performances than classical systems. However, the performances on complex applications are improved up to 12%. Multimedia applications have also bee simulated. The results show that our task manager can efficiently handle complex dynamic parallelism structures.

Verians, Xavier; Legat, Jean-Didier; Macq, Benoit M.; Quisquater, Jean-Jacques

1998-12-01

51

Massively Parallel MRI Detector Arrays  

PubMed Central

Originally proposed as a method to increase sensitivity by extending the locally high-sensitivity of small surface coil elements to larger areas, the term parallel imaging now includes the use of array coils to perform image encoding. This methodology has impacted clinical imaging to the point where many examinations are performed with an array comprising multiple smaller surface coil elements as the detector of the MR signal. This article reviews the theoretical and experimental basis for the trend towards higher channel counts relying on insights gained from modeling and experimental studies as well as the theoretical analysis of the so-called “ultimate” SNR and g-factor. We also review the methods for optimally combining array data and changes in RF methodology needed to construct massively parallel MRI detector arrays and show some examples of state-of-the-art for highly accelerated imaging with the resulting highly parallel arrays.

Keil, Boris; Wald, Lawrence L

2013-01-01

52

Optoelectronic Array Processors with Applications in Machine Intelligence and Database Management.  

NASA Astrophysics Data System (ADS)

Many computational problems in machine intelligence and database management can be solved using simple array manipulations which are similar to those of linear algebra. The regularity of these operations allows efficient parallel algorithms to be executed on array processors, thereby satisfying demands for increased throughput. A progression of increasingly complex parallel array architectures is presented. Algorithms which exploit the properties of these architectures are presented for various applications. Both conventional and novel neural learning algorithms are mapped onto these architectures. Mathematical reductions of fuzzy inference mechanisms to simple vector operations are presented which allow extremely efficient parallel computations on array architectures. Algorithms which use outer product operations are presented for constraint satisfaction problems. Intelligent secondary storage interfaces based on the array architectures are shown to provide data reduction for relational database applications by executing a portion of the database query directly at the interface. Several optoelectronic array processor designs are described which allow efficient implementations of the array architectures. By introducing optical interconnections, area and delay penalties associated with very-large-scale -integration (VLSI) electronic systems are diminished, since the optoelectronic layout topologies are determined by various optical interconnection strategies rather than by planar wiring requirements. Several simple optical systems are presented and experimentally demonstrated. In particular, the optical transpose interconnection system is shown to support several array architectures. The advantages of optically interconnected array processors are shown to be significant for large array dimensions due to the fundamental incompatibility between these arrays and the planar nature of VLSI systems. Optoelectronic technologies which support optically interconnected array processors are analyzed according to their effects on system performance. Optimal operational configurations of PLZT and multiple-quantum -well modulators are derived. Multiple-quantum well modulators are shown to have limited fan-out capabilities due to saturation effects. Future research directions for optical interconnections and optoelectronic array processors are also postulated.

Marsden, Gary Colt

53

Parallel processor-based raster graphics system architecture  

DOEpatents

An apparatus for generating raster graphics images from the graphics command stream includes a plurality of graphics processors connected in parallel, each adapted to receive any part of the graphics command stream for processing the command stream part into pixel data. The apparatus also includes a frame buffer for mapping the pixel data to pixel locations and an interconnection network for interconnecting the graphics processors to the frame buffer. Through the interconnection network, each graphics processor may access any part of the frame buffer concurrently with another graphics processor accessing any other part of the frame buffer. The plurality of graphics processors can thereby transmit concurrently pixel data to pixel locations in the frame buffer.

Littlefield, Richard J. (Seattle, WA)

1990-01-01

54

Global synchronization of parallel processors using clock pulse width modulation  

DOEpatents

A circuit generates a global clock signal with a pulse width modification to synchronize processors in a parallel computing system. The circuit may include a hardware module and a clock splitter. The hardware module may generate a clock signal and performs a pulse width modification on the clock signal. The pulse width modification changes a pulse width within a clock period in the clock signal. The clock splitter may distribute the pulse width modified clock signal to a plurality of processors in the parallel computing system.

Chen, Dong; Ellavsky, Matthew R.; Franke, Ross L.; Gara, Alan; Gooding, Thomas M.; Haring, Rudolf A.; Jeanson, Mark J.; Kopcsay, Gerard V.; Liebsch, Thomas A.; Littrell, Daniel; Ohmacht, Martin; Reed, Don D.; Schenck, Brandon E.; Swetz, Richard A.

2013-04-02

55

Discounts for dynamic programming with applications in VLSI processor arrays  

SciTech Connect

This dissertation introduces a method for transforming certain dynamic programming problems into ones that require less space and time to solve under the logarithmic cost criterion, an appropriate complexity measure for flexible word-length machines. The mapping is based on discounts that change the costs but not the identities of optimal policies. Under the proper circumstances, the structure present in the original problem is preserved in the image so that the functional equations of dynamic programming still apply. Practical value of the theory is illustrated by demonstrating that a previously published VLSI processor array can be made asymptotically smaller and faster. The second half of this work addresses issues that arise in parallel sequence comparison. The paradigm here is deoxyribonucleic acid (DNA) which maybe considered a string over a four-character alphabet. It is shown how a number of popular sequence matching algorithms can be mapped onto linear arrays of processors. One of these, the Princeton Nucleic Acid Comparator (P-NAC), has been fabricated, tested, and found to work perfectly. Its efficient implementation is due entirely to an application of discounts; benchmark results prove that it is several hundred times faster than a minicomputer.

Lopresti, D.P.

1987-01-01

56

DFT algorithms for bit-serial GaAs array processor architectures  

NASA Technical Reports Server (NTRS)

Systems and Processes Engineering Corporation (SPEC) has developed an innovative array processor architecture for computing Fourier transforms and other commonly used signal processing algorithms. This architecture is designed to extract the highest possible array performance from state-of-the-art GaAs technology. SPEC's architectural design includes a high performance RISC processor implemented in GaAs, along with a Floating Point Coprocessor and a unique Array Communications Coprocessor, also implemented in GaAs technology. Together, these data processors represent the latest in technology, both from an architectural and implementation viewpoint. SPEC has examined numerous algorithms and parallel processing architectures to determine the optimum array processor architecture. SPEC has developed an array processor architecture with integral communications ability to provide maximum node connectivity. The Array Communications Coprocessor embeds communications operations directly in the core of the processor architecture. A Floating Point Coprocessor architecture has been defined that utilizes Bit-Serial arithmetic units, operating at very high frequency, to perform floating point operations. These Bit-Serial devices reduce the device integration level and complexity to a level compatible with state-of-the-art GaAs device technology.

Mcmillan, Gary B.

1988-01-01

57

Processor Coupling: Integrating Compile Time and Runtime Scheduling for Parallelism  

Microsoft Academic Search

The technology to implement a single-chip node composed of 4high-performance floating-point ALUs will be available by 1995.This paper presents processor coupling,a mechanism for controllingmultiple ALUs to exploit both instruction-level and inter-thread parallelism,by using compile time and runtime scheduling. The compilerstatically schedules individual threads to discover availableintra-thread instruction-level parallelism. The runtime schedulingmechanism interleaves threads, exploiting ...

Stephen W. Keckler; William J. Dally

1992-01-01

58

MASA: a multithreaded processor architecture for parallel symbolic computing  

Microsoft Academic Search

MASA is a “first cut” at a processor architecture intended as a building block for a multiprocessor that can execute parallel Lisp programs efficiently. MASA features a tagged architecture, multiple contexts, fast trap handling, and a synchronization bit in every memory word. MASA's principal novelty is its use of multiple contexts both to support multithreaded execution—interleaved execution from separate instruction

Robert H. Halstead Jr.; Tetsuya Fujita

1988-01-01

59

Real-time communications scheduling for massively parallel processors  

Microsoft Academic Search

Can general purpose commercial massively parallel processors (MPPs) be used for computationally intensive real-time applications that have traditionally required a custom arrangement of special-purpose computers and mainframes? If so, then the enormous lifecycle costs of many systems needed by, for instance, the Government could potentially be reduced. The components would be commercially available and continuing technological advances could more easily

Richard Games; Arkady Kanevsky; Peter C. Krupp; Leonard Monk

1995-01-01

60

Dynamic overset grid communication on distributed memory parallel processors  

NASA Technical Reports Server (NTRS)

A parallel distributed memory implementation of intergrid communication for dynamic overset grids is presented. Included are discussions of various options considered during development. Results are presented comparing an Intel iPSC/860 to a single processor Cray Y-MP. Results for grids in relative motion show the iPSC/860 implementation to be faster than the Cray implementation.

Barszcz, Eric; Weeratunga, Sisira K.; Meakin, Robert L.

1993-01-01

61

Array processors with pipelined optical busses  

Microsoft Academic Search

A synchronous multiprocessor architecture based on pipelined optical bus interconnections is presented. The processors are placed in a square grid and are interconnected to one another through horizontal and vertical optical buses. This architecture has an effective diameter as small as two owing to its orthogonal bus connections, and it allows all processors to have simultaneous access to the buses

Zicheng Guo; Rami G. Melhem; Richard W. Hall; Donald M. Chiarulli; Steven P. Levitan

1990-01-01

62

A parallel particle-in-cell model for the massively parallel processor  

NASA Technical Reports Server (NTRS)

The availability of the nearest-neighbor communication-incorporating Massively Parallel Processor has prompted the development of a two-dimensional, particle-in-cell algorithm which loads particles in a cell randomly onto a row of processors, filling only half of them with particles. Due to the simplification of communications among processors achieved in a row by the vacant processors and the random-particle sequence, the algorithm efficiently sorts particles and performs gather/scatter procedures for collecting charge density according to their cells. The algorithm calculates electric fields at the cells by FFT.

Lin, C. S.; Thring, A. L.; Koga, J.; Seiler, E. J.

1990-01-01

63

Real-time trajectory optimization on parallel processors  

NASA Technical Reports Server (NTRS)

A parallel algorithm has been developed for rapidly solving trajectory optimization problems. The goal of the work has been to develop an algorithm that is suitable to do real-time, on-line optimal guidance through repeated solution of a trajectory optimization problem. The algorithm has been developed on an INTEL iPSC/860 message passing parallel processor. It uses a zero-order-hold discretization of a continuous-time problem and solves the resulting nonlinear programming problem using a custom-designed augmented Lagrangian nonlinear programming algorithm. The algorithm achieves parallelism of function, derivative, and search direction calculations through the principle of domain decomposition applied along the time axis. It has been encoded and tested on 3 example problems, the Goddard problem, the acceleration-limited, planar minimum-time to the origin problem, and a National Aerospace Plane minimum-fuel ascent guidance problem. Execution times as fast as 118 sec of wall clock time have been achieved for a 128-stage Goddard problem solved on 32 processors. A 32-stage minimum-time problem has been solved in 151 sec on 32 processors. A 32-stage National Aerospace Plane problem required 2 hours when solved on 32 processors. A speed-up factor of 7.2 has been achieved by using 32-nodes instead of 1-node to solve a 64-stage Goddard problem.

Psiaki, Mark L.

1993-01-01

64

A high performance distributed-parallel-processor architecture for 3D IIR digital filters  

Microsoft Academic Search

Abstract—Real-time spatio-temporal VLSI 3D IIR digital filters may be used for imaging or beamforming applications employing 3D input signals from synchronously-sampled multi-sensor arrays. Such filters have high computational complexity and often require arithmetic throughputs of hundreds of millions of floating point operations per second, especially in the case of potential radio frequency beamforming applications. A novel high-throughput distributed parallel processor

Arjuna Madanayake; Leonard T. Bruton

2005-01-01

65

Error analysis of high data rate, optical parallel processors.  

PubMed

Optical parallel processors have the potential for aiding the transfer of information over networks. The systems implications for a baseline architecture employing spatial light modulators, lenses, and charge-coupled devices are examined. Specifically, because many applications have stringent requirements on errors, this study concentrates on categorizing the potential error sources-both random and systematic-and presents the results of an error analysis for a pixel-to-pixel mapping system as a notional example. PMID:18357233

Jackson, D J; Juncosa, M L

2001-05-10

66

On Preemptive Scheduling of Unrelated Parallel Processors by Linear Programming  

Microsoft Academic Search

It IS shown that certain problems of optimal preemptive scheduling of unrelated parallel processors can be formulated and solved as hnear programming problems As a by-product of the linear programming formulaUons of these problems, upper bounds are obtained on the number of preempuons required for optimal schedules In particular it is shown that no more than O(m 2) preemptions are

Eugene L. Lawler; Jacques Labetoulle

1978-01-01

67

Potential of minicomputer/array-processor system for nonlinear finite-element analysis  

NASA Technical Reports Server (NTRS)

The potential of using a minicomputer/array-processor system for the efficient solution of large-scale, nonlinear, finite-element problems is studied. A Prime 750 is used as the host computer, and a software simulator residing on the Prime is employed to assess the performance of the Floating Point Systems AP-120B array processor. Major hardware characteristics of the system such as virtual memory and parallel and pipeline processing are reviewed, and the interplay between various hardware components is examined. Effective use of the minicomputer/array-processor system for nonlinear analysis requires the following: (1) proper selection of the computational procedure and the capability to vectorize the numerical algorithms; (2) reduction of input-output operations; and (3) overlapping host and array-processor operations. A detailed discussion is given of techniques to accomplish each of these tasks. Two benchmark problems with 1715 and 3230 degrees of freedom, respectively, are selected to measure the anticipated gain in speed obtained by using the proposed algorithms on the array processor.

Strohkorb, G. A.; Noor, A. K.

1983-01-01

68

Real-time simulation of MHD/steam power plants by digital parallel processors  

NASA Astrophysics Data System (ADS)

Attention is given to a large FORTRAN coded program which simulates the dynamic response of the MHD/steam plant on either a SEL 32/55 or VAX 11/780 computer. The code realizes a detailed first-principle model of the plant. Quite recently, in addition to the VAX 11/780, an AD-10 has been installed for usage as a real-time simulation facility. The parallel processor AD-10 is capable of simulating the MHD/steam plant at several times real-time rates. This is desirable in order to develop rapidly a large data base of varied plant operating conditions. The combined-cycle MHD/steam plant model is discussed, taking into account a number of disadvantages. The disadvantages can be overcome with the aid of an array processor used as an adjunct to the unit processor. The conversion of some computations for real-time simulation is considered.

Johnson, R. M.; Rudberg, D. A.

69

Method and structure for skewed block-cyclic distribution of lower-dimensional data arrays in higher-dimensional processor grids  

DOEpatents

A method and structure of distributing elements of an array of data in a computer memory to a specific processor of a multi-dimensional mesh of parallel processors includes designating a distribution of elements of at least a portion of the array to be executed by specific processors in the multi-dimensional mesh of parallel processors. The pattern of the designating includes a cyclical repetitive pattern of the parallel processor mesh, as modified to have a skew in at least one dimension so that both a row of data in the array and a column of data in the array map to respective contiguous groupings of the processors such that a dimension of the contiguous groupings is greater than one.

Chatterjee, Siddhartha (Yorktown Heights, NY); Gunnels, John A. (Brewster, NY)

2011-11-08

70

Architecture and Evaluation of an Asynchronous Array of Simple Processors  

Microsoft Academic Search

Abstractó This paper presents the architecture of an Asyn- chronous Array of simple Processors (AsAP), and evaluates its key architectural features as well as its performance and energy efciency . The AsAP processor calculates DSP applications with high energy-efciency , is capable of high-performance, is easily scalable, and is well-suited to future fabrication technologies. It is composed of a 2-D

Zhiyi Yu; Michael J. Meeuwsen; Ryan W. Apperson; Omar Sattari; Michael A. Lai; Jeremy W. Webb; Eric W. Work; Tinoosh Mohsenin; Bevan M. Baas

2008-01-01

71

Hardware reconfiguration for fault-tolerant processor arrays  

SciTech Connect

In large VLSI/WSI arrays, improved reliability and yield can be obtained through reconfiguration techniques. In fault tolerance design, redundancy is used to offset faults when they occur in the arrays. Since redundant components are themselves susceptible to faults, their number must be a minimum. This also implies that an efficient reconfiguration scheme is preferred, i.e., one that can use as many spare components as possible so that unnecessary waste of spares is reduced. In this thesis, hardware reconfiguration for fault-tolerant processor arrays is discussed. First, a taxonomy for reconfiguration techniques is introduced, and several schemes are surveyed and classified. This taxonomy can be used to introduce, explain, compare, study, and classify new reconfiguration schemes. Next, an extension to reconfiguration technique is presented. Two special cases of the scheme are simulated and their results compared and studied. Finally, a new approach to hardware reconfiguration, called FUSS (Full Use of Suitable Spares), is proposed for VLSI/WSI fault-tolerant processor arrays. FUSS uses an indicator vector, the surplus vector, to guide the replacement of faulty processors within an array. Analytical study of the general FUSS algorithm shows that a linear relationship between the array size and the area of interconnect is required for the reconfiguration to be 100% successful. In an instance of FUSS, called simple FUSS, reconfiguration is done by simply shifting up or down faulty processors along their corresponding columns according to the surplus vector's entries. The surplus vector is progressively updated after each column is reconfigured. The reconfiguration is successful when the surplus vector becomes the null vector. Simulations show that when the number of faulty processors is equal to that of spare processors, simple FUSS can achieve a probability of survival as high as 99%

Chean, M.

1989-01-01

72

Optimal mapping of irregular finite element domains to parallel processors  

NASA Technical Reports Server (NTRS)

Mapping the solution domain of n-finite elements into N-subdomains that may be processed in parallel by N-processors is an optimal one if the subdomain decomposition results in a well-balanced workload distribution among the processors. The problem is discussed in the context of irregular finite element domains as an important aspect of the efficient utilization of the capabilities of emerging multiprocessor computers. Finding the optimal mapping is an intractable combinatorial optimization problem, for which a satisfactory approximate solution is obtained here by analogy to a method used in statistical mechanics for simulating the annealing process in solids. The simulated annealing analogy and algorithm are described, and numerical results are given for mapping an irregular two-dimensional finite element domain containing a singularity onto the Hypercube computer.

Flower, J.; Otto, S.; Salama, M.

1987-01-01

73

Virtualization within a Parallel Array of Homogeneous Processing Units  

Microsoft Academic Search

\\u000a Our work aims at adapting the concept of virtualization, which is known from the context of operating systems, for concurrent\\u000a hardware design. By contrast, the proposed concept applies virtualization not to processors or applications but to smaller\\u000a processing units within a parallel array of homogeneous instances and individual tasks. Thereby, virtualization during runtime\\u000a enables fault tolerance without the need for

Marc Stöttinger; Alexander Biedermann; Sorin Alexander Huss

2010-01-01

74

Parallel processors and nonlinear structural dynamics algorithms and software  

NASA Technical Reports Server (NTRS)

Techniques are discussed for the implementation and improvement of vectorization and concurrency in nonlinear explicit structural finite element codes. In explicit integration methods, the computation of the element internal force vector consumes the bulk of the computer time. The program can be efficiently vectorized by subdividing the elements into blocks and executing all computations in vector mode. The structuring of elements into blocks also provides a convenient way to implement concurrency by creating tasks which can be assigned to available processors for evaluation. The techniques were implemented in a 3-D nonlinear program with one-point quadrature shell elements. Concurrency and vectorization were first implemented in a single time step version of the program. Techniques were developed to minimize processor idle time and to select the optimal vector length. A comparison of run times between the program executed in scalar, serial mode and the fully vectorized code executed concurrently using eight processors shows speed-ups of over 25. Conjugate gradient methods for solving nonlinear algebraic equations are also readily adapted to a parallel environment. A new technique for improving convergence properties of conjugate gradients in nonlinear problems is developed in conjunction with other techniques such as diagonal scaling. A significant reduction in the number of iterations required for convergence is shown for a statically loaded rigid bar suspended by three equally spaced springs.

Belytschko, Ted

1990-01-01

75

Particle simulation of plasmas on the massively parallel processor  

NASA Technical Reports Server (NTRS)

Particle simulations, in which collective phenomena in plasmas are studied by following the self consistent motions of many discrete particles, involve several highly repetitive sets of calculations that are readily adaptable to SIMD parallel processing. A fully electromagnetic, relativistic plasma simulation for the massively parallel processor is described. The particle motions are followed in 2 1/2 dimensions on a 128 x 128 grid, with periodic boundary conditions. The two dimensional simulation space is mapped directly onto the processor network; a Fast Fourier Transform is used to solve the field equations. Particle data are stored according to an Eulerian scheme, i.e., the information associated with each particle is moved from one local memory to another as the particle moves across the spatial grid. The method is applied to the study of the nonlinear development of the whistler instability in a magnetospheric plasma model, with an anisotropic electron temperature. The wave distribution function is included as a new diagnostic to allow simulation results to be compared with satellite observations.

Gledhill, I. M. A.; Storey, L. R. O.

1987-01-01

76

A taxonomy of reconfiguration techniques for fault-tolerant processor arrays--  

SciTech Connect

The authors overview, characterize, and classify some typical reconfiguration schemes in light of a proposed taxonomy. This taxonomy can be used as a guide for future research in design and analysis of reconfiguration schemes. Studying how to evaluate fault-tolerant arrays and how to exploit application characteristics to achieve dependable computing are important complementary directions of research towards reliable processor-array design. A related research problem is that of functional reconfiguration, that is, learning how to configure the topology of a parallel system to implement a different function or run a different application. Important directions of research include how to apply or extend processor-array reconfiguration algorithms to other topologies and how to marry functional and fault-tolerance reconfiguration requirements and solutions. The Diogenes approach discussed in this article is a case where this goal is naturally achieved.

Chean, M. (Shell Development Co., Houston, TX (USA)); Fortes, J.A.B. (Purdue Univ., Lafayette, IN (USA))

1990-01-01

77

Digital signal processor and programming system for parallel signal processing  

SciTech Connect

This thesis describes an integrated assault upon the problem of designing high-throughput, low-cost digital signal-processing systems. The dual prongs of this assault consist of: (1) the design of a digital signal processor (DSP) which efficiently executes signal-processing algorithms in either a uniprocessor or multiprocessor configuration, (2) the PaLS programming system which accepts an arbitrary algorithm, partitions it across a group of DSPs, synthesizes an optimal communication link topology for the DSPs, and schedules the partitioned algorithm upon the DSPs. The results of applying a new quasi-dynamic analysis technique to a set of high-level signal-processing algorithms were used to determine the uniprocessor features of the DSP design. For multiprocessing applications, the DSP contains an interprocessor communications port (IPC) which supports simple, flexible, dataflow communications while allowing the total communication bandwidth to be incrementally allocated to achieve the best link utilization. The net result is a DSP with a simple architecture that is easy to program for both uniprocessor and multi-processor modes of operation. The PaLS programming system simplifies the task of parallelizing an algorithm for execution upon a multiprocessor built with the DSP.

Van den Bout, D.E.

1987-01-01

78

An informal introduction to program transformation and parallel processors  

SciTech Connect

In the summer of 1992, I had the opportunity to participate in a Faculty Research Program at Argonne National Laboratory. I worked under Dr. Jim Boyle on a project transforming code written in pure functional Lisp to Fortran code to run on distributed-memory parallel processors. To perform this project, I had to learn three things: the transformation system, the basics of distributed-memory parallel machines, and the Lisp programming language. Each of these topics in computer science was unfamiliar to me as a mathematician, but I found that they (especially parallel processing) are greatly impacting many fields of mathematics and science. Since most mathematicians have some exposure to computers, but.certainly are not computer scientists, I felt it was appropriate to write a paper summarizing my introduction to these areas and how they can fit together. This paper is not meant to be a full explanation of the topics, but an informal introduction for the ``mathematical layman.`` I place myself in that category as well as my previous use of computers was as a classroom demonstration tool.

Hopkins, K.W. [Southwest Baptist Univ., Bolivar, MO (United States)

1994-08-01

79

The performance realities of massively parallel processors: A case study  

SciTech Connect

This paper presents the results of an architectural comparison of SIMD massive parallelism, as implemented in the Thinking Machines Corp. CM-2 computer, and vector or concurrent-vector processing, as implemented in the Cray Research Inc. Y-MP/8. The comparison is based primarily upon three application codes that represent Los Alamos production computing. Tests were run by porting optimized CM Fortran codes to the Y-MP, so that the same level of optimization was obtained on both machines. The results for fully-configured systems, using measured data rather than scaled data from smaller configurations, show that the Y-MP/8 is faster than the 64k CM-2 for all three codes. A simple model that accounts for the relative characteristic computational speeds of the two machines, and reduction in overall CM-2 performance due to communication or SIMD conditional execution, is included. The model predicts the performance of two codes well, but fails for the third code, because the proportion of communications in this code is very high. Other factors, such as memory bandwidth and compiler effects, are also discussed. Finally, the paper attempts to show the equivalence of the CM-2 and Y-MP programming models, and also comments on selected future massively parallel processor designs.

Lubeck, O.M.; Simmons, M.L.; Wasserman, H.J.

1992-07-01

80

The performance realities of massively parallel processors: A case study  

SciTech Connect

This paper presents the results of an architectural comparison of SIMD massive parallelism, as implemented in the Thinking Machines Corp. CM-2 computer, and vector or concurrent-vector processing, as implemented in the Cray Research Inc. Y-MP/8. The comparison is based primarily upon three application codes that represent Los Alamos production computing. Tests were run by porting optimized CM Fortran codes to the Y-MP, so that the same level of optimization was obtained on both machines. The results for fully-configured systems, using measured data rather than scaled data from smaller configurations, show that the Y-MP/8 is faster than the 64k CM-2 for all three codes. A simple model that accounts for the relative characteristic computational speeds of the two machines, and reduction in overall CM-2 performance due to communication or SIMD conditional execution, is included. The model predicts the performance of two codes well, but fails for the third code, because the proportion of communications in this code is very high. Other factors, such as memory bandwidth and compiler effects, are also discussed. Finally, the paper attempts to show the equivalence of the CM-2 and Y-MP programming models, and also comments on selected future massively parallel processor designs.

Lubeck, O.M.; Simmons, M.L.; Wasserman, H.J.

1992-01-01

81

Mobile and replicated alignment of arrays in data-parallel programs  

Microsoft Academic Search

When a data-parallel language like Fortran 90 is com- piled for a distributed-memory machine, aggregate data objects (such as arrays) are distributed across the processor memories. The mapping determines the amount of residual communicationneeded to bring operands of parallel opera- tions into alignment with each other. A common approach is to break the mapping into two stages: first, analignment that

J. R. Gilbert; S. Chatterjee; Robert Schreiber

1993-01-01

82

Optimal evaluation of array expressions on massively parallel machines  

SciTech Connect

The authors investigate the problem of evaluating Fortran 90 style array expressions on massively parallel distributed-memory machines. On such machines, an elementwise operation can be performed in constant time for arrays whose corresponding elements are in the same processor. If the arrays are not aligned in this manner, the cost of aligning them is part of the cost of expression evaluation. The choice of where to perform the operation then affects the cost. They present algorithms based on dynamic programming to solve this problem efficiently for a wide variety of interconnection schemes, including multidimensional grids and rings, hypercubes, and fat-trees. They also consider expressions containing operations that change the shape of the arrays, and show that their approach extends naturally to handle this case.

Chatterjee, S.; Gilbert, J.; Schreiber, R.; Teng, S.H.

1992-01-01

83

Solution of large linear systems of equations on the massively parallel processor  

NASA Technical Reports Server (NTRS)

The Massively Parallel Processor (MPP) was designed as a special machine for specific applications in image processing. As a parallel machine, with a large number of processors that can be reconfigured in different combinations it is also applicable to other problems that require a large number of processors. The solution of linear systems of equations on the MPP is investigated. The solution times achieved are compared to those obtained with a serial machine and the performance of the MPP is discussed.

Ida, Nathan; Udawatta, Kapila

1987-01-01

84

Massively parallel processor networks with optical express channels  

DOEpatents

An optical method for separating and routing local and express channel data comprises interconnecting the nodes in a network with fiber optic cables. A single fiber optic cable carries both express channel traffic and local channel traffic, e.g., in a massively parallel processor (MPP) network. Express channel traffic is placed on, or filtered from, the fiber optic cable at a light frequency or a color different from that of the local channel traffic. The express channel traffic is thus placed on a light carrier that skips over the local intermediate nodes one-by-one by reflecting off of selective mirrors placed at each local node. The local-channel-traffic light carriers pass through the selective mirrors and are not reflected. A single fiber optic cable can thus be threaded throughout a three-dimensional matrix of nodes with the x,y,z directions of propagation encoded by the color of the respective light carriers for both local and express channel traffic. Thus frequency division multiple access is used to hierarchically separate the local and express channels to eliminate the bucket brigade latencies that would otherwise result if the express traffic had to hop between every local node to reach its ultimate destination. 3 figs.

Deri, R.J.; Brooks, E.D. III; Haigh, R.E.; DeGroot, A.J.

1999-08-24

85

Massively parallel processor networks with optical express channels  

DOEpatents

An optical method for separating and routing local and express channel data comprises interconnecting the nodes in a network with fiber optic cables. A single fiber optic cable carries both express channel traffic and local channel traffic, e.g., in a massively parallel processor (MPP) network. Express channel traffic is placed on, or filtered from, the fiber optic cable at a light frequency or a color different from that of the local channel traffic. The express channel traffic is thus placed on a light carrier that skips over the local intermediate nodes one-by-one by reflecting off of selective mirrors placed at each local node. The local-channel-traffic light carriers pass through the selective mirrors and are not reflected. A single fiber optic cable can thus be threaded throughout a three-dimensional matrix of nodes with the x,y,z directions of propagation encoded by the color of the respective light carriers for both local and express channel traffic. Thus frequency division multiple access is used to hierarchically separate the local and express channels to eliminate the bucket brigade latencies that would otherwise result if the express traffic had to hop between every local node to reach its ultimate destination.

Deri, Robert J. (Pleasanton, CA); Brooks, III, Eugene D. (Livermore, CA); Haigh, Ronald E. (Tracy, CA); DeGroot, Anthony J. (Castro Valley, CA)

1999-01-01

86

Derivative constraints for broad-band element space antenna array processors  

Microsoft Academic Search

In this paper a class of linear constraints, also termed as derivative constraints, which is applicable to broad-band element space antenna array processors, is presented. The performance characteristics of the optimum processor with derivative constraints are demonstrated by computer studies involving two types of array geometries, namely linear and circular arrays. As a consequence of derivative constraints, the beam width

Meng Er; A. Cantoni

1983-01-01

87

Practical Parallelism using Transputer Arrays  

Microsoft Academic Search

This paper explores methods for extracting parallelism from a wide variety of numerical applications. We investigate communications overheads and load-balancing for networks of transputers. After a discussion of some practical strategies for constructing occam programs, two case studies are analysed in detail.

David J. Pritchard; C. R. Askew; D. B. Carpenter; Ian Glendinning; Anthony J. G. Hey; Denis A. Nicole

1987-01-01

88

A simulation tool of parallel architectures for digital image processing applications based on DLX processors  

Microsoft Academic Search

We present a simulation tool of parallel architectures for image digital treatment applications, which is characterized by the simulation of different architectures based on RISC DLX processors and an interconnection network based on wormhole routing. Each node is provided with a computational DLX processor and another of similar features, devoted to control the communication network

V. Valero; F. Cuartero; A. Garrido; F. Quiles

1995-01-01

89

Serial multiplier arrays for parallel computation  

NASA Technical Reports Server (NTRS)

Arrays of systolic serial-parallel multiplier elements are proposed as an alternative to conventional SIMD mesh serial adder arrays for applications that are multiplication intensive and require few stored operands. The design and operation of a number of multiplier and array configurations featuring locality of connection, modularity, and regularity of structure are discussed. A design methodology combining top-down and bottom-up techniques is described to facilitate development of custom high-performance CMOS multiplier element arrays as well as rapid synthesis of simulation models and semicustom prototype CMOS components. Finally, a differential version of NORA dynamic circuits requiring a single-phase uncomplemented clock signal introduced for this application.

Winters, Kel

1990-01-01

90

System Design of Vector I - A High-Speed Data-Array Processor.  

National Technical Information Service (NTIS)

A description is given of the design and construction of a low cost, high performance data-array processor. This processor and its associated core memory operate under the control of and in conjunction with an XDS Sigma 2 Computer. The processor will perf...

H. H. Loomis

1970-01-01

91

Use of application characteristics and limited preemption for run-to-completion parallel processor scheduling policies  

Microsoft Academic Search

The performance potential of run-to-completion (RTC) parallel processor scheduling policies is investigated by examining whether (1) application execution rate characteristics such as average parallelism (avg) and processor working set (PWS) and\\/or (2) limited preemption can be used to improve the performance of these policies. We address the first question by comparing policies (previous as well as new) that differ only

Su-Hui Chiang; Rajesh K. Mansharamanitt; Mary K. Vernont

1994-01-01

92

Use of Application Characteristics and Limited Preemption for Run-to-Completion Parallel Processor Scheduling Policies  

Microsoft Academic Search

The performance potential of run-to-completion (RTC) parallel processor scheduling policies is investigated by examining whether (1) application execution rate characteristics such as aver- age parallelism (avg) and processor working set (pws)and\\/or (2) limited preemption can be used to improve the performance of these policies. We address the first question by comparing policies (previous as well as new) that differ only

Su-Hui Chiang; Rajesh K. Mansharamani; Mary K. Vernon

1994-01-01

93

SAR Processing on the Array Processor NMX-432 (SAR-Signalbehandling pa Arrayprocessorn NMX-432),  

National Technical Information Service (NTIS)

The report describes the Synthetic Aperture Radar (SAR) processing on the array processor NMX-432 from Numerix Corporation. The SAR processing system consists of the array processor and the host computer MicroVAX II with two 300 Mbyte disks and one 6250 b...

B. Brusmark A. Gustavsson A. Nelander

1988-01-01

94

PDF's, confidence regions, and relevant statistics for a class of sample covariance-based array processors  

Microsoft Academic Search

We add to the many results on sample covariance matrix (SCM) dependent array processors by (i) weakening the traditional assumption of Gaussian data and (ii) providing for a class of array processors additional performance measures that are of value in practice. The data matrix is assumed drawn from a class of multivariate elliptically contoured (MEC) distributions. The performance measures include

Christ D. Richmond

1996-01-01

95

Distributed Ray Tracing Using an SIMD (Single Instruction Multiple Data) Processor Array.  

National Technical Information Service (NTIS)

Recent work on distributed ray tracing algorithms using the Distributed Array Processor (DAP) will be described. It is shown that ray tracing algorithms are ideally suited to SIMD processor arrays because, once a set of rays has been cast from the viewpoi...

N. S. Williams B. F. Buxton H. Buxton

1990-01-01

96

Periodic Application of Concurrent Error Detection in Processor Array Architectures. PhD. Thesis -  

NASA Technical Reports Server (NTRS)

Processor arrays can provide an attractive architecture for some applications. Featuring modularity, regular interconnection and high parallelism, such arrays are well-suited for VLSI/WSI implementations, and applications with high computational requirements, such as real-time signal processing. Preserving the integrity of results can be of paramount importance for certain applications. In these cases, fault tolerance should be used to ensure reliable delivery of a system's service. One aspect of fault tolerance is the detection of errors caused by faults. Concurrent error detection (CED) techniques offer the advantage that transient and intermittent faults may be detected with greater probability than with off-line diagnostic tests. Applying time-redundant CED techniques can reduce hardware redundancy costs. However, most time-redundant CED techniques degrade a system's performance.

Chen, Paul Peichuan

1993-01-01

97

Track recognition in 4 [mu]s by a systolic trigger processor using a parallel Hough transform  

SciTech Connect

A parallel Hough transform processor has been developed that identifies circular particle tracks in a 2D projection of the OPAL jet chamber. The high-speed requirements imposed by the 8 bunch crossing mode of LEP could be fulfilled by computing the starting angle and the radius of curvature for each well defined track in less than 4 [mu]s. The system consists of a Hough transform processor that determines well defined tracks, and a Euler processor that counts their number by applying the Euler relation to the thresholded result of the Hough transform. A prototype of a systolic processor has been built that handles one sector of the jet chamber. It consists of 35 [times] 32 processing elements that were loaded into 21 programmable gate arrays (XILINX). This processor runs at a clock rate of 40 MHz. It has been tested offline with about 1,000 original OPAL events. No deviations from the off-line simulation have been found. A trigger efficiency of 93% has been obtained. The prototype together with the associated drift time measurement unit has been installed at the OPAL detector at LEP and 100k events have been sampled to evaluate the system under detector conditions.

Klefenz, F.; Noffz, K.H.; Conen, W.; Zoz, R.; Kugel, A. (Univ. Mannheim (Germany). Lehrstuhl fuer Informatik V); Maenner, R. (Univ. Mannheim (Germany). Lehrstuhl fuer Informatik V Univ. Heidelberg (Germany). Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen)

1993-08-01

98

Time and Parallel Processor Bounds for Fortran-Like Loops  

Microsoft Academic Search

The main goal of this paper is to show that a large number of processors can be used effectively to speed up simple Fortran-like loops consisting of assignment statements. A practical method is given by which one can check whether or not a statement is dependent upon another. The dependence structure of the whole loop may be of different types.

Utpal Banerjee; Shyh-ching Chen; David J. Kuck; Ross A. Towle

1979-01-01

99

Implementation and Performance of a Binary Lattice Gas Algorithm on Parallel Processor Systems  

NASA Astrophysics Data System (ADS)

We study the performance of a lattice gas binary algorithm on a "real arithmetic" machine, a 32 processor INTEL iPSC hypercube. The implementation is based on so-called multi-spin coding techniques. From the measured performance we extrapolate to larger and more powerful parallel systems. Comparisons are made with "bit" machines, such as the parallel Connection Machine.

Hayot, F.; Mandal, M.; Sadayappan, P.

1989-02-01

100

Effects of error sources on the parallelism of an optical matrix-vector processor  

NASA Technical Reports Server (NTRS)

The error sources in a high accuracy optical matrix-vector processor are analyzed by numerical simulation in terms of their effects on the parallelism and speed of the processor. These effects are detailed for radices -2, -4 and -8. Radix -4 is shown to provide maximum parallel processing capabilities under the effects of the system's error sources. Processing speed is shown to be a function of matrix partitioning and the number of parallel processing channels. Consequently, radix -4 operation provides a higher processing speed than radix -2 and -8 for most matrix-vector multiplications when error source effects are considered.

Perlee, Caroline J.; Casasent, David P.

1990-01-01

101

Using algebra for massively parallel processor design and utilization  

NASA Technical Reports Server (NTRS)

This paper summarizes the author's advances in the design of dense processor networks. Within is reported a collection of recent constructions of dense symmetric networks that provide the largest know values for the number of nodes that can be placed in a network of a given degree and diameter. The constructions are in the range of current potential engineering significance and are based on groups of automorphisms of finite-dimensional vector spaces.

Campbell, Lowell; Fellows, Michael R.

1990-01-01

102

Locally connected processor arrays for matrix multiplication and linear transforms  

Microsoft Academic Search

rg Abstract-Cellular Neural Networks is a multiprocessor com­ puting architecture where the processors are only directly con­ nected to nearby processors. This results in a trade off between the number of connections between processors and the number of steps needed to perform global computation. We consider such a locally connected computing architecture and present some preliminary analysis on this trade

Chai Wah Wu

2011-01-01

103

Parallel processors and nonlinear structural dynamics algorithms and software  

NASA Technical Reports Server (NTRS)

A nonlinear structural dynamics program with an element library that exploits parallel processing is under development. The aim is to exploit scheduling-allocation so that parallel processing and vectorization can effectively be treated in a general purpose program. As a byproduct an automatic scheme for assigning time steps was devised. A rudimentary form of the program is complete and has been tested; it shows substantial advantage can be taken of parallelism. In addition, a stability proof for the subcycling algorithm has been developed.

Belytschko, T.

1986-01-01

104

Simultaneous iterations algorithm for general eigenvalue problems on parallel processors  

NASA Technical Reports Server (NTRS)

The method of simultaneous iteration with shift is extended to extraction of m-eigenpairs of a general eigenvalue problem of large order n in a parallel processing environment. The algorithm combines the power method and the Jacobi technique, and reduces to performing four basic operations. Parallel implementation of the algorithm is discussed in detail. The analysis accounts for computation and communication costs, and utilizes a parallel processing architecture of the ensemble type. Expressions for the computational efficiency and speedup are defined as a function of the problem and hardware parameters. Selected representative problems exhibit efficiencies ranging from 60 to 98 percent.

Utku, S.; Chang, Y.; Salama, M.; Rapp, D.

1986-01-01

105

Evaluation of fault-tolerant parallel-processor architectures over long space missions  

NASA Technical Reports Server (NTRS)

The impact of a five year space mission environment on fault-tolerant parallel processor architectures is examined. The target application is a Strategic Defense Initiative (SDI) satellite requiring 256 parallel processors to provide the computation throughput. The reliability requirements are that the system still be operational after five years with .99 probability and that the probability of system failure during one-half hour of full operation be less than 10(-7). The fault tolerance features an architecture must possess to meet these reliability requirements are presented, many potential architectures are briefly evaluated, and one candidate architecture, the Charles Stark Draper Laboratory's Fault-Tolerant Parallel Processor (FTPP) is evaluated in detail. A methodology for designing a preliminary system configuration to meet the reliability and performance requirements of the mission is then presented and demonstrated by designing an FTPP configuration.

Johnson, Sally C.

1989-01-01

106

A parallel workload model and its implications for processor allocation  

Microsoft Academic Search

We develop a workload model based on the observed behavior of parallel computers at the San Diego Supercomputer Center and\\u000a the Cornell Theory Center. This model gives us insight into the performance of strategies for scheduling moldable jobs on\\u000a space-sharing parallel computers. We find that Adaptive Static Partitioning (ASP), which has been reported to work well for\\u000a other workloads, does

Allen B. Downey

1998-01-01

107

Highly parallel reconfigurable computer architecture for robotic computation having plural processor cells each having right and left ensembles of plural processors  

NASA Technical Reports Server (NTRS)

In a computer having a large number of single-instruction multiple data (SIMD) processors, each of the SIMD processors has two sets of three individual processor elements controlled by a master control unit and interconnected among a plurality of register file units where data is stored. The register files input and output data in synchronism with a minor cycle clock under control of two slave control units controlling the register file units connected to respective ones of the two sets of processor elements. Depending upon which ones of the register file units are enabled to store or transmit data during a particular minor clock cycle, the processor elements within an SIMD processor are connected in rings or in pipeline arrays, and may exchange data with the internal bus or with neighboring SIMD processors through interface units controlled by respective ones of the two slave control units.

Fijany, Amir (inventor); Bejczy, Antal K. (inventor)

1994-01-01

108

Parallel processors and nonlinear structural dynamics algorithms and software  

NASA Technical Reports Server (NTRS)

A nonlinear structural dynamics finite element program was developed to run on a shared memory multiprocessor with pipeline processors. The program, WHAMS, was used as a framework for this work. The program employs explicit time integration and has the capability to handle both the nonlinear material behavior and large displacement response of 3-D structures. The elasto-plastic material model uses an isotropic strain hardening law which is input as a piecewise linear function. Geometric nonlinearities are handled by a corotational formulation in which a coordinate system is embedded at the integration point of each element. Currently, the program has an element library consisting of a beam element based on Euler-Bernoulli theory and trianglar and quadrilateral plate element based on Mindlin theory.

Belytschko, Ted

1989-01-01

109

Embedded processor for array of hydrophone sensors to construct real time images for AUV using FPGA  

Microsoft Academic Search

Implementation of embedded systems-on-chip on modern field programmable gate arrays (FPGAs) chip is doable due to its large density. Architecture of multilevel computing focusing on its embedded processor is suggested in our project. The architecture design of embedded processor presents the challenges and opportunities that stem from the task coarse granularity and the large number of input and output for

Muataz H. Salih; M. R. Arshad

2009-01-01

110

Application Scheduling and Processor Allocation in Multiprogrammed Parallel Processing Systems  

Microsoft Academic Search

When large-scale multiprocessors for parallel processing are subjected to heavy diverse workloads of applications,it will be necessary to schedule them in a multiprogrammed fashion in order to use the system resources effectivelyand keep response times low. Information about the characteristics of individual applications can be used toachieve effective scheduling. We identify some useful parameters for characterizing applications. Prior work onmultiprocessor

Kenneth C. Sevcik

1994-01-01

111

Design of an optical content-addressable parallel processor for expert systems  

NASA Astrophysics Data System (ADS)

The slow execution speed of current rule-based systems (RBS's) has restricted their application areas. To improve the speed of RBS's, researchers have proposed various electronic multiprocessor systems as well as optical systems. However, the electronic systems still suffer in performance from the large amount of required time-consuming pattern-matching and comparison operations at the core of RBS's. And optical systems do not fully exploit the available parallelism in RBS's. We propose an optical content-addressable parallel processor for expert systems. The processor executes the three basic RBS operations, match, select, and act, in a highly parallel fashion. Additionally, it extracts and exploits all possible parallelism in a RBS. Distinctive features of the proposed system include the data (knowledge) and control information to exploit the parallelism of optics in the three RBS units; (2) capability of processing general-domain knowledge expressed in terms of variables, numbers, symbols, and comparison operators such as greater than and less than; (3) the parallel optical match unit, which performs the two-dimensional optical pattern matching and comparison operations; (4) a novel conflict-resolution algorithm to resolve conflicts in a single step within the optical select unit. The three units and the general-knowledge representation scheme are designed to make the optical content-addressable parallel processor for expert systems suitable for any high-speed general-purpose RBS.

Louri, Ahmed; Na, Jongwhoa

1995-08-01

112

Modular high-temperature gas-cooled reactor simulation using parallel processors  

SciTech Connect

The MHPP (Modular HTGR Parallel Processor) code has been developed to simulate modular high-temperature gas-cooled reactor (MHTGR) transients and accidents. MHPP incorporates a very detailed model for predicting the dynamics of the reactor core, vessel, and cooling systems over a wide variety of scenarios ranging from expected transients to very-low-probability severe accidents. The simulation routines, which had originally been developed entirely as serial code, were readily adapted to parallel processing Fortran. The resulting parallelized simulation speed was enhanced significantly. Workstation interfaces are being developed to provide for user (''operator'') interaction. The benefits realized by adapting previous MHTGR codes to run on a parallel processor are discussed, along with results of typical accident analyses. 3 refs., 3 figs.

Ball, S.J.; Conklin, J.C.

1989-01-01

113

A GaAs vector processor based on parallel RISC microprocessors  

NASA Astrophysics Data System (ADS)

A vector processor architecture based on the development of a 32-bit microprocessor using gallium arsenide (GaAs) technology has been developed. The McDonnell Douglas vector processor (MVP) will be fabricated completely from GaAs digital integrated circuits. The MVP architecture includes a vector memory of 1 megabyte, a parallel bus architecture with eight processing elements connected in parallel, and a control processor. The processing elements consist of a reduced instruction set CPU (RISC) with four floating-point coprocessor units and necessary memory interface functions. This architecture has been simulated for several benchmark programs including complex fast Fourier transform (FFT), complex inner product, trigonometric functions, and sort-merge routine. The results of this study indicate that the MVP can process a 1024-point complex FFT at a speed of 112 microsec (389 megaflops) while consuming approximately 618 W of power in a volume of approximately 0.1 ft-cubed.

Misko, Tim A.; Rasset, Terry L.

114

An Asynchronous Distributed Approach for the Simulation of Behavior-Level Models on Parallel Processors  

Microsoft Academic Search

Traditional approaches to the distributed simulation of digital designs are limited in that they are inefficient and prone to deadlock for systems with feedback loops. This paper proposes an asynchronous distributed algorithm to the simulation and verification of behavior-level models and describes its implementation on an actual loosely-coupled parallel processor. The approach is relatively efficient for realistic digital designs and

Sumit Ghosh; Meng-lin Yu

1995-01-01

115

Dynamically Managing the Communication-Parallelism Trade-Off in Clustered Processors.  

National Technical Information Service (NTIS)

In a processor having multiple clusters which operate in parallel, the number of clusters in use can be varied dynamically. At the start of each program phase, the configuration option for an interval is run to determine the optimal configuration, which i...

D. Albonesi R. Balasubramonian S. Dwarkadas

2005-01-01

116

Estimating Water Flow Through a Hillslope Using the Massively Parallel Processor.  

National Technical Information Service (NTIS)

A new two-dimensional model of water flow in a hillslope has been implemented on the Massively Parallel Processor at the Goddard Space Flight Center. Flow in the soil both in the saturated and unsaturated zones, evaporation and overland flow are all model...

J. E. Devaney P. J. Camillo R. J. Gurney

1988-01-01

117

Real-time tracking with a 3D-Flow processor array  

SciTech Connect

The problem of real-time track-finding has been performed to date with CAM (Content Addressable Memories) or with fast coincidence logic, because the processing scheme was thought to have much slower performance. Advances in technology together with a new architectural approach make it feasible to also explore the computing technique for real-time track finding thus giving the advantages of implementing algorithms that can find more parameters such as calculate the sagitta, curvature, pt, etc., with respect to the CAM approach. The report describes real-time track finding using new computing approach technique based on the 3D-Flow array processor system. This system consists of a fixed interconnection architecture scheme, allowing flexible algorithm implementation on a scalable platform. The 3D-Flow parallel processing system for track finding is scalable in size and performance by either increasing the number of processors, or increasing the speed or else the number of pipelined stages. The present article describes the conceptual idea and the design stage of the project.

Crosetto, D.

1993-06-01

118

Preliminary study on the potential usefulness of array processor techniques for structural synthesis  

NASA Technical Reports Server (NTRS)

The effects of the use of array processor techniques within the structural analyzer program, SPAR, are simulated in order to evaluate the potential analysis speedups which may result. In particular the connection of a Floating Point System AP120 processor to the PRIME computer is discussed. Measurements of execution, input/output, and data transfer times are given. Using these data estimates are made as to the relative speedups that can be executed in a more complete implementation on an array processor maxi-mini computer system.

Feeser, L. J.

1980-01-01

119

Parallel processors and nonlinear structural dynamics algorithms and software  

NASA Technical Reports Server (NTRS)

The adaptation of a finite element program with explicit time integration to a massively parallel SIMD (single instruction multiple data) computer, the CONNECTION Machine is described. The adaptation required the development of a new algorithm, called the exchange algorithm, in which all nodal variables are allocated to the element with an exchange of nodal forces at each time step. The architectural and C* programming language features of the CONNECTION Machine are also summarized. Various alternate data structures and associated algorithms for nonlinear finite element analysis are discussed and compared. Results are presented which demonstrate that the CONNECTION Machine is capable of outperforming the CRAY XMP/14.

Belytschko, Ted; Gilbertsen, Noreen D.; Neal, Mark O.; Plaskacz, Edward J.

1989-01-01

120

Dynamic nuclear power plant simulations using single-board peripheral array processors  

SciTech Connect

The application of a host/coprocessor system, consisting of an IBM PC/AT and a Marinco peripheral array processor, to the simulation of nuclear power plant transients is described. The simulation algorithm employs a multistep, multirate integration method. Rapidly varying transients were implemented on the array processor, while the slow components of the system dynamics were handled by the host computer. The generation of nonlinear functions of dependent variables using table lookup techniques was also implemented on the array processor. A simplified model of a pressurized water reactor was employed as a benchmark. Flow rate and power transients using a variety of integration formulas are presented along with data showing the speedup achieved through the use of the peripheral processor.

Yeh, H.C.; Kastenberg, W.E. (California Univ., Los Angeles, Mechanical, Aerospace, and Nuclear Engineering Dept. (US)); Karplus, W.J. (California Univ., Los Angeles, Computer Science Dept. (US))

1989-11-01

121

Deadlock prevention in processor self-scheduling for parallel nested loops  

SciTech Connect

Processor self-scheduling is an effective distributed dynamic scheduling scheme for parallel nested loops in multiprocessor systems. A parallel nested loop structure can be considered as a task system. A task is an interation (or several iterations) of an innermost loop body and the execution order of tasks is determined by a precedence relation. Data dependences among statements are enforced either by the precedence relation or by explicit synchronization. Since self-scheduling is non-preemptive and uses busy-waiting as the basic technique for task synchronization, deadlocks may possibly arise. This paper identifies conditions that allow deadlock-free processor self-scheduling. It uses control tokens and data tokens to model the precedence relation and data dependences. Both control tokens and data tokens, together with processors, are regarded as three kinds of resources needed by tasks. Based on this resource model, we study three possible self-scheduling schemes with different degrees of parallelism, and propose to use appropriate scheduling priority for allocating processors to prevent deadlocks. 12 refs., 7 figs.

Tang, Peiyi; Yew, Pen-Chung; Fang, Zhixi; Zhu, Chuan-Qi

1987-01-01

122

Implementation of FFT and pulse compression routines on the SPT frequency domain array processor  

NASA Astrophysics Data System (ADS)

The Frequency Domain Array Processor (FDAP) is a VME compatible circuit board built by Signal Processing Technologies (SPT). The FDAP can process integer data arrays containing up to 8192 (32 bit) complex words or 1684 (16 bit) real words. It is capable of 400 Million Operations Per Second (MOPS) with a maximum Input/Output (I/O) rate of four billion bits per second. It also has a double buffered memory architecture permitting I/O transfers to occur in parallel with data processing. The FDAP can be hosted by an IBM PC/AT-compatible computer using a bus adaptor interface available from BIT3 Computer Corp. The FDAP board is based upon SPT's DASP/PAC chip set. This chip set and the varios system architectures which can be built around it are reviewed. The FDAP board and its associated development system are also reviewed. The ease of implementation of typical radar signal processing functions on the FDAP board are then examined. Fast Fourier Transform and pulse compression routines are implemented via a supplied user interface as well as a high level language (C). The results are examined and comments on the FDAP and its associated system development tools are made.

Behroozi, V.; Damini, A.

1990-09-01

123

Array distribution in data-parallel programs  

NASA Technical Reports Server (NTRS)

We consider distribution at compile time of the array data in a distributed-memory implementation of a data-parallel program written in a language like Fortran 90. We allow dynamic redistribution of data and define a heuristic algorithmic framework that chooses distribution parameters to minimize an estimate of program completion time. We represent the program as an alignment-distribution graph. We propose a divide-and-conquer algorithm for distribution that initially assigns a common distribution to each node of the graph and successively refines this assignment, taking computation, realignment, and redistribution costs into account. We explain how to estimate the effect of distribution on computation cost and how to choose a candidate set of distributions. We present the results of an implementation of our algorithms on several test problems.

Chatterjee, Siddhartha; Gilbert, John R.; Schreiber, Robert; Sheffler, Thomas J.

1994-01-01

124

CMOS design of focal plane programmable array processors  

Microsoft Academic Search

While digital processors can solve problems in most application areas, in some fields their capabilities are very limited. A typical example is vision. Sim- ple animals outperform super-computers in the realization of basic vision tasks. The limitations of conventional digital systems in this field can be overcome fol- lowing a fundamentally different approach based on architectures closer to nature solutions.

Ángel Rodríguez-vázquez; Servando Espejo-meana; Rafael Domínguez-castro; Ricardo Carmona-galán; Gustavo Liñán

2001-01-01

125

Real-time simulation of MHD\\/steam power plants by digital parallel processors  

Microsoft Academic Search

Attention is given to a large FORTRAN coded program which simulates the dynamic response of the MHD\\/steam plant on either a SEL 32\\/55 or VAX 11\\/780 computer. The code realizes a detailed first-principle model of the plant. Quite recently, in addition to the VAX 11\\/780, an AD-10 has been installed for usage as a real-time simulation facility. The parallel processor

R. M. Johnson; D. A. Rudberg

1981-01-01

126

Improved hierarchical production planning model for the multiproduct parallel-processor environment  

Microsoft Academic Search

A hierarchical production planning model is proposed for the multiproduct parallel-processor environment with sequence-dependent setups that are prevalent in the process industry. Firms in the process industry account for nearly 50% of US manufacturers (Novitsky 1983). The hierarchical production planning model comprises a forecasting front-end module, aggregate, and disaggregate production planning modules, and a low-level scheduling module. A simple forecasting

Leong

1987-01-01

127

Construction of a parallel processor for simulating manipulators and other mechanical systems  

NASA Technical Reports Server (NTRS)

This report summarizes the results of NASA Contract NAS5-30905, awarded under phase 2 of the SBIR Program, for a demonstration of the feasibility of a new high-speed parallel simulation processor, called the Real-Time Accelerator (RTA). The principal goals were met, and EAI is now proceeding with phase 3: development of a commercial product. This product is scheduled for commercial introduction in the second quarter of 1992.

Hannauer, George

1991-01-01

128

Redundant disk arrays: Reliable, parallel secondary storage. Ph.D. Thesis  

NASA Technical Reports Server (NTRS)

During the past decade, advances in processor and memory technology have given rise to increases in computational performance that far outstrip increases in the performance of secondary storage technology. Coupled with emerging small-disk technology, disk arrays provide the cost, volume, and capacity of current disk subsystems, by leveraging parallelism, many times their performance. Unfortunately, arrays of small disks may have much higher failure rates than the single large disks they replace. Redundant arrays of inexpensive disks (RAID) use simple redundancy schemes to provide high data reliability. The data encoding, performance, and reliability of redundant disk arrays are investigated. Organizing redundant data into a disk array is treated as a coding problem. Among alternatives examined, codes as simple as parity are shown to effectively correct single, self-identifying disk failures.

Gibson, Garth Alan

1990-01-01

129

An Integrated Approach to Locality-Conscious Processor Allocation and Scheduling of Mixed-Parallel Applications  

SciTech Connect

Complex parallel applications can often be modeled as directed acyclic graphs of coarse-grained application-tasks with dependences. These applications exhibit both task- and data-parallelism, and combining these two (also called mixedparallelism), has been shown to be an effective model for their execution. In this paper, we present an algorithm to compute the appropriate mix of task- and data-parallelism required to minimize the parallel completion time (makespan) of these applications. In other words, our algorithm determines the set of tasks that should be run concurrently and the number of processors to be allocated to each task. The processor allocation and scheduling decisions are made in an integrated manner and are based on several factors such as the structure of the taskgraph, the runtime estimates and scalability characteristics of the tasks and the inter-task data communication volumes. A locality conscious scheduling strategy is used to improve inter-task data reuse. Evaluation through simulations and actual executions of task graphs derived from real applications as well as synthetic graphs shows that our algorithm consistently generates schedules with lower makespan as compared to CPR and CPA, two previously proposed scheduling algorithms. Our algorithm also produces schedules that have lower makespan than pure taskand data-parallel schedules. For task graphs with known optimal schedules or lower bounds on the makespan, our algorithm generates schedules that are closer to the optima than other scheduling approaches.

Vydyanathan, Naga; Krishnamoorthy, Sriram; Sabin, Gerald M.; Catalyurek, Umit V.; Kurc, Tahsin; Sadayappan, Ponnuswamy; Saltz, Joel H.

2009-08-01

130

Parallel algorithms for arbitrary dimensional Euclidean distance transforms with applications on arrays with reconfigurable optical buses.  

PubMed

In this paper, we present algorithms for computing the Euclidean distance transform (EDT) of a binary image on the array with reconfigurable optical buses (AROB). First, we develop a parallel algorithm termed as Algorithm Expander which can be implemented in O(1) time on an AROB with N x Ndelta processors, where delta = 1/k, k is a constant and a positive integer. Algorithm Expander is designed to compute a higher dimensional EDT based on the computed lower dimensional EDT. It functions as a general EDT expander for us to expand EDT from a lower dimension to a higher dimension. We then develop parallel algorithms for the two-dimensional (2-D)_EDT of a binary image array of size N x N in O(1) time on an AROB with N x N x Ndelta processors and for the three-dimensional (3-D)_EDT of a binary image of size N x N x N in O(1) time on an AROB with N x N x N x Ndelta processors. To the best of our knowledge, all results derived above are the best O(1) time algorithms known. We then extend it to compute the nD_EDT of a binary image of size Nn in O(n) time on an AROB with Nn+delta processors. We also apply our parallel EDT algorithms to build Voronoi diagram and Voronoi polyhetra (polygons), to find all maximal empty spheres and the largest empty sphere, and to compute the medial axis transform. All of these applications can be solved in the same time complexity on an AROB with the same number of processors as needed for solving the EDT problems in the same dimensions. PMID:15369089

Wang, Yuh-Rau; Horng, Shi-Jinn

2004-02-01

131

An optical adaptive processor for null steering in phased array antennas  

NASA Astrophysics Data System (ADS)

This paper reports on the work in progress at Lockheed Electronics Company in the area of acoustooptic processors for adaptive antenna arrays. The work encompasses both theoretical and hardware implementation of such processors. For demonstration of the basic concept, an optical/electronic brassboard has been built and tested. The system, consisting of a 2-element array with one user and one jamming signal, has been shown to achieve almost 25 dB of nulls in less than 3 microseconds for signals in the L-band.

van Saders, J. G.; Syed, V. H.

1987-01-01

132

Application of the MPP (Massively Parallel Processor) to the Interactive Manipulation of Stereo Images of Digital Terrain Models.  

National Technical Information Service (NTIS)

Massively Parallel Processor algorithms were developed for the interactive manipulation of flat shaded digital terrain models defined over grids. The emphasis is on real time manipulation of stereo images. Standard graphics transformations are applied to ...

S. Pol D. McAllister E. Davis

1987-01-01

133

Coherent beam combination of two-dimensional high power fiber amplifier array using stochastic parallel gradient descent algorithm  

Microsoft Academic Search

We demonstrate coherent beam combination of two-dimensional high power fiber amplifier array using stochastic parallel gradient descent (SPGD) algorithm. Four polarization-maintained fiber amplifiers are tiled side by side into a 2×2 laser array with a fill factor of 54% in the near-field. Phase control on the fiber amplifiers are performed by running SPGD algorithm on a digital dignal processor with

Pu Zhou; Zejin Liu; Xiaolin Wang; Yanxing Ma; Haotong Ma; Xiaojun Xu

2009-01-01

134

P-GAS: Parallelizing a Cycle-Accurate Event-Driven Many-Core Processor Simulator Using Parallel Discrete Event Simulation  

Microsoft Academic Search

Multi-core processors are commonly available now, but most traditional computer architectural simulators still use single-thread execution. In this paper we use parallel discrete event simulation (PDES) to speedup a cycle-accurate event-driven many-core processor simulator. Evaluation against the sequential version shows that the parallelized one achieves an average speedup of 10.9x (up to 13.6x) running SPLASH-2 kernel on a 16-core host

Huiwei Lv; Yuan Cheng; Lu Bai; Mingyu Chen; Dongrui Fan; Ninghui Sun

2010-01-01

135

Parallelization of the Ensemble Empirical Model Decomposition (PEEMD) Method on Multi- and Many-core Processors  

NASA Astrophysics Data System (ADS)

Cheung, S.1, B.-W. Shen2, P. Mehrotra1 , J.-L. F. Li3 1 NASA Ames Research Center, 2 UMCP/ESSIC, 3CalTech/JPL The trend in high performance computing systems is towards clusters of multi-core nodes; from an 8 cores/node Intel Xeon Harpertown processor in 2008 to the latest Intel Xeon Ivy Bridge processor with 24 cores/node. In addition hardware vendors are developing many core coprocessors, such as NVIDIA's General Purpose Graphics Processing Unit (GPGPU) and Intel's Xeon Phi, in order to get around the constraints of power and frequency. The hybrid nature of such systems presents a major challenge for software developers, in achieving the desired performance. Applications need to be constructed with multiple levels of parallelization along with hybrid communication regimes in order to exploit the power of such systems. The Ensemble Empirical Model Decomposition (EEMD) method has been applied to signal processing on nonlinear and non-stationary data. Due to the ensemble nature of the algorithm and the geographical decomposition of the problem, we have developed a parallel version of the EEMD method with 4-level parallelization, from the grid decomposition level, to time-series level and to the ensemble level using MPI and OpenMP. The parallel EEMD (PEEMD) is being used to analyze Hurricane Sandy (2012) for better understanding of the multiple scale processes that may have impacted Sandy's movement, intensification and formation. In this presentation, we summarize our experiences with the implementation of the PEEMD focusing on the programmability and usability of different processors and accelerators for multiscale analysis for Hurricane Sandy.

Cheung, S.; Shen, B.; Li, J. F.; Mehrotra, P.

2013-12-01

136

Software development on the High-Speed Systolic Array Processor (HISSAP): Lessons learned. Final report, Mar 88-Mar 91  

SciTech Connect

This report documents the lessons learned in programming the Naval Ocean System Center's (NOSC's) High-Speed Systolic Array Processor (HISSAP) testbed. The procedures used for code generation, along with the programming utilities provided in the software development environment, are discussed with regard to their impact on the efficient implementation of algorithms on a parallel processing system such as HISSAP. This information is intended for considerations pertaining to software-development environments in future Navy parallel processing systems. Many of HISSAP's software-development utilities played key roles in the implementation of two computationally intensive algorithms: the Multiple-Signal Classification algorithm (MUSIC) and a four-channel, narrowband, finite-impulse response (FIR) filter. The introduction of utilities not included with the HISSAP tools would undoubtedly have increased the speed and efficiency of software development.

Tirpak, F.M.

1991-06-01

137

Fast structural design and analysis via hybrid domain decomposition on massively parallel processors  

NASA Technical Reports Server (NTRS)

A hybrid domain decomposition framework for static, transient and eigen finite element analyses of structural mechanics problems is presented. Its basic ingredients include physical substructuring and /or automatic mesh partitioning, mapping algorithms, 'gluing' approximations for fast design modifications and evaluations, and fast direct and preconditioned iterative solvers for local and interface subproblems. The overall methodology is illustrated with the structural design of a solar viewing payload that is scheduled to fly in March 1993. This payload has been entirely designed and validated by a group of undergraduate students at the University of Colorado using the proposed hybrid domain decomposition approach on a massively parallel processor. Performance results are reported on the CRAY Y-MP/8 and the iPSC-860/64 Touchstone systems, which represent both extreme parallel architectures. The hybrid domain decomposition methodology is shown to outperform leading solution algorithms and to exhibit an excellent parallel scalability.

Farhat, Charbel

1993-01-01

138

A unified approach to VLSI layout automation and algorithm mapping on processor arrays  

NASA Technical Reports Server (NTRS)

Development of software tools for designing supercomputing systems is highly complex and cost ineffective. To tackle this a special purpose PAcube silicon compiler which integrates different design levels from cell to processor arrays has been proposed. As a part of this, we present in this paper a novel methodology which unifies the problems of Layout Automation and Algorithm Mapping.

Venkateswaran, N.; Pattabiraman, S.; Srinivasan, Vinoo N.

1993-01-01

139

From Bit Level Systolic Arrays to HDTV Processor Chips  

Microsoft Academic Search

The paper starts presents the work initially carried out by Queen¿s University and RSRE (now Qinetiq) in the development of advanced architectures and microchips based on systolic array architectures. The paper outlines how this has led to the development of highly complex designs for high definition TV and highlights work both on advanced signal processing architectures and tool flows for

John V. Mccanny; Roger F. Woods; John G. Mcwhirter

2006-01-01

140

An Investigation into Reliability, Availability, and Serviceability (RAS) Features for Massively Parallel Processor Systems  

SciTech Connect

A study has been completed into the RAS features necessary for Massively Parallel Processor (MPP) systems. As part of this research, a use case model was built of how RAS features would be employed in an operational MPP system. Use cases are an effective way to specify requirements so that all involved parties can easily understand them. This technique is in contrast to laundry lists of requirements that are subject to misunderstanding as they are without context. As documented in the use case model, the study included a look at incorporating system software and end-user applications, as well as hardware, into the RAS system.

KELLY, SUZANNE M.; OGDEN, JEFFREY BRANDON

2002-10-01

141

Solving linear programs under uncertainty, using decomposition, importance sampling and parallel processors. Progress report  

SciTech Connect

Planning under uncertainty is a fundamental problem of decision science where solution could advance man`s ability to plan, schedule, design, and control complex situations. Goal is to develop efficient methods for solving an important class of planning problems, namely linear programs whose parameters (coefficients, right hand sides) are not known with certainty. The research concentrated on theoretical tasks of decomposition and importance sampling techniques, implementation, and software development issues and on applications. Research is continuing on use of parallel processors for solving stochastic programs.

Dantzig, G.B.; Glynn, P.; Infanger, G.

1994-03-01

142

Block iterative restoration of astronomical images with the massively parallel processor  

NASA Technical Reports Server (NTRS)

A method is described for algebraic image restoration capable of treating astronomical images. For a typical 500 x 500 image, direct algebraic restoration would require the solution of a 250,000 x 250,000 linear system. The block iterative approach is used to reduce the problem to solving 4900 121 x 121 linear systems. The algorithm was implemented on the Goddard Massively Parallel Processor, which can solve a 121 x 121 system in approximately 0.06 seconds. Examples are shown of the results for various astronomical images.

Heap, Sara R.; Lindler, Don J.

1987-01-01

143

Stochastic simulation of charged particle transport on the massively parallel processor  

NASA Technical Reports Server (NTRS)

Computations of cosmic-ray transport based upon finite-difference methods are afflicted by instabilities, inaccuracies, and artifacts. To avoid these problems, researchers developed a Monte Carlo formulation which is closely related not only to the finite-difference formulation, but also to the underlying physics of transport phenomena. Implementations of this approach are currently running on the Massively Parallel Processor at Goddard Space Flight Center, whose enormous computing power overcomes the poor statistical accuracy that usually limits the use of stochastic methods. These simulations have progressed to a stage where they provide a useful and realistic picture of solar energetic particle propagation in interplanetary space.

Earl, James A.

1988-01-01

144

Estimating water flow through a hillslope using the massively parallel processor  

NASA Technical Reports Server (NTRS)

A new two-dimensional model of water flow in a hillslope has been implemented on the Massively Parallel Processor at the Goddard Space Flight Center. Flow in the soil both in the saturated and unsaturated zones, evaporation and overland flow are all modelled, and the rainfall rates are allowed to vary spatially. Previous models of this type had always been very limited computationally. This model takes less than a minute to model all the components of the hillslope water flow for a day. The model can now be used in sensitivity studies to specify which measurements should be taken and how accurate they should be to describe such flows for environmental studies.

Devaney, Judy E.; Camillo, P. J.; Gurney, R. J.

1988-01-01

145

A 1,000 Frames/s Programmable Vision Chip with Variable Resolution and Row-Pixel-Mixed Parallel Image Processors.  

PubMed

A programmable vision chip with variable resolution and row-pixel-mixed parallel image processors is presented. The chip consists of a CMOS sensor array, with row-parallel 6-bit Algorithmic ADCs, row-parallel gray-scale image processors, pixel-parallel SIMD Processing Element (PE) array, and instruction controller. The resolution of the image in the chip is variable: high resolution for a focused area and low resolution for general view. It implements gray-scale and binary mathematical morphology algorithms in series to carry out low-level and mid-level image processing and sends out features of the image for various applications. It can perform image processing at over 1,000 frames/s (fps). A prototype chip with 64 × 64 pixels resolution and 6-bit gray-scale image is fabricated in 0.18 ?m Standard CMOS process. The area size of chip is 1.5 mm × 3.5 mm. Each pixel size is 9.5 ?m × 9.5 ?m and each processing element size is 23 ?m × 29 ?m. The experiment results demonstrate that the chip can perform low-level and mid-level image processing and it can be applied in the real-time vision applications, such as high speed target tracking. PMID:22454565

Lin, Qingyu; Miao, Wei; Zhang, Wancheng; Fu, Qiuyu; Wu, Nanjian

2009-01-01

146

High Linearity Voltage Response Parallel-Array Cell  

NASA Astrophysics Data System (ADS)

We studied in detail a cell consisting of two parallel SQUID arrays or two parallel superconducting interference filters (SQIFs) connected differentially with the goal of achieving highly linear voltage response to magnetic signal. In these different cell designs, we accounted for realistic values of coupling inductances in contrast to limiting case of vanishing inductances considered earlier. We found that a cell based on regular parallel SQUID arrays produces higher linearity as compared to the cell based on SQIFs. This high-linearity cell can be used for realizing Superconducting Quantum Arrays (SQA) capable of providing a broadband, highly-linear magnetic field-to-voltage transfer function and high dynamic range.

Kornev, V.; Kolotinskiy, N.; Skripka, V.; Sharafiev, A.; Soloviev, I.; Mukhanov, O.

2014-05-01

147

Parallel collective resonances in arrays of gold nanorods.  

PubMed

In this work we discuss the excitation of parallel collective resonances in arrays of gold nanoparticles. Parallel collective resonances result from the coupling of the nanoparticles localized surface plasmons with diffraction orders traveling in the direction parallel to the polarization vector. While they provide field enhancement and delocalization as the standard collective resonances, our results suggest that parallel resonances could exhibit greater tolerance to index asymmetry in the environment surrounding the arrays. The near- and far-field properties of these resonances are analyzed, both experimentally and numerically. PMID:24645987

Vitrey, Alan; Aigouy, Lionel; Prieto, Patricia; García-Martín, José Miguel; González, María U

2014-04-01

148

Multi-Processor Molecular Dynamics Using the Brenner Potential:. Parallelization of AN Implicit Multi-Body Potential  

Microsoft Academic Search

We present computational aspects of Molecular Dynamics calculations of thermal properties of diamond using the Brenner potential. Parallelization was essential in order to carry out these calculations on samples of suitable sizes. Our implementation uses MPI on a multi-processor machine such as the IBM SP2. Three aspects of parallelization of the Brenner potential are discussed in depth. These are its

Irina Rosenblum; Joan Adler; Simon Brandon

1999-01-01

149

Animated computer graphics models of space and earth sciences data generated via the massively parallel processor  

NASA Technical Reports Server (NTRS)

The capability was developed of rapidly producing visual representations of large, complex, multi-dimensional space and earth sciences data sets via the implementation of computer graphics modeling techniques on the Massively Parallel Processor (MPP) by employing techniques recently developed for typically non-scientific applications. Such capabilities can provide a new and valuable tool for the understanding of complex scientific data, and a new application of parallel computing via the MPP. A prototype system with such capabilities was developed and integrated into the National Space Science Data Center's (NSSDC) Pilot Climate Data System (PCDS) data-independent environment for computer graphics data display to provide easy access to users. While developing these capabilities, several problems had to be solved independently of the actual use of the MPP, all of which are outlined.

Treinish, Lloyd A.; Gough, Michael L.; Wildenhain, W. David

1987-01-01

150

Hierarchically tiled arrays for parallelism and locality  

Microsoft Academic Search

Parallel programming is facilitated by constructs which, unlike the widely used SPMD paradigm, provide programmers with a global view of the code and data structures. These constructs could be compiler directives containing information about data and task distribution, language extensions specifically designed for paral- lel computation, or classes that encapsulate parallelism. In this paper, we describe a class developed at

Jia Guo; Ganesh Bikshandi; Daniel Hoeflinger; Gheorghe Almási; Basilio B. Fraguela; María Jesús Garzarán; David A. Padua; Christoph Von Praun

2006-01-01

151

Evaluation of soft-core processors on a Xilinx Virtex-5 field programmable gate array.  

SciTech Connect

Node-based architecture (NBA) designs for future satellite projects hold the promise of decreasing system development time and costs, size, weight, and power and positioning the laboratory to address other emerging mission opportunities quickly. Reconfigurable field programmable gate array (FPGA)-based modules will comprise the core of several of the NBA nodes. Microprocessing capabilities will be necessary with varying degrees of mission-specific performance requirements on these nodes. To enable the flexibility of these reconfigurable nodes, it is advantageous to incorporate the microprocessor into the FPGA itself, either as a hard-core processor built into the FPGA or as a soft-core processor built out of FPGA elements. This document describes the evaluation of three reconfigurable FPGA-based soft-core processors for use in future NBA systems: the MicroBlaze (uB), the open-source Leon3, and the licensed Leon3. Two standard performance benchmark applications were developed for each processor. The first, Dhrystone, is a fixed-point operation metric. The second, Whetstone, is a floating-point operation metric. Several trials were run at varying code locations, loop counts, processor speeds, and cache configurations. FPGA resource utilization was recorded for each configuration.

Learn, Mark Walter

2011-04-01

152

Analysis of array errors and a short-time processor in airborne phased array radars  

Microsoft Academic Search

Array errors are inherent in a realistic phased array radar system. The influence of array errors on the clutter degrees of freedom and the clutter subspace in an airborne phased array radar is analyzed. Based on the presented theoretic results, a method of short-time processing followed by coherent integration is proposed for clutter suppression in airborne phased array radars. It

Qing-Guang Liu; Ying-Ning Peng

1996-01-01

153

Performance Evaluation and Modeling Techniques for Parallel Processors. Ph.D. Thesis  

NASA Technical Reports Server (NTRS)

In practice, the performance evaluation of supercomputers is still substantially driven by singlepoint estimates of metrics (e.g., MFLOPS) obtained by running characteristic benchmarks or workloads. With the rapid increase in the use of time-shared multiprogramming in these systems, such measurements are clearly inadequate. This is because multiprogramming and system overhead, as well as other degradations in performance due to time varying characteristics of workloads, are not taken into account. In multiprogrammed environments, multiple jobs and users can dramatically increase the amount of system overhead and degrade the performance of the machine. Performance techniques, such as benchmarking, which characterize performance on a dedicated machine ignore this major component of true computer performance. Due to the complexity of analysis, there has been little work done in analyzing, modeling, and predicting the performance of applications in multiprogrammed environments. This is especially true for parallel processors, where the costs and benefits of multi-user workloads are exacerbated. While some may claim that the issue of multiprogramming is not a viable one in the supercomputer market, experience shows otherwise. Even in recent massively parallel machines, multiprogramming is a key component. It has even been claimed that a partial cause of the demise of the CM2 was the fact that it did not efficiently support time-sharing. In the same paper, Gordon Bell postulates that, multicomputers will evolve to multiprocessors in order to support efficient multiprogramming. Therefore, it is clear that parallel processors of the future will be required to offer the user a time-shared environment with reasonable response times for the applications. In this type of environment, the most important performance metric is the completion of response time of a given application. However, there are a few evaluation efforts addressing this issue.

Dimpsey, Robert Tod

1992-01-01

154

Computing effective properties of random heterogeneous materials on heterogeneous parallel processors  

NASA Astrophysics Data System (ADS)

In recent decades, finite element (FE) techniques have been extensively used for predicting effective properties of random heterogeneous materials. In the case of very complex microstructures, the choice of numerical methods for the solution of this problem can offer some advantages over classical analytical approaches, and it allows the use of digital images obtained from real material samples (e.g., using computed tomography). On the other hand, having a large number of elements is often necessary for properly describing complex microstructures, ultimately leading to extremely time-consuming computations and high memory requirements. With the final objective of reducing these limitations, we improved an existing freely available FE code for the computation of effective conductivity (electrical and thermal) of microstructure digital models. To allow execution on hardware combining multi-core CPUs and a GPU, we first translated the original algorithm from Fortran to C, and we subdivided it into software components. Then, we enhanced the C version of the algorithm for parallel processing with heterogeneous processors. With the goal of maximizing the obtained performances and limiting resource consumption, we utilized a software architecture based on stream processing, event-driven scheduling, and dynamic load balancing. The parallel processing version of the algorithm has been validated using a simple microstructure consisting of a single sphere located at the centre of a cubic box, yielding consistent results. Finally, the code was used for the calculation of the effective thermal conductivity of a digital model of a real sample (a ceramic foam obtained using X-ray computed tomography). On a computer equipped with dual hexa-core Intel Xeon X5670 processors and an NVIDIA Tesla C2050, the parallel application version features near to linear speed-up progression when using only the CPU cores. It executes more than 20 times faster when additionally using the GPU.

Leidi, Tiziano; Scocchi, Giulio; Grossi, Loris; Pusterla, Simone; D'Angelo, Claudio; Thiran, Jean-Philippe; Ortona, Alberto

2012-11-01

155

The application of small peripheral array processors to the modeling of distributed parameter systems  

SciTech Connect

In the modeling of distributed parameter systems, it is sometimes necessary to utilize system observations to identify or estimate the boundary conditions of the field. For example, in dealing with water resources systems, it may be desired to identify and characterize sources of water pollution on the basis of measurements of pollution concentrations downstream from suspected sources. A computational technique based on pattern recognition has been found to be relatively effective in this application. However, the algorithms involved are highly computation-intensive and therefore tend to place a considerable burden on most conventional digital computer facilities. Array processors, though primarily designed for signal processing are playing an increasingly important role in the modeling and simulation of physical systems. It is a purpose of this paper to describe the application of a relatively small and inexpensive peripheral array processor to this problem and to evaluate its performance.

Karplus, W.J.; Shibata, Y.

1986-06-01

156

On-board landmark navigation and attitude reference parallel processor system  

NASA Technical Reports Server (NTRS)

An approach to autonomous navigation and attitude reference for earth observing spacecraft is described along with the landmark identification technique based on a sequential similarity detection algorithm (SSDA). Laboratory experiments undertaken to determine if better than one pixel accuracy in registration can be achieved consistent with onboard processor timing and capacity constraints are included. The SSDA is implemented using a multi-microprocessor system including synchronization logic and chip library. The data is processed in parallel stages, effectively reducing the time to match the small known image within a larger image as seen by the onboard image system. Shared memory is incorporated in the system to help communicate intermediate results among microprocessors. The functions include finding mean values and summation of absolute differences over the image search area. The hardware is a low power, compact unit suitable to onboard application with the flexibility to provide for different parameters depending upon the environment.

Gilbert, L. E.; Mahajan, D. T.

1978-01-01

157

Precompensation for mutual coupling between array elements in parallel excitation  

PubMed Central

Parallel transmission or excitation has been suggested to perform multi-dimensional spatial selective excitation to shorten the pulse width using a coil array and the sensitivity information. The mutual coupling between array elements has been a critical technical issue in RF array designs, which can cause artifacts on the excitation profile, leading to degraded excitation performance and image quality. In this work, a precompensation method is proposed to address the mutual coupling effect in parallel transmission by introducing the mutual coupling coefficient matrix into the RF pulses design procedure of the parallel transmission. 90° RF pulses have been designed using both the original transmit SENSE method and the proposed precompensation method for RF arrays with non-negligible mutual coupling, and their excitation profiles are generated by simulating the Bloch equation. The results show that the mutual coupling effect can be effectively compensated by using the proposed method, yielding enhanced tolerance to insufficient mutual decoupling of RF arrays in parallel excitation, ultimately, providing improved performance and accuracy of parallel excitation.

Pang, Yong; Zhang, Xiaoliang

2011-01-01

158

Optoelectronic implementation of a 256-channel sonar adaptive-array processor.  

PubMed

We present an optoelectronic implementation of an adaptive-array processor that is capable of performing beam forming and jammer nulling in signals of wide fractional bandwidth that are detected by an array of arbitrary topology. The optical system makes use of a two-dimensional scrolling spatial light modulator to represent an array of input signals in 256 tapped delay lines, two acousto-optic modulators for modulating the feedback error signal, and a photorefractive crystal for representing the adaptive weights as holographic gratings. Gradient-descent learning is used to dynamically adapt the holographic weights to optimally form multiple beams and to null out multiple interference sources, either in the near field or in the far field. Space-integration followed by differential heterodyne detection is used for generating the system's output. The processor is analyzed to show the effects of exponential weight decay on the optimum solution and on the convergence conditions. Several experimental results are presented that validate the system's capacity for broadband beam forming and jammer nulling for linear and circular arrays. PMID:15617279

Silveira, Paulo E X; Pati, Gour S; Wagner, Kelvin H

2004-12-10

159

Parallel arrays of Josephson junctions for submillimeter local oscillators  

NASA Technical Reports Server (NTRS)

In this paper we discuss the influence of the DC biasing circuit on operation of parallel biased quasioptical Josephson junction oscillator arrays. Because of nonuniform distribution of the DC biasing current along the length of the bias lines, there is a nonuniform distribution of magnetic flux in superconducting loops connecting every two junctions of the array. These DC self-field effects determine the state of the array. We present analysis and time-domain numerical simulations of these states for four biasing configurations. We find conditions for the in-phase states with maximum power output. We compare arrays with small and large inductances and determine the low inductance limit for nearly-in-phase array operation. We show how arrays can be steered in H-plane using the externally applied DC magnetic field.

Pance, Aleksandar; Wengler, Michael J.

1992-01-01

160

Comparison between electrical and free space optical interconnects for fine grain processor arrays based on interconnect density capabilities.  

PubMed

Optically interconnected processor arrays are compared to conventional fully electronic processor arrays in terms of interconnect density capabilities. A complexity model is introduced that allows the calculation of the array area growth rate as an asymptotic function of the number of processing elements in the array Lower bounds on the area growth rate of electrically interconnected processor arrays are compared to upper bounds for free-space optically interconnected circuits that employ computer generated holograms. Results indicate that for connection networks such as the hypercube, perfect shuffle and crossbar networks, that have a high minimum bisection width (a measure of the global nature of an interconnect topology) and contain some degree of spatial invariance, optically interconnected circuit area growth rates are below lower bounds on VLSI circuit growth rates. PMID:20555784

Feldman, M R; Guest, C C; Drabik, T J; Esener, S C

1989-09-15

161

Thread-parallel MPEG2, MPEG4 and H.264 video encoders for SoC multi-processor architectures  

Microsoft Academic Search

This study utilizes thread-level parallel techniques to significantly reduce the dynamic instruction count performance metric of the MPEG-2, MPEG-4 and H.264 video encoders. Such solutions are particularly applicable in portable devices as workload distribution among a number of parallel-executing processors decreases the individual processing requirements and allows for the real time video encoding. Due to the use of multiple processing

Tom R. Jacobs; Vassilios A. Chouliaras; David J. Mulvaney

2006-01-01

162

Experimental Study of Six Different Implementations of Parallel Matrix Multiplication on Heterogeneous Computational Clusters of Multicore Processors  

Microsoft Academic Search

Two strategies of distribution of computations can be used to implement parallel solvers for dense linear algebra prob- lems for Heterogeneous Computational Clusters of Multicore Processors (HCoMs). These strategies are called Heterogeneous Process Distribution Strategy (HPS) and Heterogeneous Data Distribution Strategy (HDS). They are not novel and have been researched thoroughly. However, the advent of multicores neces- sitates enhancements to

Pedro Alonso-Jordá; Ravi Reddy Manumachu; Alexey L. Lastovetsky

2010-01-01

163

Experimental Study of Six Different Parallel Matrix-Matrix Multiplication Applications for Heterogeneous Computational Clusters of Multicore Processors  

Microsoft Academic Search

In this document, we describe two strategies of distribution of computations that can be used to implement parallel solvers for dense linear algebra problems for Heterogeneous Computational Clusters of Multicore Processors (HCoMs). These strategies are called Heterogeneous Process Distribution Strategy (HPS) and Heterogeneous Data Distribution Strategy (HDS). They are not novel and have already been researched thoroughly. However, the advent

Pedro Alonso; Ravi Reddy; Alexey Lastovetsky

2009-01-01

164

Technology Development and Circuit Design for a Parallel Laser Programmable Floating Point Application Specific Processor.  

National Technical Information Service (NTIS)

The laser programmable floating point application specific processor (LPASP) is a new approach at rapid development of custom VLSI chips. The LPASP is a generic application specific processor that can be programmed to perform a specific function. The effo...

M. W. Scriber

1989-01-01

165

Parallel Data Cube Construction Based on an Extendible Multidimensional Array  

Microsoft Academic Search

The pre-computation of data cubes is critical for improving the response time of OLAP(On-Line Analytical Processing) systems. In order to meet the need for improved performance created by growing data sizes, parallel solutions for data cube construction are becoming increasingly important. This paper presents two parallel methods for data cube construction based on an extendible multidimensional array, which is dynamically

Dong Jin; Tatsuo Tsuji

2011-01-01

166

An Analog Processor for Image Compression  

NASA Technical Reports Server (NTRS)

This paper describes a novel analog Vector Array Processor (VAP) that was designed for use in real-time and ultra-low power image compression applications. This custom CMOS processor is based architectually on the Vector Quantization (VQ) algorithm in image coding, and the hardware implementation fully exploits the inherent parallelism built-in the VQ algorithm.

Tawel, R.

1992-01-01

167

A Generic Network Interface Architecture for a Networked Processor Array (NePA)  

Microsoft Academic Search

Recently Network-on-Chip (NoC) technique has been proposed as a promising solution for on-chip interconnection network. However,\\u000a different interface specification of integrated components raises a considerable difficulty for adopting NoC techniques. In\\u000a this paper, we present a generic architecture for network interface (NI) and associated wrappers for a networked processor\\u000a array (NoC based multiprocessor SoC) in order to allow systematic design

Seung Eun Lee; Jun Ho Bahn; Yoon Seok Yang; Nader Bagherzadeh

2008-01-01

168

Integration Architecture of Content Addressable Memory and Massive-Parallel Memory-Embedded SIMD Matrix for Versatile Multimedia Processor  

NASA Astrophysics Data System (ADS)

This paper presents an integration architecture of content addressable memory (CAM) and a massive-parallel memory-embedded SIMD matrix for constructing a versatile multimedia processor. The massive-parallel memory-embedded SIMD matrix has 2,048 2-bit processing elements, which are connected by a flexible switching network, and supports 2-bit 2,048-way bit-serial and word-parallel operations with a single command. The SIMD matrix architecture is verified to be a better way for processing the repeated arithmetic operation types in multimedia applications. The proposed architecture, reported in this paper, exploits in addition CAM technology and enables therefore fast pipelined table-lookup coding operations. Since both arithmetic and table-lookup operations execute extremely fast, the proposed novel architecture can realize consequently efficient and versatile multimedia data processing. Evaluation results of the proposed CAM-enhanced massive-parallel SIMD matrix processor for the example of the frequently used JPEG image-compression application show that the necessary clock cycle number can be reduced by 86% in comparison to a conventional mobile DSP architecture. The determined performances in Mpixel/mm2 are factors 3.3 and 4.4 better than with a CAM-less massive-parallel memory-embedded SIMD matrix processor and a conventional mobile DSP, respectively.

Kumaki, Takeshi; Ishizaki, Masakatsu; Koide, Tetsushi; Mattausch, Hans Jürgen; Kuroda, Yasuto; Gyohten, Takayuki; Noda, Hideyuki; Dosaka, Katsumi; Arimoto, Kazutami; Saito, Kazunori

169

Recursive array layouts and fast parallel matrix multiplication  

Microsoft Academic Search

Matrix multiplication is an important kernel in linear alge bra al- gorithms, and the performance of both serial and parallel im ple- mentations is highly dependent on the memory system behavior. Unfortunately, due to false sharing and cache conflicts, tra ditional column-major or row-major array layouts incur high variability in memory system performance as matrix size varies. This paper in-

Siddhartha Chatterjee; Alvin R. Lebeckt; Praveen K. Patnala; Mithuna Thottethodi

1999-01-01

170

Parallel Data Mining using the Array Package for Java  

Microsoft Academic Search

This paper discusses several techniques used in developing a parallel, production quality data mining application in Java. Three sequential versions of the data mining applic ation were developed: A sequential Fortran 90 version used as a performance reference, a plain Java implementation that only uses the primitive array structures from the language, and a baseline Java implementation that uses an

J. E. Moreira; S. P. Midkiff; M. Gupta; R. Lawrence

1998-01-01

171

Feasibility study for the implementation of NASTRAN on the ILLIAC 4 parallel processor  

NASA Technical Reports Server (NTRS)

The ILLIAC IV, a fourth generation multiprocessor using parallel processing hardware concepts, is operational at Moffett Field, California. Its capability to excel at matrix manipulation, makes the ILLIAC well suited for performing structural analyses using the finite element displacement method. The feasibility of modifying the NASTRAN (NASA structural analysis) computer program to make effective use of the ILLIAC IV was investigated. The characteristics are summarized of the ILLIAC and the ARPANET, a telecommunications network which spans the continent making the ILLIAC accessible to nearly all major industrial centers in the United States. Two distinct approaches are studied: retaining NASTRAN as it now operates on many of the host computers of the ARPANET to process the input and output while using the ILLIAC only for the major computational tasks, and installing NASTRAN to operate entirely in the ILLIAC environment. Though both alternatives offer similar and significant increases in computational speed over modern third generation processors, the full installation of NASTRAN on the ILLIAC is recommended. Specifications are presented for performing that task with manpower estimates and schedules to correspond.

Field, E. I.

1975-01-01

172

Mobile and replicated alignment of arrays in data-parallel programs  

NASA Technical Reports Server (NTRS)

When a data-parallel language like FORTRAN 90 is compiled for a distributed-memory machine, aggregate data objects (such as arrays) are distributed across the processor memories. The mapping determines the amount of residual communication needed to bring operands of parallel operations into alignment with each other. A common approach is to break the mapping into two stages: first, an alignment that maps all the objects to an abstract template, and then a distribution that maps the template to the processors. We solve two facets of the problem of finding alignments that reduce residual communication: we determine alignments that vary in loops, and objects that should have replicated alignments. We show that loop-dependent mobile alignment is sometimes necessary for optimum performance, and we provide algorithms with which a compiler can determine good mobile alignments for objects within do loops. We also identify situations in which replicated alignment is either required by the program itself (via spread operations) or can be used to improve performance. We propose an algorithm based on network flow that determines which objects to replicate so as to minimize the total amount of broadcast communication in replication. This work on mobile and replicated alignment extends our earlier work on determining static alignment.

Chatterjee, Siddhartha; Gilbert, John R.; Schreiber, Robert

1993-01-01

173

A Parallel and Concurrent Implementation of Lin-Kernighan Heuristic (LKH-2) for Solving Traveling Salesman Problem for Multi-Core Processors using SPC3 Programming Model  

Microsoft Academic Search

With the arrival of multi-cores, every processor has now built-in parallel computational power and that can be fully utilized only if the program in execution is written accordingly. This study is a part of an on-going research for designing of a new parallel programming model for multi-core processors. In this paper we have presented a combined parallel and concurrent implementation

2011-01-01

174

A Parallel and Concurrent Implementation of Lin Kernighan Heuristic (LKH-2) for Solving Traveling Salesman Problem for Multi-Core Processors using SPC 3 Programming Model  

Microsoft Academic Search

With the arrival of multi-cores, every processor has now built-in parallel computational power and that can be fully utilized only if the program in execution is written accordingly. This study is a part of an on-going research for designing of a new parallel programming model for multi-core processors. In this paper we have presented a combined parallel and concurrent implementation

Muhammad Ali Ismail; Shahid H. Mirza; Talat Altaf

2011-01-01

175

Parallel nanoimaging and nanolithography using a heated microcantilever array  

NASA Astrophysics Data System (ADS)

We report parallel topographic imaging and nanolithography using heated microcantilever arrays integrated into a commercial atomic force microscope (AFM). The array has five AFM cantilevers, each of which has an internal resistive heater. The temperatures of the cantilever heaters can be monitored and controlled independently and in parallel. We perform parallel AFM imaging of a region of size 550 ?m × 90 ?m, where the cantilever heat flow signals provide a measure of the nanometer-scale substrate topography. At a cantilever scan speed of 1134 ?m s-1, we acquire a 3.1 million-pixel image in 62 s with noise-limited vertical resolution of 0.6 nm and pixels of size 351 nm × 45 nm. At a scan speed of 4030 ?m s-1 we acquire a 26.4 million pixel image in 124 s with vertical resolution of 5.4 nm and pixels of size 44 nm × 43 nm. Finally, we demonstrate parallel nanolithography with the cantilever array, including iterations of measure-write-measure nanofabrication, with each cantilever operating independently.

Somnath, Suhas; Kim, Hoe Joon; Hu, Huan; King, William P.

2014-01-01

176

Parallel nanoimaging and nanolithography using a heated microcantilever array.  

PubMed

We report parallel topographic imaging and nanolithography using heated microcantilever arrays integrated into a commercial atomic force microscope (AFM). The array has five AFM cantilevers, each of which has an internal resistive heater. The temperatures of the cantilever heaters can be monitored and controlled independently and in parallel. We perform parallel AFM imaging of a region of size 550 ?m × 90 ?m, where the cantilever heat flow signals provide a measure of the nanometer-scale substrate topography. At a cantilever scan speed of 1134 ?m s(-1), we acquire a 3.1 million-pixel image in 62 s with noise-limited vertical resolution of 0.6 nm and pixels of size 351 nm × 45 nm. At a scan speed of 4030 ?m s(-1) we acquire a 26.4 million pixel image in 124 s with vertical resolution of 5.4 nm and pixels of size 44 nm × 43 nm. Finally, we demonstrate parallel nanolithography with the cantilever array, including iterations of measure-write-measure nanofabrication, with each cantilever operating independently. PMID:24334342

Somnath, Suhas; Kim, Hoe Joon; Hu, Huan; King, William P

2014-01-10

177

Parallel computation of optimized arrays for 2-D electrical imaging surveys  

NASA Astrophysics Data System (ADS)

Modern automatic multi-electrode survey instruments have made it possible to use non-traditional arrays to maximize the subsurface resolution from electrical imaging surveys. Previous studies have shown that one of the best methods for generating optimized arrays is to select the set of array configurations that maximizes the model resolution for a homogeneous earth model. The Sherman-Morrison Rank-1 update is used to calculate the change in the model resolution when a new array is added to a selected set of array configurations. This method had the disadvantage that it required several hours of computer time even for short 2-D survey lines. The algorithm was modified to calculate the change in the model resolution rather than the entire resolution matrix. This reduces the computer time and memory required as well as the computational round-off errors. The matrix-vector multiplications for a single add-on array were replaced with matrix-matrix multiplications for 28 add-on arrays to further reduce the computer time. The temporary variables were stored in the double-precision Single Instruction Multiple Data (SIMD) registers within the CPU to minimize computer memory access. A further reduction in the computer time is achieved by using the computer graphics card Graphics Processor Unit (GPU) as a highly parallel mathematical coprocessor. This makes it possible to carry out the calculations for 512 add-on arrays in parallel using the GPU. The changes reduce the computer time by more than two orders of magnitude. The algorithm used to generate an optimized data set adds a specified number of new array configurations after each iteration to the existing set. The resolution of the optimized data set can be increased by adding a smaller number of new array configurations after each iteration. Although this increases the computer time required to generate an optimized data set with the same number of data points, the new fast numerical routines has made this practical on commonly available microcomputers.

Loke, M. H.; Wilkinson, P. B.; Chambers, J. E.

2010-12-01

178

XETAL-II: A 107 GOPS, 600mW Massively-Parallel Processor for Video Scene Analysis  

Microsoft Academic Search

Xetal-II is a SIMD processor with 320 processing elements delivering a peak performance of 107 GOPS on 16b data while dissipating 600mW. A 10Mb on-chip memory can store up to 4 VGA frames allowing efficient implementation of frame-iterative algorithms. A massively parallel interconnect provides an internal bandwidth of more than 1.3Tb\\/s to sustain the peak-performance. The 74mm2 IC is fabricated

A. Abbo; R. Kleihorst; V. Choudhary; L. Sevat; P. Wielage; S. Mouy; M. Heijligers

2007-01-01

179

Xetal-II: A 107 GOPS, 600 mW Massively Parallel Processor for Video Scene Analysis  

Microsoft Academic Search

Xetal-II is a single-instruction multiple-data (SIMD) processor with 320 processing elements. It delivers a peak performance of 107 GOPS on 16-bit data while dissipating 600 mW. A 10 Mbit on-chip memory is provided which can store up to four VGA frames, allowing efficient implementation of frame-iterative algorithms. A massively parallel interconnect provides an internal bandwidth of more than 1.3 Tbit\\/s

Anteneh A. Abbo; Richard P. Kleihorst; Vishal Choudhary; Leo Sevat; Paul Wielage; Sebastien Mouy; Bart Vermeulen; Marc Heijligers

2008-01-01

180

Computers: Massively parallel processors. (Latest citations from INSPEC the database for Physics, Electronics, and Computing). Published Search  

SciTech Connect

The bibliography contains citations concerning a concept in computers called Massively Parallel Processing. The processing power of a computer may be increased by using numerous processors in parallel and feeding data through a number of different computational paths at the same time. The citations explore these computers and their practical uses, and include case studies, specific problems solved, theory, and future possibilities and needs. Applications of neural network modeling, pattern recognition, image processing, local area routing, and genetic sequence comparison are discussed. (Contains 250 citations and includes a subject term index and title list.)

Not Available

1993-10-01

181

General linear codes for fault-tolerant matrix operations on processor arrays  

NASA Technical Reports Server (NTRS)

Various checksum codes have been suggested for fault-tolerant matrix computations on processor arrays. Use of these codes is limited due to potential roundoff and overflow errors. Numerical errors may also be misconstrued as errors due to physical faults in the system. In this a set of linear codes is identified which can be used for fault-tolerant matrix operations such as matrix addition, multiplication, transposition, and LU-decomposition, with minimum numerical error. Encoding schemes are given for some of the example codes which fall under the general set of codes. With the help of experiments, a rule of thumb for the selection of a particular code for a given application is derived.

Nair, V. S. S.; Abraham, J. A.

1988-01-01

182

Optical fiber interconnection for the scalable parallel computing system  

Microsoft Academic Search

In this paper, we discuss the optical fiber interconnection technologies applied in the two types of parallel processing systems: 1) a backplane interconnection in a parallel processor array system and 2) a computing cluster network. We have set up a parallel processor array system using optical fiber to make point-to-point interconnection between processor elements and are developing a low-cost virtual

Ge Zhou; Yimo Zhang; Wei Liu

2000-01-01

183

Investigations on the Usefulness of the Massively Parallel Processor for Study of Electronic Properties of Atomic and Condensed Matter Systems. Final Report, August 15, 1986-August 14, 1987.  

National Technical Information Service (NTIS)

The usefulness of the Massively Parallel Processor (MPP) for investigation of electronic structures and hyperfine properties of atomic and condensed matter systems was explored. The major effort was directed towards the preparation of algorithms for paral...

T. P. Das

1988-01-01

184

Mechanically verified hardware implementing an 8-bit parallel IO Byzantine agreement processor  

NASA Technical Reports Server (NTRS)

Consider a network of four processors that use the Oral Messages (Byzantine Generals) Algorithm of Pease, Shostak, and Lamport to achieve agreement in the presence of faults. Bevier and Young have published a functional description of a single processor that, when interconnected appropriately with three identical others, implements this network under the assumption that the four processors step in synchrony. By formalizing the original Pease, et al work, Bevier and Young mechanically proved that such a network achieves fault tolerance. We develop, formalize, and discuss a hardware design that has been mechanically proven to implement their processor. In particular, we formally define mapping functions from the abstract state space of the Bevier-Young processor to a concrete state space of a hardware module and state a theorem that expresses the claim that the hardware correctly implements the processor. We briefly discuss the Brock-Hunt Formal Hardware Description Language which permits designs both to be proved correct with the Boyer-Moore theorem prover and to be expressed in a commercially supported hardware description language for additional electrical analysis and layout. We briefly describe our implementation.

Moore, J. Strother

1992-01-01

185

Stream Processors  

Microsoft Academic Search

Stream processors, like other multi core architectures partition their functional units and storage into multiple processing\\u000a elements. In contrast to typical architectures, which contain symmetric general-purpose cores and a cache hierarchy, stream\\u000a processors have a significantly leaner design. Stream processors are specifically designed for the stream execution model,\\u000a in which applications have large amounts of explicit parallel computation, structured and

Mattan Erez; William J. Dally

2009-01-01

186

Novel broadband reconfigurable optical add-drop multiplexer employing custom fiber arrays and Opto-VLSI processors  

Microsoft Academic Search

A reconfigurable optical add\\/drop multiplexer (ROADM) structure based on using a custom-made fiber array and an Opto-VLSI processor is proposed and demonstrated. The fiber array consists of N pairs of angled fibers corresponding to N channels, each of which can independently perform add, drop, and thru functions through a reconfigurable Opto-VLSI beam steerer. Experimental results show that the ROADM structure

Feng Xiao; Budi Juswardy; Kamal Alameh; Yong Tak Lee

2008-01-01

187

ALICE a multi-processor reduction machine for the parallel evaluation CF applicative languages  

Microsoft Academic Search

The functional or applicative languages have long been regarded as suitable vehicles for overcoming many of the problems involved in the production and maintenance of correct and reliable software. However, their inherent inefficiences when run on conventional von Neumann style machines have prevented their widespread acceptance. With the declining cost of hardware and the increasing feasibility of multi-processor architectures this

John Darlington; Mike Reeve

1981-01-01

188

A precision chirp scaling SAR processor extension to sub-aperture implementation on massively parallel supercomputers  

Microsoft Academic Search

A new concept in SAR raw data focusing algorithms is discussed. The so called “chirp scaling” (CS) technique has allowed the complete elimination of the interpolation step required in conventional ?-K wave domain processing algorithms. This drives to high performance the implementation of aberrationless 2D processors, both for SAR focusing and for analogue problems (seismic wave migration, tomography, etc.). Furthermore

Fabr izio Impagnatiello

1995-01-01

189

On Parallelization of High-Speed Processors for Elliptic Curve Cryptography  

Microsoft Academic Search

This paper discusses parallelization of elliptic curve cryptography hardware accelerators using elliptic curves over binary fields F2m. Elliptic curve point multiplication, which is the operation used in every elliptic curve cryptosystem, is hierarchical in nature, and parallelism can be utilized in different hierarchy levels as shown in many publications. However, a comprehensive analysis on the effects of parallelization has not

Kimmo U. Järvinen; Jorma Skyttä

2008-01-01

190

RADCAP: an operational parallel processing facility  

Microsoft Academic Search

An overview is presented of RADCAP, the operational associative array processor (AP) facility installed at Rome Air Development Center (RADC). Basically, this facility consists of a Goodyear Aerospace STARAN associative array (parallel) processor and various peripheral devices, all interfaced with a Honeywell Information Systems (HIS) 645 sequential computer, which runs under the Multics timeshared operating system. The RADCAP hardware and

James D. Feldman; Louis C. Fulmer

1974-01-01

191

A digital magnetic resonance imaging spectrometer using digital signal processor and field programmable gate array.  

PubMed

A digital spectrometer for low-field magnetic resonance imaging is described. A digital signal processor (DSP) is utilized as the pulse programmer on which a pulse sequence is executed as a subroutine. Field programmable gate array (FPGA) devices that are logically mapped into the external addressing space of the DSP work as auxiliary controllers of gradient control, radio frequency (rf) generation, and rf receiving separately. The pulse programmer triggers an event by setting the 32-bit control register of the corresponding FPGA, and then the FPGA automatically carries out the event function according to preset configurations in cooperation with other devices; accordingly, event control of the spectrometer is flexible and efficient. Digital techniques are in widespread use: gradient control is implemented in real-time by a FPGA; rf source is constructed using direct digital synthesis technique, and rf receiver is constructed using digital quadrature detection technique. Well-designed performance is achieved, including 1 ?s time resolution of the gradient waveform, 1 ?s time resolution of the soft pulse, and 2 MHz signal receiving bandwidth. Both rf synthesis and rf digitalization operate at the same 60 MHz clock, therefore, the frequency range of transmitting and receiving is from DC to ~27 MHz. A majority of pulse sequences have been developed, and the imaging performance of the spectrometer has been validated through a large number of experiments. Furthermore, the spectrometer is also suitable for relaxation measurement in nuclear magnetic resonance field. PMID:23742570

Liang, Xiao; Binghe, Sun; Yueping, Ma; Ruyan, Zhao

2013-05-01

192

A digital magnetic resonance imaging spectrometer using digital signal processor and field programmable gate array  

NASA Astrophysics Data System (ADS)

A digital spectrometer for low-field magnetic resonance imaging is described. A digital signal processor (DSP) is utilized as the pulse programmer on which a pulse sequence is executed as a subroutine. Field programmable gate array (FPGA) devices that are logically mapped into the external addressing space of the DSP work as auxiliary controllers of gradient control, radio frequency (rf) generation, and rf receiving separately. The pulse programmer triggers an event by setting the 32-bit control register of the corresponding FPGA, and then the FPGA automatically carries out the event function according to preset configurations in cooperation with other devices; accordingly, event control of the spectrometer is flexible and efficient. Digital techniques are in widespread use: gradient control is implemented in real-time by a FPGA; rf source is constructed using direct digital synthesis technique, and rf receiver is constructed using digital quadrature detection technique. Well-designed performance is achieved, including 1 ?s time resolution of the gradient waveform, 1 ?s time resolution of the soft pulse, and 2 MHz signal receiving bandwidth. Both rf synthesis and rf digitalization operate at the same 60 MHz clock, therefore, the frequency range of transmitting and receiving is from DC to ~27 MHz. A majority of pulse sequences have been developed, and the imaging performance of the spectrometer has been validated through a large number of experiments. Furthermore, the spectrometer is also suitable for relaxation measurement in nuclear magnetic resonance field.

Liang, Xiao; Binghe, Sun; Yueping, Ma; Ruyan, Zhao

2013-05-01

193

Systolic-array optimizing compiler  

Microsoft Academic Search

The WARP machine is a linear array of ten programmable processors and is capable of executing 100 million floating-point operations per second (100 MFLOPS). The individual processors, or cells, derive their performance from a wide instruction set and a high degree of internal pipelining and parallelism. Can an array of high-performance cells be programmed to cooperate at a fine grain

M. S. Lam; M. S. L

1987-01-01

194

Exploiting Fine-grain Thread Level Parallelism on the MIT Multi-ALU Processor  

Microsoft Academic Search

Much of the improvement in computer performance over the last twenty years has come from faster transistors and architectural advances that increase parallelism. Historically, parallelism has been exploited either at the instruction level with a grain-size of a single instruction or by partitioning applications into coarse threads with grain-sizes of thousands of instructions. Fine-grain threads fill the parallelism gap between

Stephen W. Keckler; William J. Dally; Daniel Maskit; Nicholas P. Carter; Andrew Chang; Whay Sing Lee

1998-01-01

195

Microfluidic trap array for massively parallel imaging of Drosophila embryos.  

PubMed

Here we describe a protocol for the fabrication and use of a microfluidic device to rapidly orient >700 Drosophila embryos in parallel for end-on imaging. The protocol describes master microfabrication (?1 d), polydimethylsiloxane molding (few hours), system setup and device operation (few minutes) and imaging (depending on application). Our microfluidics-based approach described here is one of the first to facilitate rapid orientation for end-on imaging, and it is a major breakthrough for quantitative studies on Drosophila embryogenesis. The operating principle of the embryo trap is based on passive hydrodynamics, and it does not require direct manipulation of embryos by the user; biologists following the protocol should be able to repeat these procedures. The compact design and fabrication materials used allow the device to be used with traditional microscopy setups and do not require specialized fixtures. Furthermore, with slight modification, this array can be applied to the handling of other model organisms and oblong objects. PMID:23493069

Levario, Thomas J; Zhan, Mei; Lim, Bomyi; Shvartsman, Stanislav Y; Lu, Hang

2013-04-01

196

Numerical methods for matrix computations using arrays of processors. Final report, 15 August 1983-15 October 1986  

SciTech Connect

The basic objective of this project was to consider a large class of matrix computations with particular emphasis on algorithms that can be implemented on arrays of processors. In particular, methods useful for sparse matrix computations were investigated. These computations arise in a variety of applications such as the solution of partial differential equations by multigrid methods and in the fitting of geodetic data. Some of the methods developed have already found their use on some of the newly developed architectures.

Golub, G.H.

1987-04-30

197

Combinatorial arrays and parallel screening for positive electrode discovery  

NASA Astrophysics Data System (ADS)

Combinatorial techniques have been applied to the preparation and screening of positive electrode candidates for lithium batteries. This work describes the automated parallel synthesis of 64-electrode arrays using a Packard Multiprobe II liquid handling system. A cell was constructed with a single lithium reference-counter electrode and 64, three-millimeter-diameter working electrodes containing Li xMn 2O 4 active material, PVdF-HFP binder and carbon black as a conducting additive. Eight duplicate electrodes, each of eight respective compositions, were deposited on the array and the mass fraction of carbon was varied in steps from 1 to 25%. The results showed a rapid increase in capacity at the percolation limit of 3% for most cells. Some groups of nominally identical cells showed random variations in capacity, especially at low carbon loadings. The overall result is a demonstration of advantages of the combinatorial concept, which were time-saving and an improved statistical significance of the results compared with on-off experiments.

Spong, A. D.; Vitins, G.; Guerin, S.; Hayden, B. E.; Russell, A. E.; Owen, John R.

198

High-performance computational chemistry : hartree-fock electronic structure calculations on massively parallel processors.  

SciTech Connect

The parallel performance of the NWChem version 1.2{alpha} parallel direct-SCF code has been characterized on five massively parallel supercomputers (IBM SP, Kendall Square KSR-2, CRAY T3D and T3E, and Intel Touchstone DELTA) using single-point energy calculations on seven molecules of varying size (up to 389 atoms) and composition (first-row atoms, halogens, and transition metals). The authors compare the performance using both replicated-data and distributed-data algorithms and the original McMurchie-Davidson and recently incorporated TEXAS integrals packages.

Tilson, J. L.; Minkoff, M.; Wagner, A. F.; Shepard, R.; Sutton, P.; Harrison, R. J.; Kendall, R. A.; Wong, A. T.; PNNL

1999-01-01

199

A longitudinal multi-bunch feedback system using parallel digital signal processors  

SciTech Connect

A programmable longitudinal feedback system based on four AT&T 1610 digital signal processors has been developed as a component of the PEP-II R&D program. This longitudinal quick prototype is a proof of concept for the PEP-II system and implements full-speed bunch-by-bunch signal processing for storage rings with bunch spacing of 4 ns. The design incorporates a phase-detector-based front end that digitizes the oscillation phases of bunchies at the 250 MHz crossing rate, four programmable signal processors that compute correction signals, and a 250-MHz hold buffer/kicker driver stage that applies correction signals back on the beam. The design implements a general-purpose, table-driven downsampler that allows the system to be operated at several accelerator facilities. The hardware architecture of the signal processing is described, and the software algorithms used in the feedback signal computation are discussed. The system configuration used for tests at the LBL Advanced Light Source is presented.

Sapozhnikov, L.; Fox, J.D.; Olsen, J.J.; Oxoby, G.; Linscott, I. [Stanford Linear Accelerator Center, Menlo Park, CA (United States); Drago, A.; Serio, M. [Istituto Nazionale di Fisica Nucleare, Frascati (Italy). Lab. Nazionale di Frascati

1993-12-01

200

Parallel Implementation of the Wideband DOA Algorithm on the IBM Cell BE Processor.  

National Technical Information Service (NTIS)

The Multiple Signal Classification (MUSIC) algorithm is a powerful technique for determining the Direction of Arrival (DOA) of signals impinging on an antenna array. The algorithm is serial based, mathematically intensive, and requires substantial computi...

M. B. Longbrake M. M. Jamali P. E. Buxa T. E. Schmuland

2010-01-01

201

Programming a Hillslope Water Movement Model on the Mpp (Massively Parallel Processor).  

National Technical Information Service (NTIS)

A physically based numerical model was developed of heat and moisture flow within a hillslope on a parallel architecture computer, as a precursor to a model of a complete catchment. Moisture flow within a catchment includes evaporation, overland flow, flo...

J. E. DeVaney A. R. Irving P. J. Camillo R. J. Gurney

1987-01-01

202

O(1) time algorithms for computing histogram and Hough transform on a cross-bridge reconfigurable array of processors  

SciTech Connect

Instead of using the base-2 number system, we use a base-m number system to represent the numbers used in the proposed algorithms. Such a strategy can be used to design an O(T) time, T = (log(sub m) N) + 1, prefix sum algorithm for a binary sequence with N-bit on a cross-bridge reconfigurable array of processors using N processors, where the data bus is m-bit wide. Then, this basic operation can be used to compute the histogram of an n x n image with G gray-level value in constant time using G x n x n processors, and compute the Hough transform of an image with N edge pixels and n x n parameter space in constant time using n x n x N processors, respectively. This result is better than the previously known results proposed in the literature. Also, the execution time of the proposed algorithms is tunable by the bus bandwidth. 43 refs.

Kao, T.; Horng, S.; Wang, Y. [Natl Taiwan Inst. of Technology, Taipei, Taiwan (China)] [Natl Taiwan Inst. of Technology, Taipei, Taiwan (China)

1995-04-01

203

Low-power, real-time digital video stabilization using the HyperX parallel processor  

NASA Astrophysics Data System (ADS)

Coherent Logix has implemented a digital video stabilization algorithm for use in soldier systems and small unmanned air / ground vehicles that focuses on significantly reducing the size, weight, and power as compared to current implementations. The stabilization application was implemented on the HyperX architecture using a dataflow programming methodology and the ANSI C programming language. The initial implementation is capable of stabilizing an 800 x 600, 30 fps, full color video stream with a 53ms frame latency using a single 100 DSP core HyperX hx3100TM processor running at less than 3 W power draw. By comparison an Intel Core2 Duo processor running the same base algorithm on a 320x240, 15 fps stream consumes on the order of 18W. The HyperX implementation is an overall 100x improvement in performance (processing bandwidth increase times power improvement) over the GPP based platform. In addition the implementation only requires a minimal number of components to interface directly to the imaging sensor and helmet mounted display or the same computing architecture can be used to generate software defined radio waveforms for communications links. In this application, the global motion due to the camera is measured using a feature based algorithm (11 x 11 Difference of Gaussian filter and Features from Accelerated Segment Test) and model fitting (Random Sample Consensus). Features are matched in consecutive frames and a control system determines the affine transform to apply to the captured frame that will remove or dampen the camera / platform motion on a frame-by-frame basis.

Hunt, Martin A.; Tong, Lin; Bindloss, Keith; Zhong, Shang; Lim, Steve; Schmid, Benjamin J.; Tidwell, J. D.; Willson, Paul D.

2011-05-01

204

Support and optimization for parallel sparse programs with array intrinsics of Fortran 90  

Microsoft Academic Search

Fortran 90 provides a rich set of array intrinsic functions that are useful for representing array expressions and data parallel programming. However, the application of these intrinsic functions to sparse data sets in distributed memory environments, is currently not supported by vendors of Fortran 90 and HPF compilers. Our recent research work has been aimed at, providing parallel processing supports

Rong-guey Chang; Tyng-ruey Chuang; Jenq Kuen Lee

2004-01-01

205

Micro fluidic biosensor array for parallelized cell adhesion analysis during pathogenic infection  

Microsoft Academic Search

In this contribution we present a novel disposable micro fluidic biosensor array for parallelized monitoring of cell adhesion during pathogenic infection. The biosensor array consists of 4 bioreactor chips and a flow distribution network. Thus, 4 biological experiments, can be run in parallel. Each bioreactor chip contains 4 quartz crystal resonators (QCRs) with randomly spread cells on top which are

T. Jacobs; G. Cama; M. Hartmann; T. Kahne; S. Hirsch; M. Naumann; P. Hauptmann

2008-01-01

206

Evaluation of the Leon3 soft-core processor within a Xilinx radiation-hardened field-programmable gate array.  

SciTech Connect

The purpose of this document is to summarize the work done to evaluate the performance of the Leon3 soft-core processor in a radiation environment while instantiated in a radiation-hardened static random-access memory based field-programmable gate array. This evaluation will look at the differences between two soft-core processors: the open-source Leon3 core and the fault-tolerant Leon3 core. Radiation testing of these two cores was conducted at the Texas A&M University Cyclotron facility and Lawrence Berkeley National Laboratory. The results of these tests are included within the report along with designs intended to improve the mitigation of the open-source Leon3. The test setup used for evaluating both versions of the Leon3 is also included within this document.

Learn, Mark Walter

2012-01-01

207

Multimode power processor  

DOEpatents

In one embodiment, a power processor which operates in three modes: an inverter mode wherein power is delivered from a battery to an AC power grid or load; a battery charger mode wherein the battery is charged by a generator; and a parallel mode wherein the generator supplies power to the AC power grid or load in parallel with the battery. In the parallel mode, the system adapts to arbitrary non-linear loads. The power processor may operate on a per-phase basis wherein the load may be synthetically transferred from one phase to another by way of a bumpless transfer which causes no interruption of power to the load when transferring energy sources. Voltage transients and frequency transients delivered to the load when switching between the generator and battery sources are minimized, thereby providing an uninterruptible power supply. The power processor may be used as part of a hybrid electrical power source system which may contain, in one embodiment, a photovoltaic array, diesel engine, and battery power sources.

O'Sullivan, George A. (Pottersville, NJ); O'Sullivan, Joseph A. (St. Louis, MO)

1999-01-01

208

Multimode power processor  

DOEpatents

In one embodiment, a power processor which operates in three modes: an inverter mode wherein power is delivered from a battery to an AC power grid or load; a battery charger mode wherein the battery is charged by a generator; and a parallel mode wherein the generator supplies power to the AC power grid or load in parallel with the battery. In the parallel mode, the system adapts to arbitrary non-linear loads. The power processor may operate on a per-phase basis wherein the load may be synthetically transferred from one phase to another by way of a bumpless transfer which causes no interruption of power to the load when transferring energy sources. Voltage transients and frequency transients delivered to the load when switching between the generator and battery sources are minimized, thereby providing an uninterruptible power supply. The power processor may be used as part of a hybrid electrical power source system which may contain, in one embodiment, a photovoltaic array, diesel engine, and battery power sources. 31 figs.

O'Sullivan, G.A.; O'Sullivan, J.A.

1999-07-27

209

Parallel Random Number Generation: Long-Range Correlations Among Multiple Processors  

Microsoft Academic Search

We use an empirical study based on simple Monte Carlo integrations to exhibit the well known long-range correlations between\\u000a linear congruential random numbers. In contrast to former studies, our long-range correlation test is carried out to assess\\u000a more than only two parallel streams. In addition we perform our test also with explicit inversive generators which from the\\u000a theoretical point of

Karl Entacher; Andreas Uhl; Stefan Wegenkittl

1999-01-01

210

A Study on Shortest Path Routing Algorithm on Dataflow Parallel Reconfigurable Processor DAPDNA2  

Microsoft Academic Search

In IP networks, we ordinally use OSPF(Open Shortest Path First) as a routing protocol. OSPF find the shortest path using Dijkstra's Shortest Path Algorithm. Dijkstra's Algorithm is suitable for program counter based CPU, however it is not scalable for the number of nodes in the networks since its computational complexity is O(n2). In this paper, We propose a parallel shortest

Sho SHIMIZU; Yutaka ARAKAWA; Naoaki YAMANAKA; Kosuke SHIBA

211

Reconfigurable Parallel VLSI CoProcessor for Space Robot Using FPGA  

Microsoft Academic Search

This paper proposes hardware solutions to the computation for the trigonometric and square root functions of inverse kinematics. They are based on an existing pipeline arithmetic which employs the CORDIC(Coordinate Rotation Digital Computer) algorithm. This integrated approach enhances computational efficiency by reducing the duplicate calculations of this functions and maximizing the parallel\\/pipelining processing for real-time robot control. The reliability of

R. Wei; M. H. Jin; J. J. Xia; Z. W. Xie; Hong Liu

2006-01-01

212

Pulsenet - A Parallel Flash Sampler and Digital Processor IC for Optical SETI  

Microsoft Academic Search

PulseNet is a full-custom IC with parallel flash ADC and digital processing that enables an all-sky optical search for extraterrestrial intelligence. It integrates 448 sense amplifiers that digitize 32 analog signals at 1GS\\/s, and other circuits that filter samples, store candidate signals, and perform astronomical observations. Its ~250,000 CMOS transistors (TSMC 0.25?m) dissipate 1.1W at 400MHz and 2.5V.

Andrew W. Howard; Gu-Yeon Wei; William J. Dally; Paul Horowitz

2006-01-01

213

Obtaining identical results with double precision global accuracy on different numbers of processors in parallel particle Monte Carlo simulations  

NASA Astrophysics Data System (ADS)

We describe and compare different approaches for achieving numerical reproducibility in photon Monte Carlo simulations. Reproducibility is desirable for code verification, testing, and debugging. Parallelism creates a unique problem for achieving reproducibility in Monte Carlo simulations because it changes the order in which values are summed. This is a numerical problem because double precision arithmetic is not associative. Parallel Monte Carlo, both domain replicated and decomposed simulations, will run their particles in a different order during different runs of the same simulation because the non-reproducibility of communication between processors. In addition, runs of the same simulation using different domain decompositions will also result in particles being simulated in a different order. In [1], a way of eliminating non-associative accumulations using integer tallies was described. This approach successfully achieves reproducibility at the cost of lost accuracy by rounding double precision numbers to fewer significant digits. This integer approach, and other extended and reduced precision reproducibility techniques, are described and compared in this work. Increased precision alone is not enough to ensure reproducibility of photon Monte Carlo simulations. Non-arbitrary precision approaches require a varying degree of rounding to achieve reproducibility. For the problems investigated in this work double precision global accuracy was achievable by using 100 bits of precision or greater on all unordered sums which where subsequently rounded to double precision at the end of every time-step.

Cleveland, Mathew A.; Brunner, Thomas A.; Gentile, Nicholas A.; Keasler, Jeffrey A.

2013-10-01

214

Obtaining identical results with double precision global accuracy on different numbers of processors in parallel particle Monte Carlo simulations  

SciTech Connect

We describe and compare different approaches for achieving numerical reproducibility in photon Monte Carlo simulations. Reproducibility is desirable for code verification, testing, and debugging. Parallelism creates a unique problem for achieving reproducibility in Monte Carlo simulations because it changes the order in which values are summed. This is a numerical problem because double precision arithmetic is not associative. Parallel Monte Carlo, both domain replicated and decomposed simulations, will run their particles in a different order during different runs of the same simulation because the non-reproducibility of communication between processors. In addition, runs of the same simulation using different domain decompositions will also result in particles being simulated in a different order. In [1], a way of eliminating non-associative accumulations using integer tallies was described. This approach successfully achieves reproducibility at the cost of lost accuracy by rounding double precision numbers to fewer significant digits. This integer approach, and other extended and reduced precision reproducibility techniques, are described and compared in this work. Increased precision alone is not enough to ensure reproducibility of photon Monte Carlo simulations. Non-arbitrary precision approaches require a varying degree of rounding to achieve reproducibility. For the problems investigated in this work double precision global accuracy was achievable by using 100 bits of precision or greater on all unordered sums which where subsequently rounded to double precision at the end of every time-step.

Cleveland, Mathew A., E-mail: cleveland7@llnl.gov; Brunner, Thomas A.; Gentile, Nicholas A.; Keasler, Jeffrey A.

2013-10-15

215

Hardware Implementation of Skeletonization Algorithm for Parallel Asynchronous Image Processing  

Microsoft Academic Search

This paper presents an FPGA realisation of an application-specific cellular processor array designed for asynchronous skeletonization of binary images. The skel- etonization algorithm is based on iterative thinning utilizing a 'grassfire' transformation approach. The purpose of this work was to test the performance of a fully parallel asynchronous processor array and to evaluate the inhomo- geneity of wave propagation velocity.

Alexey Lopich; Piotr Dudek

2009-01-01

216

Design and numerical evaluation of a volume coil array for parallel MR imaging at ultrahigh fields.  

PubMed

In this work, we propose and investigate a volume coil array design method using different types of birdcage coils for MR imaging. Unlike the conventional radiofrequency (RF) coil arrays of which the array elements are surface coils, the proposed volume coil array consists of a set of independent volume coils including a conventional birdcage coil, a transverse birdcage coil, and a helix birdcage coil. The magnetic fluxes of these three birdcage coils are intrinsically cancelled, yielding a highly decoupled volume coil array. In contrast to conventional non-array type volume coils, the volume coil array would be beneficial in improving MR signal-to-noise ratio (SNR) and also gain the capability of implementing parallel imaging. The volume coil array is evaluated at the ultrahigh field of 7T using FDTD numerical simulations, and the g-factor map at different acceleration rates was also calculated to investigate its parallel imaging performance. PMID:24649435

Pang, Yong; Wong, Ernest W H; Yu, Baiying; Zhang, Xiaoliang

2014-02-01

217

Parallel plate lens with metal hole array for terahertz wave band  

NASA Astrophysics Data System (ADS)

Optical devices for terahertz wave band from 0.1 to 10 THz are rapidly expanding and require better designs. This paper proposes and designs a parallel plate lens with metal hole array for the terahertz wave band. The fast wave effect is due to the parallel plate. For this lens, the parallel plate spacing and hole array dimensions control the phase velocity and the focusing effect. It is not necessary to control the phase through the lens shape, which is flat, itself. The periodic analysis model extracted from the full model confirms the phase control by the metal hole array dimensions. The periodic model can be used for efficient iterative design. The full wave analysis results are also obtained by ANSYS HFSS and the focusing effect is confirmed. Phase control using both the parallel plate and the hole array enhances the focusing effect over the focusing effect controlled only by the metal hole array dimensions.

Suzuki, Takehito; Yonamine, Hiroki; Konno, Takuya; Young, John C.; Takano, Keisuke; Hangyo, Masanori

2014-05-01

218

Method of up-front load balancing for local memory parallel processors  

NASA Technical Reports Server (NTRS)

In a parallel processing computer system with multiple processing units and shared memory, a method is disclosed for uniformly balancing the aggregate computational load in, and utilizing minimal memory by, a network having identical computations to be executed at each connection therein. Read-only and read-write memory are subdivided into a plurality of process sets, which function like artificial processing units. Said plurality of process sets is iteratively merged and reduced to the number of processing units without exceeding the balance load. Said merger is based upon the value of a partition threshold, which is a measure of the memory utilization. The turnaround time and memory savings of the instant method are functions of the number of processing units available and the number of partitions into which the memory is subdivided. Typical results of the preferred embodiment yielded memory savings of from sixty to seventy five percent.

Baffes, Paul Thomas (inventor)

1990-01-01

219

Design and analysis of real-time wavefront processor  

NASA Astrophysics Data System (ADS)

Latency of wavefront processor is an important factor of closed loop adaptive optical systems. For an adaptive optical system using Shack-Hartmann wave-front sensing and point beam, by ways of task queue, subtask arithmetic decomposition and subtask structure design, a multi-processors structure based on moder parallelism theory is built to realize a pipeline of wavefront gradient, wavefront reconstruction and wavefront control. By traits of field programmable gate array(FPGA) and digital signal processor(DSP), a pipeline wavefront processor based on FPGA+DSP structure is built with highly real-time performance. Clocks of FPGA and DSP, "age" of correctors are primary sources of this wavefront processor"s latency. For a 61-element adaptive optical system whose sampling frequency is 2900HZ, latency of this wavefront processor is less than 100us.

Zhou, Luchun; Wang, Chunhong; Li, Mei; Jiang, Wenhan

2004-12-01

220

Acoustooptic linear algebra processors - Architectures, algorithms, and applications  

NASA Technical Reports Server (NTRS)

Architectures, algorithms, and applications for systolic processors are described with attention to the realization of parallel algorithms on various optical systolic array processors. Systolic processors for matrices with special structure and matrices of general structure, and the realization of matrix-vector, matrix-matrix, and triple-matrix products and such architectures are described. Parallel algorithms for direct and indirect solutions to systems of linear algebraic equations and their implementation on optical systolic processors are detailed with attention to the pipelining and flow of data and operations. Parallel algorithms and their optical realization for LU and QR matrix decomposition are specifically detailed. These represent the fundamental operations necessary in the implementation of least squares, eigenvalue, and SVD solutions. Specific applications (e.g., the solution of partial differential equations, adaptive noise cancellation, and optimal control) are described to typify the use of matrix processors in modern advanced signal processing.

Casasent, D.

1984-01-01

221

Massively parallel computation of lattice associative memory classifiers on multicore processors  

NASA Astrophysics Data System (ADS)

Over the past quarter century, concepts and theory derived from neural networks (NNs) have featured prominently in the literature of pattern recognition. Implementationally, classical NNs based on the linear inner product can present performance challenges due to the use of multiplication operations. In contrast, NNs having nonlinear kernels based on Lattice Associative Memories (LAM) theory tend to concentrate primarily on addition and maximum/minimum operations. More generally, the emergence of LAM-based NNs, with their superior information storage capacity, fast convergence and training due to relatively lower computational cost, as well as noise-tolerant classification has extended the capabilities of neural networks far beyond the limited applications potential of classical NNs. This paper explores theory and algorithmic approaches for the efficient computation of LAM-based neural networks, in particular lattice neural nets and dendritic lattice associative memories. Of particular interest are massively parallel architectures such as multicore CPUs and graphics processing units (GPUs). Originally developed for video gaming applications, GPUs hold the promise of high computational throughput without compromising numerical accuracy. Unfortunately, currently-available GPU architectures tend to have idiosyncratic memory hierarchies that can produce unacceptably high data movement latencies for relatively simple operations, unless careful design of theory and algorithms is employed. Advantageously, some GPUs (e.g., the Nvidia Fermi GPU) are optimized for efficient streaming computation (e.g., concurrent multiply and add operations). As a result, the linear or nonlinear inner product structures of NNs are inherently suited to multicore GPU computational capabilities. In this paper, the authors' recent research in lattice associative memories and their implementation on multicores is overviewed, with results that show utility for a wide variety of pattern classification applications using classical NNs or lattice-based NNs. Dataflow diagrams are presented in terms of a parameterized model of data burden and LAM partitioning.

Ritter, Gerhard X.; Schmalz, Mark S.; Hayden, Eric T.

2011-09-01

222

Experience in highly parallel processing using DAP  

NASA Technical Reports Server (NTRS)

Distributed Array Processors (DAP) have been in day to day use for ten years and a large amount of user experience has been gained. The profile of user applications is similar to that of the Massively Parallel Processor (MPP) working group. Experience has shown that contrary to expectations, highly parallel systems provide excellent performance on so-called dirty problems such as the physics part of meteorological codes. The reasons for this observation are discussed. The arguments against replacing bit processors with floating point processors are also discussed.

Parkinson, D.

1987-01-01

223

Parallel Matrix Multiplication on a Linear Array with a Reconfigurable Pipelined Bus System  

Microsoft Academic Search

The known fast sequential algorithms for multiplying two N×N matrices (over an arbitrary ring) have time complexity O(N? ), where 2processor linear array with a reconfigurable pipelined bus system (LARPBS) in O(N ?\\/p+(N2\\/p2?\\/)log p)

Keqin Li; Victor Y. Pan

1999-01-01

224

Electrostatic quadrupole array for focusing parallel beams of charged particles  

DOEpatents

An array of electrostatic quadrupoles, capable of providing strong electrostatic focusing simultaneously on multiple beams, is easily fabricated from a single array element comprising a support rod and multiple electrodes spaced at intervals along the rod. The rods are secured to four terminals which are isolated by only four insulators. This structure requires bias voltage to be supplied to only two terminals and eliminates the need for individual electrode bias and insulators, as well as increases life by eliminating beam plating of insulators.

Brodowski, John (Smithtown, NY)

1982-11-23

225

Intermediate Level Computer Vision Processing Algorithm Development for the Content Addressable Array Parallel Processor.  

National Technical Information Service (NTIS)

During this quarter a set of seven benchmark problems were developed and analyzed for the IUA. These included Hough Transform, Convex Hull, Voronoi Diagram, Minimal Spanning Tree, Visibility of Vertices in a projected 3-dimensional model, sub-graph isomor...

1986-01-01

226

Porting the parallel array programming language ZPL to an embedded multicomputing system  

Microsoft Academic Search

This paper describes the port of the ZPL parallel array language to the Mercury-RACE, a multicomputing system designed for embedded real-time applications. We discuss the design of the language runtime system and our strategy on mapping ZPL operators to hardware communication. We also show performance results of the ZPL parallel matrix inverse algorithm on the target architecture.

Demetrio Rey; Joss Stubblefield; James Canning

2002-01-01

227

390–480 GHz photon-assisted tunneling steps generated by parallel Josephson tunnel junction arrays  

Microsoft Academic Search

We report on the first direct detection of submillimeter waves emitted by small parallel tunnel junction arrays. The arrays made up of 10 and 20 Nb\\/Al-AlOx\\/Nb junctions of 6 ?m2 is integrated and coupled in RF to Nb\\/Al-AlOx\\/Nb twin junction-based detector by a microstrip\\/slotline transition. The detector's I-V curve exhibits clearly photonassisted steps when the array is biased on the

F. Boussaha; A. Fe?ret; C. Chaumont; L. Pelay; M. Batrung; B. Lecomte; M. Salez; D. Bouville; F. Dauplay; J. Krieg; G. Beaudin; L. Lapierre

2010-01-01

228

A 4×4 radial dipole array fed by double-sided parallel-strip line  

Microsoft Academic Search

This work presents a 4×4 array of radial dipole antennas fed by double-sided parallel-strip transmission lines. This dipole array has a center frequency of 3.00 GHz and a 530 MHz bandwidth, corresponding to a fractional bandwidth of 17.7%. The side-lobes of the array were minimized by choosing a ?\\/2 element spacing and by unequally feeding the antenna elements with a

Travis W. Eubanks; Kai Chang

2010-01-01

229

High-performance ultra-low power VLSI analog processor for data compression  

NASA Technical Reports Server (NTRS)

An apparatus for data compression employing a parallel analog processor. The apparatus includes an array of processor cells with N columns and M rows wherein the processor cells have an input device, memory device, and processor device. The input device is used for inputting a series of input vectors. Each input vector is simultaneously input into each column of the array of processor cells in a pre-determined sequential order. An input vector is made up of M components, ones of which are input into ones of M processor cells making up a column of the array. The memory device is used for providing ones of M components of a codebook vector to ones of the processor cells making up a column of the array. A different codebook vector is provided to each of the N columns of the array. The processor device is used for simultaneously comparing the components of each input vector to corresponding components of each codebook vector, and for outputting a signal representative of the closeness between the compared vector components. A combination device is used to combine the signal output from each processor cell in each column of the array and to output a combined signal. A closeness determination device is then used for determining which codebook vector is closest to an input vector from the combined signals, and for outputting a codebook vector index indicating which of the N codebook vectors was the closest to each input vector input into the array.

Tawel, Raoul (Inventor)

1996-01-01

230

Optically reconfigurable processors  

Microsoft Academic Search

Reconfigurable processors, like the field programmable gate arrays (FPGAs), open new computational paradigms where the processor is able to tailor its internal structure to better implement a given application. A typical FPGA consists of an array of configurable logic blocks and a mesh of interconnections fully programmable by the user to perform a given application. By just changing its internal

J. Mumbru; G. Panotopulos; D. Psaltis; Gan Zhou; Xin An; Fai Mok

2000-01-01

231

Maskless, parallel patterning with zone-plate array lithography  

SciTech Connect

Zone-plate array lithography (ZPAL) is a maskless lithography scheme that uses an array of shuttered zone plates to print arbitrary patterns on a substrate. An experimental ultraviolet ZPAL system has been constructed and used to simultaneously expose nine different patterns with a 3x3 array of zone plates in a quasidot-matrix fashion. We present exposed patterns, describe the system design and construction, and discuss issues essential to a functional ZPAL system. We also discuss another ZPAL system which operates with 4.5 nm x radiation from a point source. We present simulations which show that, with our existing x-ray zone plates and this system, we should be able to achieve 55 nm resolution. (c) 1999 American Vacuum Society.

Carter, D. J. D. [Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, Massachussetts 02139 (United States)] [Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, Massachussetts 02139 (United States); Gil, Dario [Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, Massachussetts 02139 (United States)] [Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, Massachussetts 02139 (United States); Menon, Rajesh [Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, Massachussetts 02139 (United States)] [Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, Massachussetts 02139 (United States); Mondol, Mark K. [Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, Massachussetts 02139 (United States)] [Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, Massachussetts 02139 (United States); Smith, Henry I. [Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, Massachussetts 02139 (United States)] [Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, Massachussetts 02139 (United States); Anderson, Erik H. [Lawrence Berkeley Laboratory, Berkeley, California 94720 (United States)] [Lawrence Berkeley Laboratory, Berkeley, California 94720 (United States)

1999-11-01

232

Stream Processors  

NASA Astrophysics Data System (ADS)

Stream processors, like other multi core architectures partition their functional units and storage into multiple processing elements. In contrast to typical architectures, which contain symmetric general-purpose cores and a cache hierarchy, stream processors have a significantly leaner design. Stream processors are specifically designed for the stream execution model, in which applications have large amounts of explicit parallel computation, structured and predictable control, and memory accesses that can be performed at a coarse granularity. Applications in the streaming model are expressed in a gather-compute-scatter form, yielding programs with explicit control over transferring data to and from on-chip memory. Relying on these characteristics, which are common to many media processing and scientific computing applications, stream architectures redefine the boundary between software and hardware responsibilities with software bearing much of the complexity required to manage concurrency, locality, and latency tolerance. Thus, stream processors have minimal control consisting of fetching medium- and coarse-grained instructions and executing them directly on the many ALUs. Moreover, the on-chip storage hierarchy of stream processors is under explicit software control, as is all communication, eliminating the need for complex reactive hardware mechanisms.

Erez, Mattan; Dally, William J.

233

The LISA design environment for the synthesis of array processors including memories for the data transfer and fault tolerance by reconfiguration and coding techniques  

Microsoft Academic Search

The LISA design environment transforms computation extensive digital signal processing algorithms into array processor architectures. It supports the complete design flow from algorithmic specification in a high-level programming language to circuit description at the gate level. From the input description a graph representation is derived by symbolic execution and further mapped onto different architectures. Netlists in different formats can be

Mirjam Schönfeld; Jens Franzen; Markus Schwiegershausen; Peter Pirsch; Uwe Vehlies; ANDREAS MCONZNER

1995-01-01

234

Submillimeter mixers based on superconductive parallel junction arrays  

Microsoft Academic Search

Observation and analysis of submillimeter-wave radiation (300GHz-3THz) in astronomy and atmospheric sciences requires increasingly performant receivers. The most sensitive receivers working in this range of electromagnetic spectrum use superconductor-insolator-superconductor (SIS) junctions. In order to increase the bandwidth and the sensitivity, we are developing a quantum-noise limited heterodyne receiver based on several parallel SIS junctions with broad (larger than 30%) fixed

Faouzi Boussaha; Morvan C. Salez; Yan Delorme; Alexandre Feret; Benoit Lecomte; Karl Westerberg; Michel Chaubet

2003-01-01

235

Prototype implementation of array-processor extensible over multiple FPGAs for scalable stencil computation  

Microsoft Academic Search

This paper demonstrates and evaluates the performance and the scalability of the systolic computational-memory array (SCMA) for stencil computation, which is a typical computing kernel of scientific simulation. We describe the basic architecture of th SCMA, and show the requirements and the design of SCMAs to scalably operate over multiple devices. We implement a prototype of the SCMA with three

Kentaro Sano; Luzhou Wang; Satoru Yamamoto

2011-01-01

236

Wideband Amorphous-Solid Debye-Sears Light Modulators for Array-Antenna Processors.  

National Technical Information Service (NTIS)

The use of fused-silica Debye-Sears light modulators in electro-optical array-antenna processing is investigated theoretically and experimentally, with the result that a significant gain in signal-processing capacity, over that which can be achieved with ...

J. Minkoff

1967-01-01

237

A 32-Channel Lattice Transmission Line Array for Parallel Transmit and Receive MRI at 7 Tesla  

PubMed Central

Transmit and receive RF coil arrays have proven to be particularly beneficial for ultra-high-field MR. Transmit coil arrays enable such techniques as B1+ shimming to substantially improve transmit B1 homogeneity compared to conventional volume coil designs, and receive coil arrays offer enhanced parallel imaging performance and SNR. Concentric coil arrangements hold promise for developing transceiver arrays incorporating large numbers of coil elements. At magnetic field strengths of 7 tesla and higher where the Larmor frequencies of interest can exceed 300 MHz, the coil array design must also overcome the problem of the coil conductor length approaching the RF wavelength. In this study, a novel concentric arrangement of resonance elements built from capacitively-shortened half-wavelength transmission lines is presented. This approach was utilized to construct an array with whole-brain coverage using 16 transceiver elements and 16 receive-only elements, resulting in a coil with a total of 16 transmit and 32 receive channels.

Adriany, Gregor; Auerbach, Edward J.; Snyder, Carl J.; Gozubuyuk, Ark; Moeller, Steen; Ritter, Johannes; van de Moortele, Pierre-Francois; Vaughan, Tommy; Ugurbil, Kamil

2010-01-01

238

CombinePlt and CombineThs user manual: Merging multiple, processor-local plot and time-history data bases produced during a parallel calculation  

SciTech Connect

The CombinePlt and CombineThs post-processing utilities are designed to merge the data in multiple, processor-local plot and time-history data bases produced by the parallel versions of the analysis codes DYNA3D, NIKE3D or PING into a serial data base which is compatible with the existing versions of the GRIZ and THUG visualization tools. These utilities make use of the partition assignment file produced by the PartMesh suite of pre-processing utilities to map the data from the processor-local order to global order. These utilities are also capable of translating 64-bit IEEE data bases into 32-bit IEEE data bases which are required for post-processing with GRIZ or THUG on an SGI workstation.

Procassini, R.J.; DeGroot, A.J.

1995-06-01

239

CombinePlt and CombineThs user manual: Merging multiple, processor-local plot and time-history data bases produced during a parallel calculation. Revision 1  

SciTech Connect

The CombinePlt and CombineThs post-processing utilities are designed to merge the data in multiple, processor-local plot and time-history data bases produced by the parallel versions of the analysis codes DYNA3D, NIKE3D or PING into a serial database which is compatible with the existing versions of the GRIZ and THUG visualization tools. These utilities make use of the partition assignment file produced by the PartMesh suite for pre-processing utilities to map the data from the processor-local order to global order. These utilities are also capable of translating 64-bit IEEE data bases into 32-bit IEEE data bases which are required for post-processing with GRIZ or THUG on an SGI workstation.

Procassini, R.J.; DeGroot, A.J.

1995-09-21

240

Achieving supercomputer performance for neural net simulation with an array of digital signal processors  

SciTech Connect

Music, a DSP-based system with a parallel distributed-memory architecture, provides enormous computing power yet retains the flexibility of a general-purpose computer. Reaching a peak performance of 2.7 Gflops at a significantly lower cost, power consumption, and space requirement than conventional supercomputers, Music is well suited to computationally intensive applications such as neural network simulation. 12 refs., 9 figs., 2 tabs.

Muller, U.A.; Baumle, B.; Kohler, P.; Gunzinger, A.; Guggenbuhl, W. [Swiss Federal Inst. of Technology, Zurich (Switzerland)] [Swiss Federal Inst. of Technology, Zurich (Switzerland)

1992-10-01

241

Fundamental and harmonic submillimeter-wave emission from parallel Josephson junction arrays  

Microsoft Academic Search

We report heterodyne measurements of Josephson microwave radiation emitted by a parallel array of small superconductor-insulator-superconductor (SIS) junctions at submillimeter-wave frequencies. The array consists of five Nb\\/Al-AlOx\\/Nb junctions nonevenly distributed in a niobium superconducting stripline, and is optimized for rf coupling in the 450-640 GHz range. We observed Fiske-like resonant steps in its I-V curve in the presence of magnetic

Faouzi Boussaha; Morvan Salez; Alexandre Féret; Benoit Lecomte; Christine Chaumont; Michel Chaubet; Fre´de´ric Dauplay; Yan Delorme; Jean-Michel Krieg

2009-01-01

242

Statics of non uniform Josephson junction parallel arrays: model vs. experiment  

Microsoft Academic Search

We study experimentally and numerically the zero-voltage supercurrent vs. magnetic field of non-uniform arrays of Josephson junctions parallel-connected by a superconducting stripline. The measured curves are complex, unique and in excellent agreement with numerical simulations using a specially developed model. Using this, we can optimize the arrays to have any desired interference pattern. Such new devices could find applications in

F. Boussaha; J. G. Caputo; L. Loukitch; M. Salez

2006-01-01

243

Near hexagonal close-packed optical fiber array for parallel optical interconnection  

Microsoft Academic Search

In conventional optical fiber based two-dimensional (2-D) parallel optical interconnection (POI), optical fiber arrays are aligned to 2-D light sources and detectors using precisely fabricated hole-plate or stacked V-groove plates. All these structures have problems in making positioning components and assembly with high accuracy. For polymer optical fibers with large diameter, the whole dimension of the array will be too

Chun Yang; Mingde Zhang; Zhichao Xia

2005-01-01

244

Extended Aperture 2-D Direction Finding With a Two-Parallel-Shape-Array Using Propagator Method  

Microsoft Academic Search

In this letter, we propose a two-parallel-shape array geometry, consisting of sensors spaced much farther apart than a half-wavelength, to improve estimation accuracy via aperture extension for two-dimensional (2D) direction finding. First, the subarray parallel with the x-axis is employed to extract automatically paired high-variance but unambiguous y-axis direction cosines and low-variance but cyclically ambiguous x-axis direction cosines. Then, the

Jin He; Zhong Liu

2009-01-01

245

A class of parallel algorithms for computation of the manipulator inertia matrix  

NASA Technical Reports Server (NTRS)

Parallel and parallel/pipeline algorithms for computation of the manipulator inertia matrix are presented. An algorithm based on composite rigid-body spatial inertia method, which provides better features for parallelization, is used for the computation of the inertia matrix. Two parallel algorithms are developed which achieve the time lower bound in computation. Also described is the mapping of these algorithms with topological variation on a two-dimensional processor array, with nearest-neighbor connection, and with cardinality variation on a linear processor array. An efficient parallel/pipeline algorithm for the linear array was also developed, but at significantly higher efficiency.

Fijany, Amir; Bejczy, Antal K.

1989-01-01

246

Multiple scattering of electromagnetic waves by an array of parallel gyrotropic rods.  

PubMed

We study multiple scattering of electromagnetic waves by an array of parallel gyrotropic circular rods and show that such an array can exhibit fairly unusual scattering properties and provide, under certain conditions, a giant enhancement of the scattered field. Among the scattering patterns of such an array at its resonant frequencies, the most interesting is the distribution of the total field in the form of a perfect self-similar structure of chessboard type. The scattering characteristics of the array are found to be essentially determined by the resonant properties of its gyrotropic elements and cannot be realized for arrays of nongyrotropic rods. It is expected that the results obtained can lead to a wide variety of practical applications. PMID:23368086

Es'kin, V A; Kudrin, A V; Zaboronkova, T M; Krafft, C

2012-12-01

247

Generation of second optical harmonic in a macroscopic array of parallel nanowires  

NASA Astrophysics Data System (ADS)

The quadratic optical susceptibility tensor for a macroscopically ordered array of parallel nanowires is determined. An experimental investigation of the polarization properties of signals of the second harmonic from ferroelectric nanowires synthesized in channels of chrysotile asbestos demonstrate that its results can be useful in structural studies.

Belotitskii, V. I.; Kumzerov, Yu. A.; Fokin, A. V.

2009-09-01

248

Development of superconductive parallel junction arrays for Submm-wave local oscillator applications  

Microsoft Academic Search

In order to develop Submillimiter-wave fully Integrated Superconducting Receivers (SIRs) based on parallel small SIS junction arrays (multijunction) operating as local oscillator, we investigate their performance through measurement and simulation. Multijunction may be an interesting alternative to LJJ because it allows wide LO tunability, wide impedance matching bandwidths and increase design flexibility and control of technological parameters. I. INTRODUCTION

F. Boussaha; A. Féret; B. Lecomte; M. Salez; L. Loukitch; C. Chaumont; J.-M. Krieg; M. Chaubet

2008-01-01

249

A MEMS nanoplotter with high-density parallel dip-pen nanolithography probe arrays  

Microsoft Academic Search

We report on the development of a nanoplotter that consists of an array of microfabricated probes for parallel dip-pen nanolithography. Two types of device have been developed by using microelectromechanical systems micromachining technology. The first consists of 32 silicon nitride cantilevers separated by 100 µm, while the second consists of eight boron-doped silicon tips separated by 310 µm. The former

Ming Zhang; David Bullen; Sung-Wook Chung; Seunghun Hong; Kee S. Ryu; Zhifang Fan; Chad A. Mirkin; Chang Liu

2002-01-01

250

Monotonic parallel and orthogonal routing for single-layer ball grid array packages  

Microsoft Academic Search

In this paper, we give the necessary and sufficient condition that all nets can be connected by monotonic routes when a net consists of a finger and a ball and fingers are on the two parallel boundaries of the Ball Grid Array package, and propose a monotonic routing method based on this condition. Moreover, we give a necessary condition and

Yoichi Tomioka; Atsushi Takahashi

2006-01-01

251

Monotonic parallel and orthogonal routing for single-layer ball grid array packages  

Microsoft Academic Search

In this paper, we give the necessary and sufficient condition that all nets can be connected by monotonic routes when a net consists of a finger and a ball and fingers are on the two parallel boundaries of the ball grid array package, and propose a monotonic routing method based on this condition. Moreover, we give a necessary condition and

Yoichi Tomioka; Atsushi Takahashi

2006-01-01

252

Parallel Assisted Assembly of Multilayer DNA and Protein Nanoparticle Structures Using a CMOS Electronic Array  

Microsoft Academic Search

A CMOS electronic microarray device was used to carry out the rapid parallel assembly of functionalized nanoparticles into multilayer structures. Electronic microarrays produce reconfigurable DC electric fields that allow DNA, proteins as well as charged molecules to be rapidly transported from the bulk solution and addressed to specifically activated sites on the array surface. Such a device was used to

Michael J. Heller; Dietrich A. Dehlinger; Benjamin D. Sullivan

2006-01-01

253

Two-dimensional wavelet processor  

NASA Astrophysics Data System (ADS)

An optical implementation of the two-dimensional (2-D) wavelet transform and inverse wavelet transform is performed in real time by the exploitation of a new multichannel system that processes the different daughter wavelets separately. The so-coined wavelet-processor system relies on a multichannel replication array generated that uses a Dammann grating and is able to handle every wavelet function. All channels process in parallel using a conventional 2-D correlator. Experimental results applying the Mexican-hat wavelet-decomposition technique are presented.

Ouzieli, Ido; Mendlovic, David

1996-10-01

254

Multicoil resonance-based parallel array for smart wireless power delivery.  

PubMed

This paper presents a novel resonance-based multicoil structure as a smart power surface to wirelessly power up apparatus like mobile, animal headstage, implanted devices, etc. The proposed powering system is based on a 4-coil resonance-based inductive link, the resonance coil of which is formed by an array of several paralleled coils as a smart power transmitter. The power transmitter employs simple circuit connections and includes only one power driver circuit per multicoil resonance-based array, which enables higher power transfer efficiency and power delivery to the load. The power transmitted by the driver circuit is proportional to the load seen by the individual coil in the array. Thus, the transmitted power scales with respect to the load of the electric/electronic system to power up, and does not divide equally over every parallel coils that form the array. Instead, only the loaded coils of the parallel array transmit significant part of total transmitted power to the receiver. Such adaptive behavior enables superior power, size and cost efficiency then other solutions since it does not need to use complex detection circuitry to find the location of the load. The performance of the proposed structure is verified by measurement results. Natural load detection and covering 4 times bigger area than conventional topologies with a power transfer efficiency of 55% are the novelties of presented paper. PMID:24109796

Mirbozorgi, S A; Sawan, M; Gosselin, B

2013-01-01

255

OpenMP Parallelization of a Mickens Time-Integration Scheme for a Mixed-Culture Biofilm Model and Its Performance on Multi-core and Multi-processor Computers  

Microsoft Academic Search

\\u000a We document and compare the performance of an OpenMP parallelized simulation code for a mixed-culture biofilm model on a desktop\\u000a workstation with two quad core Xeon processors, and on SGI Altix Systems with single core and dual core Itanium processors.\\u000a The underlying model is a parabolic system of highly non-linear partial differential equations, which is discretized in time\\u000a using a

Nasim Muhammad; Hermann J. Eberl

2009-01-01

256

dc properties of series-parallel arrays of Josephson junctions in an external magnetic field  

SciTech Connect

A detailed dc theory of superconducting multijunction interferometers has previously been developed by several authors for the case of parallel junction arrays. The theory is now extended to cover the case of a loop containing several junctions connected in series. The problem is closely associated with high-{ital T}{sub {ital c}} superconductors and their clusters of intrinsic Josephson junctions. These materials exhibit spontaneous interferometric effects, and there is no reason to assume that the intrinsic junctions form only parallel arrays. A simple formalism of phase states is developed in order to express the superconducting phase differences across the junctions forming a series array as functions of the phase difference across the weakest junction of the system, and to relate the differences in critical currents of the junctions to gaps in the allowed ranges of their phase functions. This formalism is used to investigate the energy states of the array, which in the case of different junctions are split and separated by energy barriers of height depending on the phase gaps. Modifications of the washboard model of a single junction are shown. Next a superconducting inductive loop containing a series array of two junctions is considered, and this model is used to demonstrate the transitions between phase states and the associated instabilities. Finally, the critical current of a parallel connection of two series arrays is analyzed and shown to be a multivalued function of the externally applied magnetic flux. The instabilities caused by the presence of intrinsic serial junctions in granular high-{ital T}{sub {ital c}} materials are pointed out as a potential source of additional noise.

Lewandowski, S.J. (Instytut Fizyki, Polska Akademia Nauk, Al. Lotnikow 32, PL 02-668 Warszawa, Poland (PL))

1991-04-01

257

Automatic Parallelization of Numerical Python Applications using the Global Arrays Toolkit  

SciTech Connect

Global Arrays is a software system from Pacific Northwest National Laboratory that enables an efficient, portable, and parallel shared-memory programming interface to manipulate distributed dense arrays. The NumPy module is the de facto standard for numerical calculation in the Python programming language, a language whose use is growing rapidly in the scientific and engineering communities. NumPy provides a powerful N-dimensional array class as well as other scientific computing capabilities. However, like the majority of the core Python modules, NumPy is inherently serial. Using a combination of Global Arrays and NumPy, we have reimplemented NumPy as a distributed drop-in replacement called Global Arrays in NumPy (GAiN). Serial NumPy applications can become parallel, scalable GAiN applications with only minor source code changes. Scalability studies of several different GAiN applications will be presented showing the utility of developing serial NumPy codes which can later run on more capable clusters or supercomputers.

Daily, Jeffrey A.; Lewis, Robert R.

2011-11-30

258

The microarchitecture of superscalar processors  

Microsoft Academic Search

Superscalar processing is the latest in along series of innovations aimed at producing ever-faster microprocessors. By exploiting instruction-level parallelism, superscalar processors are capable of executing more than one instruction in a clock cycle. This paper discusses the microarchitecture of superscalar processors. We begin with a discussion of the general problem solved by superscalar processors: converting an ostensibly sequential program into

James E. Smith; Gurindar S. Sohi

1995-01-01

259

Optical subassembly with 57°-angled fiber array and silicon optical bench for VCSEL array and parallel optical transmitter module  

NASA Astrophysics Data System (ADS)

This paper suggests a passive aligned optical subassembly (OSA) using 54.7° mirrors of a silicon optical bench (SiOB) and a 57° angled fiber array for a vertical cavity surface emitting laser (VCSEL). This OSA is very cost-effective because the OSA was fabricated by only one-axis alignment along the V-groove's direction and flip-chip-bonding the VCSEL. In addition, this paper describes a 2.5-Gbps x 12-channels parallel optical transmitter module fabricated with the OSA.

Hwang, Sung Hwan; Lee, Sang Hwan; Park, Hyo-Hoon

2006-10-01

260

A flexible programmable signal processor for next generation fighter aircraft  

NASA Astrophysics Data System (ADS)

The performance requirements of next generation Programmable Signal Processors (PSP) military radar applications are examined. Consideration is given to processor performance criteria (throughput rate, parameter changes, mode changes) in connection with several air-to-air radar modes including long range search, single target tracking, track-while-scan, and ECCM. Air-to-ground radar modes are also examined, with emphasis given to Moving Target Indication (MTI), Doppler mode, SAR, and terrain following/avoidance. It is shown that next-generation PSP will require processing speeds on the order of 1 billion complex operations per second. It is pointed out that conventional array processor architectures similar to those in current PSPs will need significantly larger memory bandwidths to achieve the required throughput rates. However, the use of parallel architectures such as systolic arrays and wavefront arrays can achieve such speeds with much lower memory bandwidth requirements.

Rowlett, R.; Stewart, C.; Mayor, M.

261

Parallel SPM cantilever arrays for large area surface metrology and lithography  

NASA Astrophysics Data System (ADS)

In this paper technology of scanning probe microscopy (SPM) surface metrology using arrays of piezoresistive thermally actuated cantilevers is discussed. The cantilever architecture presented here makes it possible to image surface topography using sensors operating in parallel. In this way the throughput of the sample imaging is increased, which is of crucial importance in measurements of large area samples. Application of piezoresistive detection scheme makes it possible to investigate quantitatively the interaction between the microprobe and the imaged surface. Integration of the thermal deflection actuator with the spring beam decreases the response time and enables fast and high resolution control of the tip sample distance. The results of topography parallel measurement using 1×4 cantilever array will be presented.

Gotszalk, Teodor; Ivanov, Tzvetan; Rangelow, Ivo W.

2014-04-01

262

A Superconductive Parallel Junction Array Mixer for Very Wide Band Heterodyne Submillimeter-Wave Spectrometry  

Microsoft Academic Search

The study of submillimeter-wave radiation in astronomy and atmospheric sciences requires increasingly performant receivers, in particular allowing extended spectral line surveys. To this end, we are developing a quantum-noise limited heterodyne receiver based on SIS junction parallel arrays with broad (larger than 30%) fixed tuned bandwidth. Simulations show that networks of junctions (N>2) of micronic size, embedded in a superconducting

F. Boussaha; Y. Delorme; M. Salez; M. H. Chung; F. Dauplay; B. Lecomte; J. G. Caputo; V. Thevenet

2002-01-01

263

Excitation of a Parallel Plate Waveguide by an Array of Rectangular Waveguides  

NASA Technical Reports Server (NTRS)

This work addresses the problem of excitation of a parallel plate waveguide by an array of rectangular waveguides that arises in applications such as the continuous transverse stub (CTS) antenna and dual-polarized parabolic cylindrical reflector antennas excited by a scanning line source. In order to design the junction region between the parallel plate waveguide and the linear array of rectangular waveguides, waveguide sizes have to be chosen so that the input match is adequate for the range of scan angles for both polarizations. Electromagnetic wave scattered by the junction of a parallel plate waveguide by an array of rectangular waveguides is analyzed by formulating coupled integral equations for the aperture electric field at the junction. The integral equations are solved by the method of moments. In order to make the computational process efficient and accurate, the method of weighted averaging was used to evaluate rapidly oscillating integrals encountered in the moment matrix. In addition, the real axis spectral integral is evaluated in a deformed contour for speed and accuracy. The MoM results for a large finite array have been validated by comparing its reflection coefficients with corresponding results for an infinite array generated by the commercial finite element code, HFSS. Once the aperture electric field is determined by MoM, the input reflection coefficients at each waveguide port, and coupling for each polarization over the range of useful scan angles, are easily obtained. Results for the input impedance and coupling characteristics for both the vertical and horizontal polarizations are presented over a range of scan angles. It is shown that the scan range is limited to about 35 for both polarizations and therefore the optimum waveguide is a square of size equal to about 0.62 free space wavelength.

Rengarajan, Sembiam

2011-01-01

264

Design and characterization of harmonic diffractive microlens arrays with continuous relief for parallel laser direct writing  

NASA Astrophysics Data System (ADS)

To make the real-time focusing possible in the exposure process, diffractive microlens arrays with continuous relief are designed and fabricated using harmonic diffraction theory for parallel laser direct writing to integrate the exposing and autofocusing functions in one array by taking both the writing resolution and diffraction efficiency into consideration. A theoretical model is established using Rayleigh-Sommerfeld diffraction theory to accurately characterize the focusing characteristics of each harmonic diffractive microlens in the array so that the fidelity of pattern can be improved through exposure dose modulation. The measurements made indicate that the experimental results coincide well with the theoretical results when the writing laser with a wavelength of 441.6 nm and the autofocusing laser with a wavelength of 670 nm are normally incident on an array with an F-number of F/4 fabricated on fused silica, and the array developed can be used to synchronously focus the writing laser and the autofocusing laser into the same spot of the array.

Shan, Mingguang; Guo, Lili; Zhong, Zhi

2010-03-01

265

A 32-channel lattice transmission line array for parallel transmit and receive MRI at 7 tesla.  

PubMed

Transmit and receive RF coil arrays have proven to be particularly beneficial for ultra-high-field MR. Transmit coil arrays enable such techniques as B(1) (+) shimming to substantially improve transmit B(1) homogeneity compared to conventional volume coil designs, and receive coil arrays offer enhanced parallel imaging performance and SNR. Concentric coil arrangements hold promise for developing transceiver arrays incorporating large numbers of coil elements. At magnetic field strengths of 7 tesla and higher where the Larmor frequencies of interest can exceed 300 MHz, the coil array design must also overcome the problem of the coil conductor length approaching the RF wavelength. In this study, a novel concentric arrangement of resonance elements built from capacitively-shortened half-wavelength transmission lines is presented. This approach was utilized to construct an array with whole-brain coverage using 16 transceiver elements and 16 receive-only elements, resulting in a coil with a total of 16 transmit and 32 receive channels. PMID:20512850

Adriany, Gregor; Auerbach, Edward J; Snyder, Carl J; Gözübüyük, Ark; Moeller, Steen; Ritter, Johannes; Van de Moortele, Pierre-François; Vaughan, Tommy; U?urbil, Kâmil

2010-06-01

266

Weak-Periodic Stochastic Resonance in a Parallel Array of Static Nonlinearities  

PubMed Central

This paper studies the output-input signal-to-noise ratio (SNR) gain of an uncoupled parallel array of static, yet arbitrary, nonlinear elements for transmitting a weak periodic signal in additive white noise. In the small-signal limit, an explicit expression for the SNR gain is derived. It serves to prove that the SNR gain is always a monotonically increasing function of the array size for any given nonlinearity and noisy environment. It also determines the SNR gain maximized by the locally optimal nonlinearity as the upper bound of the SNR gain achieved by an array of static nonlinear elements. With locally optimal nonlinearity, it is demonstrated that stochastic resonance cannot occur, i.e. adding internal noise into the array never improves the SNR gain. However, in an array of suboptimal but easily implemented threshold nonlinearities, we show the feasibility of situations where stochastic resonance occurs, and also the possibility of the SNR gain exceeding unity for a wide range of input noise distributions.

Ma, Yumei; Duan, Fabing; Chapeau-Blondeau, Francois; Abbott, Derek

2013-01-01

267

Optimal expression evaluation for data parallel architectures  

NASA Technical Reports Server (NTRS)

A data parallel machine represents an array or other composite data structure by allocating one processor per data item. A pointwise operation can be performed between two such arrays in unit time, provided their corresponding elements are allocated in the same processors. If the arrays are not aligned in this fashion, the cost of moving one or both of them is part of the cost of operation. The choice of where to perform the operation then affects this cost. If an expression with several operands is to be evaluated, there may be many choices of where to perform the intermediate operations. An efficient algorithm is given to find the minimum cost way to evaluate an expression, for several different data parallel architectures. The algorithm applies to any architecture in which the metric describing the cost of moving an array has a property called robustness. This encompasses most of the common data parallel communication architectures, including meshes of arbitrary dimension and hypercubes.

Gilbert, J. R.; Schreiber, R.

1990-01-01

268

Multithread video coding processor for the videophone  

Microsoft Academic Search

The architecture of a programmable video codec IC is described that employs multiple vector processors in a single chip. The vector processors operate in parallel and communicate with one another through on-chip shared memories. A single scaler control processor schedules each vector processor independently to achieve real-time video coding with special vector instructions. With programmable interconnection buses, the proposed architecture

Jeong-Min Kim; Seok-Kyun Hong; Eel-Wan Lee; Soo-Ik Chae

1995-01-01

269

High-performance SPAD array detectors for parallel photon timing applications  

NASA Astrophysics Data System (ADS)

Over the past few years there has been a growing interest in monolithic arrays of single photon avalanche diodes (SPAD) for spatially resolved detection of faint ultrafast optical signals. SPADs implemented in planar technologies offer the typical advantages of microelectronic devices (small size, ruggedness, low voltage, low power, etc.). Furthermore, they have inherently higher photon detection efficiency than PMTs and are able to provide, beside sensitivities down to single-photons, very high acquisition speeds. In order to make SPAD array more and more competitive in time-resolved application it is necessary to face problems like electrical crosstalk between adjacent pixel, moreover all the singlephoton timing electronics with picosecond resolution has to be developed. In this paper we present a new instrument suitable for single-photon imaging applications and made up of 32 timeresolved parallel channels. The 32x1 pixel array that includes SPAD detectors represents the system core, and an embedded data elaboration unit performs on-board data processing for single-photon counting applications. Photontiming information is exported through a custom parallel cable that can be connected to an external multichannel TCSPC system.

Rech, I.; Cuccato, A.; Antonioli, S.; Cammi, C.; Gulinatti, A.; Ghioni, M.

2012-02-01

270

Parallel and series FED microstrip array with high efficiency and low cross polarization  

NASA Astrophysics Data System (ADS)

A microstrip array antenna for vertically polarized fan beam (approximately 2 deg x 50 deg) for C-band SAR applications with a physical area of 1.7 m by 0.17 m comprises two rows of patch elements and employs a parallel feed to left- and right-half sections of the rows. Each section is divided into two segments that are fed in parallel with the elements in each segment fed in series through matched transmission lines for high efficiency. The inboard section has half the number of patch elements of the outboard section, and the outboard sections, which have tapered distribution with identical transmission line sections, terminated with half wavelength long open-circuit stubs so that the remaining energy is reflected and radiated in phase. The elements of the two inboard segments of the two left- and right-half sections are provided with tapered transmission lines from element to element for uniform power distribution over the central third of the entire array antenna. The two rows of array elements are excited at opposite patch feed locations with opposite (180 deg difference) phases for reduced cross-polarization.

Huang, John

1995-06-01

271

Parallel and series fed microstrip array with high efficiency and low cross polarization  

NASA Astrophysics Data System (ADS)

A microstrip array antenna for vertically polarized fan beam (approximately 2 deg x 50 deg) for C-band SAR applications with a physical area of 1.7 m by 0.17 m comprises two rows of patch elements and employs a parallel feed to left- and right-half sections of the rows is described. Each section is divided into two segments that are fed in parallel with the elements in each segment fed in series through matched transmission lines for high efficiency. The inboard section has half the number of patch elements of the outboard section, and the outboard sections, which have tapered distribution with identical transmission line sections, terminated with half wavelength long open-circuit stubs so that the remaining energy is reflected and radiated in phase. The elements of the two inboard segments of the two left- and right-half sections are provided with tapered transmission lines from element to element for uniform power distribution over the central third of the entire array antenna. The two rows of array elements are excited at opposite patch feed locations with opposite (180 deg difference) phases for reduced polarization.

Huang, John

1993-04-01

272

Parallel and series FED microstrip array with high efficiency and low cross polarization  

NASA Technical Reports Server (NTRS)

A microstrip array antenna for vertically polarized fan beam (approximately 2 deg x 50 deg) for C-band SAR applications with a physical area of 1.7 m by 0.17 m comprises two rows of patch elements and employs a parallel feed to left- and right-half sections of the rows. Each section is divided into two segments that are fed in parallel with the elements in each segment fed in series through matched transmission lines for high efficiency. The inboard section has half the number of patch elements of the outboard section, and the outboard sections, which have tapered distribution with identical transmission line sections, terminated with half wavelength long open-circuit stubs so that the remaining energy is reflected and radiated in phase. The elements of the two inboard segments of the two left- and right-half sections are provided with tapered transmission lines from element to element for uniform power distribution over the central third of the entire array antenna. The two rows of array elements are excited at opposite patch feed locations with opposite (180 deg difference) phases for reduced cross-polarization.

Huang, John (inventor)

1995-01-01

273

Proceedings of the 1983 international conference on parallel processing  

Microsoft Academic Search

The following topics were dealt with: the performance of existing supercomputers on computationally intensive tasks; multistage networks; numerical algorithms; network connection capabilities; special purpose systems; node-to-node networks; nonnumerical algorithms; tree structured systems; parallel programming and languages; images and speech; expressing parallelism; database machines and signal processing; data flow; simulation and operating systems; models; scheduling resources; system performance; VLSI processor arrays;

H. J. Siegel; L. Siegel

1983-01-01

274

The Imagine Stream Processor  

Microsoft Academic Search

The Imagine Stream Processor is a single-chip pro- grammable media processor with 48 parallel ALUs. At 400 MHz, this translates to a peak arithmetic rate of 16 GFLOPS on single-precision data and 32 GOPS on 16- bit fixed-point data. The scalability of Imagine's program- ming model and architecture enable it to achieve such high arithmetic rates. Imagine executes applications that

Ujval J. Kapasi; William J. Dally; Brucek Khailany; John D. Owens; Scott Rixner

2002-01-01

275

Design and implementation of a parallel array operator for the arbitrary remapping of data.  

SciTech Connect

The data redistribution or remapping functions, gather and scatter, are of long-standing in high-performance computing, having been included in Cray Fortran for decades. In this paper, we present a highly-general array operator with powerful ga.ther and scatter capa.bilities unmatched in other array languages. We discuss an efficient parallel implementation, introducing several new optimizations-run length encoding, dead army reuse, and direct conimunica.tion-that lessen the costs associa.ted with the operator's wide applicability. In our implementation of this operator in ZPL, we demonstrade comparable performance to the highly-tuned, hand-coded Fortran plus MPI versions of the NAS FT and NAS CG benchmarks.

Dietz, Steven; Choi, S. E. (Sung-Eun); Chamberlain, B. L. (Bradford L.); Snyder, Lawrence

2003-01-01

276

Dynamic scheduling and planning parallel observations on large Radio Telescope Arrays with the Square Kilometre Array in mind  

NASA Astrophysics Data System (ADS)

Scheduling, the task of producing a time table for resources and tasks, is well-known to be a difficult problem the more resources are involved (a NP-hard problem). This is about to become an issue in Radio astronomy as observatories consisting of hundreds to thousands of telescopes are planned and operated. The Square Kilometre Array (SKA), which Australia and New Zealand bid to host, is aiming for scales where current approaches -- in construction, operation but also scheduling -- are insufficent. Although manual scheduling is common today, the problem is becoming complicated by the demand for (1) independent sub-arrays doing simultaneous observations, which requires the scheduler to plan parallel observations and (2) dynamic re-scheduling on changed conditions. Both of these requirements apply to the SKA, especially in the construction phase. We review the scheduling approaches taken in the astronomy literature, as well as investigate techniques from human schedulers and today's observatories. The scheduling problem is specified in general for scientific observations and in particular on radio telescope arrays. Also taken into account is the fact that the observatory may be oversubscribed, requiring the scheduling problem to be integrated with a planning process. We solve this long-term scheduling problem using a time-based encoding that works in the very general case of observation scheduling. This research then compares algorithms from various approaches, including fast heuristics from CPU scheduling, Linear Integer Programming and Genetic algorithms, Branch-and-Bound enumeration schemes. Measures include not only goodness of the solution, but also scalability and re-scheduling capabilities. In conclusion, we have identified a fast and good scheduling approach that allows (re-)scheduling difficult and changing problems by combining heuristics with a Genetic algorithm using block-wise mutation operations. We are able to explain and eradicate two problems in the literature: The inability of a GA to properly improve schedules and the generation of schedules with frequent interruptions. Finally, we demonstrate the scheduling framework for several operating telescopes: (1) Dynamic re-scheduling with the AUT Warkworth 12m telescope, (2) Scheduling for the Australian Mopra 22m telescope and scheduling for the Allen Telescope Array. Furthermore, we discuss the applicability of the presented scheduling framework to the Atacama Large Millimeter/submillimeter Array (ALMA, in construction) and the SKA. In particular, during the development phase of the SKA, this dynamic, scalable scheduling framework can accommodate changing conditions.

Buchner, Johannes

2011-12-01

277

Non-Phi0-periodic macroscopic quantum interference in one-dimensional parallel Josephson junction arrays with unconventional grating structure  

Microsoft Academic Search

A theoretical study is presented for a number N of Josephson junctions connected as a one-dimensional (1D) parallel array in such a manner that there are N-1 individual superconducting loops with arbitrary shape formed. In the resistive array mode, for bias currents I>Ic, all Josephson junctions in the array oscillate at the same magnetic field dependent frequency nuB which is,

J. Oppenländer; Ch. Häussler; N. Schopohl

2001-01-01

278

Voxel based parallel post processor for void nucleation and growth analysis of atomistic simulations of material fracture.  

PubMed

Molecular dynamics (MD) simulations are used in the study of void nucleation and growth in crystals that are subjected to tensile deformation. These simulations are run for typically several hundred thousand time steps depending on the problem. We output the atom positions at a required frequency for post processing to determine the void nucleation, growth and coalescence due to tensile deformation. The simulation volume is broken up into voxels of size equal to the unit cell size of crystal. In this paper, we present the algorithm to identify the empty unit cells (voids), their connections (void size) and dynamic changes (growth and coalescence of voids) for MD simulations of large atomic systems (multi-million atoms). We discuss the parallel algorithms that were implemented and discuss their relative applicability in terms of their speedup and scalability. We also present the results on scalability of our algorithm when it is incorporated into MD software LAMMPS. PMID:24793054

Hemani, H; Warrier, M; Sakthivel, N; Chaturvedi, S

2014-05-01

279

Optoelectronics technology consortium 32-channel parallel fiber optic transmitter/receiver array testbed  

NASA Astrophysics Data System (ADS)

In the data communications industry, there are emerging requirements for a short distance (tens to hundreds of meters), high-speed (200 Mbit/sec-1 Gbit/sec) data bus for large computing environments, clustered parallel computing systems, and datacom switching. In response to these requirements, a parallel optical fiber interconnect has been developed by the Optoelectronics Technology Consortium (OETC), an ARPA-funded industry alliance including IBM, AT&T, Honeywell, and Lockheed Martin. This year, IBM completed testing of a 32-channel OETC fiber optic transmitter/receiver array in a product testbed, and announced future availability of a commercial product called "Jitney" based on the OETC prototype.1-5 The OETC modules have been described in detail elsewhere.3-4 The datalink consists of a 32-channel vertical cavity surface emitting laser (VCSEL) array transmitter, operating at 850 nm, and corresponding 32-channel metal-semiconductor-metal (MSM) photodiode receivers, interconnected by a fiber ribbon cable. Using multi-mode 62.5/125 u m fiber, link distances range from about 100 m at 1 Gbit/sec to over 250 m at 200 Mbit/sec.

Decusatis, Casimer; Quinn, Terrence; Pepeljugoski, Petar; Kuchta, Daniel; Crow, John; Corp., Ibm; Heights, Yorktown; N., Y.

1996-12-01

280

Highly parallel CMOS lock-in optical sensor array for hyperspectral recording in scanned imaging systems  

NASA Astrophysics Data System (ADS)

Many optical measurements that are subject to high levels of background illumination rely on phase sensitive lock-in detection to extract the useful signal. If modulation is applied to the portion of the signal that contains information, lockin detection can perform very narrowband (and hence low noise) detection at frequencies well away from noise sources such as 1/f and instrumental drift. Lock-in detection is therefore used in many optical imaging and measurement techniques, including optical coherence tomography, heterodyne interferometry, optoacoustic tomography and a range of pump-probe techniques. Phase sensitive imaging is generally performed sequentially with a single photodetector and a lock-in amplifier. However, this approach severely limits the rate of multi-dimensional image acquisition. We present a novel linear array chip that can perform phase sensitive, shot-noise limited optical detection in up to 256 parallel channels. This has been achieved by employing four independent wells in each pixel, and massively enhancing the intrinsic well depth to suppress the effect of optical shot noise. Thus the array can reduce the number of dimensions that need to be sequentially scanned and greatly speed up acquisition. Results demonstrating spatial and spectral parallelism in pump-probe experiments are presented where the a.c. amplitude to background ratio approaches 1 part in one million.

Light, Roger A.; Smith, Richard J.; Johnston, Nicholas S.; Sharples, Steve D.; Somekh, Michael G.; Pitter, Mark C.

2010-02-01

281

Real-time processor for staring receivers  

NASA Technical Reports Server (NTRS)

The design, fabrication, and testing of a state-of-the-art, high-throughput on-focal plane IR-image signal processor is described. The processing functions performed are frame differencing and thresholding. The final focal plane array will consist of a 128 x 128-pixel platinum-silicide detector bump-mounted to an on-chip CCD multiplexer. The processor is in a 128-channel parallel-pipeline format. Each channel consists of a pixel regenerator (charge differencer), 128-pixel frame store CCD memory, pixel differencer, second pixel regenerator, thresholder (analog comparator), and digital latch. Four parallel analog outputs and four parallel digital outputs are included. The digital outputs provide a bit map of the image. All analog clock signals (128 KHz, 256 KHz, and 5 MHz) are generated by on-chip TTL-input clock drivers. TTL clock driver inputs are generated off-chip. The technology is low-temperature surface and buried channel CCD/CMOS/indium bump. The design goal was 8-bit resolution at 77 K and 1000 frames/s. Applications include point- or extended-target motion detection with thresholding. Design trade-offs and enhancements (such as on-chip detector gain compensation and a simple window processor) are discussed.

Hanzal, Brian; Peczalski, Andrzej; Schwanebeck, James; Sanderson, Richard; Fossum, Eric

1992-01-01

282

Large-scale parallel arrays of silicon nanowires via block copolymer directed self-assembly  

NASA Astrophysics Data System (ADS)

Extending the resolution and spatial proximity of lithographic patterning below critical dimensions of 20 nm remains a key challenge with very-large-scale integration, especially if the persistent scaling of silicon electronic devices is sustained. One approach, which relies upon the directed self-assembly of block copolymers by chemical-epitaxy, is capable of achieving high density 1 : 1 patterning with critical dimensions approaching 5 nm. Herein, we outline an integration-favourable strategy for fabricating high areal density arrays of aligned silicon nanowires by directed self-assembly of a PS-b-PMMA block copolymer nanopatterns with a L0 (pitch) of 42 nm, on chemically pre-patterned surfaces. Parallel arrays (5 × 106 wires per cm) of uni-directional and isolated silicon nanowires on insulator substrates with critical dimension ranging from 15 to 19 nm were fabricated by using precision plasma etch processes; with each stage monitored by electron microscopy. This step-by-step approach provides detailed information on interfacial oxide formation at the device silicon layer, the polystyrene profile during plasma etching, final critical dimension uniformity and line edge roughness variation nanowire during processing. The resulting silicon-nanowire array devices exhibit Schottky-type behaviour and a clear field-effect. The measured values for resistivity and specific contact resistance were ((2.6 +/- 1.2) × 105 ?cm) and ((240 +/- 80) ?cm2) respectively. These values are typical for intrinsic (un-doped) silicon when contacted by high work function metal albeit counterintuitive as the resistivity of the starting wafer (~10 ?cm) is 4 orders of magnitude lower. In essence, the nanowires are so small and consist of so few atoms, that statistically, at the original doping level each nanowire contains less than a single dopant atom and consequently exhibits the electrical behaviour of the un-doped host material. Moreover this indicates that the processing successfully avoided unintentional doping. Therefore our approach permits tuning of the device steps to contact the nanowires functionality through careful selection of the initial bulk starting material and/or by means of post processing steps e.g. thermal annealing of metal contacts to produce high performance devices. We envision that such a controllable process, combined with the precision patterning of the aligned block copolymer nanopatterns, could prolong the scaling of nanoelectronics and potentially enable the fabrication of dense, parallel arrays of multi-gate field effect transistors.Extending the resolution and spatial proximity of lithographic patterning below critical dimensions of 20 nm remains a key challenge with very-large-scale integration, especially if the persistent scaling of silicon electronic devices is sustained. One approach, which relies upon the directed self-assembly of block copolymers by chemical-epitaxy, is capable of achieving high density 1 : 1 patterning with critical dimensions approaching 5 nm. Herein, we outline an integration-favourable strategy for fabricating high areal density arrays of aligned silicon nanowires by directed self-assembly of a PS-b-PMMA block copolymer nanopatterns with a L0 (pitch) of 42 nm, on chemically pre-patterned surfaces. Parallel arrays (5 × 106 wires per cm) of uni-directional and isolated silicon nanowires on insulator substrates with critical dimension ranging from 15 to 19 nm were fabricated by using precision plasma etch processes; with each stage monitored by electron microscopy. This step-by-step approach provides detailed information on interfacial oxide formation at the device silicon layer, the polystyrene profile during plasma etching, final critical dimension uniformity and line edge roughness variation nanowire during processing. The resulting silicon-nanowire array devices exhibit Schottky-type behaviour and a clear field-effect. The measured values for resistivity and specific contact resistance were ((2.6 +/- 1.2) × 105 ?cm) and ((240 +/- 80) ?cm2) respectively. These values are typic

Farrell, Richard A.; Kinahan, Niall T.; Hansel, Stefan; Stuen, Karl O.; Petkov, Nikolay; Shaw, Matthew T.; West, Laetitia E.; Djara, Vladimir; Dunne, Robert J.; Varona, Olga G.; Gleeson, Peter G.; Jung, Soon-Jung; Kim, Hye-Young; Kole?nik, Maria M.; Lutz, Tarek; Murray, Christopher P.; Holmes, Justin D.; Nealey, Paul F.; Duesberg, Georg S.; Krsti?, Vojislav; Morris, Michael A.

2012-05-01

283

High Density Single-Molecule-Bead Arrays for Parallel Single Molecule Force Spectroscopy  

PubMed Central

The assembly of a highly-parallel force spectroscopy tool requires careful placement of single-molecule targets on the substrate and the deliberate manipulation of a multitude of force probes. Since the probe must approach the target biomolecule for covalent attachment, while avoiding irreversible adhesion to the substrate, the use of the polymer microsphere as force probes to create the tethered bead array poses a problem. Therefore, the interactions between the force probe and the surface must be repulsive at very short distances (< 5 nm) and attractive at long distances. To achieve this balance, the chemistry of the substrate, force probe, and solution must be tailored to control the probe-surface interactions. In addition to an appropriately designed chemistry, it is necessary to control the surface density of the target molecule in order to ensure that only one molecule is interrogated by a single force probe. We used gold-thiol chemistry to control both the substrate’s surface chemistry and the spacing of the studied molecules, through a competitive binding of the thiol-terminated DNA and an inert thiol forming a blocking layer. For our single molecule array, we modeled the forces between the probe and the substrate using DLVO theory and measured their magnitude and direction with colloidal probe microscopy. The practicality of each system was tested using a probe binding assay to evaluate the proportion of the beads remaining adhered to the surface after application of force. We have translated the results specific for our system to general guiding principles for preparation of tethered bead arrays and demonstrated the ability of this system to produce a high yield of active force spectroscopy probes in a microwell substrate. This study outlines the characteristics of the chemistry needed to create such a force spectroscopy array.

Barrett, Michael J.; Oliver, Piercen M.; Cheng, Peng; Cetin, Deniz; Vezenov, Dmitri

2012-01-01

284

Computation and parallel implementation for early vision  

NASA Technical Reports Server (NTRS)

The problem of early vision is to transform one or more retinal illuminance images-pixel arrays-to image representations built out of such primitive visual features such as edges, regions, disparities, and clusters. These transformed representations form the input to later vision stages that perform higher level vision tasks including matching and recognition. Researchers developed algorithms for: (1) edge finding in the scale space formulation; (2) correlation methods for computing matches between pairs of images; and (3) clustering of data by neural networks. These algorithms are formulated for parallel implementation of SIMD machines, such as the Massively Parallel Processor, a 128 x 128 array processor with 1024 bits of local memory per processor. For some cases, researchers can show speedups of three orders of magnitude over serial implementations.

Gualtieri, J. Anthony

1990-01-01

285

Large-scale parallel arrays of silicon nanowires via block copolymer directed self-assembly.  

PubMed

Extending the resolution and spatial proximity of lithographic patterning below critical dimensions of 20 nm remains a key challenge with very-large-scale integration, especially if the persistent scaling of silicon electronic devices is sustained. One approach, which relies upon the directed self-assembly of block copolymers by chemical-epitaxy, is capable of achieving high density 1?:?1 patterning with critical dimensions approaching 5 nm. Herein, we outline an integration-favourable strategy for fabricating high areal density arrays of aligned silicon nanowires by directed self-assembly of a PS-b-PMMA block copolymer nanopatterns with a L(0) (pitch) of 42 nm, on chemically pre-patterned surfaces. Parallel arrays (5 × 10(6) wires per cm) of uni-directional and isolated silicon nanowires on insulator substrates with critical dimension ranging from 15 to 19 nm were fabricated by using precision plasma etch processes; with each stage monitored by electron microscopy. This step-by-step approach provides detailed information on interfacial oxide formation at the device silicon layer, the polystyrene profile during plasma etching, final critical dimension uniformity and line edge roughness variation nanowire during processing. The resulting silicon-nanowire array devices exhibit Schottky-type behaviour and a clear field-effect. The measured values for resistivity and specific contact resistance were ((2.6 ± 1.2) × 10(5)?cm) and ((240 ± 80) ?cm(2)) respectively. These values are typical for intrinsic (un-doped) silicon when contacted by high work function metal albeit counterintuitive as the resistivity of the starting wafer (?10 ?cm) is 4 orders of magnitude lower. In essence, the nanowires are so small and consist of so few atoms, that statistically, at the original doping level each nanowire contains less than a single dopant atom and consequently exhibits the electrical behaviour of the un-doped host material. Moreover this indicates that the processing successfully avoided unintentional doping. Therefore our approach permits tuning of the device steps to contact the nanowires functionality through careful selection of the initial bulk starting material and/or by means of post processing steps e.g. thermal annealing of metal contacts to produce high performance devices. We envision that such a controllable process, combined with the precision patterning of the aligned block copolymer nanopatterns, could prolong the scaling of nanoelectronics and potentially enable the fabrication of dense, parallel arrays of multi-gate field effect transistors. PMID:22481430

Farrell, Richard A; Kinahan, Niall T; Hansel, Stefan; Stuen, Karl O; Petkov, Nikolay; Shaw, Matthew T; West, Laetitia E; Djara, Vladimir; Dunne, Robert J; Varona, Olga G; Gleeson, Peter G; Jung, Soon-Jung; Kim, Hye-Young; Kole?nik, Maria M; Lutz, Tarek; Murray, Christopher P; Holmes, Justin D; Nealey, Paul F; Duesberg, Georg S; Krsti?, Vojislav; Morris, Michael A

2012-05-21

286

Adiabatic evolution of light in an array of parallel curved optical waveguides  

NASA Astrophysics Data System (ADS)

Adiabatic evolution of light in parallel curved optical waveguide array is investigated theoretically. This problem is shown to bear a close connection with the process of coherent population transfer in a “bow-tie” model in quantum physics. Under certain conditions on the geometry of the waveguides and the optical properties of the system complete light transfer between the outer waveguides is achieved. Special attention is paid to the case of three waveguides, which is analyzed using the solutions of the well-known bow-tie model. The analytic solution is used to design recipes for creating arbitrary superpositions of light intensity between the waveguides, with possible applications in achromatic optical multiple-beam splitters. For more than three waveguides complete light transfer between the outer waveguides and beam splitting is demonstrated numerically.

Hristova, H. S.; Rangelov, A. A.; Guérin, S.; Vitanov, N. V.

2013-07-01

287

Parallelization of the NAS Conjugate Gradient Benchmark Using the Global Arrays Shared Memory Programming Model  

SciTech Connect

The NAS Conjugate Gradient (CG) benchmark is an important scientific kernel used to evaluate machine performance and compare characteristics of different programming models. Global Arrays (GA) toolkit supports a shared memory programming paradigm— even on distributed memory systems— and offers the programmer control over the distribution and locality that are important for optimizing performance on scalable architectures. In this paper, we describe and compare two different parallelization strategies of the CG benchmark using GA and report performance results on a shared-memory system as well as on a cluster. Performance benefits of using shared memory for irregular/sparse computations have been demonstrated before in context of the CG benchmark using OpenMP. Similarly, the GA implementation outperforms the standard MPI implementation on shared memory system, in our case the SGI Altix. However, with GA these benefits are extended to distributed memory systems and demonstrated on a Linux cluster with Myrinet.

Zhang, Yeliang; Tipparaju, Vinod; Nieplocha, Jarek; Hariri, Salim

2005-04-08

288

Exploiting Processor Groups to Extend Scalability of the GA Shared Memory Programming Model  

SciTech Connect

Exploiting processor groups is becoming increasingly important for programming next-generation high-end systems composed of tens or hundreds of thousands of processors. This paper discusses the requirements, functionality and development of multilevel-parallelism based on processor groups in the context of the Global Array (GA) shared memory programming model. The main effort involves management of shared data, rather than interprocessor communication. Experimental results for the NAS NPB Conjugate Gradient benchmark and a molecular dynamics (MD) application are presented for a Linux cluster with Myrinet and illustrate the value of the proposed approach for improving scalability. While the original GA version of the CG benchmark lagged MPI, the processor-group version outperforms MPI in all cases, except for a few points on the smallest problem size. Similarly, the group version of the MD application improves execution time by 58% on 32 processors.

Nieplocha, Jarek; Krishnan, Manoj Kumar; Palmer, Bruce J.; Tipparaju, Vinod; Zhang, Yeliang

2005-05-04

289

Modeling of the phase lag causing fluidelastic instability in a parallel triangular tube array  

NASA Astrophysics Data System (ADS)

Fluidelastic instability is considered a critical flow induced vibration mechanism in tube and shell heat exchangers. It is believed that a finite time lag between tube vibration and fluid response is essential to predict the phenomenon. However, the physical nature of this time lag is not fully understood. This paper presents a fundamental study of this time delay using a parallel triangular tube array with a pitch ratio of 1.54. A computational fluid dynamics (CFD) model was developed and validated experimentally in an attempt to investigate the interaction between tube vibrations and flow perturbations at lower reduced velocities Ur=1-6 and Reynolds numbers Re=2000-12 000. The numerical predictions of the phase lag are in reasonable agreement with the experimental measurements for the range of reduced velocities Ug/fd=6-7. It was found that there are two propagation mechanisms; the first is associated with the acoustic wave propagation at low reduced velocities, Ur<2, and the second mechanism for higher reduced velocities is associated with the vorticity shedding and convection. An empirical model of the two mechanisms is developed and the phase lag predictions are in reasonable agreement with the experimental and numerical measurements. The developed phase lag model is then coupled with the semi-analytical model of Lever and Weaver to predict the fluidelastic stability threshold. Improved predictions of the stability boundaries for the parallel triangular array were achieved. In addition, the present study has explained why fluidelastic instability does not occur below some threshold reduced velocity.

Khalifa, Ahmed; Weaver, David; Ziada, Samir

2013-11-01

290

Parallel Optical-Transmission Module Using Vertical-Cavity Surface-Emitting Laser Array and Micro-Optical Bench (MOB)  

Microsoft Academic Search

In this paper, we present a scheme of a parallel optical transmission module using a vertical-cavity surface-emitting laser (VCSEL) array and a micro-optical bench (MOB). We have fabricated a prototype using a VCSEL array emitting at 850 nm and a MOB for the first time, and demonstrated the feasibility of a two-dimensional alignment-free optical interconnect module. No noticeable degradation of

Yuji Shimada; Yasuhiko Aoki; Kenichi Iga

2001-01-01

291

High-Precision Digital-to-Analog Tunable Capacitors With Improved Quality Factor Using a Parallel Digital Actuator Array  

Microsoft Academic Search

We present a micromechanical digital-to-analog (DA) tunable capacitor using a parallel digital actuator array, which is capable of accomplishing high-precision tuning and quality factor improvement. The present DA tunable capacitor uses a parallel interconnection of digital actuators and thus achieves a low-resistive structure for wireless communications. Based on the criteria for the capacitance range (0.348-1.932 pF) and the actuation voltage

Won Han; Young-Ho Cho

2009-01-01

292

Parallel multipoint recording of aligned and cultured neurons on micro channel array toward cellular network analysis.  

PubMed

This paper describes an advanced Micro Channel Array (MCA) for recording electrophysiological signals of neuronal networks at multiple points simultaneously. The developed MCA is designed for neuronal network analysis which has been studied by the co-authors using the Micro Electrode Arrays (MEA) system, and employs the principles of extracellular recordings. A prerequisite for extracellular recordings with good signal-to-noise ratio is a tight contact between cells and electrodes. The MCA described herein has the following advantages. The electrodes integrated around individual micro channels are electrically isolated to enable parallel multipoint recording. Reliable clamping of a targeted cell through micro channels is expected to improve the cellular selectivity and the attachment between the cell and the electrode toward steady electrophysiological recordings. We cultured hippocampal neurons on the developed MCA. As a result, the spontaneous and evoked spike potentials could be recorded by sucking and clamping the cells at multiple points. In this paper, we describe the design and fabrication of the MCA and the successful electrophysiological recordings leading to the development of an effective cellular network analysis device. PMID:20414807

Tonomura, Wataru; Moriguchi, Hiroyuki; Jimbo, Yasuhiko; Konishi, Satoshi

2010-08-01

293

First programmable digital optical processor: optical cellular logic image processor  

NASA Astrophysics Data System (ADS)

The construction of digital optical processors based on the cellular logic image processor (CLIP) architecture is discussed. Both a single-channel processor and a parallel version incorporating 256 information channels have been constructed. The single channel version of the processor allows eight different combinatorial logic processes to be carried out under electronic control and can be programmed in real time. Several algorithms including pattern recognition, byte comparison, full addition and subtraction have been implemented with this machine. The 256 channel version operates similarly to the single channel version except that a reduced instruction set internal processor with four selectable logic processes is used. A nearest neighbor interconnect provides the communication required between the different information channels. More advanced processing capability can be achieved with the introduction of such non-local interconnects as shuffle networks. Results and simulations obtained with these processors are presented. Advances in the various components of the O- CLIP circuit, future goals, and potential application are also discussed.

Craig, Robert G.; Wherrett, Brian S.; Walker, Andrew C.; McKnight, Douglas J.; Redmond, Ian R.; Snowdon, John F.; Buller, Gerald S.; Restall, Edward J.; Wilson, R. A.; Wakelin, Suzanne; McArdle, Neil; Meredith, P.; Miller, J. M.; Taghizadeh, Mohammad R.; MacKinnon, G.; Smith, S. Desmond

1991-09-01

294

Peripheral processors for high-speed simulation. [helicopter cockpit simulator  

NASA Technical Reports Server (NTRS)

This paper describes some of the results of a study directed to the specification and procurement of a new cockpit simulator for an advanced class of helicopters. A part of the study was the definition of a challenging benchmark problem, and detailed analyses of it were made to assess the suitability of a variety of simulation techniques. The analyses showed that a particularly cost-effective approach to the attainment of adequate speed for this extremely demanding application is to employ a large minicomputer acting as host and controller for a special-purpose digital peripheral processor. Various realizations of such peripheral processors, all employing state-of-the-art electronic circuitry and a high degree of parallelism and pipelining, are available or under development. The types of peripheral processors array processors, simulation-oriented processors, and arrays of processing elements - are analyzed and compared. They are particularly promising approaches which should be suitable for high-speed simulations of all kinds, the cockpit simulator being a case in point.

Karplus, W. J.

1977-01-01

295

Experiences with Soft-Core Processor Design  

Microsoft Academic Search

Soft-core processors exploit the flexibility of Field Pro- grammable Gate Arrays (FPGAs) to allow a system de- signer to customize the processor to the needs of a target application. This paper describes the UT Nios implementa- tion of Altera's Nios architecture. A benchmark set appro- priate for soft-core processors is defined. Using the bench- mark set, the performance of UT

Franjo Plavec; Blair Fort; Zvonko G. Vranesic; Stephen Dean Brown

2005-01-01

296

Optimizing Compiler for the CELL Processor  

Microsoft Academic Search

Developed for multimedia and game applications, as well as other numerically intensive workloads, the CELL processor provides support both for highly parallel codes, which have high computation and memory requirements, and for scalar codes, which require fast response time and a full-featured programming environment. This first generation CELL processor implements on a single chip a Power Architecture processor with two

Alexandre E. Eichenberger; Kathryn M. O'Brien; Kevin O'Brien; Peng Wu; Tong Chen; Peter H. Oden; Daniel A. Prener; Janice C. Shepherd; Byoungro So; Zehra Sura; Amy Wang; Tao Zhang; Peng Zhao; Michael Gschwind

2005-01-01

297

480-GMACS\\/mW Resonant Adiabatic Mixed-Signal Processor Array for Charge-Based Pattern Recognition  

Microsoft Academic Search

A resonant adiabatic mixed-signal VLSI array delivers 480 GMACS (109 multiply-and-accumulates per second) throughput for every mW of power, a 25-fold improvement over the energy efficiency obtained when resonant clock generator and line drivers are replaced with static CMOS drivers. Losses in resonant clock generation are minimized by activating switches between the LC tank and DC supply with a periodic

Rafal Karakiewicz; Roman Genov; Gert Cauwenberghs

2007-01-01

298

Sixteen-element Ge-on-SOI PIN photo-detector arrays for parallel optical interconnects  

NASA Astrophysics Data System (ADS)

We describe the structure and testing of one-dimensional array parallel-optics photo-detectors with 16 photodiodes of which each diode operates up to 8 Gb/s. The single element is vertical and top illuminated 30-?m-diameter silicon on insulator (Ge-on-SOI) PIN photodetector. High-quality Ge absorption layer is epitaxially grown on SOI substrate by the ultra-high vacuum chemical vapor deposition (UHV-CVD). The photodiode exhibits a good responsivity of 0.20 A/W at a wavelength of 1550 nm. The dark current is as low as 0.36 ?A at a reverse bias of 1 V, and the corresponding current density is about 51 mA/cm2. The detector with a diameter of 30 ?m is measured at an incident light of 1.55 ?m and 0.5 mW, and the 3-dB bandwidth is 7.39 GHz without bias and 13.9 GHz at a reverse bias of 3 V. The 16 devices show a good consistency.

Li, Chong; Xue, Chun-Lai; Liu, Zhi; Cheng, Bu-Wen; Wang, Qi-Ming

2014-03-01

299

Development of micropump-actuated negative pressure pinched injection for parallel electrophoresis on array microfluidic chip.  

PubMed

A micropump-actuated negative pressure pinched injection method is developed for parallel electrophoresis on a multi-channel LIF detection system. The system has a home-made device that could individually control 16-port solenoid valves and a high-voltage power supply. The laser beam is excitated and distributes to the array separation channels for detection. The hybrid Glass-PDMS microfluidic chip comprises two common reservoirs, four separation channels coupled to their respective pneumatic micropumps and two reference channels. Due to use of pressure as a driving force, the proposed method has no sample bias effect for separation. There is only one high-voltage supply needed for separation without relying on the number of channels, which is significant for high-throughput analysis, and the time for sample loading is shortened to 1 s. In addition, the integrated micropumps can provide the versatile interface for coupling with other function units to satisfy the complicated demands. The performance is verified by separation of DNA marker and Hepatitis B virus DNA samples. And this method is also expected to show the potential throughput for the DNA analysis in the field of disease diagnosis. PMID:19681052

Li, Bowei; Jiang, Lei; Xie, Hua; Gao, Yan; Qin, Jianhua; Lin, Bingcheng

2009-09-01

300

An Evaluation of Global Address Space Languages: Co-Array Fortran and Unified Parallel C  

SciTech Connect

Co-array Fortran (CAF) and Unified Parallel C (UPC) are two emerging languages for single-program, multiple-data global address space programming. These languages boost programmer productivity by providing shared variables for communication instead of message passing. However, the performance of these emerging languages still has room for improvement. In this paper, we study the performance of variants of the NAS MG, CG, SP, and BT benchmarks on several modern cluster architectures to identify challenges that must be met to deliver top performance. We compare CAF and UPC variants of these programs with the original Fortran+MPI code. Today, CAF and UPC programs deliver scalable performance on clusters only when written to use bulk communication. However, our experiments uncovered some significant performance bottlenecks limiting UPC performance on all platforms. We account for the root causes of these performance anomalies and show that they can be remedied with additional compiler improvements, in particular we show that many of these obstacles can be resolved with adequate optimizations by the backend C compilers.

Coarfa, Cristian; Dotsenko, Yuri; Mellor-Crummey, John M.; Cantonnet, Franois; El-Ghazawi, Tarek; Mohanti, Ashrujit; Yao, Yiyi; Chavarría-Miranda, Daniel

2005-06-10

301

Multimedia processors  

Microsoft Academic Search

This paper describes large-scale-integration programmable processors designed for multimedia processing such as real-time compression and decompression of audio and video as well as the generation of computer graphics. As the target of these processors is to handle audio and video in real time, the processing capability must be increased tenfold compared to that of conventional microprocessors, which were designed to

ICHIRO KURODA; TAKAO NISHITANI

1998-01-01

302

Signal processor packaging design  

NASA Astrophysics Data System (ADS)

The Signal Processor Packaging Design (SPPD) program was a technology development effort to demonstrate that a miniaturized, high throughput programmable processor could be fabricated to meet the stringent environment imposed by high speed kinetic energy guided interceptor and missile applications. This successful program culminated with the delivery of two very small processors, each about the size of a large pin grid array package. Rockwell International's Tactical Systems Division in Anaheim, California developed one of the processors, and the other was developed by Texas Instruments' (TI) Defense Systems and Electronics Group (DSEG) of Dallas, Texas. The SPPD program was sponsored by the Guided Interceptor Technology Branch of the Air Force Wright Laboratory's Armament Directorate (WL/MNSI) at Eglin AFB, Florida and funded by SDIO's Interceptor Technology Directorate (SDIO/TNC). These prototype processors were subjected to rigorous tests of their image processing capabilities, and both successfully demonstrated the ability to process 128 X 128 infrared images at a frame rate of over 100 Hz.

McCarley, Paul L.; Phipps, Mickie A.

1993-10-01

303

Hardware multiplier processor  

DOEpatents

A hardware processor is disclosed which in the described embodiment is a memory mapped multiplier processor that can operate in parallel with a 16 bit microcomputer. The multiplier processor decodes the address bus to receive specific instructions so that in one access it can write and automatically perform single or double precision multiplication involving a number written to it with or without addition or subtraction with a previously stored number. It can also, on a single read command automatically round and scale a previously stored number. The multiplier processor includes two concatenated 16 bit multiplier registers, two 16 bit concatenated 16 bit multipliers, and four 16 bit product registers connected to an internal 16 bit data bus. A high level address decoder determines when the multiplier processor is being addressed and first and second low level address decoders generate control signals. In addition, certain low order address lines are used to carry uncoded control signals. First and second control circuits coupled to the decoders generate further control signals and generate a plurality of clocking pulse trains in response to the decoded and address control signals.

Pierce, Paul E. (Albuquerque, NM)

1986-01-01

304

Atmospheric plasma jet array in parallel electric and gas flow fields for three-dimensional surface treatment  

NASA Astrophysics Data System (ADS)

This letter reports on electrical and optical characteristics of a ten-channel atmospheric pressure glow discharge jet array in parallel electric and gas flow fields. Challenged with complex three-dimensional substrates including surgical tissue forceps and sloped plastic plate of up to 15°, the jet array is shown to achieve excellent jet-to-jet uniformity both in time and in space. Its spatial uniformity is four times better than a comparable single jet when both are used to treat a 15° sloped substrate. These benefits are likely from an effective self-adjustment mechanism among individual jets facilitated by individualized ballast and spatial redistribution of surface charges.

Cao, Z.; Walsh, J. L.; Kong, M. G.

2009-01-01

305

Atmospheric plasma jet array in parallel electric and gas flow fields for three-dimensional surface treatment  

SciTech Connect

This letter reports on electrical and optical characteristics of a ten-channel atmospheric pressure glow discharge jet array in parallel electric and gas flow fields. Challenged with complex three-dimensional substrates including surgical tissue forceps and sloped plastic plate of up to 15 deg., the jet array is shown to achieve excellent jet-to-jet uniformity both in time and in space. Its spatial uniformity is four times better than a comparable single jet when both are used to treat a 15 deg. sloped substrate. These benefits are likely from an effective self-adjustment mechanism among individual jets facilitated by individualized ballast and spatial redistribution of surface charges.

Cao, Z.; Walsh, J. L.; Kong, M. G. [Department of Electronic and Electrical Engineering, Loughborough University, Leices LE11 3TU (United Kingdom)

2009-01-12

306

Parallel self-mixing imaging system based on an array of vertical-cavity surface-emitting lasers  

NASA Astrophysics Data System (ADS)

In this paper we investigate the feasibility of a massively parallel self-mixing imaging system based on an array of vertical-cavity surface-emitting lasers (VCSELs) to measure surface profiles of displacement, distance, velocity, and liquid flow rate. The concept of the system is demonstrated using a prototype to measure the velocity at different radial points on a rotating disk, and the velocity profile of diluted milk in a custom built diverging-converging planar flow channel. It is envisaged that a scaled up version of the parallel self-mixing imaging system will enable real-time surface profiling, vibrometry, and flowmetry.

Tucker, John R.; Baque, Johnathon L.; Leng Lim, Yah; Zvyagin, Andrei V.; Raki?, Aleksandar D.

2007-09-01

307

Multilevel Parallelism in Computational Chemistry using Common Component Architecture and Global Arrays  

Microsoft Academic Search

The development of complex scientific applications for high-end systems is a challenging task. Addressing complexity of the involved software and algorithms is becoming increasingly difficult and requires appropriate software engineering approaches to address interoperability, maintenance, and software composition challenges. At the same time, the requirements for performance and scalability to thousand processor configurations magnifies the level of difficulties facing the

Manojkumar Krishnan; Yuri Alexeev; Theresa L. Windus; Jarek Nieplocha

2005-01-01

308

PDDP: A data parallel programming model. Revision 1  

SciTech Connect

PDDP, the Parallel Data Distribution Preprocessor, is a data parallel programming model for distributed memory parallel computers. PDDP impelments High Performance Fortran compatible data distribution directives and parallelism expressed by the use of Fortran 90 array syntax, the FORALL statement, and the (WRERE?) construct. Distribued data objects belong to a global name space; other data objects are treated as local and replicated on each processor. PDDP allows the user to program in a shared-memory style and generates codes that are portable to a variety of parallel machines. For interprocessor communication, PDDP uses the fastest communication primitives on each platform.

Warren, K.H.

1995-06-01

309

Imer-product array processor for retrieval of stored images represented by bipolar binary (+1,-1) pixels using partial input trinary pixels represented by (+1,-1)  

NASA Technical Reports Server (NTRS)

An inner-product array processor is provided with thresholding of the inner product during each iteration to make more significant the inner product employed in estimating a vector to be used as the input vector for the next iteration. While stored vectors and estimated vectors are represented in bipolar binary (1,-1), only those elements of an initial partial input vector that are believed to be common with those of a stored vector are represented in bipolar binary; the remaining elements of a partial input vector are set to 0. This mode of representation, in which the known elements of a partial input vector are in bipolar binary form and the remaining elements are set equal to 0, is referred to as trinary representation. The initial inner products corresponding to the partial input vector will then be equal to the number of known elements. Inner-product thresholding is applied to accelerate convergence and to avoid convergence to a negative input product.

Liu, Hua-Kuang (Inventor); Awwal, Abdul A. S. (Inventor); Karim, Mohammad A. (Inventor)

1993-01-01

310

Parallel multispot smFRET analysis using an 8-pixel SPAD array  

NASA Astrophysics Data System (ADS)

Single-molecule Förster resonance energy transfer (smFRET) is a powerful tool for extracting distance information between two fluorophores (a donor and acceptor dye) on a nanometer scale. This method is commonly used to monitor binding interactions or intra- and intermolecular conformations in biomolecules freely diffusing through a focal volume or immobilized on a surface. The diffusing geometry has the advantage to not interfere with the molecules and to give access to fast time scales. However, separating photon bursts from individual molecules requires low sample concentrations. This results in long acquisition time (several minutes to an hour) to obtain sufficient statistics. It also prevents studying dynamic phenomena happening on time scales larger than the burst duration and smaller than the acquisition time. Parallelization of acquisition overcomes this limit by increasing the acquisition rate using the same low concentrations required for individual molecule burst identification. In this work we present a new two-color smFRET approach using multispot excitation and detection. The donor excitation pattern is composed of 4 spots arranged in a linear pattern. The fluorescent emission of donor and acceptor dyes is then collected and refocused on two separate areas of a custom 8-pixel SPAD array. We report smFRET measurements performed on various DNA samples synthesized with various distances between the donor and acceptor fluorophores. We demonstrate that our approach provides identical FRET efficiency values to a conventional single-spot acquisition approach, but with a reduced acquisition time. Our work thus opens the way to high-throughput smFRET analysis on freely diffusing molecules.

Ingargiola, A.; Colyer, R. A.; Kim, D.; Panzeri, F.; Lin, R.; Gulinatti, A.; Rech, I.; Ghioni, M.; Weiss, S.; Michalet, X.

2012-02-01

311

Parallel detection of harmful algae using reverse transcription polymerase chain reaction labeling coupled with membrane-based DNA array.  

PubMed

Harmful algal blooms (HABs) are a global problem, which can cause economic loss to aquaculture industry's and pose a potential threat to human health. More attention must be made on the development of effective detection methods for the causative microalgae. The traditional microscopic examination has many disadvantages, such as low efficiency, inaccuracy, and requires specialized skill in identification and especially is incompetent for parallel analysis of several morphologically similar microalgae to species level at one time. This study aimed at exploring the feasibility of using membrane-based DNA array for parallel detection of several microalgae by selecting five microaglae, including Heterosigma akashiwo, Chaetoceros debilis, Skeletonema costatum, Prorocentrum donghaiense, and Nitzschia closterium as test species. Five species-specific (taxonomic) probes were designed from variable regions of the large subunit ribosomal DNA (LSU rDNA) by visualizing the alignment of LSU rDNA of related species. The specificity of the probes was confirmed by dot blot hybridization. The membrane-based DNA array was prepared by spotting the tailed taxonomic probes onto positively charged nylon membrane. Digoxigenin (Dig) labeling of target molecules was performed by multiple PCR/RT-PCR using RNA/DNA mixture of five microalgae as template. The Dig-labeled amplification products were hybridized with the membrane-based DNA array to produce visible hybridization signal indicating the presence of target algae. Detection sensitivity comparison showed that RT-PCR labeling (RPL) coupled with hybridization was tenfold more sensitive than DNA-PCR-labeling-coupled with hybridization. Finally, the effectiveness of RPL coupled with membrane-based DNA array was validated by testing with simulated and natural water samples, respectively. All of these results indicated that RPL coupled with membrane-based DNA array is specific, simple, and sensitive for parallel detection of microalgae which shows promise for monitoring natural samples in the future. PMID:24338073

Zhang, Chunyun; Chen, Guofu; Ma, Chaoshuai; Wang, Yuanyuan; Zhang, Baoyu; Wang, Guangce

2014-03-01

312

Execution models of prolog for parallel computers  

SciTech Connect

The research described in this book addresses the semantic gap between logic programming languages and the architecture of parallel computers the problem of how to implement logic programming languages on parallel computers in a way that can most effectively exploit the inherent parallelism of the language and efficiently utilize the parallel architecture of the computer. Following a review of other research results, the first project explores the possibilities of implementing logic programs on MIMD, nonshared memory massively parallel computers containing 100 to 1,000 processing elements. The second investigates the possibility of implementing Prolog on a distributed processor array. The author's objectives are to define a parallel computational paradigm (the extended cellular-dataflow model) that can be used to create a parallel Prolog abstract machine.

Kacsuk, P.

1990-01-01

313

Parallelizing the Data Cube  

Microsoft Academic Search

This paper presents a general methodology for the efficient parallelization of existing data cube construction algorithms. We describe two different partitioning strategies, one for top-down and one for bottom- up cube algorithms. Both partitioning strategies assign subcubes to individual processors in such a way that the loads assigned to the processors are balanced. Our methods reduce inter processor communication overhead

Frank K. H. A. Dehne; Todd Eavis; Susanne E. Hambrusch; Andrew Rau-chaplin

2001-01-01

314

FPGA-Based Coprocessor for Singular Value Array Reconciliation Tomography  

Microsoft Academic Search

We present an FPGA-based co-processor for accelerating computations associated with Singular Value Array Reconciliation Tomography (SART), a recently developed method for RF source localization. The co-processor allows this relatively complex computational task to be performed using less hardware and less power than would be required by a microprocessor-based computing cluster with comparable throughput and accuracy. The architecture exploits parallelism of

Jack Coyne; David Cyganski; R. James Duckworth

2008-01-01

315

Parallel Optical-Transmission Module Using Vertical-Cavity Surface-Emitting Laser Array and Micro-Optical Bench (MOB)  

NASA Astrophysics Data System (ADS)

In this paper, we present a scheme of a parallel optical transmission module using a vertical-cavity surface-emitting laser (VCSEL) array and a micro-optical bench (MOB). We have fabricated a prototype using a VCSEL array emitting at 850 nm and a MOB for the first time, and demonstrated the feasibility of a two-dimensional alignment-free optical interconnect module. No noticeable degradation of I-V, and I-L characteristics of the packaged VCSEL array was observed after the proposed packaging process. The excess loss of the MOB alignment scheme using a multi-mode fiber was evaluated to be about 0.5 dB.

Shimada, Yuji; Aoki, Yasuhiko; Iga, Kenichi

2001-02-01

316

Parallel array of YBa2Cu3O7-? superconducting Josephson vortex-flow transistors with high current gains  

NASA Astrophysics Data System (ADS)

We have developed a Josephson vortex-flow transistor based on a parallel array of 440 YBa2Cu3O7-? bicrystal grain boundary Josephson junctions. The array's critical current Ic was measured as a function of the control current Ictrl through a control line that is inductively coupled to the array. The device has a highly asymmetric Ic(Ictrl) curve with several regions where a switching behaviour is observed characterized by a maximum current gain gmax = ?Ic/?Ictrl of 19 and a significant dynamic range of 20 ?A at 77 K. In the range 4.7-92 K gmax versus temperature is non-monotonic with a maximum recorded at 77 K.

Chesca, Boris; John, Daniel; Kemp, Matthew; Brown, Jeffrey; Mellor, Christopher

2013-08-01

317

Quadrature transmit array design using single-feed circularly polarized patch antenna for parallel transmission in MR imaging  

PubMed Central

Quadrature coils are often desired in MR applications because they can improve MR sensitivity and also reduce excitation power. In this work, we propose, for the first time, a quadrature array design strategy for parallel transmission at 298 MHz using single-feed circularly polarized (CP) patch antenna technique. Each array element is a nearly square ring microstrip antenna and is fed at a point on the diagonal of the antenna to generate quadrature magnetic fields. Compared with conventional quadrature coils, the single-feed structure is much simple and compact, making the quadrature coil array design practical. Numerical simulations demonstrate that the decoupling between elements is better than –35 dB for all the elements and the RF fields are homogeneous with deep penetration and quadrature behavior in the area of interest. Bloch equation simulation is also performed to simulate the excitation procedure by using an 8-element quadrature planar patch array to demonstrate its feasibility in parallel transmission at the ultrahigh field of 7 Tesla.

Pang, Yong; Yu, Baiying; Vigneron, Daniel B.

2014-01-01

318

Taming Compiler to Work with Multicore Processors  

Microsoft Academic Search

We present a parallelization scheme involving extracting intra block parallelism within sequential programs which are in SSA form and scheduling block on to multicore processor. Since we are working on SSA form program, we are able to exploit more parallelism compared to existing parallelization compilers. Also an attempt is made to schedule to multiple cores taking by number of registers

D. C. Kiran; S. Gurunarayanan; J. P. Misra

2011-01-01

319

Parallel algorithms for planar graph isomorphism and related problems  

Microsoft Academic Search

Parallel algorithms for planar graph isomorphism and several related problems are presented. Two models of parallel computation are considered: the CREW-PRAM model and the two-dimensional array of processors. The results include O(?n)-time mesh algorithms for finding a good separating cycle and the triconnected components of a planar graph, and for solving the single-function coarsest partitioning problem

J. Jaja; S. R. Kosaraju

1988-01-01

320

Scalable Parallel Programming with CUDA  

Microsoft Academic Search

The advent of multicore CPUs and manycore GPUs means that mainstream processor chips are now parallel systems. Furthermore, their parallelism continues to scale with Moore's law. The challenge is to develop mainstream application software that transparently scales its parallelism to leverage the increasing number of processor cores, much as 3D graphics applications transparently scale their parallelism to manycore GPUs with

John Nickolls; Ian Buck; Michael Garland; Kevin Skadron

2008-01-01

321

Generalized Cannon's algorithm for parallel matrix multiplication  

Microsoft Academic Search

Cannon’s algorithm is a memory-efficient matrix multiplication technique for parallel computers with toroidal mesh interconnections. This algorithm assumes that input matrices are block distributed, but it is not clear how it can deal with block-cyclic distributed matrices. This paper generalizes Cannon’s algorithm for the case when input matrices are blockcyclic distributed across a two-dimensional processor array with an arbitrary number

Hyuk-Jae Lee; James P. Robertson; José A. B. Fortes

1997-01-01

322

SIMD massively parallel processing system for real-time image processing  

Microsoft Academic Search

This paper will describe the embedded SIMD massively parallel processor that we have developed for real-time image processing applications, such as real-time small target detection and tracking and video processing. The processor array is based on SIMD chip BAP-128 designed by our own, and uses high performance DSP TMS320C31, which can effectively perform serial and floating point calculations, as the

Xiaochu Chen; Ming Zhang; Qingdong Yao; Jilin Liu; Hong Ye; Song Wu; Dongxiao Li; Yong Zhang; Lei Ding; Zhongyang Yao; Weijian Yang; Qiaohai Pan

1998-01-01

323

Highly efficient oxide-confined VCSEL arrays for parallel optical interconnects  

Microsoft Academic Search

We have designed and fabricated 1464-4258\\/1\\/2\\/030\\/img1 and 1464-4258\\/1\\/2\\/030\\/img2 VCSEL arrays at 850 and 980 nm operation wavelength, respectively, which are designed for maximum single-mode output power and high-frequency applications. GaAs VCSELs in the 1464-4258\\/1\\/2\\/030\\/img1 array show record high single-mode CW powers up to 4.8 mW. Individual devices of the 1464-4258\\/1\\/2\\/030\\/img2 InGaAs VCSEL array exhibit small-signal modulation bandwidths exceeding 10 GHz.

C. Jung; R. King; R. Jäger; M. Grabherr; F. Eberhard; R. Michalzik; K. J. Ebeling

1999-01-01

324

Parallel dip-pen nanolithography with arrays of individually addressable cantilevers  

NASA Astrophysics Data System (ADS)

In dip-pen nanolithography (DPN), nanoscale chemical patterns are created by directly transferring chemical molecules from the tip of an atomic force microscope probe to a surface. We report the development of a thermally actuated probe array for DPN applications. The array consists of ten thermal bimorph actuated probes, each 300 ?m long, with a lateral spacing of 100 ?m. The probes are actuated by passing dc current through a heater embedded in the probe base. The array is demonstrated by using it to simultaneously write ten different octadecanethiol patterns on a gold surface.

Bullen, David; Chung, Sung-Wook; Wang, Xuefeng; Zou, Jun; Mirkin, Chad A.; Liu, Chang

2004-02-01

325

Parallel multipoint recording of aligned and cultured neurons on micro channel array toward cellular network analysis  

Microsoft Academic Search

This paper describes an advanced Micro Channel Array (MCA) for recording electrophysiological signals of neuronal networks\\u000a at multiple points simultaneously. The developed MCA is designed for neuronal network analysis which has been studied by the\\u000a co-authors using the Micro Electrode Arrays (MEA) system, and employs the principles of extracellular recordings. A prerequisite\\u000a for extracellular recordings with good signal-to-noise ratio is

Wataru Tonomura; Hiroyuki Moriguchi; Yasuhiko Jimbo; Satoshi Konishi

2010-01-01

326

Conjoining soft-core FPGA processors  

Microsoft Academic Search

Soft-core programmable processors on field-programmable gate arrays (FPGAs) can be custom synthesized to instantiate only those hardware units, such as multipliers and floating-point units, that an application requires to meet performance demands, thus minimizing soft-core size on the FPGA. Conjoining processors, meaning to share hardware units among two or more processors, can further reduce soft-core size, leaving more resources for

David Sheldon; Rakesh Kumar; Frank Vahid; Dean M. Tullsen; Roman L. Lysecky

2006-01-01

327

3D optical interconnect mesh network for on-board parallel multiprocessor system based on EOPCB  

NASA Astrophysics Data System (ADS)

A three-dimensional (3-D) 4×4×4 optical interconnect Mesh network scheme for parallel multiprocessor system based on polymer light waveguide electro-optical printed circuit board(EOPCB) is proposed in this paper. The Mesh topological structures of light waveguide interconnects for processor element chip-to-chip on a board, and board-toboard on backplane is constructed. The system consists of 64 processor element chips interconnected in a 3-D Mesh network configuration. Every processor board comprises 4x4 processor element chips with Mesh interconnection. Board-to-board Mesh interconnects are established on a backplane through light waveguide Mesh interconnect topological structure. An additional optical layer with light waveguide structure is used in conventional PCB to construct EOPCB. Vertical cavity surface emitting laser (VCSEL) array is used as optical transmitter array. PIN photodiode array is used as optical receiver array. A MT-compatible direct coupling method is presented to couple light beam between optical transmitter/receiver with light waveguide layer. The optical signals from a processor element chip on a board can transmit to another processor element chip on another board through light waveguide interconnection in the backplane. So 3-D optical interconnection Mesh network for parallel multiprocessor system can be reailzed by EOPCB.

Luo, Fengguang; Cao, Mingcui; Zhou, Xinjun; Xu, Jun; Luo, Zhixiang; Yuan, Jing; Zong, Liangjia; Feng, Yonghua; Chen, Chao; Zhang, Conghui

2007-11-01

328

Parallel imaging performance investigation of an 8-channel common-mode differential-mode (CMDM) planar array for 7T MRI  

PubMed Central

An 8-channel planar phased array was proposed based on the common-mode differential-mode (CMDM) structure for ultrahigh field MRI. The parallel imaging performance of the 8-channel CMDM planar array was numerically investigated based on electromagnetic simulations and Cartesian sensitivity encoding (SENSE) reconstruction. The signal-to-noise ratio (SNR) of multichannel images combined using root-sum-of-squares (rSoS) and covariance weighted root-sum-of-squares (Cov-rSoS) at various reduction factors were compared between 8-channel CMDM array and 4-channel CM and DM array. The results of the study indicated the 8-channel CMDM array excelled the 4-channel CM and DM in SNR. The g-factor maps and artifact power were calculated to evaluate parallel imaging performance of the proposed 8-channel CMDM array. The artifact power of 8-channel CMDM array was reduced dramatically compared with the 4-channel CM and DM arrays demonstrating the parallel imaging feasibility of the CMDM array.

Hu, Xiaoqing; Chen, Xiao; Liu, Xin; Zheng, Hairong

2014-01-01

329

Dynamically reconfigurable optical morphological processor and its applications  

NASA Technical Reports Server (NTRS)

An innovative optically implemented morphological processor is introduced. With the use of a large space-bandwidth-product Dammann grating and a high-speed shutter spatial light modulator, effective structuring element with large size and arbitrary shape can be constructed with dynamic reconfigurability. This reconfigurability is a major improvement over the conventional correlator-based morphological processor in which fixed holographic filters are used as structuring elements (Casasent and Botha, 1988). A novel two-dimensional thresholding photodetector array, capable of performing parallel thresholding and feedback, is utilized in this system and makes possible the implementation of many complex morphological operations requiring iterative feedbacks and full programmability. The optical architecture and the principle of operation are presented. Experimental demonstration of binary image morphological erosion, dilation, opening, and closing are also demonstrated. A technique for extending this technique to gray-scale image using thresholding decomposition technique is also discussed.

Chao, Tien-Hsin

1993-01-01

330

Fast and flexible genetic algorithm processor  

Microsoft Academic Search

In this paper a generic genetic algorithm processor (GAP) with high flexibility in parameter tuning is introduced. The proposed processor utilizes pipeline structure to have low processing time. In order to further increase in the speed, genetic population has been duplicated, one for replacement stage of genetic algorithm (GA) and another for selection phase. Additionally, parallel processing method in the

Pourya Hoseini; Abdollah Khoei; Khayrollah Hadidi; Sajjad Moshfe

2011-01-01

331

Implementation and Assessment of Advanced Analog Vector-Matrix Processor  

NASA Technical Reports Server (NTRS)

This paper discusses the design and implementation of an analog optical vecto-rmatrix coprocessor with a throughput of 128 Mops for a personal computer. Vector matrix calculations are inherently parallel, providing a promising domain for the use of optical calculators. However, to date, digital optical systems have proven too cumbersome to replace electronics, and analog processors have not demonstrated sufficient accuracy in large scale systems. The goal of the work described in this paper is to demonstrate a viable optical coprocessor for linear operations. The analog optical processor presented has been integrated with a personal computer to provide full functionality and is the first demonstration of an optical linear algebra processor with a throughput greater than 100 Mops. The optical vector matrix processor consists of a laser diode source, an acoustooptical modulator array to input the vector information, a liquid crystal spatial light modulator to input the matrix information, an avalanche photodiode array to read out the result vector of the vector matrix multiplication, as well as transport optics and the electronics necessary to drive the optical modulators and interface to the computer. The intent of this research is to provide a low cost, highly energy efficient coprocessor for linear operations. Measurements of the analog accuracy of the processor performing 128 Mops are presented along with an assessment of the implications for future systems. A range of noise sources, including cross-talk, source amplitude fluctuations, shot noise at the detector, and non-linearities of the optoelectronic components are measured and compared to determine the most significant source of error. The possibilities for reducing these sources of error are discussed. Also, the total error is compared with that expected from a statistical analysis of the individual components and their relation to the vector-matrix operation. The sufficiency of the measured accuracy of the processor is compared with that required for a range of typical problems. Calculations resolving alloy concentrations from spectral plume data of rocket engines are implemented on the optical processor, demonstrating its sufficiency for this problem. We also show how this technology can be easily extended to a 100 x 100 10 MHz (200 Cops) processor.

Gary, Charles K.; Bualat, Maria G.; Lum, Henry, Jr. (Technical Monitor)

1994-01-01

332

Application of affymetrix array and massively parallel signature sequencing for identification of genes involved in prostate cancer progression  

PubMed Central

Background Affymetrix GeneChip Array and Massively Parallel Signature Sequencing (MPSS) are two high throughput methodologies used to profile transcriptomes. Each method has certain strengths and weaknesses; however, no comparison has been made between the data derived from Affymetrix arrays and MPSS. In this study, two lineage-related prostate cancer cell lines, LNCaP and C4-2, were used for transcriptome analysis with the aim of identifying genes associated with prostate cancer progression. Methods Affymetrix GeneChip array and MPSS analyses were performed. Data was analyzed with GeneSpring 6.2 and in-house perl scripts. Expression array results were verified with RT-PCR. Results Comparison of the data revealed that both technologies detected genes the other did not. In LNCaP, 3,180 genes were only detected by Affymetrix and 1,169 genes were only detected by MPSS. Similarly, in C4-2, 4,121 genes were only detected by Affymetrix and 1,014 genes were only detected by MPSS. Analysis of the combined transcriptomes identified 66 genes unique to LNCaP cells and 33 genes unique to C4-2 cells. Expression analysis of these genes in prostate cancer specimens showed CA1 to be highly expressed in bone metastasis but not expressed in primary tumor and EPHA7 to be expressed in normal prostate and primary tumor but not bone metastasis. Conclusion Our data indicates that transcriptome profiling with a single methodology will not fully assess the expression of all genes in a cell line. A combination of transcription profiling technologies such as DNA array and MPSS provides a more robust means to assess the expression profile of an RNA sample. Finally, genes that were differentially expressed in cell lines were also differentially expressed in primary prostate cancer and its metastases.

Oudes, Asa J; Roach, Jared C; Walashek, Laura S; Eichner, Lillian J; True, Lawrence D; Vessella, Robert L; Liu, Alvin Y

2005-01-01

333

Processor Scheduling for Multiprocessor Joins  

Microsoft Academic Search

A family of practical algorithms is presented to schedule join execution in a shared-memory multiprocessor environment. The algorithms are based on page connectivity graphs and determine when to read each data page into memory and how to schedule page joins on the available processors. The goal is to overlap page reads with parallel join execution in such a way that

Marguerite C. Murphyt; Doron Rotem

1989-01-01

334

Controlled Growth of Parallel Oriented ZnO Nanostructural Arrays on Ga2O3 Nanowires.  

National Technical Information Service (NTIS)

Novel hierarchical ZnO-Ga2O3 nanostructures were fabricated via a two stage growth process. Nanowires of Ga2O3 were obtained in the first stage by the vapor-liquid-solid mechanism and used as the foundation for growth of self-assembled ordered arrays of Z...

L. Mazeina S. M. Prokes Y. N. Picard

2008-01-01

335

Active ink-jet nozzles equipped with arrayed visual sensors for parallel alignment control  

Microsoft Academic Search

We have made a smart inkjet nozzle array designed for multilayer patterning of functional materials. Each nozzle comprises a visual sensor and a mechanically moving actuator to achieve independent alignment control of the inkjets. The landing position of ejected inkjets were successfully controlled by the actuated nozzle. Waveform patterns of a fluorescent dye were printed by the fabricated prototype. The

K. Hoshino; T. Nagai; Y. Mita; M. Sugiyama; Kiyoshi Matsumoto; Isao Shimoyama

2005-01-01

336

Parallel Microoptics for Interconnections Based on Vertical Cavity Surface Emitting Lasers Arrays  

Microsoft Academic Search

Low-cost vertical cavity surface emitting lasers (VCSELs) may allow the development of broadband optical networks. Designs suitable for microoptics and “jisso” (a Japanese terms which includes alignment, assembly, mounting and packaging) technologies are required for low-cost and small interconnection modules. Optical interconnections based on VCSEL arrays including both space and wavelength division multiplexing technologies are described.

Shigeru Kawai

2003-01-01

337

Parallel and Distributed Computing Architectures and Algorithms for Fault-Tolerant Sonar Arrays.  

National Technical Information Service (NTIS)

This report summarizes the progress and results of the third year of a three-year study whose goal is the use of fault-tolerant distributed and parallel processing techniques to decrease the cost and improve the performance and reliability of large, batte...

A. D. George

1999-01-01

338

Disposable micro-fluidic biosensor array for online parallelized cell adhesion kinetics analysis on quartz crystal resonators  

NASA Astrophysics Data System (ADS)

In this contribution we present a new disposable micro-fluidic biosensor array for the online analysis of adherent Madin Darby canine kidney (MDCK-II) cells on quartz crystal resonators (QCRs). The device was conceived for the parallel cultivation of cells providing the same experimental conditions among all the sensors of the array. As well, dedicated sensor interface electronics were developed and optimized for fast spectra acquisition of all 16 QCRs with a miniaturized impedance analyzer. This allowed performing cell cultivation experiments for the observation of fast cellular reaction kinetics with focus on the comparison of the resulting sensor signals influenced by different cell distributions on the sensor surface. To prove the assumption of equal flow circulation within the symmetric micro-channel network and support the hypothesis of identical cultivation conditions for the cells living above the sensors, the influence of fabrication tolerances on the flow regime has been simulated. As well, the shear stress on the adherent cell layer due to the flowing media was characterized. Injection molding technology was chosen for the cheap mass production of disposable devices. Furthermore, the injection molding process was simulated in order to optimize the mold geometry and minimize the shrinkage and the warpage of the parts. MDCK-II cells were cultivated in the biosensor array. Parallel cultivation of cells on the gold surface of the QCRs led to first observations of the impact of the cell distribution on the sensor signals during cell cultivation. Indeed, the initial cell distribution revealed a significant influence on the changes in the measured acoustic load on the QCRs suggesting dissimilar cell migrations as well as proliferation kinetics of a non-confluent MDCK-II cell layer.

Cama, G.; Jacobs, T.; Dimaki, M. I.; Svendsen, W. E.; Hauptmann, P.; Naumann, M.

2010-08-01

339

Dual flux-to-voltage response of YBa2Cu3O7?? asymmetric parallel arrays of Josephson junctions  

NASA Astrophysics Data System (ADS)

We fabricated a parallel array of 440 YBa2Cu3O7?? bicrystal grain boundary Josephson junctions having an inductive asymmetric loop configuration within the array. Families of current–voltage characteristics (IVCs) have been measured in the temperature range (4.7–92) K for various values of a magnetic flux applied via a control current Ictrl. For both positive and negative current biases, I current-driven chains of magnetic vortices are propagating along the array producing flux-flow current resonances on the IVCs. However, at 77 K and above, due to the system’s inductive asymmetry the flux flow is suppressed (enhanced) for negative (positive) I. Consequently, the system shows a dual flux-to-voltage response. For negative I it operates like a flux-interferometer having a rather sinusoidal V (Ictrl) response. In contrast, for positive I the device’s response V (Ictrl) remains periodic but highly non-sinusoidal due to the interplay between multiple flux-flow modes. Below 60 K such a dual behaviour is far less pronounced as a result of flux-flow modes being suppressed due to a decrease of the dissipation coefficient with temperature.

Chesca, Boris; John, Daniel; Mellor, Christopher J.

2014-05-01

340

Parallel Recognition of Series-Parallel Graphs  

Microsoft Academic Search

Recently, He and Yesha gave an algorithm for recognizing directed series parallel graphs, in time O(log2n) with linearly many EREW processors. We give a new algorithm for this problem, based on a structural characterization of series parallel graphs in terms of their ear decompositions. Our algorithm can recognize undirected as well as directed series parallel graphs. It can be implemented

David Eppstein

1992-01-01

341

Characterization of parallel computers and algorithms  

NASA Astrophysics Data System (ADS)

The principal current development in computing is the advent of the parallel computer in all its various forms - for example pipelined vector computers (CRAY-1 and CYBER 205) and arrays of processors (ICL DAP). This paper defines a two-parameter characterization of such computers that measures both the maximum performance and the amount of hardware parallelism. This allows a rational comparison of the performance of alternative algorithms on widely differing computers. As an example we consider the choice of the best algorithm for the solution of tridiagonal systems of equations.

Hockney, R. W.

1982-06-01

342

A three-dimensional architecture for a parallel processing photosensing array.  

PubMed

A three-dimensional architecture for a photosensing array has been developed. This silicon based architecture consists of a 10 x 10 array of photosensors with 80 microns diameter, through chip interconnects to the back side of a 500 microns thick silicon wafer. Each photosensor consists of a 300 x 300 microns pn-junction photodiode. The following processes were used to create this photosensing architecture: 1) thermomigration of aluminum pads through an n-type silicon wafer; 2) creation of pn-junction photosensors on one side of the wafer; and 3) creation of aluminum pad ohmic contacts to the thermomigrated, through chip interconnects and the substrate on the back side of the wafer. The electrical and optical characteristics of the three-dimensional architecture indicates that it should be well suited as a photosensing framework around which a "silicon retina" could be built. PMID:1487292

Johansson, T; Abbasi, M; Huber, R J; Normann, R A

1992-12-01

343

High performance selectively oxidized VCSELs and arrays for parallel high-speed optical interconnects  

Microsoft Academic Search

High-bandwidth single-mode selectively oxidized vertical-cavity surface-emitting laser (VCSEL) arrays operate at 980 nm or 850 nm emission wavelength for substrate or epitaxial side emission. Coplanar feeding lines and polyimide passivation are used to reduce electrical parasitics in top-emitting GaAs and bottom-emitting InGaAs VCSELs. To enhance fundamental single-mode emission for larger devices of reduced series resistance a surface relief transverse mode

Felix Mederer; Irene Ecker; Jürgen Joos; Max Kicherer; Heiko J. Unold; Karl Joachim Ebeling; Martin Grabherr; Roland Jäger; Roger King; Dieter Wiedenmann

2001-01-01

344

Low-crosstalk 10Gb\\/s flip-chip array module for parallel optical interconnects  

Microsoft Academic Search

A 10-Gb\\/s optical receiver array with low interchannel crosstalk is realized by exploiting an InGaP-GaAs heterojunction bipolar transistor technology in a three-dimensional multilayer low-temperature cofiring ceramics (LTCC) module. Neutralization feedback circuit with LTCC embedded bus structure is proposed to suppress significant high-frequency crosstalk from on-chip bus and intermetallic capacitance. This module demonstrates 5 dB better suppressed-coupling than a conventional on-chip

Sang Hyun Park; Sung Min Park; Hyo-Hoon Park; Chul Soon Park

2005-01-01

345

1 4 Ge-on-SOI PIN Photodetector Array for Parallel Optical Interconnects  

Microsoft Academic Search

High-quality Ge film was epitaxially grown on silicon on insulator using the ultrahigh vacuum chemical vapor deposition. In this paper, we demonstrated that the efficient 1 ?? 4 germanium-on-silicon p-i-n photodetector arrays with 1.0 ??m Ge film had a responsivity as high as 0.65 A\\/W at 1.31 ??m and 0.32 A\\/W at 1.55 ??m, respectively. The dark current density was

Chunlai Xue; Haiyun Xue; Buwen Cheng; Weixuan Hu; Yude Yu; Qiming Wang

2009-01-01

346

Ultra-Wideband Tapered Slot Antenna Arrays with Parallel-Plate Waveguides  

Microsoft Academic Search

Owing to their ultra-wideband characteristics, tapered slot antennas (TSAs) are used as element antennas in wideband phased arrays. However, when the size of a TSA is reduced in order to prevent the generation of a grating lobe during wide-angle beam scanning, the original ultra-wideband characteristics are degraded because of increased reflections from the ends of the tapered slot aperture. To

Satoshi Yamaguchi; Hiroaki Miyashita; Toru Takahashi; Masataka Otsuka; Yoshihiko Konishi

2010-01-01

347

Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays  

Microsoft Academic Search

We describe a novel sequencing approach that combines non-gel-based signature sequencing with in vitro cloning of millions of templates on separate 5 ?m diameter microbeads. After constructing a microbead library of DNA templates by in vitro cloning, we assembled a planar array of a million template-containing microbeads in a flow cell at a density greater than 3 × 106 microbeads\\/cm2.

Maria Johnson; John Bridgham; George Golda; David H. Lloyd; Davida Johnson; Shujun Luo; Sarah McCurdy; Michael Foy; Mark Ewan; Rithy Roth; Dave George; Sam Eletr; Glenn Albrecht; Eric Vermaas; Steven R. Williams; Keith Moon; Timothy Burcham; Michael Pallas; Robert B. DuBridge; James Kirchner; Karen Fearon; Jen-i Mao; Kevin Corcoran; Sydney Brenner

2000-01-01

348

Development of parallel architectures for sensor array-processing algorithms. Semi-Annual report  

SciTech Connect

The high resolution direction of arrival (DOA) estimation has been an important area of research for a number of years. Many researchers have developed a variety of algorithms to estimate the direction of arrival. Another important aspect of the DOA estimation area is the development of high speed hardware capable of computing the DOA in real time. In this research the authors have first focussed on the development of parallel architecture for multiple signal classification (MUSIC) and estimation of signal parameters by rotational invariance technique (ESPRIT) algorithms for the narrow band sources. These algorithms are substituted with computationally efficient modules and converted to pipelined and parallel algorithms. For example one important computation of eigendecomposition of the covariance matrix has been performed using Householders transformations and QR method.

Jamali, M.M.; Kwatra, S.C.; Djoudi, A.; Sheelvant, R.; Rao, M.

1991-08-01

349

Database Reorganization in Parallel Disk Arrays with I/O Service Stealing  

NASA Technical Reports Server (NTRS)

We present a model for data reorganization in parallel disk systems that is geared towards load balancing in an environment with periodic access patterns. Data reorganization is performed by disk cooling, i.e. migrating files or extents from the hottest disks to the coldest ones. We develop an approximate queueing model for determining the effective arrival rates of cooling requests and discuss its use in assessing the costs versus benefits of cooling.

Zabback, Peter; Onyuksel, Ibrahim; Scheuermann, Peter; Weikum, Gerhard

1996-01-01

350

Field Programmable Gate Array Based Parallel Strapdown Algorithm Design for Strapdown Inertial Navigation Systems  

PubMed Central

A new generalized optimum strapdown algorithm with coning and sculling compensation is presented, in which the position, velocity and attitude updating operations are carried out based on the single-speed structure in which all computations are executed at a single updating rate that is sufficiently high to accurately account for high frequency angular rate and acceleration rectification effects. Different from existing algorithms, the updating rates of the coning and sculling compensations are unrelated with the number of the gyro incremental angle samples and the number of the accelerometer incremental velocity samples. When the output sampling rate of inertial sensors remains constant, this algorithm allows increasing the updating rate of the coning and sculling compensation, yet with more numbers of gyro incremental angle and accelerometer incremental velocity in order to improve the accuracy of system. Then, in order to implement the new strapdown algorithm in a single FPGA chip, the parallelization of the algorithm is designed and its computational complexity is analyzed. The performance of the proposed parallel strapdown algorithm is tested on the Xilinx ISE 12.3 software platform and the FPGA device XC6VLX550T hardware platform on the basis of some fighter data. It is shown that this parallel strapdown algorithm on the FPGA platform can greatly decrease the execution time of algorithm to meet the real-time and high precision requirements of system on the high dynamic environment, relative to the existing implemented on the DSP platform.

Li, Zong-Tao; Wu, Tie-Jun; Lin, Can-Long; Ma, Long-Hua

2011-01-01

351

Field programmable gate array based parallel strapdown algorithm design for strapdown inertial navigation systems.  

PubMed

A new generalized optimum strapdown algorithm with coning and sculling compensation is presented, in which the position, velocity and attitude updating operations are carried out based on the single-speed structure in which all computations are executed at a single updating rate that is sufficiently high to accurately account for high frequency angular rate and acceleration rectification effects. Different from existing algorithms, the updating rates of the coning and sculling compensations are unrelated with the number of the gyro incremental angle samples and the number of the accelerometer incremental velocity samples. When the output sampling rate of inertial sensors remains constant, this algorithm allows increasing the updating rate of the coning and sculling compensation, yet with more numbers of gyro incremental angle and accelerometer incremental velocity in order to improve the accuracy of system. Then, in order to implement the new strapdown algorithm in a single FPGA chip, the parallelization of the algorithm is designed and its computational complexity is analyzed. The performance of the proposed parallel strapdown algorithm is tested on the Xilinx ISE 12.3 software platform and the FPGA device XC6VLX550T hardware platform on the basis of some fighter data. It is shown that this parallel strapdown algorithm on the FPGA platform can greatly decrease the execution time of algorithm to meet the real-time and high precision requirements of system on the high dynamic environment, relative to the existing implemented on the DSP platform. PMID:22164058

Li, Zong-Tao; Wu, Tie-Jun; Lin, Can-Long; Ma, Long-Hua

2011-01-01

352

Optimal expression evaluation for data parallel architectures  

NASA Technical Reports Server (NTRS)

A data parallel machine represents an array or other composite data structure by allocating one processor (at least conceptually) per data item. A pointwise operation can be performed between two such arrays in unit time, provided their corresponding elements are allocated in the same processors. If the arrays are not aligned in this fashion, the cost of moving one or both of them is part of the cost of the operation. The choice of where to perform the operation then affects this cost. If an expression with several operands is to be evaluated, there may be many choices of where to perform the intermediate operations. An efficient algorithm is given to find the minimum-cost way to evaluate an expression, for several different data parallel architectures. This algorithm applies to any architecture in which the metric describing the cost of moving an array is robust. This encompasses most of the common data parallel communication architectures, including meshes of arbitrary dimension and hypercubes. Remarks are made on several variations of the problem, some of which are solved and some of which remain open.

Gilbert, John R.; Schreiber, Robert

1990-01-01

353

Electro-optical microwave signal processor for high-frequency wideband frequency channelization  

NASA Astrophysics Data System (ADS)

An electro-optic microwave signal processor for activity monitoring in an electronic warfare receiver, offering wideband operation, parallel output in real time and 100 percent probability of intercept is presented, along with results from a prototype system. Requirements on electronic warfare receiver system are demanding, because they have to defect and identify potential threats across a large frequency bandwidth and in the high pulse density expected of the battlefield environment. A technique of processing signals across a wide bandwidth is to use a channelizer in the receiver front-end, in order to produce a number of narrow band outputs that can be individually processed. In the presented signal processor, received microwave signals ar unconverted onto an optical carrier using an electro- optic modulator and then spatially separated into a series of spots. The position and intensity of the spots is determined by the received signal(s) frequency and strength. Finally a photodiode array can be used for fast parallel data readout. Thus the signal processor output is fully channelized according to frequency. A prototype signal processor has been constructed, which can process microwave frequencies from 500MHz to 8GHz. A standard telecommunications electro-optic intensity modulator with a 3dB bandwidth of approximately 2.5GHz provides frequency upconversion. Readout is achieved using either a near IR camera or a 16 element linear photodiode array.

Dawber, William N.; Webster, Ken

1998-08-01

354

Supercomputing on massively parallel bit-serial architectures  

NASA Technical Reports Server (NTRS)

Research on the Goodyear Massively Parallel Processor (MPP) suggests that high-level parallel languages are practical and can be designed with powerful new semantics that allow algorithms to be efficiently mapped to the real machines. For the MPP these semantics include parallel/associative array selection for both dense and sparse matrices, variable precision arithmetic to trade accuracy for speed, micro-pipelined train broadcast, and conditional branching at the processing element (PE) control unit level. The preliminary design of a FORTRAN-like parallel language for the MPP has been completed and is being used to write programs to perform sparse matrix array selection, min/max search, matrix multiplication, Gaussian elimination on single bit arrays and other generic algorithms. A description is given of the MPP design. Features of the system and its operation are illustrated in the form of charts and diagrams.

Iobst, Ken

1985-01-01

355

Development of large scale neural network processor toward Brainway computing  

Microsoft Academic Search

Reports on a processor that is suitable for neural networking and the processor simulates 5000 neurons with 1000 connections each and flexible learning rules. Up to 100 of such processors can establish parallel connections with each other, so that the system performance will almost linearly increase 100 times by using 100 chips together. The modeling, learning rule, hardware, software and

H. Yamada; Michinori Ichikawa; Gen Matsumoto

1998-01-01

356

Forced convective air cooling from electronic component arrays in a parallel plate channel  

NASA Astrophysics Data System (ADS)

This paper discusses air forced convection heat transfer from inline protruding elements arranged in eight rows. The streamwise and spanwise spacings between elements were varied using a splitter plate that can be positioned at three different modular configurations. A set of empirical formulas was presented to correlate the experimental data for the design of air cooling systems. Arrays of components with one odd- size module have been tested also. Experimental results show that blocks near the entrance and behind the odd- size module have improved performance compared with uniform arrangements. Accordingly, temperature sensitive components are suggested to be arranged in these locations.

Cai, D. Y.; Gan, Y. P.; Ma, C. F.; Li, Q. X.

1994-09-01

357

Eigenvalue computation of large symmetric tridiagonal matrices on concurrent processors  

NASA Technical Reports Server (NTRS)

Symmetric tridiagonal eigenvalue problems may arise indirectly in structural dynamic analysis. An algorithm for eigenvalue computation of large symmetric tridiagonal matrices on concurrent processors to meet the challenge of the new emerging computer hardware technology is presented. A standard bisection method in conjunction with Sylvester's Theorem is chosen to be converted into a parallel N-section algorithm. This parallel algorithm takes advantage of the multi-processor environment by carrying out N (number of processors) triangular factorizations of chosen shifted matrices in all processors concurrently and by minimizing communication between processors. The algorithm is designed for local-memory concurrent processors, i.e. message passing type processors. The efficiency and speed-up are given in terms of problem and machine parameters. The algorithm is very efficient when both the number of processors and the number of eigenvalues to be extracted are much smaller than the order of the tridiagonal matrix.

Chang, H. Y.; Utku, S.; Salama, M.

1988-01-01

358

Parallel array of nanochannels grafted with polymer-brushes-stabilized Au nanoparticles for flow-through catalysis  

NASA Astrophysics Data System (ADS)

Smart systems on the nanometer scale for continuous flow-through reaction present fascinating advantages in heterogeneous catalysis, in which a parallel array of straight nanochannels offers a platform with high surface area for assembling and stabilizing metallic nanoparticles working as catalysts. Herein we demonstrate a method for finely modifying the nanoporous anodic aluminum oxide (AAO), and further integration of nanoreactors. By using atomic transfer radical polymerization (ATRP), polymer brushes were successfully grafted on the inner wall of the nanochannels of the AAO membrane, followed by exchanging counter ions with a precursor for nanoparticles (NPs), and used as the template for deposition of well-defined Au NPs. The membrane was used as a functional nanochannel for novel flow-through catalysis. High catalytic performance and instantaneous separation of products from the reaction system was achieved in reduction of 4-nitrophenol.

Liu, Jianxi; Ma, Shuanhong; Wei, Qiangbing; Jia, Lei; Yu, Bo; Wang, Daoai; Zhou, Feng

2013-11-01

359

Parallel array of nanochannels grafted with polymer-brushes-stabilized Au nanoparticles for flow-through catalysis.  

PubMed

Smart systems on the nanometer scale for continuous flow-through reaction present fascinating advantages in heterogeneous catalysis, in which a parallel array of straight nanochannels offers a platform with high surface area for assembling and stabilizing metallic nanoparticles working as catalysts. Herein we demonstrate a method for finely modifying the nanoporous anodic aluminum oxide (AAO), and further integration of nanoreactors. By using atomic transfer radical polymerization (ATRP), polymer brushes were successfully grafted on the inner wall of the nanochannels of the AAO membrane, followed by exchanging counter ions with a precursor for nanoparticles (NPs), and used as the template for deposition of well-defined Au NPs. The membrane was used as a functional nanochannel for novel flow-through catalysis. High catalytic performance and instantaneous separation of products from the reaction system was achieved in reduction of 4-nitrophenol. PMID:24129356

Liu, Jianxi; Ma, Shuanhong; Wei, Qiangbing; Jia, Lei; Yu, Bo; Wang, Daoai; Zhou, Feng

2013-12-01

360

A 167Processor Computational Platform in 65 nm CMOS  

Microsoft Academic Search

A 167-processor computational platform consists of an array of simple programmable processors capable of per-pro- cessor dynamic supply voltage and clock frequency scaling, three algorithm-specific processors, and three 16 KB shared memories; and is implemented in 65 nm CMOS. All processors and shared memories are clocked by local fully independent, dynamically haltable, digitally-programmable oscillators and are intercon- nected by a

Dean N. Truong; Wayne H. Cheng; Tinoosh Mohsenin; Zhiyi Yu; ANTHONY T. JACOBSON; Michael J. Meeuwsen; Anh T. Tran; Zhibin Xiao; Eric W. Work; Jeremy W. Webb; Paul Mejia; Bevan M. Baas

2009-01-01

361

Parallel image-acquisition in continuous-wave electron paramagnetic resonance imaging with a surface coil array: Proof-of-concept experiments.  

PubMed

This article describes a feasibility study of parallel image-acquisition using a two-channel surface coil array in continuous-wave electron paramagnetic resonance (CW-EPR) imaging. Parallel EPR imaging was performed by multiplexing of EPR detection in the frequency domain. The parallel acquisition system consists of two surface coil resonators and radiofrequency (RF) bridges for EPR detection. To demonstrate the feasibility of this method of parallel image-acquisition with a surface coil array, three-dimensional EPR imaging was carried out using a tube phantom. Technical issues in the multiplexing method of EPR detection were also clarified. We found that degradation in the signal-to-noise ratio due to the interference of RF carriers is a key problem to be solved. PMID:24374749

Enomoto, Ayano; Hirata, Hiroshi

2014-02-01

362

International exploration describes how the natural massive parallelism of the brain inspires novel VLSI designs  

SciTech Connect

Artificial neural networks have experienced spectacular growth in the last few years. However, only very large scale integration (VLSI) processor arrays can realize the true computing potential of massively parallel neural networks. The realization of these neurocomputers--which are optimized to the computation of neural models--follows one of two approaches: {ital general-purpose} neurocomputers that consist of programmable processor arrays for emulating a range of neural network models, or {ital special-purpose} neurocomputers that are dedicated hardware implementations of a specific neural network model.

Treleaven, P.; Pacheco, M.; Vellasco, M. (University Coll., London (UK))

1989-12-01

363

Compact multi-channel LED\\/PD array modules using new assembly techniques for hundred Mb\\/s\\/ch parallel optical transmission  

Microsoft Academic Search

Compact 12-channel LED\\/PD (light emitting diode\\/photodiode) array modules using novel assembly techniques have been developed for high-speed parallel optical transmission. Optical and electronic devices were mounted on a lateral point and the common submount surfaces, respectively, for high-speed operation and module package miniaturizing. The flip-chip technique by solder bumps was employed for optical array chip bonding, in order to simplify

M. Itoh; T. Nagahori; H. Kohashi; H. Haneko; H. Honmou; I. Watanabe; T. Uji; M. Fujiwara

1991-01-01

364

Parallelizing the Data Cube  

Microsoft Academic Search

Abstract. This paper presents a general methodology,for the efficient parallelization of existing data cube construction algorithms. We describe two different partitioning strategies, one for top-down and one for bottom- up cube algorithms. Both partitioning strategies assign subcubes to individual processors in such a way that the loads assigned to the processors are balanced. Our methods reduce inter processor communication,overhead by

Frank K. H. A. Dehne; Todd Eavis; Susanne E. Hambrusch; Andrew Rau-chaplin

2002-01-01

365

High-speed (2.5 Gbps) reconfigurable inter-chip optical interconnects using opto-VLSI processors.  

PubMed

Reconfigurablele optical interconnects enable flexible and high-performance communication in multi-chip architectures to be arbitrarily adapted, leading to efficient parallel signal processing. The use of Opto-VLSI processors as beam steerers and multicasters for reconfigurable inter-chip optical interconnection is discussed. We demonstrate, as proof-of-concept, 2.5 Gbps reconfigurable optical interconnects between an 850nm vertical cavity surface emitting lasers (VCSEL) array and a photodiode (PD) array integrated onto a PCB by driving two Opto-VLSI processors with steering and multicasting digital phase holograms. The architecture is experimentally demonstrated through three scenarios showing its flexibility to perform single, multicasting, and parallel reconfigurable optical interconnects. To our knowledge, this is the first reported high-speed reconfigurable N-to-N optical interconnects architecture, which will have a significant impact on the flexibility and efficiency of large shared-memory multiprocessor machines. PMID:19516864

Aljada, Muhsen; Alameh, Kamal E; Lee, Yong-Tak; Chung, Il-Sug

2006-07-24

366

Photorefractive processing for large adaptive phased arrays.  

PubMed

An adaptive null-steering phased-array optical processor that utilizes a photorefractive crystal to time integrate the adaptive weights and null out correlated jammers is described. This is a beam-steering processor in which the temporal waveform of the desired signal is known but the look direction is not. The processor computes the angle(s) of arrival of the desired signal and steers the array to look in that direction while rotating the nulls of the antenna pattern toward any narrow-band jammers that may be present. We have experimentally demonstrated a simplified version of this adaptive phased-array-radar processor that nulls out the narrow-band jammers by using feedback-correlation detection. In this processor it is assumed that we know a priori only that the signal is broadband and the jammers are narrow band. These are examples of a class of optical processors that use the angular selectivity of volume holograms to form the nulls and look directions in an adaptive phased-array-radar pattern and thereby to harness the computational abilities of three-dimensional parallelism in the volume of photorefractive crystals. The development of this processing in volume holographic system has led to a new algorithm for phased-array-radar processing that uses fewer tapped-delay lines than does the classic time-domain beam former. The optical implementation of the new algorithm has the further advantage of utilization of a single photorefractive crystal to implement as many as a million adaptive weights, allowing the radar system to scale to large size with no increase in processing hardware. PMID:21085246

Weverka, R T; Wagner, K; Sarto, A

1996-03-10

367

Upset Characterization of the PowerPC405 Hard-core Processor Embedded in Virtex-II Pro Field Programmable Gate Arrays  

NASA Technical Reports Server (NTRS)

Shown in this presentation are recent results for the upset susceptibility of the various types of memory elements in the embedded PowerPC405 in the Xilinx V2P40 FPGA. For critical flight designs where configuration upsets are mitigated effectively through appropriate design triplication and configuration scrubbing, these upsets of processor elements can dominate the system error rate. Data from irradiations with both protons and heavy ions are given and compared using available models.

Swift, Gary M.; Allen, Gregory S.; Farmanesh, Farhad; George, Jeffrey; Petrick, David J.; Chayab, Fayez

2006-01-01

368

Parallel Mandelbrot Set Model  

NSDL National Science Digital Library

The Parallel Mandelbrot Set Model is a parallelization of the sequential MandelbrotSet model, which does all the computations on a single processor core. This parallelization is able to use a computer with more than one cores (or processors) to carry out the same computation, thus speeding up the process. The parallelization is done using the model elements in the Parallel Java group. These model elements allow easy use of the Parallel Java library created by Alan Kaminsky. In particular, the parallelization used for this model is based on code in Chapters 11 and 12 of Kaminsky's book Building Parallel Java. The Parallel Mandelbrot Set Model was developed using the Easy Java Simulations (EJS) modeling tool. It is distributed as a ready-to-run (compiled) Java archive. Double click the ejs_chaos_ParallelMandelbrotSet.jar file to run the program if Java is installed.

Franciscouembre

2011-11-24

369

Parallel recognition of cancer cells using an addressable array of solid-state micropores.  

PubMed

Early stage detection and precise quantification of circulating tumor cells (CTCs) in the peripheral blood of cancer patients are important for early diagnosis. Early diagnosis improves the effectiveness of the therapy and results in better prognosis. Several techniques have been used for CTC detection but are limited by their need for dye tagging, low throughput and lack of statistical reliability at single cell level. Solid-state micropores can characterize each cell in a sample providing interesting information about cellular populations. We report a multi-channel device which utilized solid-state micropores array assembly for simultaneous measurement of cell translocation. This increased the throughput of measurement and as the cells passed the micropores, tumor cells showed distinctive current blockade pulses, when compared to leukocytes. The ionic current across each micropore channel was continuously monitored and recorded. The measurement system not only increased throughput but also provided on-chip cross-relation. The whole blood was lysed to get rid of red blood cells, so the blood dilution was not needed. The approach facilitated faster processing of blood samples with tumor cell detection efficiency of about 70%. The design provided a simple and inexpensive method for rapid and reliable detection of tumor cells without any cell staining or surface functionalization. The device can also be used for high throughput electrophysiological analysis of other cell types. PMID:25038540

Ilyas, Azhar; Asghar, Waseem; Kim, Young-Tae; Iqbal, Samir M

2014-12-15

370

Efficient Implementation of OFDM Inner Receiver on a Programmable Multi-Core Processor Platform  

NASA Astrophysics Data System (ADS)

This paper presents an efficient implementation of OFDM inner receiver on a programmable multi-core processor platform with CMMB as an application. The platform consists of an array of programmable SIMD processors interconnected in a 2-D mesh network, which can provide high performance and is quite suitable for wireless communication applications. Implemented on one cluster with 8 cores, the receiver includes symbol timing, carrier frequency offset and sampling frequency offset synchronization, channel estimation and equalization. Multiple optimization techniques are explored to improve system throughput such as: task-level parallelism on many cores, data-level parallelism on SIMD cores, minimization of memory access and route-length-minimization task mapping techniques. Besides, efficient memory strategy and specific instructions for complex computation increase the performance. The simulation results show that the inner receiver could achieve a throughput of up to 120Mbps when operating at 750MHz.

Fan, Wenhua; Chen, Chen; Chen, Yun; Yu, Zhiyi; Zeng, Xiaoyang

371

Parallel Merge Sort  

Microsoft Academic Search

We give a parallel implementation of merge sort on a CREW PRAM that uses n processors and O(logn) time; the constant in the running time is small. We also give a more complex version of the algorithm for the EREW PRAM; it also uses n processors and O(logn) time. The constant in the running time is still moderate, though not

Richard Cole

1986-01-01

372

High-speed (2.5 Gbps) reconfigurable inter-chip optical interconnects using opto-VLSI processors  

Microsoft Academic Search

Reconfigurablele optical interconnects enable flexible and high-performance communication in multi-chip architectures to be arbitrarily adapted, leading to efficient parallel signal processing. The use of Opto-VLSI processors as beam steerers and multicasters for reconfigurable inter-chip optical interconnection is discussed. We demonstrate, as proof-of-concept, 2.5 Gbps reconfigurable optical interconnects between an 850nm vertical cavity surface emitting lasers (VCSEL) array and a photodiode

Muhsen Aljada; Kamal E. Alameh; Yong-Tak Lee; Il-Sug Chung

2006-01-01

373

Dynamic Partial Self-Reconfiguration on Spartan-III FPGAs via a Parallel Configuration Access Port (PCAP)  

Microsoft Academic Search

This paper presents an alternative approach for dynamic par- tial self-reconfiguration that enables a Field Programmable Gate Array (FPGA) to reconfigure itself dynamically and partially through a parallel configuration access port (PCAP) under the control of the stand alone PCAP core within the FPGA instead of using an embedded processor. The reconfiguration process is accomplished without an internal configu- ration

Salih Bayar; Arda Yurdakul

374

Parallel Optimisation  

NSDL National Science Digital Library

An introduction to optimisation techniques that may improve parallel performance and scaling on HECToR. It assumes that the reader has some experience of parallel programming including basic MPI and OpenMP. Scaling is a measurement of the ability for a parallel code to use increasing numbers of cores efficiently. A scalable application is one that, when the number of processors is increased, performs better by a factor which justifies the additional resource employed. Making a parallel application scale to many thousands of processes requires not only careful attention to the communication, data and work distribution but also to the choice of the algorithms to use. Since the choice of algorithm is too broad a subject and very particular to application domain to include in this brief guide we concentrate on general good practices towards parallel optimisation on HECToR.

375

Programmable pipelined image processor  

NASA Technical Reports Server (NTRS)

A pipelined image processor selectively interconnects modules in a column of a two-dimensional array to modules of the next column of the array of modules 1,1 through M,N, where M is the number of modules in one dimension and N is the number of modules in the other direction. Each module includes two input selectors for A and B inputs, two convolvers, a binary function operator, a neighborhood comparison operator which produces an A output and an output selector which may select as a B output the output of any one of the components in the module, including the A output of the neighborhood comparison operator. Each module may be connected to as many as eight modules in the next column, preferably with the majority always in a different row that is up (or down) in the array for a generally spiral data path around the torus thus formed. The binary function operator is implemented as a look-up table addressed by the most significant 8 bits of each 12-bit argument. The table output includes a function value and the slopes for interpolation of the two arguments by multiplying the 4 least significant bits in multipliers and adding the products to the function value through adders.

Gennery, Donald B. (inventor); Wilcox, Brian (inventor)

1988-01-01

376

A cost-effective methodology for the design of massively-parallel VLSI functional units  

NASA Technical Reports Server (NTRS)

In this paper we propose a generalized methodology for the design of cost-effective massively-parallel VLSI Functional Units. This methodology is based on a technique of generating and reducing a massive bit-array on the mask-programmable PAcube VLSI array. This methodology unifies (maintains identical data flow and control) the execution of complex arithmetic functions on PAcube arrays. It is highly regular, expandable and uniform with respect to problem-size and wordlength, thereby reducing the communication complexity. The memory-functional unit interface is regular and expandable. Using this technique functional units of dedicated processors can be mask-programmed on the naked PAcube arrays, reducing the turn-around time. The production cost of such dedicated processors can be drastically reduced since the naked PAcube arrays can be mass-produced. Analysis of the the performance of functional units designed by our method yields promising results.

Venkateswaran, N.; Sriram, G.; Desouza, J.

1993-01-01

377

Self-Tuning Parallelism  

Microsoft Academic Search

Assigning additional processors to a parallel application may slow it down or lead to poor computer utilization. This paper\\u000a demonstrates that it is possible for an application to automatically choose its own, optimal degree of parallelism. The technique\\u000a is based on a simple binary search procedure for finding the optimal number of processors, subject to one of the following\\u000a criteria:

Otilia Werner-kytölä; Walter F. Tichy

2000-01-01

378

Fault-tolerant computer architecture based on INMOS transputer processor  

NASA Technical Reports Server (NTRS)

Redundant processing was used for several years in mission flight systems. In these systems, more than one processor performs the same task at the same time but only one processor is actually in real use. A fault-tolerance computer architecture based on the features provided by INMOS Transputers is presented. The Transputer architecture provides several communication links that allow data and command communication with other Transputers without the use of a bus. Additionally the Transputer allows the use of parallel processing to increase the system speed considerably. The processor architecture consists of three processors working in parallel keeping all the processors at the same operational level but only one processor is in real control of the process. The design allows each Transputer to perform a test to the other two Transputers and report the operating condition of the neighboring processors. A graphic display was developed to facilitate the identification of any problem by the user.

Ortiz, Jorge L.

1987-01-01

379

Implementing Access to Data Distributed on Many Processors  

NASA Technical Reports Server (NTRS)

A reference architecture is defined for an object-oriented implementation of domains, arrays, and distributions written in the programming language Chapel. This technology primarily addresses domains that contain arrays that have regular index sets with the low-level implementation details being beyond the scope of this discussion. What is defined is a complete set of object-oriented operators that allows one to perform data distributions for domain arrays involving regular arithmetic index sets. What is unique is that these operators allow for the arbitrary regions of the arrays to be fragmented and distributed across multiple processors with a single point of access giving the programmer the illusion that all the elements are collocated on a single processor. Today's massively parallel High Productivity Computing Systems (HPCS) are characterized by a modular structure, with a large number of processing and memory units connected by a high-speed network. Locality of access as well as load balancing are primary concerns in these systems that are typically used for high-performance scientific computation. Data distributions address these issues by providing a range of methods for spreading large data sets across the components of a system. Over the past two decades, many languages, systems, tools, and libraries have been developed for the support of distributions. Since the performance of data parallel applications is directly influenced by the distribution strategy, users often resort to low-level programming models that allow fine-tuning of the distribution aspects affecting performance, but, at the same time, are tedious and error-prone. This technology presents a reusable design of a data-distribution framework for data parallel high-performance applications. Distributions are a means to express locality in systems composed of large numbers of processor and memory components connected by a network. Since distributions have a great effect on the performance of applications, it is important that the distribution strategy is flexible, so its behavior can change depending on the needs of the application. At the same time, high productivity concerns require that the user be shielded from error-prone, tedious details such as communication and synchronization.

James, Mark

2006-01-01

380

Processors for generalized stack filters  

Microsoft Academic Search

New processor structures for generalized stack filters are proposed. They can be implemented using different numbers of Boolean function units. The class of pipeline-parallel structures for generalized stack filters is simple and modular in structure, and suitable for VLSI implementation. Coder and decoder networks are developed for the mutual transform of binary-weighted and unary-weighted codes as the thresholding and addition

D. Akopian; O. Vainio; S. Agaian; J. Astola

1995-01-01

381

SCAN secure processor and its biometric capabilities  

NASA Astrophysics Data System (ADS)

This paper presents the design of the SCAN secure processor and its extended instruction set to enable secure biometric authentication. The SCAN secure processor is a modified SparcV8 processor architecture with a new instruction set to handle voice, iris, and fingerprint-based biometric authentication. The algorithms for processing biometric data are based on the local global graph methodology. The biometric modules are synthesized in reconfigurable logic and the results of the field-programmable gate array (FPGA) synthesis are presented. We propose to implement the above-mentioned modules in an off-chip FPGA co-processor. Further, the SCAN-secure processor will offer a SCAN-based encryption and decryption of 32 bit instructions and data.

Kannavara, Raghudeep; Mertoguno, Sukarno; Bourbakis, Nikolaos

2011-04-01

382

Highly Parallel Computing Architectures by using Arrays of Quantum-dot Cellular Automata (QCA): Opportunities, Challenges, and Recent Results  

NASA Technical Reports Server (NTRS)

There has been significant improvement in the performance of VLSI devices, in terms of size, power consumption, and speed, in recent years and this trend may also continue for some near future. However, it is a well known fact that there are major obstacles, i.e., physical limitation of feature size reduction and ever increasing cost of foundry, that would prevent the long term continuation of this trend. This has motivated the exploration of some fundamentally new technologies that are not dependent on the conventional feature size approach. Such technologies are expected to enable scaling to continue to the ultimate level, i.e., molecular and atomistic size. Quantum computing, quantum dot-based computing, DNA based computing, biologically inspired computing, etc., are examples of such new technologies. In particular, quantum-dots based computing by using Quantum-dot Cellular Automata (QCA) has recently been intensely investigated as a promising new technology capable of offering significant improvement over conventional VLSI in terms of reduction of feature size (and hence increase in integration level), reduction of power consumption, and increase of switching speed. Quantum dot-based computing and memory in general and QCA specifically, are intriguing to NASA due to their high packing density (10(exp 11) - 10(exp 12) per square cm ) and low power consumption (no transfer of current) and potentially higher radiation tolerant. Under Revolutionary Computing Technology (RTC) Program at the NASA/JPL Center for Integrated Space Microelectronics (CISM), we have been investigating the potential applications of QCA for the space program. To this end, exploiting the intrinsic features of QCA, we have designed novel QCA-based circuits for co-planner (i.e., single layer) and compact implementation of a class of data permutation matrices, a class of interconnection networks, and a bit-serial processor. Building upon these circuits, we have developed novel algorithms and QCA-based architectures for highly parallel and systolic computation of signal/image processing applications, such as FFT and Wavelet and Wlash-Hadamard Transforms.

Fijany, Amir; Toomarian, Benny N.

2000-01-01

383

Contextual classification on a CDC Flexible Processor system. [for photomapped remote sensing data  

NASA Technical Reports Server (NTRS)

A potential hardware organization for the Flexible Processor Array is presented. An algorithm that implements a contextual classifier for remote sensing data analysis is given, along with uniprocessor classification algorithms. The Flexible Processor algorithm is provided, as are simulated timings for contextual classifiers run on the Flexible Processor Array and another system. The timings are analyzed for context neighborhoods of sizes three and nine.

Smith, B. W.; Siegel, H. J.; Swain, P. H.

1981-01-01

384

MicroPhotonic Reconfigurable RF Signal Processor  

Microsoft Academic Search

In this paper, we discuss the use of MicroPhotonic processors to control the optical power distribution in photonic signal processing structures, achieving adaptive photonic RF filtering with arbitrary transfer functions. A new MicroPhotonics-based photonic signal processing architecture is presented, in which fibre collimator arrays, Opto-VLSl processors, and a WDM combiner are integrated within an optical substrate to control the gains

Kamal E. Alameh; Selam T. Ahderom; Mehrdad Raisi; Rong Zheng; Kamran Eshraghian

2004-01-01

385

On Darcy-Brinkman Equation: Viscous Flow Between Two Parallel Plates Packed with Regular Square Arrays of Cylinders  

NASA Astrophysics Data System (ADS)

Effects of the bounding solid walls are examined numerically for slow flow overregular, square arrays of circular cylinders between two parallel plates. A local magnitudeof the rate of entropy generation is used effectively to determine the flow region affected bythe presence of the solid boundary. Computed axial pressure gradients are compared to thecorresponding solution based on the Darcy-Brinkman equation for porous media in whichthe effective viscosity appears as an additional property to be determined from the flowcharacteristics. Results indicate that, between two limits of the Darcian porous medium andthe viscous flow, the magnitude of ? (the ratio of the effective viscosity to the fluid ˆviscosity) needs to be close to unity in order to satisfy the non-slip boundary conditions atthe bounding walls. Although the study deals with a specific geometric pattern of the porousstructure, it suggests a restriction on the validity of the Darcy-Brinkman equation to modelhigh porosity porous media. The non-slip condition at the bounding solid walls may beaccounted for by introducing a thin porous layer with ? = 1 near the solid walls. ˆ

Liu, Haidong; Patil, Prabhamani R.; Narusawa, Uichiro

2007-09-01

386

Efficient resources assignment schemes for clustered multithreaded processors  

Microsoft Academic Search

New feature sizes provide larger number of transistors per chip that architects could use in order to further exploit instruction level parallelism. However, these technologies bring also new challenges that complicate conventional monolithic processor designs. On the one hand, exploiting instruction level parallelism is leading us to diminishing returns and therefore exploiting other sources of parallelism like thread level parallelism

Fernando Latorre; José González; Antonio González

2008-01-01

387

Arrays  

NSDL National Science Digital Library

This interactive Flash applet helps students develop the concept of equal groups as a foundation for multiplication and division. The applet displays an array of dots, some of which are covered by a card. Student use the visible number of rows and columns to determine the total number of dots. Clicking on the card reveals the full array, and a voice announces the total.

2011-01-01

388

Scalable Parallel Matrix Multiplication on Distributed Memory Parallel Computers  

Microsoft Academic Search

Consider any known sequential algorithm for matrix multipli- cation over an arbitrary ring with time complexity ,w here . We show that such an algorithm can be parallelized on a distributed memory parallel computer (DMPC) in time by using processors. Such a parallel computation is cost optimal and matches the performance of PRAM. Further- more, our parallelization on a DMPC

Keqin Li

2000-01-01

389

FFT Computation with Systolic Arrays, A New Architecture  

NASA Technical Reports Server (NTRS)

The use of the Cooley-Tukey algorithm for computing the l-d FFT lends itself to a particular matrix factorization which suggests direct implementation by linearly-connected systolic arrays. Here we present a new systolic architecture that embodies this algorithm. This implementation requires a smaller number of processors and a smaller number of memory cells than other recent implementations, as well as having all the advantages of systolic arrays. For the implementation of the decimation-in-frequency case, word-serial data input allows continuous real-time operation without the need of a serial-to-parallel conversion device. No control or data stream switching is necessary. Computer simulation of this architecture was done in the context of a 1024 point DFT with a fixed point processor, and CMOS processor implementation has started.

Boriakoff, Valentin

1994-01-01

390

Signal Processor Performance Comparison.  

National Technical Information Service (NTIS)

The performance of several signal processors used or potentially useful in modern active sonar has been investigated. This report compares these processors on a basis which includes both the output statistics and data rates. These comparisons are based on...

P. B. Brown

1967-01-01

391

A multi-optical-fiber array with charge-coupled device image detection for parallel processing of light signals and spectra  

SciTech Connect

We describe a new two-dimensional multioptical-fiber array (MOFA) with 144 elements, its use with charge-coupled device (CCD) image detection to perform parallel processing for light intensity measurements, and its use with a concave aspherical grating and CCD image detection to perform parallel processing for spectrometry. Each 2-m fiber extending from the MOFA is individually sheathed in a flexible, opaque cover. The light signals can be measured simultaneously with cross-talk ranging from 0.4% to 0.03% of the peak signal, depending upon configuration. We characterize the instrument in both the intensity and the spectrometric configurations, and present techniques for optimizing performance.

Piccard, R.D.; Vo-Dinh, T. (Advanced Monitoring Development Group, Health and Safety Research Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831-6101 (USA))

1991-03-01

392

Parallel architectures for iterative methods on adaptive, block structured grids  

NASA Technical Reports Server (NTRS)

A parallel computer architecture well suited to the solution of partial differential equations in complicated geometries is proposed. Algorithms for partial differential equations contain a great deal of parallelism. But this parallelism can be difficult to exploit, particularly on complex problems. One approach to extraction of this parallelism is the use of special purpose architectures tuned to a given problem class. The architecture proposed here is tuned to boundary value problems on complex domains. An adaptive elliptic algorithm which maps effectively onto the proposed architecture is considered in detail. Two levels of parallelism are exploited by the proposed architecture. First, by making use of the freedom one has in grid generation, one can construct grids which are locally regular, permitting a one to one mapping of grids to systolic style processor arrays, at least over small regions. All local parallelism can be extracted by this approach. Second, though there may be a regular global structure to the grids constructed, there will be parallelism at this level. One approach to finding and exploiting this parallelism is to use an architecture having a number of processor clusters connected by a switching network. The use of such a network creates a highly flexible architecture which automatically configures to the problem being solved.

Gannon, D.; Vanrosendale, J.

1983-01-01

393

Parallelizing Monte Carlo with PMC  

SciTech Connect

PMC (Parallel Monte Carlo) is a system of generic interface routines that allows easy porting of Monte Carlo packages of large-scale physics simulation codes to Massively Parallel Processor (MPP) computers. By loading various versions of PMC, simulation code developers can configure their codes to run in several modes: serial, Monte Carlo runs on the same processor as the rest of the code; parallel, Monte Carlo runs in parallel across many processors of the MPP with the rest of the code running on other MPP processor(s); distributed, Monte Carlo runs in parallel across many processors of the MPP with the rest of the code running on a different machine. This multi-mode approach allows maintenance of a single simulation code source regardless of the target machine. PMC handles passing of messages between nodes on the MPP, passing of messages between a different machine and the MPP, distributing work between nodes, and providing independent, reproducible sequences of random numbers. Several production codes have been parallelized under the PMC system. Excellent parallel efficiency in both the distributed and parallel modes results if sufficient workload is available per processor. Experiences with a Monte Carlo photonics demonstration code and a Monte Carlo neutronics package are described.

Rathkopf, J.A.; Jones, T.R.; Nessett, D.M.; Stanberry, L.C.

1994-11-01

394

Micro Time Cost Analysis of Parallel Computations  

Microsoft Academic Search

The authors investigate the modeling and analysis of time cost behavior of parallel computations. It is assumed parallel computations reside in a computer system in which there is a limited number of processors, all the processors have the same speed, and they communicate with each other through a shared memory. It has been found that the time costs of parallel

Bin Qin; Howard A. Sholl; Reda A. Ammar

1991-01-01

395

Applications of parallel processing to astrodynamics  

Microsoft Academic Search

Parallel processing is being used to improve the catalog of earth orbiting satellites and for problems associated with the catalog. Initial efforts centered around using SIMD parallel processors to perform debris conjunction analysis and satellite dynamics studies. More recently, the availability of cheap supercomputing processors and parallel processing software such as PVM have enabled the reutilization of existing astrodynamics software

S. Coffey; L. Healy; H. Neal

1996-01-01

396

Functional unit level parallelism in RISC architecture  

Microsoft Academic Search

This paper presents the design and implementation of RISC processor having five stages pipelined architecture. Functional unit parallelism is exploited through the implementation of pipelining in five stages of RISC processor. The hazards which come to life due to parallelism are data, structural, and control hazards. In order to achieve the true benefits of the parallelism through pipelining; these hazards

Ajmal Khan; Muhammad Saqib; Zeeshan Kaleem

2009-01-01

397

A neural net implementation of SPCA pre-processor for gas\\/odor classification using the responses of thick film gas sensor array  

Microsoft Academic Search

In this paper, an artificial neural net (ANN) implementation of SPCA pre-processing is presented for its use with a neural classifier trained with SPCA transformed data. Here, a SPCA transforming neural stage (Net ISPCA) is placed before a SPCA trained neural classifier stage (Net IISPCA). Accordingly, newer sensor array response of respective gas\\/odor can now be classified, more precisely, using

N. S. Rajput; R. R. Das; V. N. Mishra; K. P. Singh; R. Dwivedi

2010-01-01

398

Primitive operations for a hierarchical parallel processor  

SciTech Connect

Pyramid data structures make some image processing operations easier to compute. This paper discusses the programming strategies for pyramid machines with a view toward a set of primitive operations. One such set of operations is described. 8 references.

Tanimoto, S.L.

1982-01-01

399

Efficient wiring of reconfigurable parallel processors  

SciTech Connect

The advent of chips which include one or more CPUS, some local memory, and rudimentary communications and routing hardware has opened a new area in computer architecture design. What is the best way to connect these chips to solve particular problems? This paper defines the efficiency of a wiring scheme for a set of communication patterns. It then gives upper and lower bounds on the best efficiency achievable. It also presents simple wiring schemes for some stencil patterns used in mesh-based discrete simulations.

Greenberg, D.S.

1992-08-28

400

Efficient wiring of reconfigurable parallel processors  

SciTech Connect

The advent of chips which include one or more CPUS, some local memory, and rudimentary communications and routing hardware has opened a new area in computer architecture design. What is the best way to connect these chips to solve particular problems This paper defines the efficiency of a wiring scheme for a set of communication patterns. It then gives upper and lower bounds on the best efficiency achievable. It also presents simple wiring schemes for some stencil patterns used in mesh-based discrete simulations.

Greenberg, D.S.

1992-08-28

401

SUDS: Automatic Parallelization for Raw Processors  

Microsoft Academic Search

A computer can never be too fast or too cheap. Com- puter systems pervade nearly every aspect of science, engineering, communications and commerce because they perform certain tasks at rates unachievable by any other kind of system built by humans. A computer sys- tem's throughput, however, is constrained by that sys- tem's ability to find concurrency. Given a particular target

Matthew Ian Frank

2003-01-01

402

Image Correlation on a Parallel Processor.  

National Technical Information Service (NTIS)

Digital photogrammetry requires that conjugate imagery be located by image correlation. Image correlation involves many computations and can be the most time consuming part of the digital photogrammetry process. This report, in line and area correlations,...

D. L. Ackerman M. A. Crombie M. L. Powers

1976-01-01

403

Design limitations of highly parallel free-space optical interconnects based on arrays of vertical cavity surface-emitting laser diodes, microlenses, and photodetectors  

Microsoft Academic Search

We utilize a novel diffraction formalism to study the crosstalk effect in a highly parallel free-space optical interconnect based on two-dimensional arrays of surface-emitting laser diodes, microlenses, and photodetectors. The diffraction induced crosstalk between adjacent laser diodes in each detector to the system limitations is investigated. Optimum design rules and formulas are given for the first time, to include the

Suning Tang; Ray T. Chen; Lara Garrett; Dave Gerold; Maggie M. Li; Srikanth Natarajan; Jielun Lin

1994-01-01

404

High performance parallel computers for science: New developments at the Fermilab advanced computer program  

SciTech Connect

Fermilab's Advanced Computer Program (ACP) has been developing highly cost effective, yet practical, parallel computers for high energy physics since 1984. The ACP's latest developments are proceeding in two directions. A Second Generation ACP Multiprocessor System for experiments will include $3500 RISC processors each with performance over 15 VAX MIPS. To support such high performance, the new system allows parallel I/O, parallel interprocess communication, and parallel host processes. The ACP Multi-Array Processor, has been developed for theoretical physics. Each $4000 node is a FORTRAN or C programmable pipelined 20 MFlops (peak), 10 MByte single board computer. These are plugged into a 16 port crossbar switch crate which handles both inter and intra crate communication. The crates are connected in a hypercube. Site oriented applications like lattice gauge theory are supported by system software called CANOPY, which makes the hardware virtually transparent to users. A 256 node, 5 GFlop, system is under construction. 10 refs., 7 figs.

Nash, T.; Areti, H.; Atac, R.; Biel, J.; Cook, A.; Deppe, J.; Edel, M.; Fischler, M.; Gaines, I.; Hance, R.

1988-08-01

405

Final Report, Center for Programming Models for Scalable Parallel Computing: Co-Array Fortran, Grant Number DE-FC02-01ER25505  

SciTech Connect

The major accomplishment of this project is the production of CafLib, an 'object-oriented' parallel numerical library written in Co-Array Fortran. CafLib contains distributed objects such as block vectors and block matrices along with procedures, attached to each object, that perform basic linear algebra operations such as matrix multiplication, matrix transpose and LU decomposition. It also contains constructors and destructors for each object that hide the details of data decomposition from the programmer, and it contains collective operations that allow the programmer to calculate global reductions, such as global sums, global minima and global maxima, as well as vector and matrix norms of several kinds. CafLib is designed to be extensible in such a way that programmers can define distributed grid and field objects, based on vector and matrix objects from the library, for finite difference algorithms to solve partial differential equations. A very important extra benefit that resulted from the project is the inclusion of the co-array programming model in the next Fortran standard called Fortran 2008. It is the first parallel programming model ever included as a standard part of the language. Co-arrays will be a supported feature in all Fortran compilers, and the portability provided by standardization will encourage a large number of programmers to adopt it for new parallel application development. The combination of object-oriented programming in Fortran 2003 with co-arrays in Fortran 2008 provides a very powerful programming model for high-performance scientific computing. Additional benefits from the project, beyond the original goal, include a programto provide access to the co-array model through access to the Cray compiler as a resource for teaching and research. Several academics, for the first time, included the co-array model as a topic in their courses on parallel computing. A separate collaborative project with LANL and PNNL showed how to extend the co-array model to other languages in a small experimental version of Co-array Python. Another collaborative project defined a Fortran 95 interface to ARMCI to encourage Fortran programmers to use the one-sided communication model in anticipation of their conversion to the co-array model later. A collaborative project with the Earth Sciences community at NASA Goddard and GFDL experimented with the co-array model within computational kernels related to their climate models, first using CafLib and then extending the co-array model to use design patterns. Future work will build on the design-pattern idea with a redesign of CafLib as a true object-oriented library using Fortran 2003 and as a parallel numerical library using Fortran 2008.

Robert W. Numrich

2008-04-22

406

Electrically reconfigurable logic array  

NASA Technical Reports Server (NTRS)

To compose the complicated systems using algorithmically specialized logic circuits or processors, one solution is to perform relational computations such as union, division and intersection directly on hardware. These relations can be pipelined efficiently on a network of processors having an array configuration. These processors can be designed and implemented with a few simple cells. In order to determine the state-of-the-art in Electrically Reconfigurable Logic Array (ERLA), a survey of the available programmable logic array (PLA) and the logic circuit elements used in such arrays was conducted. Based on this survey some recommendations are made for ERLA devices.

Agarwal, R. K.

1982-01-01

407

Spatio-temporal operator formalism for holographic recording and diffraction in a photorefractive-based true-time-delay phased-array processor.  

PubMed

We present a spatio-temporal operator formalism and beam propagation simulations that describe the broadband efficient adaptive method for a true-time-delay array processing (BEAMTAP) algorithm for an optical beamformer by use of a photorefractive crystal. The optical system consists of a tapped-delay line implemented with an acoustooptic Bragg cell, an accumulating scrolling time-delay detector achieved with a traveling-fringes detector, and a photorefractive crystal to store the adaptive spatio-temporal weights as volume holographic gratings. In this analysis, linear shift-invariant integral operators are used to describe the propagation, interference, grating accumulation, and volume holographic diffraction of the spatio-temporally modulated optical fields in the system to compute the adaptive array processing operation. In addition, it is shown that the random fluctuation in time and phase delays of the optically modulated and transmitted array signals produced by fiber perturbations (temperature fluctuations, vibrations, or bending) are dynamically compensated for through the process of holographic wavefront reconstruction as a byproduct of the adaptive beam-forming and jammer-excision operation. The complexity of the cascaded spatial-temporal integrals describing the holographic formation, and subsequent readout processes, is shown to collapse to a simple imaging condition through standard operator manipulation. We also present spatio-temporal beam propagation simulation results as an illustrative demonstration of our analysis and the operation of a BEAMTAP beamformer. PMID:14503701

Kiruluta, Andrew; Pati, Gour S; Kriehn, Gregory; Silveira, Paulo E X; Sarto, Anthony W; Wagner, Kelvin

2003-09-10

408

Optimal processor assignment for pipeline computations  

NASA Technical Reports Server (NTRS)

The availability of large scale multitasked parallel architectures introduces the following processor assignment problem for pipelined computations. Given a set of tasks and their precedence constraints, along with their experimentally determined individual responses times for different processor sizes, find an assignment of processor to tasks. Two objectives are of interest: minimal response given a throughput requirement, and maximal throughput given a response time requirement. These assignment problems differ considerably from the classical mapping problem in which several tasks share a processor; instead, it is assumed that a large number of processors are to be assigned to a relatively small number of tasks. Efficient assignment algorithms were developed for different classes of task structures. For a p processor system and a series parallel precedence graph with n constituent tasks, an O(np2) algorithm is provided that finds the optimal assignment for the response time optimization problem; it was found that the assignment optimizing the constrained throughput in O(np2log p) time. Special cases of linear, independent, and tree graphs are also considered.

Nicol, David M.; Simha, Rahul; Choudhury, Alok N.; Narahari, Bhagirath

1991-01-01

409

Magnetic arrays  

DOEpatents

Electromagnet arrays which can provide selected field patterns in either two or three dimensions, and in particular, which can provide single-sided field patterns in two or three dimensions. These features are achieved by providing arrays which have current densities that vary in the windings both parallel to the array and in the direction of array thickness.

Trumper, David L. (Plaistow, NH); Kim, Won-jong (Cambridge, MA); Williams, Mark E. (Pelham, NH)

1997-05-20

410

Magnetic arrays  

DOEpatents

Electromagnet arrays are disclosed which can provide selected field patterns in either two or three dimensions, and in particular, which can provide single-sided field patterns in two or three dimensions. These features are achieved by providing arrays which have current densities that vary in the windings both parallel to the array and in the direction of array thickness. 12 figs.

Trumper, D.L.; Kim, W.; Williams, M.E.

1997-05-20

411

Integrated sensor and range-finding analog signal processor  

NASA Astrophysics Data System (ADS)

The authors present experimental results from an array of cells, each of which contains a photodiode and the analog signal-processing circuitry needed for light-stripe range finding. Prototype circuits were fabricated through MOSIS in a 2-microns CMOS p-well double-metal, double-poly process. This design builds on some of the ideas that have been developed for ICs that integrate signal-processing circuitry with photosensors. In the case of light-stripe range finding, the increase in cell complexity from sensing only to sensing and processing makes the modification of the operational principle of range finding practical, which in turn results in a dramatic improvement in performance. The IC array of photosensor and analog signal processor cells that acquires 1000 frames of light-stripe range data per second-two orders of magnitude faster than conventional light-stripe range-finding methods. The highly parallel range-finding algorithm used requires that the output of each photosensor site be continuously monitored. Prototype high-speed range-finding systems have been built using a 5 x 5 array and a 28 x 32 array of these sensing elements.

Gruss, Andrew; Carley, L. Richard; Kanade, Takeo

1991-03-01

412

Planarity Testing in Parallel  

Microsoft Academic Search

We present a parallel algorithm based on open ear decomposition to con- struct an embedding of a graph onto the plane or report that the graph is nonpla- nar. Our parallel algorithm runs on a CRCW PRAM in logarithmic time with a number of processors bounded by that needed for finding connected components in a graph and for performing bucket

Vijaya Ramachandran; John H. Reif

1994-01-01

413

Graph-Based Dynamic Assignment Of Multiple Processors  

NASA Technical Reports Server (NTRS)

Algorithm-to-architecture mapping model (ATAMM) is strategy minimizing time needed to periodically execute graphically described, data-driven application algorithm on multiple data processors. Implemented as operating system managing flow of data and dynamically assigns nodes of graph to processors. Predicts throughput versus number of processors available to execute given application algorithm. Includes rules ensuring application algorithm represented by graph executed periodically without deadlock and in shortest possible repetition time. ATAMM proves useful in maximizing effectiveness of parallel computing systems.

Hayes, Paul J.; Andrews, Asa M.

1994-01-01

414

Parallel programming interface for distributed data  

NASA Astrophysics Data System (ADS)

The Parallel Programming Interface for Distributed Data (PPIDD) library provides an interface, suitable for use in parallel scientific applications, that delivers communications and global data management. The library can be built either using the Global Arrays (GA) toolkit, or a standard MPI-2 library. This abstraction allows the programmer to write portable parallel codes that can utilise the best, or only, communications library that is available on a particular computing platform. Program summaryProgram title: PPIDD Catalogue identifier: AEEF_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEEF_1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 17 698 No. of bytes in distributed program, including test data, etc.: 166 173 Distribution format: tar.gz Programming language: Fortran, C Computer: Many parallel systems Operating system: Various Has the code been vectorised or parallelized?: Yes. 2-256 processors used RAM: 50 Mbytes Classification: 6.5 External routines: Global Arrays or MPI-2 Nature of problem: Many scientific applications require management and communication of data that is global, and the standard MPI-2 protocol provides only low-level methods for the required one-sided remote memory access. Solution method: The Parallel Programming Interface for Distributed Data (PPIDD) library provides an interface, suitable for use in parallel scientific applications, that delivers communications and global data management. The library can be built either using the Global Arrays (GA) toolkit, or a standard MPI-2 library. This abstraction allows the programmer to write portable parallel codes that can utilise the best, or only, communications library that is available on a particular computing platform. Running time: Problem dependent. The test provided with the distribution takes only a few seconds to run.

Wang, Manhui; May, Andrew J.; Knowles, Peter J.

2009-12-01

415

Unstructured Adaptive Grid Computations on an Array of SMPs  

NASA Technical Reports Server (NTRS)

Dynamic load balancing is necessary for parallel adaptive methods to solve unsteady CFD problems on unstructured grids. We have presented such a dynamic load balancing framework called JOVE, in this paper. Results on a four-POWERnode POWER CHALLENGEarray demonstrated that load balancing gives significant performance improvements over no load balancing for such adaptive computations. The parallel speedup of JOVE, implemented using MPI on the POWER CHALLENCEarray, was significant, being as high as 31 for 32 processors. An implementation of JOVE that exploits 'an array of SMPS' architecture was also studied; this hybrid JOVE outperformed flat JOVE by up to 28% on the meshes and adaption models tested. With large, realistic meshes and actual flow-solver and adaption phases incorporated into JOVE, hybrid JOVE can be expected to yield significant advantage over flat JOVE, especially as the number of processors is increased, thus demonstrating the scalability of an array of SMPs architecture.

Biswas, Rupak; Pramanick, Ira; Sohn, Andrew; Simon, Horst D.

1996-01-01

416

Transitive closure on the imagine stream processor  

SciTech Connect

The increasing gap between processor and memory speeds is a well-known problem in modern computer architecture. The Imagine system is designed to address the processor-memory gap through streaming technology. Stream processors are best-suited for computationally intensive applications characterized by high data parallelism and producer-consumer locality with minimal data dependencies. This work examines an efficient streaming implementation of the computationally intensive Transitive Closure (TC) algorithm on the Imagine platform. We develop a tiled TC algorithm specifically for the Imagine environment, which efficiently reuses streams to minimize expensive off-chip data transfers. The implementation requires complex stream programming since the memory hierarchy and cluster organization of the underlying architecture are exposed to the Imagine programmer. Results demonstrate that limited performance of TC is achieved primarily due to the complicated data-dependencies of the blocked algorithm. This work is an ongoing effort to identify classes of scientific problems well-suited for streaming processors.

Griem, Gorden; Oliker, Leonid

2003-11-11

417

Acoustooptic processor for adaptive radar noise environment characterization.  

PubMed

A new 2-D acoustooptic processor that estimates the angular as well as spectral distributions of jammers in the far field of an adaptive phased array radar is described. The operating modes of the system are discussed together with the estimation accuracy achieved. Experimental results are presented to illustrate the operation of the processor, and different acoustooptic cell operating modes are discussed. PMID:18213313

Goutzoulis, A P; Casasent, D; Kumar, B V

1984-12-01

418

Parallel I/O Systems  

NSDL National Science Digital Library

* Redundant disk array architectures,* Fault tolerance issues in parallel I/O systems,* Caching and prefetching,* Parallel file systems,* Parallel I/O systems, * Parallel I/O programming paradigms, * Parallel I/O applications and environments, * Parallel programming with parallel I/O

Apon, Amy

419

Building the 4 Processor SB-PRAM Prototype  

Microsoft Academic Search

The SB-PRAM is a massively parallel, uniform memory access (UMA) shared memory computer. The main ideas of the design are multithreading on instruction level, hashing of the address space, and combining in the butterfly net- work. We have built a first research prototype with 4 physi- cal processors, thus 128 virtual processors, to demonstrate the feasibility of the concept. The

Peter Bach; Michael Braun; Arno Formella; Jörg Friedrich; Thomas Griin; Cédric Lichtenau

1997-01-01

420

An efficient microcode compiler for application specific DSP processors  

Microsoft Academic Search

A computer program for microcode compilation for custom digital signal processors is presented. This tool is part of the CATHEDRAL II silicon compiler. The following optimization problems are highlighted: scheduling, hardware assignment, and loop folding. Efficient techniques to solve these problems are developed. This allows for the automatic synthesis of processor architectures which simultaneously exploit pipelining and parallelism. A demonstrator

Gert Goossens; Jan M. Rabaey; Joos Vandewalle; Hugo De Man

1990-01-01

421

Enhancing forced air convection heat transfer from an array of parallel plate fins using a heat pipe  

Microsoft Academic Search

An experimental study of heat transfer from an array of copper plate fins supported by a copper heat pipe and cooled by forced air flow is presented. The results are compared to an identical array of copper fins, but supported by a solid copper rod. The primary variable is the height of the fin stack, while the fin pitch, air

Z. Zhao; C. T. Avedisian

1997-01-01

422

Broadcasting collective operation contributions throughout a parallel computer  

DOEpatents

Methods, systems, and products are disclosed for broadcasting collective operation contributions throughout a parallel computer. The parallel computer includes a plurality of compute nodes connected together through a data communications network. Each compute node has a plurality of processors for use in collective parallel operations on the parallel computer. Broadcasting collective operation contributions throughout a parallel computer according to embodiments of the present invention includes: transmitting, by each processor on each compute node, that processor's collective operation contribution to the other processors on that compute node using intra-node communications; and transmitting on a designated network link, by each processor on each compute node according to a serial processor transmission sequence, that processor's collective operation contribution to the other processors on the other compute nodes using inter-node communications.

Faraj, Ahmad (Rochester, MN)

2012-02-21

423

How scaling will change processor architecture  

Microsoft Academic Search

For the past 30 years processors have hidden scaling from the programmer, presenting the same sequential computational interface. Power and wire scaling issues are causing this interface to change, exposing more parallelism. For efficiency, future machines must be distributed and heterogeneous and will add at least a \\

Mark Horowitz; William Dally

2004-01-01

424

SPROC: A Multiple-Processor DSP IC.  

National Technical Information Service (NTIS)

A large, single-chip, multiple-processor, digital signal processing (DSP) integrated circuit (IC) fabricated in HP-Cmos34 is presented. The innovative architecture is best suited for analog and real-time systems characterized by both parallel signal data ...

R. Davis

1991-01-01

425

Fast parallel sorting algorithms  

Microsoft Academic Search

A parallel bucket-sort algorithm is presented that requires time O(log n) and the use of n processors. The algorithm makes use of a technique that requires more space than the product of processors and time. A realistic model is used in which no memory contention is permitted. A procedure is also presented to sort n numbers in time O(k log

Daniel S. Hirschberg; R. L. Rivest

1978-01-01

426

Optimistic parallelism requires abstractions  

Microsoft Academic Search

The problem of writing software for multicore processors is greatly simplified if we could automatically parallelize sequential programs. Although auto-parallelization has been studied for many decades, it has succeeded only in a few application areas such as dense matrix computations. In particular, auto-parallelization of irregular programs, which are organized around large, pointer-based data struc- tures like graphs, has seemed intractable.

Milind Kulkarni; Keshav Pingali; Bruce Walter; Ganesh Ramanarayanan; Kavita Bala; L. Paul Chew

2007-01-01

427

Highly parallel signal processing architectures; Proceedings of the Nineteenth Critical Reviews of Technology Conference, Los Angeles, CA, Jan. 21, 22, 1986  

Microsoft Academic Search

The present conference considers topics in the fields of highly parallel computing systems' algorithms and architectures, hardware implementation-related problems, software-related issues, and the design of optical matrix processors. Attention is given to eigenvector signal processing methods, architectures for the computation of eigenvalues, bit-serial architectures for parallel arrays, a systolic algorithm, and the application of hypercube architectures to signal processing. Also

Bromley

1986-01-01

428

Itanium 2 Processor Microarchitecture  

Microsoft Academic Search

On 8 July 2002, Intel introduced the Itanium 2 processor, the Itanium archi-tecture's second implementation. This event was a milestone in the cooperation between Intel and Hewlett-Packard to establish the Ita-nium architecture as a key workstation, serv-er, and supercomputer building block. The Itanium 2 processor may appear similar to the Itanium processor, yet it represents significant advances in performance and

Cameron Mcnairy; Don Soltis

2003-01-01

429

Rapid geodesic mapping of brain functional connectivity: implementation of a dedicated co-processor in a field-programmable gate array (FPGA) and application to resting state functional MRI.  

PubMed

Graph theory-based analyses of brain network topology can be used to model the spatiotemporal correlations in neural activity detected through fMRI, and such approaches have wide-ranging potential, from detection of alterations in preclinical Alzheimer's disease through to command identification in brain-machine interfaces. However, due to prohibitive computational costs, graph-based analyses to date have principally focused on measuring connection density rather than mapping the topological architecture in full by exhaustive shortest-path determination. This paper outlines a solution to this problem through parallel implementation of Dijkstra's algorithm in programmable logic. The processor design is optimized for large, sparse graphs and provided in full as synthesizable VHDL code. An acceleration factor between 15 and 18 is obtained on a representative resting-state fMRI dataset, and maps of Euclidean path length reveal the anticipated heterogeneous cortical involvement in long-range integrative processing. These results enable high-resolution geodesic connectivity mapping for resting-state fMRI in patient populations and real-time geodesic mapping to support identification of imagined actions for fMRI-based brain-machine interfaces. PMID:23746911

Minati, Ludovico; Cercignani, Mara; Chan, Dennis

2013-10-01

430

A Mapping Method for Multi-Process Execution on Dynamically Reconfigurable Processors  

NASA Astrophysics Data System (ADS)

The multi-process execution in dynamically reconfigurable processors is a technique to enhance throughput by trying to exploit more inherent parallelism of applications. Basically, a total process for an application is divided into small processes, assigned into limited areas of a reconfigurable array, and concurrently executed in a pipelined manner. In order to improve the efficiency of the multi-process execution, a systematic method for mapping processes onto a reconfigurable array consisting of multiple hardware execution units is essential. This paper proposes and investigates a systematic method for mapping an application modeled as a Kahn Process Network onto a dynamically reconfigurable processing array. In order to execute streaming applications in a pipelined manner, the size of Tiles, which is a unit area of dynamically reconfigurable array, and the grouping of processes are adjusted. Using real applications such as DCT, JPEG encoder and Turbo encoder, the impact of different versions mapped onto the NEC Dynamically Reconfigurable Processor on performance is evaluated. Evaluation results show that our proposed mapping algorithm achieves the best performance in terms of the throughput and the execution time.

Tuan, Vu Manh; Amano, Hideharu

431

Doppler-free, multiwavelength acousto-optic deflector for two-photon addressing arrays of Rb atoms in a quantum information processor.  

PubMed

We demonstrate a dual wavelength acousto-optic deflector (AOD) designed to deflect two wavelengths to the same angles by driving with two RF frequencies. The AOD is designed as a beam scanner to address two-photon transitions in a two-dimensional array of trapped neutral Rb87 atoms in a quantum computer. Momentum space is used to design AODs that have the same diffraction angles for two wavelengths (780 and 480 nm) and have nonoverlapping Bragg-matched frequency response at these wavelengths, so that there will be no cross talk when proportional frequencies are applied to diffract the two wavelengths. The appropriate crystal orientation, crystal shape, transducer size, and transducer height are determined for an AOD made with a tellurium dioxide crystal (TeO(2)). The designed and fabricated AOD has more than 100 resolvable spots, widely separated band shapes for the two wavelengths within an overall octave bandwidth, spatially overlapping diffraction angles for both wavelengths (780 and 480 nm), and a 4 micros or less access time. Cascaded AODs in which the first device upshifts and the second downshifts allow Doppler-free scanning as required for addressing the narrow atomic resonance without detuning. We experimentally show the diffraction-limited Doppler-free scanning performance and spatial resolution of the designed AOD. PMID:18404181

Kim, Sangtaek; Mcleod, Robert R; Saffman, M; Wagner, Kelvin H

2008-04-10

432

Multilist Scheduling. A New Parallel Programming Model.  

National Technical Information Service (NTIS)

Parallel programming requires task scheduling to optimize performance; this primarily involves balancing the load over the processors. In many cases, it is critical to perform task scheduling at runtime. For example, (1) in many parallel applications the ...

I. C. Wu H. T. Kung P. Steenkiste D. O'Hallaron G. Thompson

1993-01-01

433

The first IA64 microprocessor: a design for highly-parallel execution  

Microsoft Academic Search

The first implementation of the IA-64 architecture achieves high performance by implementing a highly parallel execution core, while maintaining binary compatibility with the IA-32 instruction set. The processor contains 25.4 M transistors. The chip is fabricated in a 0.18 ?m CMOS process with 6 metal layers and packaged in a 1012-pad organic land grid array using C4 (flip-chip) assembly technology

G. Singer; S. Rusu

2000-01-01

434

Energy Estimation for Extensible Processors  

Microsoft Academic Search

This paper presents an efficient methodology for estimating the energy consumption of application programs running on extensible processors. Extensible processors, which are increasingly popular in embedded system design, allow a designer to customize a base processor core through instruction set extensions. Existing processor energy macro-modeling techniques are not applicable to extensible processors, since they assume that the instruction set architecture

Yunsi Fei; Srivaths Ravi; Anand Raghunathan; Niraj K. Jha

2003-01-01

435

Thermal solutions to Pentium processors in TCP in notebooks and sub-notebooks  

Microsoft Academic Search

Less than one year after the introduction of the 90 MHz Pentium Processor in Pin Grid Array (PGA) package, 75 MHz Pentium Processor in Tape Carrier Package (TCP) has been introduced for applications in mobile products. Notebooks and sub-notebooks using the 75 MHz Pentium Processor in TCP are expected to be on the market early 1995. In this paper, we

H. Xie; M. Aghazadeh; W. Lui; K. Haley

1995-01-01

436

Accuracy Limitations in Optical Linear Algebra Processors  

NASA Astrophysics Data System (ADS)

One of the limiting factors in applying optical linear algebra processors (OLAPs) to real-world problems has been the poor achievable accuracy of these processors. Little previous research has been done on determining noise sources from a systems perspective which would include noise generated in the multiplication and addition operations, noise from spatial variations across arrays, and from crosstalk. In this dissertation, we propose a second-order statistical model for an OLAP which incorporates all these system noise sources. We now apply this knowledge to determining upper and lower bounds on the achievable accuracy. This is accomplished by first translating the standard definition of accuracy used in electronic digital processors to analog optical processors. We then employ our second-order statistical model. Having determined a general accuracy equation, we consider limiting cases such as for ideal and noisy components. From the ideal case, we find the fundamental limitations on improving analog processor accuracy. From the noisy case, we determine the practical limitations based on both device and system noise sources. These bounds allow system trade-offs to be made both in the choice of architecture and in individual components in such a way as to maximize the accuracy of the processor. Finally, by determining the fundamental limitations, we show the system engineer when the accuracy desired can be achieved from hardware or architecture improvements and when it must come from signal pre-processing and/or post-processing techniques.

Batsell, Stephen Gordon

1990-01-01

437

Adjunct processors in embedded medical imaging systems  

NASA Astrophysics Data System (ADS)

Adjunct processors have traditionally been used for certain tasks in medical imaging systems. Often based on application-specific integrated circuits (ASICs), these processors formed X-ray image-processing pipelines or constituted the backprojectors in computed tomography (CT) systems. We examine appropriate functions to perform with adjunct processing and draw some conclusions about system design trade-offs. These trade-offs have traditionally focused on the required performance and flexibility of individual system components, with increasing emphasis on time-to-market impact. Typically, front-end processing close to the sensor has the most intensive processing requirements. However, the performance capabilities of each level are dynamic and the system architect must keep abreast of the current capabilities of all options to remain competitive. Designers are searching for the most efficient implementation of their particular system requirements. We cite algorithm characteristics that point to effective solutions by adjunct processors. We have developed a field- programmable gate array (FPGA) adjunct-processor solution for a Cone-Beam Reconstruction (CBR) algorithm that offers significant performance improvements over a general-purpose processor implementation. The same hardware could efficiently perform other image processing functions such as two-dimensional (2D) convolution. The potential performance, price, operating power, and flexibility advantages of an FPGA adjunct processor over an ASIC, DSP or general-purpose processing solutions are compelling.

Trepanier, Marc; Goddard, Iain

2002-05-01

438

Communication-efficient parallel architectures and algorithms for image computations  

SciTech Connect

The main purpose of this dissertation is the design of efficient parallel techniques for image computations which require global operations on image pixels, as well as the development of parallel architectures with special communication features which can support global data movement efficiently. The class of image problems considered in this dissertation involves global operations on image pixels, and irregular (data-dependent) data movement operations. Such problems include histogramming, component labeling, proximity computations, computing the Hough Transform, computing convexity of regions and related properties such as computing the diameter and a smallest area enclosing rectangle for each region. Images with multiple figures and multiple labeled-sets of pixels are also considered. Efficient solutions to such problems involve integer sorting, graph theoretic techniques, and techniques from computational geometry. Although such solutions are not computationally intensive (they all require O(n{sup 2}) operations to be performed on an n {times} n image), they require global communications. The emphasis here is on developing parallel techniques for data movement, reduction, and distribution, which lead to processor-time optimal solutions for such problems on the proposed organizations. The proposed parallel architectures are based on a memory array which can be viewed as an arrangement of memory modules in a k-dimensional space such that the modules are connected to buses placed parallel to the orthogonal axes of the space, and each bus is connected to one processor or a group of processors. It will be shown that such organizations are communication-efficient and are thus highly suited to the image problems considered here, and also to several other classes of problems. The proposed organizations have p processors and O(n{sup 2}) words of memory to process n {times} n images.

Alnuweiri, H.M.

1989-01-01

439

Models for Dynamic Load Balancing in a Heterogeneous Multiple Processor System  

Microsoft Academic Search

Queueing models for a simple heterogeneous multiple processor system are presented, analyzed, and compared. Each model is distinguished by a job routing strategy which is designed to reduce the average job turnaround time by balancing the total load among the processors. In each case an arriving job is routed by a job dispatcher to one of m parallel processors. The

Yuan-chieh Chow; Walter H. Kohler

1979-01-01

440

Parallelism and Scalability in an Image Processing Application  

Microsoft Academic Search

The recent trends in processor architecture show that parallel processing is moving into new areas of computing in the form\\u000a of many-core desktop processors and multi-processor system-on-chips. This means that parallel processing is required in application\\u000a areas that traditionally have not used parallel programs. This paper investigates parallelism and scalability of an embedded\\u000a image processing application. The major challenges faced

Morten S. Rasmussen; Matthias B. Stuart; Sven Karlsson

2009-01-01

441

VLIW: a case study of parallelism verification  

Microsoft Academic Search

Parallelism in processor architecture and design imposes a verification challenge as the exponential growth in the number of execution combinations becomes unwieldy. In this paper we report on the verification of a Very Large Instruction Word processor. The verification team used a sophisticated test program generator that modeled the parallel aspects as sequential constraints, and augmented the tool with manually

Allon Adir; Yaron Arbetman; Bella Dubrov; Yossi Lichtenstein; Michal Rimon; Michael Vinov; Massimo A. Calligaro; Andrew Cofler; Gabriel Duffy

2005-01-01

442

VLIW - a case study of parallelism verification  

Microsoft Academic Search

Parallelism in processor architecture and design imposes a verification challenge as the exponential growth in the number of execution combinations becomes unwieldy. In this paper we report on the verification of a very large instruction word processor. The verification team used a sophisticated test program generator that modeled the parallel aspects as sequential constraints, and augmented the tool with manually

Allon Adir; Yaron Arbetman; Bella Dubrov; Michal Rimon; Michael Vinov; Massimo A Calligaro; Andrew Cofler; Gabriel Duffy

2005-01-01

443

Master\\/slave speculative parallelization  

Microsoft Academic Search

Master\\/Slave Speculative Parallelization (MSSP) is an execution paradigm for improving the execution rate of sequential programs by parallelizing them speculatively for execution on a multiprocessor. In MSSP, one processor---the master---executes an approximate version of the program to compute selected values that the full program's execution is expected to compute. The master's results are checked by slave processors that execute the

Craig B. Zilles; Gurindar S. Sohi

2002-01-01

444

Parallel Parsing of Arithmetic Expressions  

Microsoft Academic Search

Parallel algorithms for parsing expressions on mesh, shuffle, cube, and cube-connected cycle parallel computers are presented. With n processors, it requires O( square root n) time on the mesh-connected model and O(log\\/sup 2\\/ n) time on others. For the mesh-connected computer, the author uses a wrap-around row-major ordering. For the shuffle computer, he uses an extra connection between adjacent processors,

Y. N. Srikant

1990-01-01

445

Switch for serial or parallel communication networks  

DOEpatents

A communication switch apparatus and a method for use in a geographically extensive serial, parallel or hybrid communication network linking a multi-processor or parallel processing system has a very low software processing overhead in order to accommodate random burst of high density data. Associated with each processor is a communication switch. A data source and a data destination, a sensor suite or robot for example, may also be associated with a switch. The configuration of the switches in the network are coordinated through a master processor node and depends on the operational phase of the multi-processor network: data acquisition, data processing, and data exchange. The master processor node passes information on the state to be assumed by each switch to the processor node associated with the switch. The processor node then operates a series of multi-state switches internal to each communication switch. The communication switch does not parse and interpret communication protocol and message routing information. During a data acquisition phase, the communication switch couples sensors producing data to the processor node associated with the switch, to a downlink destination on the communications network, or to both. It also may couple an uplink data source to its processor node. During the data exchange phase, the switch couples its processor node or an uplink data source to a downlink destination (which may include a processor node or a robot), or couples an uplink source to its processor node and its processor node to a downlink destination. 9 figs.

Crosette, D.B.

1994-07-19

446

Switch for serial or parallel communication networks  

DOEpatents

A communication switch apparatus and a method for use in a geographically extensive serial, parallel or hybrid communication network linking a multi-processor or parallel processing system has a very low software processing overhead in order to accommodate random burst of high density data. Associated with each processor is a communication switch. A data source and a data destination, a sensor suite or robot for example, may also be associated with a switch. The configuration of the switches in the network are coordinated through a master processor node and depends on the operational phase of the multi-processor network: data acquisition, data processing, and data exchange. The master processor node passes information on the state to be assumed by each switch to the processor node associated with the switch. The processor node then operates a series of multi-state switches internal to each communication switch. The communication switch does not parse and interpret communication protocol and message routing information. During a data acquisition phase, the communication switch couples sensors producing data to the processor node associated with the switch, to a downlink destination on the communications network, or to both. It also may couple an uplink data source to its processor node. During the data exchange phase, the switch couples its processor node or an uplink data source to a downlink destination (which may include a processor node or a robot), or couples an uplink source to its processor node and its processor node to a downlink destination.

Crosette, Dario B. (DeSoto, TX)

1994-01-01

447

Dry thermal processor  

SciTech Connect

A dry thermal processor for recovering vaporized substances from particulate host solids is described, comprising: inner and outer, radially spaced apart, interconnected tubular members which rotate together in response to a drive applied to the outer tubular member; said inner tubular member comprising a plurality of substantially parallel spaced apart preheat tubes each having first and second ends, a vaporization tube having first and second ends, and tubular junction means whereby sequential preheat and reaction zones are provided by the inner tubular member; means for passing solids from the preheat zone to the reaction zone and restricting gas movement there between; said tubular members forming an annular space with first and second ends between them to provide sequential open combustion and heat transfer zones proximate the reaction zone and preheat zone, respectively; means for closing the second end of the vaporization tube; means for passing coked solids from the second end of the reaction zone, through said vaporization tube closing means, into the combustion zone and restricting gas movement between said zones; means for recycling hot solids from the second end of the combustion zone into the first end of the reaction zone and restricting gas movement there between; means for drawing gases separately from the preheat zone, the reaction zone, and the annular space; means extending into the combustion zone for injecting oxidizing gas and for supplying supplemental heat there into; said outer tubular member carrying internal lifters in the combustion zone for lifting and dropping coked solids passing therethrough to assist combustion; stationary end frames associated with the tubular members and closing the first and second ends of the annular space; means for feeding feedstock into the preheat tubes; means for removing cooled solids from the annular space through the first end frame; and means for rotating the outer tubular member.

Taciuk, W.; Caple, R.; Goodwin, S.; Taciuk, G.

1993-06-08

448

Construction of response patterns for metal cations by using a fluorescent conjugated polymer sensor array from parallel combinatorial synthesis.  

PubMed

Pattern-based strategy is an emerging field of interest for effective sensing applications. Seven different conjugated polymers from combinatorial synthesis were combined into a sensor array, and seven metal cations were selected as representative analytes. The response patterns for each cation were constructed by collecting the individual fluorescence responses from seven polymers in the array. Each ion owns a characteristic pattern. Some of them have similar modes of response with subtle differences, while some patterns are distinctively different. The family/period the metal cations belong to and the charges/electronic configurations they possess may account for such similarity/difference in the pattern. PMID:24611913

Xu, Haibo; Wu, Wei; Chen, Yun; Qiu, Tian; Fan, Li-Juan

2014-04-01

449

Efficient execution of Kahn process networks on multi-processor systems using protothreads and windowed FIFOs  

Microsoft Academic Search

As single-processor systems are ceasing to scale effectively, multi-processor systems are becoming more and more popular. While there are many challenges of designing multi-processor systems in hardware, writing efficient parallel applications that utilize the computing capability of multiple processors may reveal to be even more challenging. In this paper, we introduce a framework that allows to efficiently execute applications expressed

Wolfgang Haid; Lars Schor; Kai Huang; Iuliana Bacivarov; Lothar Thiele

2009-01-01

450

Gang scheduling a parallel machine  

SciTech Connect

Program development on parallel machines can be a nightmare of scheduling headaches. We have developed a portable time sharing mechanism to handle the problem of scheduling gangs of processors. User program and their gangs of processors are put to sleep and awakened by the gang scheduler to provide a time sharing environment. Time quantums are adjusted according to priority queues and a system of fair share accounting. The initial platform for this software is the 128 processor BBN TC2000 in use in the Massively Parallel Computing Initiative at the Lawrence Livermore National Laboratory. 2 refs., 1 fig.

Gorda, B.C.; Brooks, E.D. III.

1991-03-01

451

Communication-Minimal Partitioning of Parallel Loops and Data Arrays for Cache-Coherent Distributed-Memory Multiprocessors  

Microsoft Academic Search

Harnessing the full performance potential of cache-coherent distributed shared memory multiprocessors without inor- dinate user effort requires a compilation technology that can automatically manage multiple levels of memory hierarchy. This paper describes a working compiler for such machines that automatically partitions loops and data arrays to optimize locality of access. The compiler implements a solution to the problem of finding

Rajeev Barua; David A. Kranz; Anant Agarwal

1996-01-01

452

Utilizing Maximum Power Point Trackers in Parallel to Maximize the Power Output of a Solar (Photovoltaic) Array.  

National Technical Information Service (NTIS)

It is common when optimizing a photovoltaic (PV) system to use a maximum power point tracker (MPPT) to increase the power output of the solar array. Currently, most military applications that utilize solar energy omit or use only a single MPPT per PV syst...

C. A. Stephenson

2012-01-01

453

Texas Instruments' DLP products massively paralleled MOEMS arrays for display applications: a distant second to Mother Nature  

NASA Astrophysics Data System (ADS)

This paper describes the business scope to which DLP® Products works under with emphasis placed upon some of the technological complications and challenges present when developing an actuator array with the ultimate intention of rendering visual content at high-definition and standard video rates. Additionally, some general thoughts on alternative applications of this spatial light modulation technology are provided.

Oden, P. I.

2008-05-01

454

Stochastic propagation of an array of parallel cracks: Exploratory work on matrix fatigue damage in composite laminates  

Microsoft Academic Search

Transverse cracking of polymeric matrix materials is an important fatigue damage mechanism in continuous-fiber composite laminates. The propagation of an array of these cracks is a stochastic problem usually treated by Monte Carlo methods. However, this exploratory work proposes an alternative approach wherein the Monte Carlo method is replaced by a more closed-form recursion relation based on fractional Brownian motion.''

Williford

1989-01-01

455

Parallel nearest neighbor calculations  

NASA Astrophysics Data System (ADS)

We are just starting to parallelize the nearest neighbor portion of our free-Lagrange code. Our implementation of the nearest neighbor reconnection algorithm has not been parallelizable (i.e., we just flip one connection at a time). In this paper we consider what sort of nearest neighbor algorithms lend themselves to being parallelized. For example, the construction of the Voronoi mesh can be parallelized, but the construction of the Delaunay mesh (dual to the Voronoi mesh) cannot because of degenerate connections. We will show our most recent attempt to tessellate space with triangles or tetrahedrons with a new nearest neighbor construction algorithm called DAM (Dial-A-Mesh). This method has the characteristics of a parallel algorithm and produces a better tessellation of space than the Delaunay mesh. Parallel processing is becoming an everyday reality for us at Los Alamos. Our current production machines are Cray YMPs with 8 processors that can run independently or combined to work on one job. We are also exploring massive parallelism through the use of two 64K processor Connection Machines (CM2), where all the processors run in lock step mode. The effective application of 3-D computer models requires the use of parallel processing to achieve reasonable "turn around" times for our calculations.

Trease, Harold

456

The Function Processor: An Architecture for Efficient Execution of Recursive Functions  

Microsoft Academic Search

The Function Processor is a wavefront array architecture, i.e., a regular structure of locally interconnected processing elements called Function Cells, which operate according to the data flow execution principle. By means of a compilation method developed for this architecture, data flow graphs for functional programs can be created and mapped onto the processor array, so that each Function Cell is

Jonas Vasell; Jesper Vasell

1991-01-01