Science.gov

Sample records for parallel processor array

  1. Integration of IR focal plane arrays with massively parallel processor

    NASA Astrophysics Data System (ADS)

    Esfandiari, P.; Koskey, P.; Vaccaro, K.; Buchwald, W.; Clark, F.; Krejca, B.; Rekeczky, C.; Zarandy, A.

    2008-04-01

    The intent of this investigation is to replace the low fill factor visible sensor of a Cellular Neural Network (CNN) processor with an InGaAs Focal Plane Array (FPA) using both bump bonding and epitaxial layer transfer techniques for use in the Ballistic Missile Defense System (BMDS) interceptor seekers. The goal is to fabricate a massively parallel digital processor with a local as well as a global interconnect architecture. Currently, this unique CNN processor is capable of processing a target scene in excess of 10,000 frames per second with its visible sensor. What makes the CNN processor so unique is that each processing element includes memory, local data storage, local and global communication devices and a visible sensor supported by a programmable analog or digital computer program.

  2. Digital Parallel Processor Array for Optimum Path Planning

    NASA Technical Reports Server (NTRS)

    Kremeny, Sabrina E. (Inventor); Fossum, Eric R. (Inventor); Nixon, Robert H. (Inventor)

    1996-01-01

    The invention computes the optimum path across a terrain or topology represented by an array of parallel processor cells interconnected between neighboring cells by links extending along different directions to the neighboring cells. Such an array is preferably implemented as a high-speed integrated circuit. The computation of the optimum path is accomplished by, in each cell, receiving stimulus signals from neighboring cells along corresponding directions, determining and storing the identity of a direction along which the first stimulus signal is received, broadcasting a subsequent stimulus signal to the neighboring cells after a predetermined delay time, whereby stimulus signals propagate throughout the array from a starting one of the cells. After propagation of the stimulus signal throughout the array, a master processor traces back from a selected destination cell to the starting cell along an optimum path of the cells in accordance with the identity of the directions stored in each of the cells.

  3. Parallel processing in a host plus multiple array processor system for radar

    NASA Technical Reports Server (NTRS)

    Barkan, B. Z.

    1983-01-01

    Host plus multiple array processor architecture is demonstrated to yield a modular, fast, and cost-effective system for radar processing. Software methodology for programming such a system is developed. Parallel processing with pipelined data flow among the host, array processors, and discs is implemented. Theoretical analysis of performance is made and experimentally verified. The broad class of problems to which the architecture and methodology can be applied is indicated.

  4. Massively parallel processor computer

    NASA Technical Reports Server (NTRS)

    Fung, L. W. (Inventor)

    1983-01-01

    An apparatus for processing multidimensional data with strong spatial characteristics, such as raw image data, characterized by a large number of parallel data streams in an ordered array is described. It comprises a large number (e.g., 16,384 in a 128 x 128 array) of parallel processing elements operating simultaneously and independently on single bit slices of a corresponding array of incoming data streams under control of a single set of instructions. Each of the processing elements comprises a bidirectional data bus in communication with a register for storing single bit slices together with a random access memory unit and associated circuitry, including a binary counter/shift register device, for performing logical and arithmetical computations on the bit slices, and an I/O unit for interfacing the bidirectional data bus with the data stream source. The massively parallel processor architecture enables very high speed processing of large amounts of ordered parallel data, including spatial translation by shifting or sliding of bits vertically or horizontally to neighboring processing elements.

  5. Fast, Massively Parallel Data Processors

    NASA Technical Reports Server (NTRS)

    Heaton, Robert A.; Blevins, Donald W.; Davis, ED

    1994-01-01

    Proposed fast, massively parallel data processor contains 8x16 array of processing elements with efficient interconnection scheme and options for flexible local control. Processing elements communicate with each other on "X" interconnection grid with external memory via high-capacity input/output bus. This approach to conditional operation nearly doubles speed of various arithmetic operations.

  6. Spaceborne Processor Array

    NASA Technical Reports Server (NTRS)

    Chow, Edward T.; Schatzel, Donald V.; Whitaker, William D.; Sterling, Thomas

    2008-01-01

    A Spaceborne Processor Array in Multifunctional Structure (SPAMS) can lower the total mass of the electronic and structural overhead of spacecraft, resulting in reduced launch costs, while increasing the science return through dynamic onboard computing. SPAMS integrates the multifunctional structure (MFS) and the Gilgamesh Memory, Intelligence, and Network Device (MIND) multi-core in-memory computer architecture into a single-system super-architecture. This transforms every inch of a spacecraft into a sharable, interconnected, smart computing element to increase computing performance while simultaneously reducing mass. The MIND in-memory architecture provides a foundation for high-performance, low-power, and fault-tolerant computing. The MIND chip has an internal structure that includes memory, processing, and communication functionality. The Gilgamesh is a scalable system comprising multiple MIND chips interconnected to operate as a single, tightly coupled, parallel computer. The array of MIND components shares a global, virtual name space for program variables and tasks that are allocated at run time to the distributed physical memory and processing resources. Individual processor- memory nodes can be activated or powered down at run time to provide active power management and to configure around faults. A SPAMS system is comprised of a distributed Gilgamesh array built into MFS, interfaces into instrument and communication subsystems, a mass storage interface, and a radiation-hardened flight computer.

  7. QUEN - The APL wavefront array processor

    SciTech Connect

    Dolecek, Q.E. )

    1989-09-01

    Developments in computer networks are making parallel processing machines accessible to an increasing number of scientists and engineers. Several vector and array processors are already commercially available, as are costly systolic, wavefront, and massive parallel processors. This article discusses the Applied Physics Laboratory's entry: a low-cost, memory-linked wavefront array processor that can be used as a peripheral on existing computers. Available today as the family of QUEN processors, it is the first commercial parallel processor to bring Cray 1 computation speeds into the minicomputer price range. 5 refs.

  8. Intermediate Level Computer Vision Processing Algorithm Development for the Content Addressable Array Parallel Processor.

    DTIC Science & Technology

    1986-11-29

    Madison, Wiscon- sin, August 1982. [161 Fitzpatrick, D. T., Foderaro, J. K., Katevenis, M . G. H., Landman, H. A.. Patterson, D. A., Peek, J. B ., Peshkess...October 18-22, 1982. [33] Levitan , S. P., Parallel Algorithms and Architectures: A Programmer’s Per- 35 AN I%. . m ,,-1we, V .r V . , - .7...e. . . e. ** -! ~ * ~ - . . . . . 0.Wty C^11Cri m . op~ bo* pa, U FILE- copy(4 REPORT DOCUMENTATION PAGE e PQTSIC%.RSTV C6AUSIPCATION 16

  9. Parallel processor engine model program

    NASA Technical Reports Server (NTRS)

    Mclaughlin, P.

    1984-01-01

    The Parallel Processor Engine Model Program is a generalized engineering tool intended to aid in the design of parallel processing real-time simulations of turbofan engines. It is written in the FORTRAN programming language and executes as a subset of the SOAPP simulation system. Input/output and execution control are provided by SOAPP; however, the analysis, emulation and simulation functions are completely self-contained. A framework in which a wide variety of parallel processing architectures could be evaluated and tools with which the parallel implementation of a real-time simulation technique could be assessed are provided.

  10. Parallel Analog-to-Digital Image Processor

    NASA Technical Reports Server (NTRS)

    Lokerson, D. C.

    1987-01-01

    Proposed integrated-circuit network of many identical units convert analog outputs of imaging arrays of x-ray or infrared detectors to digital outputs. Converter located near imaging detectors, within cryogenic detector package. Because converter output digital, lends itself well to multiplexing and to postprocessing for correction of gain and offset errors peculiar to each picture element and its sampling and conversion circuits. Analog-to-digital image processor is massively parallel system for processing data from array of photodetectors. System built as compact integrated circuit located near local plane. Buffer amplifier for each picture element has different offset.

  11. Plasma simulation using the massively parallel processor

    NASA Technical Reports Server (NTRS)

    Lin, C. S.; Thring, A. L.; Koga, J.; Janetzke, R. W.

    1987-01-01

    Two dimensional electrostatic simulation codes using the particle-in-cell model are developed on the Massively Parallel Processor (MPP). The conventional plasma simulation procedure that computes electric fields at particle positions by means of a gridded system is found inefficient on the MPP. The MPP simulation code is thus based on the gridless system in which particles are assigned to processing elements and electric fields are computed directly via Discrete Fourier Transform. Currently, the gridless model on the MPP in two dimensions is about nine times slower that the gridded system on the CRAY X-MP without considering I/O time. However, the gridless system on the MPP can be improved by incorporating a faster I/O between the staging memory and Array Unit and a more efficient procedure for taking floating point sums over processing elements. The initial results suggest that the parallel processors have the potential for performing large scale plasma simulations.

  12. Globality and speed of optical parallel processors.

    PubMed

    Lohmann, A W; Marathay, A S

    1989-09-15

    The chances of optical computing are probably best if a large number of processing elements act in parallel. The efficiency of parallel processors depends, among other things, on the time it takes to communicate signals from one processor to any other processor. In an optical parallel processor one hopes to be able to transmit a signal from one processor to any other processor within only one cycle period, no matter how far apart the processors are. Such a global communications network is desirable especially for algorithms with global interactions. The fast Fourier algorithm is an example. We define a degree of globality and we show how speed and globality are related. Our result applies to a specific architecture based on spatial filtering.

  13. Molecular fingerprinting on the SIMD parallel processor Kestrel.

    PubMed

    Rice, E; Hughey, R

    2001-01-01

    In combinatorial library design and use, the conformation space of molecules can be represented using three-dimensional (3-D) pharmacophores. For large libraries of flexible molecules, the calculation of these 3-D pharmacophoric fingerprints can require examination of trillions of pharmacophores, presenting a significant practical challenge. Here we describe the mapping of this problem to the UCSC Kestrel parallel processor, a single-instruction multiple-data (SIMD) processor. Data parallelism is achieved by simultaneous processing of multiple conformations and by careful representation of the fingerprint structure in the array. The resulting application achieved a 35+ speedup over an SGI 2000 processor on the prototype Kestrel board.

  14. Assignment of job modules onto array processors

    SciTech Connect

    Fukunaga, K.; Yamada, S.; Kasai, T.

    1987-07-01

    This paper deals with the optimum assignment of job modules onto array processors. In array processors it is important to assign job modules onto processors such that the modules that communicate with each other are assigned to adjacent processors, because communication overhead increases as communications occur between processors that are remotely connected. The authors propose an efficient algorithm to solve this assignment problem for a specific array of processors. The algorithm reduces the quadratic problem to a solvable linear problem that produces a good, but not necessarily optimal solution. This is followed by a phase of iterations in which the solution is improved by small perturbation of the assignment.

  15. Array Processor Has Power and Flexibility

    NASA Technical Reports Server (NTRS)

    Barnes, G. H.; Lundstrom, S. F.; Shafer, P. E.

    1982-01-01

    Proposed processor architecture would have flexibility of a multi-processor and computational power of a lockstep array. Using an efficient interconnection network, it accomodates a large number of individual processors and memory modules. Array architecture would be suitable for very large scientific simulation problems and other applications.

  16. Balancing Loads Among Parallel Data Processors

    NASA Technical Reports Server (NTRS)

    Baffes, Paul Thomas

    1990-01-01

    Heuristic algorithm minimizes amount of memory used by multiprocessor system. Distributes load of many identical, short computations among multiple parallel digital data processors, each of which has its own (local) memory. Each processor operates on distinct and independent set of data in larger shared memory. As integral part of load-balancing scheme, total amount of space used in shared memory minimized. Possible applications include artificial neural networks or image processors for which "pipeline" and vector methods of load balancing inappropriate.

  17. Adapting implicit methods to parallel processors

    SciTech Connect

    Reeves, L.; McMillin, B.; Okunbor, D.; Riggins, D.

    1994-12-31

    When numerically solving many types of partial differential equations, it is advantageous to use implicit methods because of their better stability and more flexible parameter choice, (e.g. larger time steps). However, since implicit methods usually require simultaneous knowledge of the entire computational domain, these methods axe difficult to implement directly on distributed memory parallel processors. This leads to infrequent use of implicit methods on parallel/distributed systems. The usual implementation of implicit methods is inefficient due to the nature of parallel systems where it is common to take the computational domain and distribute the grid points over the processors so as to maintain a relatively even workload per processor. This creates a problem at the locations in the domain where adjacent points are not on the same processor. In order for the values at these points to be calculated, messages have to be exchanged between the corresponding processors. Without special adaptation, this will result in idle processors during part of the computation, and as the number of idle processors increases, the lower the effective speed improvement by using a parallel processor.

  18. Optical logic array processor using shadowgrams

    NASA Astrophysics Data System (ADS)

    Tanida, J.; Ichioka, Y.

    1983-06-01

    On the basis of a lensless shadow-casting technique, a new, simple method of optically implementing digital logic gates has been developed. These gates are capable of performimg a complete set of logical operations on a large array of binary variables in parallel, i.e., the pattern logics. A light-emitting diode (LED) array is used as an incoherent light source in the lensless shadow-casting system. Sixteen possible functions of two binary variables are simply realizable with these gates in parallel by controlling the switching modes of the LEDs. Experimental results demonstrate the feasibility of various gate arrays, such as AND, OR, NOR, XOR, and NAND. As an example of application of the proposed method, an optical logic array processor is constructed that can implement parallel operations of addition or subtraction for two binary variables without considering the carry mechanisms. Use of the light-modulated LED array means that the proposed method can be applied to combinational circuits.

  19. The Use of a Microcomputer Based Array Processor for Real Time Laser Velocimeter Data Processing

    NASA Technical Reports Server (NTRS)

    Meyers, James F.

    1990-01-01

    The application of an array processor to laser velocimeter data processing is presented. The hardware is described along with the method of parallel programming required by the array processor. A portion of the data processing program is described in detail. The increase in computational speed of a microcomputer equipped with an array processor is illustrated by comparative testing with a minicomputer.

  20. Optical Interferometric Parallel Data Processor

    NASA Technical Reports Server (NTRS)

    Breckinridge, J. B.

    1987-01-01

    Image data processed faster than in present electronic systems. Optical parallel-processing system effectively calculates two-dimensional Fourier transforms in time required by light to travel from plane 1 to plane 8. Coherence interferometer at plane 4 splits light into parts that form double image at plane 6 if projection screen placed there.

  1. Mapping between parallel processor structures and programs

    NASA Technical Reports Server (NTRS)

    Ngai, Tin-Fook; Yan, Jerry C.; Mak, Victor W. K.; Flynn, Michael J.; Lundstrom, Stephen F.

    1987-01-01

    This paper reports some ongoing research efforts at Stanford in allocation of parallel processing resources. Both processor structures and program structures have their own characteristics. Resource allocation binds the two structures during program execution. The mapping problem determines what processor structure and program structure may be combined to obtain maximum speedup. Three approaches to this mapping problem are considered. Two important factors, granularity and interaction delay, are also considered. A new hierarchical approach to structure definition is outlined. Effective and efficient tools are necessary for the study of the mapping problem. A fast turn-around simulation environment developed for investigating partition strategies for distributed computations and a computationally efficient method to predict performance of parallel processor structures are described.

  2. Joint Experimentation on Scalable Parallel Processors (JESPP)

    DTIC Science & Technology

    2006-04-01

    SCALABLE PARALLEL PROCESSORS (JESPP) 6. AUTHOR(S) Dan M. Davis, Robert F. Lucas, Ke-Thia Yao, Gene Wagenbreth 5. FUNDING NUMBERS C...List of Papers • Robert J. Graebener, Gregory Rafuse, Robert Miller & Ke-Thia Yao, “The Road to Successful Joint Experimentation Starts at the...2003. • Robert F. Lucas & Dan M. Davis, “Joint Experimentation on Scalable Parallel Processors“, Interservice/Industry Training, Simulation, and

  3. Assignment Of Finite Elements To Parallel Processors

    NASA Technical Reports Server (NTRS)

    Salama, Moktar A.; Flower, Jon W.; Otto, Steve W.

    1990-01-01

    Elements assigned approximately optimally to subdomains. Mapping algorithm based on simulated-annealing concept used to minimize approximate time required to perform finite-element computation on hypercube computer or other network of parallel data processors. Mapping algorithm needed when shape of domain complicated or otherwise not obvious what allocation of elements to subdomains minimizes cost of computation.

  4. Processor arrays with asynchronous TDM optical buses

    NASA Astrophysics Data System (ADS)

    Li, Y.; Zheng, S. Q.

    1997-04-01

    We propose a pipelined asynchronous time division multiplexing optical bus. Such a bus can use one of the two hardwared priority schemes, the linear priority scheme and the round-robin priority scheme. Our simulation results show that the performances of our proposed buses are significantly better than the performances of known pipelined synchronous time division multiplexing optical buses. We also propose a class of processor arrays connected by pipelined asynchronous time division multiplexing optical buses. We claim that our proposed processor array not only have better performance, but also have better scalabilities than the existing processor arrays connected by pipelined synchronous time division multiplexing optical buses.

  5. Scalable Unix tools on parallel processors

    SciTech Connect

    Gropp, W.; Lusk, E.

    1994-12-31

    The introduction of parallel processors that run a separate copy of Unix on each process has introduced new problems in managing the user`s environment. This paper discusses some generalizations of common Unix commands for managing files (e.g. 1s) and processes (e.g. ps) that are convenient and scalable. These basic tools, just like their Unix counterparts, are text-based. We also discuss a way to use these with a graphical user interface (GUI). Some notes on the implementation are provided. Prototypes of these commands are publicly available.

  6. Broadband monitoring simulation with massively parallel processors

    NASA Astrophysics Data System (ADS)

    Trubetskov, Mikhail; Amotchkina, Tatiana; Tikhonravov, Alexander

    2011-09-01

    Modern efficient optimization techniques, namely needle optimization and gradual evolution, enable one to design optical coatings of any type. Even more, these techniques allow obtaining multiple solutions with close spectral characteristics. It is important, therefore, to develop software tools that can allow one to choose a practically optimal solution from a wide variety of possible theoretical designs. A practically optimal solution provides the highest production yield when optical coating is manufactured. Computational manufacturing is a low-cost tool for choosing a practically optimal solution. The theory of probability predicts that reliable production yield estimations require many hundreds or even thousands of computational manufacturing experiments. As a result reliable estimation of the production yield may require too much computational time. The most time-consuming operation is calculation of the discrepancy function used by a broadband monitoring algorithm. This function is formed by a sum of terms over wavelength grid. These terms can be computed simultaneously in different threads of computations which opens great opportunities for parallelization of computations. Multi-core and multi-processor systems can provide accelerations up to several times. Additional potential for further acceleration of computations is connected with using Graphics Processing Units (GPU). A modern GPU consists of hundreds of massively parallel processors and is capable to perform floating-point operations efficiently.

  7. Phased array antenna beamforming using optical processor

    NASA Technical Reports Server (NTRS)

    Anderson, L. P.; Boldissar, F.; Chang, D. C. D.

    1991-01-01

    The feasibility of optical processor based beamforming for microwave array antennas is investigated. The primary focus is on systems utilizing the 20/30 GHz communications band and a transmit configuration exclusively to serve this band. A mathematical model is developed for computation of candidate design configurations. The model is capable of determination of the necessary design parameters required for spatial aspects of the microwave 'footprint' (beam) formation. Computed example beams transmitted from geosynchronous orbit are presented to demonstrate network capabilities. The effect of the processor on the output microwave signal to noise quality at the antenna interface is also considered.

  8. Real time processor for array speckle interferometry

    NASA Technical Reports Server (NTRS)

    Chin, Gordon; Florez, Jose; Borelli, Renan; Fong, Wai; Miko, Joseph; Trujillo, Carlos

    1989-01-01

    The authors are constructing a real-time processor to acquire image frames, perform array flat-fielding, execute a 64 x 64 element two-dimensional complex FFT (fast Fourier transform) and average the power spectrum, all within the 25 ms coherence time for speckles at near-IR (infrared) wavelength. The processor will be a compact unit controlled by a PC with real-time display and data storage capability. This will provide the ability to optimize observations and obtain results on the telescope rather than waiting several weeks before the data can be analyzed and viewed with offline methods. The image acquisition and processing, design criteria, and processor architecture are described.

  9. Efficient searching and sorting applications using an associative array processor

    NASA Technical Reports Server (NTRS)

    Pace, W.; Quinn, M. J.

    1978-01-01

    The purpose of this paper is to describe a method of searching and sorting data by using some of the unique capabilities of an associative array processor. To understand the application, the associative array processor is described in detail. In particular, the content addressable memory and flip network are discussed because these two unique elements give the associative array processor the power to rapidly sort and search. A simple alphanumeric sorting example is explained in hardware and software terms. The hardware used to explain the application is the STARAN (Goodyear Aerospace Corporation) associative array processor. The software used is the APPLE (Array Processor Programming Language) programming language. Some applications of the array processor are discussed. This summary tries to differentiate between the techniques of the sequential machine and the associative array processor.

  10. Global Arrays Parallel Programming Toolkit

    SciTech Connect

    Nieplocha, Jaroslaw; Krishnan, Manoj Kumar; Palmer, Bruce J.; Tipparaju, Vinod; Harrison, Robert J.; Chavarría-Miranda, Daniel

    2011-01-01

    The two predominant classes of programming models for parallel computing are distributed memory and shared memory. Both shared memory and distributed memory models have advantages and shortcomings. Shared memory model is much easier to use but it ignores data locality/placement. Given the hierarchical nature of the memory subsystems in modern computers this characteristic can have a negative impact on performance and scalability. Careful code restructuring to increase data reuse and replacing fine grain load/stores with block access to shared data can address the problem and yield performance for shared memory that is competitive with message-passing. However, this performance comes at the cost of compromising the ease of use that the shared memory model advertises. Distributed memory models, such as message-passing or one-sided communication, offer performance and scalability but they are difficult to program. The Global Arrays toolkit attempts to offer the best features of both models. It implements a shared-memory programming model in which data locality is managed by the programmer. This management is achieved by calls to functions that transfer data between a global address space (a distributed array) and local storage. In this respect, the GA model has similarities to the distributed shared-memory models that provide an explicit acquire/release protocol. However, the GA model acknowledges that remote data is slower to access than local data and allows data locality to be specified by the programmer and hence managed. GA is related to the global address space languages such as UPC, Titanium, and, to a lesser extent, Co-Array Fortran. In addition, by providing a set of data-parallel operations, GA is also related to data-parallel languages such as HPF, ZPL, and Data Parallel C. However, the Global Array programming model is implemented as a library that works with most languages used for technical computing and does not rely on compiler technology for achieving

  11. Scan line graphics generation on the massively parallel processor

    NASA Technical Reports Server (NTRS)

    Dorband, John E.

    1988-01-01

    Described here is how researchers implemented a scan line graphics generation algorithm on the Massively Parallel Processor (MPP). Pixels are computed in parallel and their results are applied to the Z buffer in large groups. To perform pixel value calculations, facilitate load balancing across the processors and apply the results to the Z buffer efficiently in parallel requires special virtual routing (sort computation) techniques developed by the author especially for use on single-instruction multiple-data (SIMD) architectures.

  12. MILP model for resource disruption in parallel processor system

    NASA Astrophysics Data System (ADS)

    Nordin, Syarifah Zyurina; Caccetta, Louis

    2015-02-01

    In this paper, we consider the existence of disruption on unrelated parallel processor scheduling system. The disruption occurs due to a resource shortage where one of the parallel processors is facing breakdown problem during the task allocation, which give impact to the initial scheduling plan. Our objective is to reschedule the original unrelated parallel processor scheduling after the resource disruption that minimizes the makespan. A mixed integer linear programming model is presented for the recovery scheduling that considers the post-disruption policy. We conduct a computational experiment with different stopping time limit to see the performance of the model by using CPLEX 12.1 solver in AIMMS 3.10 software.

  13. A systolic array parallelizing compiler

    SciTech Connect

    Tseng, P.S. )

    1990-01-01

    This book presents a completely new approach to the problem of systolic array parallelizing compiler. It describes the AL parallelizing compiler for the Warp systolic array, the first working systolic array parallelizing compiler which can generate efficient parallel code for complete LINPACK routines. This book begins by analyzing the architectural strength of the Warp systolic array. It proposes a model for mapping programs onto the machine and introduces the notion of data relations for optimizing the program mapping. Also presented are successful applications of the AL compiler in matrix computation and image processing. A complete listing of the source program and compiler-generated parallel code are given to clarify the overall picture of the compiler. The book concludes that systolic array parallelizing compiler can produce efficient parallel code, almost identical to what the user would have written by hand.

  14. Chemical network problems solved on NASA/Goddard's massively parallel processor computer

    NASA Technical Reports Server (NTRS)

    Cho, Seog Y.; Carmichael, Gregory R.

    1987-01-01

    The single instruction stream, multiple data stream Massively Parallel Processor (MPP) unit consists of 16,384 bit serial arithmetic processors configured as a 128 x 128 array whose speed can exceed that of current supercomputers (Cyber 205). The applicability of the MPP for solving reaction network problems is presented and discussed, including the mapping of the calculation to the architecture, and CPU timing comparisons.

  15. Breadboard Signal Processor for Arraying DSN Antennas

    NASA Technical Reports Server (NTRS)

    Jongeling, Andre; Sigman, Elliott; Chandra, Kumar; Trinh, Joseph; Soriano, Melissa; Navarro, Robert; Rogstad, Stephen; Goodhart, Charles; Proctor, Robert; Jourdan, Michael; Rayhrer, Benno

    2008-01-01

    A recently developed breadboard version of an advanced signal processor for arraying many antennas in NASA s Deep Space Network (DSN) can accept inputs in a 500-MHz-wide frequency band from six antennas. The next breadboard version is expected to accept inputs from 16 antennas, and a following developed version is expected to be designed according to an architecture that will be scalable to accept inputs from as many as 400 antennas. These and similar signal processors could also be used for combining multiple wide-band signals in non-DSN applications, including very-long-baseline interferometry and telecommunications. This signal processor performs functions of a wide-band FX correlator and a beam-forming signal combiner. [The term "FX" signifies that the digital samples of two given signals are fast Fourier transformed (F), then the fast Fourier transforms of the two signals are multiplied (X) prior to accumulation.] In this processor, the signals from the various antennas are broken up into channels in the frequency domain (see figure). In each frequency channel, the data from each antenna are correlated against the data from each other antenna; this is done for all antenna baselines (that is, for all antenna pairs). The results of the correlations are used to obtain calibration data to align the antenna signals in both phase and delay. Data from the various antenna frequency channels are also combined and calibration corrections are applied. The frequency-domain data thus combined are then synthesized back to the time domain for passing on to a telemetry receiver

  16. High density packaging and interconnect of massively parallel image processors

    NASA Technical Reports Server (NTRS)

    Carson, John C.; Indin, Ronald J.

    1991-01-01

    This paper presents conceptual designs for high density packaging of parallel processing systems. The systems fall into two categories: global memory systems where many processors are packaged into a stack, and distributed memory systems where a single processor and many memory chips are packaged into a stack. Thermal behavior and performance are discussed.

  17. Digital image processing software system using an array processor

    SciTech Connect

    Sherwood, R.J.; Portnoff, M.R.; Journeay, C.H.; Twogood, R.E.

    1981-03-10

    A versatile array processor-based system for general-purpose image processing was developed. At the heart of this system is an extensive, flexible software package that incorporates the array processor for effective interactive image processing. The software system is described in detail, and its application to a diverse set of applications at LLNL is briefly discussed. 4 figures, 1 table.

  18. Direct simulation Monte Carlo analysis on parallel processors

    NASA Technical Reports Server (NTRS)

    Wilmoth, Richard G.

    1989-01-01

    A method is presented for executing a direct simulation Monte Carlo (DSMC) analysis using parallel processing. The method is based on using domain decomposition to distribute the work load among multiple processors, and the DSMC analysis is performed completely in parallel. Message passing is used to transfer molecules between processors and to provide the synchronization necessary for the correct physical simulation. Benchmark problems are described for testing the method and results are presented which demonstrate the performance on two commercially available multicomputers. The results show that reasonable parallel speedup and efficiency can be obtained if the problem is properly sized to the number of processors. It is projected that with a massively parallel system, performance exceeding that of current supercomputers is possible.

  19. Parallel processor-based raster graphics system architecture

    DOEpatents

    Littlefield, Richard J.

    1990-01-01

    An apparatus for generating raster graphics images from the graphics command stream includes a plurality of graphics processors connected in parallel, each adapted to receive any part of the graphics command stream for processing the command stream part into pixel data. The apparatus also includes a frame buffer for mapping the pixel data to pixel locations and an interconnection network for interconnecting the graphics processors to the frame buffer. Through the interconnection network, each graphics processor may access any part of the frame buffer concurrently with another graphics processor accessing any other part of the frame buffer. The plurality of graphics processors can thereby transmit concurrently pixel data to pixel locations in the frame buffer.

  20. Joint Experiment on Scalable Parallel Processors (JESPP) Parallel Data Management

    DTIC Science & Technology

    2006-05-01

    thousand entities to a WAN including multiple Beowulf clusters and hundreds of processors simulating hundreds of thousands of entities. An...support larger simulations on Beowulf clusters ISI implemented a distributed logger. Data is logged locally on each processor running a simulator...development and execution effort (Lucas, 2003). Common SPPs include the IBM SP, SGI Origin, Cray T3E, and the “ Beowulf ” Linux clusters. Traditionally

  1. Massively Parallel MRI Detector Arrays

    PubMed Central

    Keil, Boris; Wald, Lawrence L

    2013-01-01

    Originally proposed as a method to increase sensitivity by extending the locally high-sensitivity of small surface coil elements to larger areas, the term parallel imaging now includes the use of array coils to perform image encoding. This methodology has impacted clinical imaging to the point where many examinations are performed with an array comprising multiple smaller surface coil elements as the detector of the MR signal. This article reviews the theoretical and experimental basis for the trend towards higher channel counts relying on insights gained from modeling and experimental studies as well as the theoretical analysis of the so-called “ultimate” SNR and g-factor. We also review the methods for optimally combining array data and changes in RF methodology needed to construct massively parallel MRI detector arrays and show some examples of state-of-the-art for highly accelerated imaging with the resulting highly parallel arrays. PMID:23453758

  2. Diagnosis and reconfiguration of VLSI/WSI array processors

    SciTech Connect

    Wang, M.

    1988-01-01

    Some fault-tolerant techniques and analytical methods are presented for linear, mesh, and tree array processors which are implemented in Very Large Scale Integration (VLSI) circuits or Wafer Scale Integration (WSI) circuits. Several techniques are developed for testing, diagnosis, on-line fault detection and reconfiguration of array processors. A testing strategy, built-in self-test, is presented for array processors to achieve the C-testability by which the test length is independent of the size of the array. The signature comparison approach is used for diagnostic algorithms. Reconfiguration schemes with two-level redundancy for mesh and tree arrays are described. An on-line fault detection scheme by using redundant cells and blocks are developed. Analytical tools for reliability are given for evaluating the proposed schemes. A yield estimation model for WSI mesh array processors with two-level redundancy is presented. Distributed as well as clustered defects are considered in this model.

  3. Global synchronization of parallel processors using clock pulse width modulation

    DOEpatents

    Chen, Dong; Ellavsky, Matthew R.; Franke, Ross L.; Gara, Alan; Gooding, Thomas M.; Haring, Rudolf A.; Jeanson, Mark J.; Kopcsay, Gerard V.; Liebsch, Thomas A.; Littrell, Daniel; Ohmacht, Martin; Reed, Don D.; Schenck, Brandon E.; Swetz, Richard A.

    2013-04-02

    A circuit generates a global clock signal with a pulse width modification to synchronize processors in a parallel computing system. The circuit may include a hardware module and a clock splitter. The hardware module may generate a clock signal and performs a pulse width modification on the clock signal. The pulse width modification changes a pulse width within a clock period in the clock signal. The clock splitter may distribute the pulse width modified clock signal to a plurality of processors in the parallel computing system.

  4. DFT algorithms for bit-serial GaAs array processor architectures

    NASA Technical Reports Server (NTRS)

    Mcmillan, Gary B.

    1988-01-01

    Systems and Processes Engineering Corporation (SPEC) has developed an innovative array processor architecture for computing Fourier transforms and other commonly used signal processing algorithms. This architecture is designed to extract the highest possible array performance from state-of-the-art GaAs technology. SPEC's architectural design includes a high performance RISC processor implemented in GaAs, along with a Floating Point Coprocessor and a unique Array Communications Coprocessor, also implemented in GaAs technology. Together, these data processors represent the latest in technology, both from an architectural and implementation viewpoint. SPEC has examined numerous algorithms and parallel processing architectures to determine the optimum array processor architecture. SPEC has developed an array processor architecture with integral communications ability to provide maximum node connectivity. The Array Communications Coprocessor embeds communications operations directly in the core of the processor architecture. A Floating Point Coprocessor architecture has been defined that utilizes Bit-Serial arithmetic units, operating at very high frequency, to perform floating point operations. These Bit-Serial devices reduce the device integration level and complexity to a level compatible with state-of-the-art GaAs device technology.

  5. Wafer Scale Integration of Parallel Processors.

    DTIC Science & Technology

    1982-11-01

    epartment of Computer S( Ie(( e Matih Scienices Buildling Wes(t IdfdCote. Ind(ian~a 1 7%7 45 82 11 29 003 Uclassif ind 5ECUflITY CLASSIFICATION OF THIS...Lafayette, Indiana 47907____________ IL. CONTROLLING0 OFFICE NAME AND ADDRESS 12. REPORT DATE Office of Naval Research November, 1982 Information ...price model , CHiP computer, switch lattice, two level hierarchy, reflective switch, high parallel computers 20. ABST RAC T (Continue on reverse side

  6. Staging memory for massively parallel processor

    NASA Technical Reports Server (NTRS)

    Batcher, Kenneth E. (Inventor)

    1988-01-01

    The invention herein relates to a computer organization capable of rapidly processing extremely large volumes of data. A staging memory is provided having a main stager portion consisting of a large number of memory banks which are accessed in parallel to receive, store, and transfer data words simultaneous with each other. Substager portions interconnect with the main stager portion to match input and output data formats with the data format of the main stager portion. An address generator is coded for accessing the data banks for receiving or transferring the appropriate words. Input and output permutation networks arrange the lineal order of data into and out of the memory banks.

  7. Real-time trajectory optimization on parallel processors

    NASA Technical Reports Server (NTRS)

    Psiaki, Mark L.

    1993-01-01

    A parallel algorithm has been developed for rapidly solving trajectory optimization problems. The goal of the work has been to develop an algorithm that is suitable to do real-time, on-line optimal guidance through repeated solution of a trajectory optimization problem. The algorithm has been developed on an INTEL iPSC/860 message passing parallel processor. It uses a zero-order-hold discretization of a continuous-time problem and solves the resulting nonlinear programming problem using a custom-designed augmented Lagrangian nonlinear programming algorithm. The algorithm achieves parallelism of function, derivative, and search direction calculations through the principle of domain decomposition applied along the time axis. It has been encoded and tested on 3 example problems, the Goddard problem, the acceleration-limited, planar minimum-time to the origin problem, and a National Aerospace Plane minimum-fuel ascent guidance problem. Execution times as fast as 118 sec of wall clock time have been achieved for a 128-stage Goddard problem solved on 32 processors. A 32-stage minimum-time problem has been solved in 151 sec on 32 processors. A 32-stage National Aerospace Plane problem required 2 hours when solved on 32 processors. A speed-up factor of 7.2 has been achieved by using 32-nodes instead of 1-node to solve a 64-stage Goddard problem.

  8. Potential of minicomputer/array-processor system for nonlinear finite-element analysis

    NASA Technical Reports Server (NTRS)

    Strohkorb, G. A.; Noor, A. K.

    1983-01-01

    The potential of using a minicomputer/array-processor system for the efficient solution of large-scale, nonlinear, finite-element problems is studied. A Prime 750 is used as the host computer, and a software simulator residing on the Prime is employed to assess the performance of the Floating Point Systems AP-120B array processor. Major hardware characteristics of the system such as virtual memory and parallel and pipeline processing are reviewed, and the interplay between various hardware components is examined. Effective use of the minicomputer/array-processor system for nonlinear analysis requires the following: (1) proper selection of the computational procedure and the capability to vectorize the numerical algorithms; (2) reduction of input-output operations; and (3) overlapping host and array-processor operations. A detailed discussion is given of techniques to accomplish each of these tasks. Two benchmark problems with 1715 and 3230 degrees of freedom, respectively, are selected to measure the anticipated gain in speed obtained by using the proposed algorithms on the array processor.

  9. Computations on the massively parallel processor at the Goddard Space Flight Center

    NASA Technical Reports Server (NTRS)

    Strong, James P.

    1991-01-01

    Described are four significant algorithms implemented on the massively parallel processor (MPP) at the Goddard Space Flight Center. Two are in the area of image analysis. Of the other two, one is a mathematical simulation experiment and the other deals with the efficient transfer of data between distantly separated processors in the MPP array. The first algorithm presented is the automatic determination of elevations from stereo pairs. The second algorithm solves mathematical logistic equations capable of producing both ordered and chaotic (or random) solutions. This work can potentially lead to the simulation of artificial life processes. The third algorithm is the automatic segmentation of images into reasonable regions based on some similarity criterion, while the fourth is an implementation of a bitonic sort of data which significantly overcomes the nearest neighbor interconnection constraints on the MPP for transferring data between distant processors.

  10. Method and structure for skewed block-cyclic distribution of lower-dimensional data arrays in higher-dimensional processor grids

    DOEpatents

    Chatterjee, Siddhartha; Gunnels, John A.

    2011-11-08

    A method and structure of distributing elements of an array of data in a computer memory to a specific processor of a multi-dimensional mesh of parallel processors includes designating a distribution of elements of at least a portion of the array to be executed by specific processors in the multi-dimensional mesh of parallel processors. The pattern of the designating includes a cyclical repetitive pattern of the parallel processor mesh, as modified to have a skew in at least one dimension so that both a row of data in the array and a column of data in the array map to respective contiguous groupings of the processors such that a dimension of the contiguous groupings is greater than one.

  11. Parallel-pipeline 2-D DCT/IDCT processor chip

    NASA Astrophysics Data System (ADS)

    Ruiz, G. A.; Michell, J. A.; Buron, A.

    2005-06-01

    This paper describes the architecture of an 8x8 2-D DCT/IDCT processor with high throughput and a cost-effective architecture. The 2D DCT/IDCT is calculated using the separability property, so that its architecture is made up of two 1-D processors and a transpose buffer (TB) as intermediate memory. This transpose buffer presents a regular structure based on D-type flip-flops with a double serial input/output data-flow very adequate for pipeline architectures. The processor has been designed with parallel and pipeline architecture to attain high throughput, reduced hardware and maximum efficiency in all arithmetic elements. This architecture allows that the processing elements and arithmetic units work in parallel at half the frequency of the data input rate, except for normalization of transform which it is done in a multiplier operating at maximum frequency. Moreover, it has been verified that the precision analysis of the proposed processor meets the demands of IEEE Std. 1180-1990 used in video codecs ITU-T H.261 and ITU-T H.263. This processor has been conceived using a standard cell design methodology and manufactured in a 0.35-μm CMOS CSD 3M/2P 3.3V process. It has an area of 6.25 mm2 (the core is 3mm2) and contains a total of 11.7k gates, of which 5.8k gates are flip-flops. A data input rate frequency of 300MHz has been established with a latency of 172 cycles for the 2-D DCT and 178 cycles for the 2-D IDCT. The computing time of a block is close to 580ns. Its performances in computing speed as well as hardware complexity indicate that the proposed design is suitable for HDTV applications.

  12. Photorefractive phased array antenna beam-forming processor

    NASA Astrophysics Data System (ADS)

    Sarto, Anthony W.; Wagner, Kelvin H.; Weverka, Robert T.; Blair, Steven M.; Weaver, Samuel P.

    1996-11-01

    A high bandwidth, large degree-of-freedom photorefractive phased-array antenna beam-forming processor which uses 3D dynamic volume holograms in photorefractive crystals to time integrate the adaptive weights to perform beam steering and jammer-cancellation signal-processing tasks is described. The processor calculates the angle-of-arrival of a desired signal of interest and steers the antenna pattern in the direction of this desired signal by forming a dynamic holographic grating proportional to the correlation between the incoming signal of interest from the antenna array and the temporal waveform of the desired signal. Experimental results of main-beam formation and measured array-functions are presented in holographic index grating and the resulting processor output.

  13. Hardware reconfiguration for fault-tolerant processor arrays

    SciTech Connect

    Chean, M.

    1989-01-01

    In large VLSI/WSI arrays, improved reliability and yield can be obtained through reconfiguration techniques. In fault tolerance design, redundancy is used to offset faults when they occur in the arrays. Since redundant components are themselves susceptible to faults, their number must be a minimum. This also implies that an efficient reconfiguration scheme is preferred, i.e., one that can use as many spare components as possible so that unnecessary waste of spares is reduced. In this thesis, hardware reconfiguration for fault-tolerant processor arrays is discussed. First, a taxonomy for reconfiguration techniques is introduced, and several schemes are surveyed and classified. This taxonomy can be used to introduce, explain, compare, study, and classify new reconfiguration schemes. Next, an extension to reconfiguration technique is presented. Two special cases of the scheme are simulated and their results compared and studied. Finally, a new approach to hardware reconfiguration, called FUSS (Full Use of Suitable Spares), is proposed for VLSI/WSI fault-tolerant processor arrays. FUSS uses an indicator vector, the surplus vector, to guide the replacement of faulty processors within an array. Analytical study of the general FUSS algorithm shows that a linear relationship between the array size and the area of interconnect is required for the reconfiguration to be 100% successful. In an instance of FUSS, called simple FUSS, reconfiguration is done by simply shifting up or down faulty processors along their corresponding columns according to the surplus vector's entries. The surplus vector is progressively updated after each column is reconfigured. The reconfiguration is successful when the surplus vector becomes the null vector. Simulations show that when the number of faulty processors is equal to that of spare processors, simple FUSS can achieve a probability of survival as high as 99%

  14. Ring-array processor distribution topology for optical interconnects

    NASA Technical Reports Server (NTRS)

    Li, Yao; Ha, Berlin; Wang, Ting; Wang, Sunyu; Katz, A.; Lu, X. J.; Kanterakis, E.

    1992-01-01

    The existing linear and rectangular processor distribution topologies for optical interconnects, although promising in many respects, cannot solve problems such as clock skews, the lack of supporting elements for efficient optical implementation, etc. The use of a ring-array processor distribution topology, however, can overcome these problems. Here, a study of the ring-array topology is conducted with an aim of implementing various fast clock rate, high-performance, compact optical networks for digital electronic multiprocessor computers. Practical design issues are addressed. Some proof-of-principle experimental results are included.

  15. Optimal mapping of irregular finite element domains to parallel processors

    NASA Technical Reports Server (NTRS)

    Flower, J.; Otto, S.; Salama, M.

    1987-01-01

    Mapping the solution domain of n-finite elements into N-subdomains that may be processed in parallel by N-processors is an optimal one if the subdomain decomposition results in a well-balanced workload distribution among the processors. The problem is discussed in the context of irregular finite element domains as an important aspect of the efficient utilization of the capabilities of emerging multiprocessor computers. Finding the optimal mapping is an intractable combinatorial optimization problem, for which a satisfactory approximate solution is obtained here by analogy to a method used in statistical mechanics for simulating the annealing process in solids. The simulated annealing analogy and algorithm are described, and numerical results are given for mapping an irregular two-dimensional finite element domain containing a singularity onto the Hypercube computer.

  16. Analog parallel processor hardware for high speed pattern recognition

    NASA Technical Reports Server (NTRS)

    Daud, T.; Tawel, R.; Langenbacher, H.; Eberhardt, S. P.; Thakoor, A. P.

    1990-01-01

    A VLSI-based analog processor for fully parallel, associative, high-speed pattern matching is reported. The processor consists of two main components: an analog memory matrix for storage of a library of patterns, and a winner-take-all (WTA) circuit for selection of the stored pattern that best matches an input pattern. An inner product is generated between the input vector and each of the stored memories. The resulting values are applied to a WTA network for determination of the closest match. Patterns with up to 22 percent overlap are successfully classified with a WTA settling time of less than 10 microsec. Applications such as star pattern recognition and mineral classification with bounded overlap patterns have been successfully demonstrated. This architecture has a potential for an overall pattern matching speed in excess of 10 exp 9 bits per second for a large memory.

  17. The language parallel Pascal and other aspects of the massively parallel processor

    NASA Technical Reports Server (NTRS)

    Reeves, A. P.; Bruner, J. D.

    1982-01-01

    A high level language for the Massively Parallel Processor (MPP) was designed. This language, called Parallel Pascal, is described in detail. A description of the language design, a description of the intermediate language, Parallel P-Code, and details for the MPP implementation are included. Formal descriptions of Parallel Pascal and Parallel P-Code are given. A compiler was developed which converts programs in Parallel Pascal into the intermediate Parallel P-Code language. The code generator to complete the compiler for the MPP is being developed independently. A Parallel Pascal to Pascal translator was also developed. The architecture design for a VLSI version of the MPP was completed with a description of fault tolerant interconnection networks. The memory arrangement aspects of the MPP are discussed and a survey of other high level languages is given.

  18. Particle simulation of plasmas on the massively parallel processor

    NASA Technical Reports Server (NTRS)

    Gledhill, I. M. A.; Storey, L. R. O.

    1987-01-01

    Particle simulations, in which collective phenomena in plasmas are studied by following the self consistent motions of many discrete particles, involve several highly repetitive sets of calculations that are readily adaptable to SIMD parallel processing. A fully electromagnetic, relativistic plasma simulation for the massively parallel processor is described. The particle motions are followed in 2 1/2 dimensions on a 128 x 128 grid, with periodic boundary conditions. The two dimensional simulation space is mapped directly onto the processor network; a Fast Fourier Transform is used to solve the field equations. Particle data are stored according to an Eulerian scheme, i.e., the information associated with each particle is moved from one local memory to another as the particle moves across the spatial grid. The method is applied to the study of the nonlinear development of the whistler instability in a magnetospheric plasma model, with an anisotropic electron temperature. The wave distribution function is included as a new diagnostic to allow simulation results to be compared with satellite observations.

  19. Phase space simulation of collisionless stellar systems on the massively parallel processor

    NASA Technical Reports Server (NTRS)

    White, Richard L.

    1987-01-01

    A numerical technique for solving the collisionless Boltzmann equation describing the time evolution of a self gravitating fluid in phase space was implemented on the Massively Parallel Processor (MPP). The code performs calculations for a two dimensional phase space grid (with one space and one velocity dimension). Some results from calculations are presented. The execution speed of the code is comparable to the speed of a single processor of a Cray-XMP. Advantages and disadvantages of the MPP architecture for this type of problem are discussed. The nearest neighbor connectivity of the MPP array does not pose a significant obstacle. Future MPP-like machines should have much more local memory and easier access to staging memory and disks in order to be effective for this type of problem.

  20. Bit-parallel arithmetic in a massively-parallel associative processor

    NASA Technical Reports Server (NTRS)

    Scherson, Isaac D.; Kramer, David A.; Alleyne, Brian D.

    1992-01-01

    A simple but powerful new architecture based on a classical associative processor model is presented. Algorithms for performing the four basic arithmetic operations both for integer and floating point operands are described. For m-bit operands, the proposed architecture makes it possible to execute complex operations in O(m) cycles as opposed to O(m exp 2) for bit-serial machines. A word-parallel, bit-parallel, massively-parallel computing system can be constructed using this architecture with VLSI technology. The operation of this system is demonstrated for the fast Fourier transform and matrix multiplication.

  1. Digital signal processor and programming system for parallel signal processing

    SciTech Connect

    Van den Bout, D.E.

    1987-01-01

    This thesis describes an integrated assault upon the problem of designing high-throughput, low-cost digital signal-processing systems. The dual prongs of this assault consist of: (1) the design of a digital signal processor (DSP) which efficiently executes signal-processing algorithms in either a uniprocessor or multiprocessor configuration, (2) the PaLS programming system which accepts an arbitrary algorithm, partitions it across a group of DSPs, synthesizes an optimal communication link topology for the DSPs, and schedules the partitioned algorithm upon the DSPs. The results of applying a new quasi-dynamic analysis technique to a set of high-level signal-processing algorithms were used to determine the uniprocessor features of the DSP design. For multiprocessing applications, the DSP contains an interprocessor communications port (IPC) which supports simple, flexible, dataflow communications while allowing the total communication bandwidth to be incrementally allocated to achieve the best link utilization. The net result is a DSP with a simple architecture that is easy to program for both uniprocessor and multi-processor modes of operation. The PaLS programming system simplifies the task of parallelizing an algorithm for execution upon a multiprocessor built with the DSP.

  2. An informal introduction to program transformation and parallel processors

    SciTech Connect

    Hopkins, K.W.

    1994-08-01

    In the summer of 1992, I had the opportunity to participate in a Faculty Research Program at Argonne National Laboratory. I worked under Dr. Jim Boyle on a project transforming code written in pure functional Lisp to Fortran code to run on distributed-memory parallel processors. To perform this project, I had to learn three things: the transformation system, the basics of distributed-memory parallel machines, and the Lisp programming language. Each of these topics in computer science was unfamiliar to me as a mathematician, but I found that they (especially parallel processing) are greatly impacting many fields of mathematics and science. Since most mathematicians have some exposure to computers, but.certainly are not computer scientists, I felt it was appropriate to write a paper summarizing my introduction to these areas and how they can fit together. This paper is not meant to be a full explanation of the topics, but an informal introduction for the ``mathematical layman.`` I place myself in that category as well as my previous use of computers was as a classroom demonstration tool.

  3. Semantic network array processor and its applications to image understanding

    SciTech Connect

    Dixit, V.; Moldovan, D.I.

    1987-01-01

    The problems in computer vision range from edge detection and segmentation at the lowest level to the problem of cognition at the highest level. This correspondence describes the organization and operation of a semantic network array processor (SNAP) as applicable to high level computer vision problems. The architecture consists of an array of identical cells each containing a content addressable memory, microprogram control, and a communication unit. The applications discussed in this paper are the two general techniques, discrete relaxation and dynamic programming. While the discrete relaxation is discussed with reference to scene labeling and edge interpretation, the dynamic programming is tuned for stereo.

  4. Yield model for WSI restructurable homogeneous processor arrays

    SciTech Connect

    Lee, J.J.

    1987-01-01

    A model for a comprehensive yield analysis of WSI processor arrays was developed. The yield model is partitioned into three logically well-separated submodels: failures, testing and embedding. Failures are clearly divided into physical failures (defects) and logical failures (faults), and the relation is defined by the sensitivity of faults to defects. The completeness of cell-level testing is reflected in the yield model as a reduction coefficient, the probability that all the processor cells in the array are good. A new hierarchical definition of fault coverage is introduced to remove drawbacks in the traditional definition. A specific embedding program for two-dimensional arrays is designed based on the divide-and-conquer strategy, and then Monte Carlo simulation is conducted to show the effects of key parameters in the model. The embedding probability is defined from the subunit embedding probability and combining probability. Finally, those results are integrated successfully into a mathematical expression. The equation is used to study possible approaches for higher wafer yield.

  5. Massively parallel processor networks with optical express channels

    DOEpatents

    Deri, Robert J.; Brooks, III, Eugene D.; Haigh, Ronald E.; DeGroot, Anthony J.

    1999-01-01

    An optical method for separating and routing local and express channel data comprises interconnecting the nodes in a network with fiber optic cables. A single fiber optic cable carries both express channel traffic and local channel traffic, e.g., in a massively parallel processor (MPP) network. Express channel traffic is placed on, or filtered from, the fiber optic cable at a light frequency or a color different from that of the local channel traffic. The express channel traffic is thus placed on a light carrier that skips over the local intermediate nodes one-by-one by reflecting off of selective mirrors placed at each local node. The local-channel-traffic light carriers pass through the selective mirrors and are not reflected. A single fiber optic cable can thus be threaded throughout a three-dimensional matrix of nodes with the x,y,z directions of propagation encoded by the color of the respective light carriers for both local and express channel traffic. Thus frequency division multiple access is used to hierarchically separate the local and express channels to eliminate the bucket brigade latencies that would otherwise result if the express traffic had to hop between every local node to reach its ultimate destination.

  6. Massively parallel processor networks with optical express channels

    DOEpatents

    Deri, R.J.; Brooks, E.D. III; Haigh, R.E.; DeGroot, A.J.

    1999-08-24

    An optical method for separating and routing local and express channel data comprises interconnecting the nodes in a network with fiber optic cables. A single fiber optic cable carries both express channel traffic and local channel traffic, e.g., in a massively parallel processor (MPP) network. Express channel traffic is placed on, or filtered from, the fiber optic cable at a light frequency or a color different from that of the local channel traffic. The express channel traffic is thus placed on a light carrier that skips over the local intermediate nodes one-by-one by reflecting off of selective mirrors placed at each local node. The local-channel-traffic light carriers pass through the selective mirrors and are not reflected. A single fiber optic cable can thus be threaded throughout a three-dimensional matrix of nodes with the x,y,z directions of propagation encoded by the color of the respective light carriers for both local and express channel traffic. Thus frequency division multiple access is used to hierarchically separate the local and express channels to eliminate the bucket brigade latencies that would otherwise result if the express traffic had to hop between every local node to reach its ultimate destination. 3 figs.

  7. Nonlinear dynamics in a neural network (parallel) processor

    NASA Astrophysics Data System (ADS)

    Perera, A. G. Unil; Matsik, S. G.; Betarbet, S. R.

    1995-06-01

    We consider an iterative map derived from the device equations for a silicon p+-n-n+ diode, which simulates a biological neuron. This map has been extended to a coupled neuron circuit consisting of two of these artificial neurons connected by a filter circuit, which could be used as a single channel of a parallel asynchronous processor. The extended map output is studied under different conditions to determine the effect of various parameters on the pulsing pattern. As the control parameter is increased, fixed points (both stable and unstable) as well as a limit cycle appear. On further increase, a Hopf bifurcation is seen causing the disappearance of the limit cycle. The increasing control parameter, which is related to a decrease in the bias applied to the circuit, also causes variation in the location of the fixed points. This variation could be important in applications to neural networks. The control parameter value at which the fixed point appear and the bifurcation occurs can be varied by changing the weightage of the filter circuit. The modeling outputs, are compared with the experimental outputs.

  8. Parallel optical Walsh expansion in a pattern recognition preprocessor using planar microlens array

    NASA Astrophysics Data System (ADS)

    Murashige, Kimio; Akiba, Atsushi; Baba, Toshihiko; Iga, Kenichi

    1992-05-01

    A parallel optical processor developed for a pattern recognition system using a planar microlens array and a Walsh orthogonal expansion spatial filter is developed. The parallel optical Walsh expansion of multiple images made by the planar microlens array with good accuracy, which assures 99-percent recognition of simple numeral characters in the system, is demonstrated. A novel selection method of Walsh expansion coefficients is proposed in order to enlarge the tolerance of the recognition rate against the deformation of input patterns.

  9. Partitioning: An essential step in mapping algorithms into systolic array processors

    SciTech Connect

    Navarro, J.J.; Llaberia, J.M.; Valero, M.

    1987-07-01

    Many scientific and technical applications require high computing speed; those involving matrix computations are typical. For applications involving matrix computations, algorithmically specialized, high-performance, low-cost architectures have been conceived and implemented. Systolic array processors (SAPs) are a good example of these machines. An SAP is a regular array of simple processing elements (PEs) that have a nearest-neighbor interconnection pattern. The simplicity, modularity, and expandability of SAPs make them suitable for VLSI/WSI implementation. Algorithms that are efficiently executed on SAPs are called systolic algorithms (SAs). An SA uses an array of systolic cells whose parallel operations must be specified. When an SA is executed on an SAP, the specified computations of each cell are carried out by a PE of the SAP.

  10. Application of bistable optical logic gate arrays to all-optical digital parallel processing

    NASA Astrophysics Data System (ADS)

    Walker, A. C.

    1986-05-01

    Arrays of bistable optical gates can form the basis of an all-optical digital parallel processor. Two classes of signal input geometry exist - on- and off-axis - and lead to distinctly different device characteristics. The optical implementation of multisignal fan-in to an array of intrinsically bistable optical gates using the more efficient off-axis option is discussed together with the construction of programmable read/write memories from optically bistable devices. Finally the design of a demonstration all-optical parallel processor incorporating these concepts is presented.

  11. On nonlinear finite element analysis in single-, multi- and parallel-processors

    NASA Technical Reports Server (NTRS)

    Utku, S.; Melosh, R.; Islam, M.; Salama, M.

    1982-01-01

    Numerical solution of nonlinear equilibrium problems of structures by means of Newton-Raphson type iterations is reviewed. Each step of the iteration is shown to correspond to the solution of a linear problem, therefore the feasibility of the finite element method for nonlinear analysis is established. Organization and flow of data for various types of digital computers, such as single-processor/single-level memory, single-processor/two-level-memory, vector-processor/two-level-memory, and parallel-processors, with and without sub-structuring (i.e. partitioning) are given. The effect of the relative costs of computation, memory and data transfer on substructuring is shown. The idea of assigning comparable size substructures to parallel processors is exploited. Under Cholesky type factorization schemes, the efficiency of parallel processing is shown to decrease due to the occasional shared data, just as that due to the shared facilities.

  12. Design of a dataway processor for a parallel image signal processing system

    NASA Astrophysics Data System (ADS)

    Nomura, Mitsuru; Fujii, Tetsuro; Ono, Sadayasu

    1995-04-01

    Recently, demands for high-speed signal processing have been increasing especially in the field of image data compression, computer graphics, and medical imaging. To achieve sufficient power for real-time image processing, we have been developing parallel signal-processing systems. This paper describes a communication processor called 'dataway processor' designed for a new scalable parallel signal-processing system. The processor has six high-speed communication links (Dataways), a data-packet routing controller, a RISC CORE, and a DMA controller. Each communication link operates at 8-bit parallel in a full duplex mode at 50 MHz. Moreover, data routing, DMA, and CORE operations are processed in parallel. Therefore, sufficient throughput is available for high-speed digital video signals. The processor is designed in a top- down fashion using a CAD system called 'PARTHENON.' The hardware is fabricated using 0.5-micrometers CMOS technology, and its hardware is about 200 K gates.

  13. Serial multiplier arrays for parallel computation

    NASA Technical Reports Server (NTRS)

    Winters, Kel

    1990-01-01

    Arrays of systolic serial-parallel multiplier elements are proposed as an alternative to conventional SIMD mesh serial adder arrays for applications that are multiplication intensive and require few stored operands. The design and operation of a number of multiplier and array configurations featuring locality of connection, modularity, and regularity of structure are discussed. A design methodology combining top-down and bottom-up techniques is described to facilitate development of custom high-performance CMOS multiplier element arrays as well as rapid synthesis of simulation models and semicustom prototype CMOS components. Finally, a differential version of NORA dynamic circuits requiring a single-phase uncomplemented clock signal introduced for this application.

  14. Periodic Application of Concurrent Error Detection in Processor Array Architectures. PhD. Thesis -

    NASA Technical Reports Server (NTRS)

    Chen, Paul Peichuan

    1993-01-01

    Processor arrays can provide an attractive architecture for some applications. Featuring modularity, regular interconnection and high parallelism, such arrays are well-suited for VLSI/WSI implementations, and applications with high computational requirements, such as real-time signal processing. Preserving the integrity of results can be of paramount importance for certain applications. In these cases, fault tolerance should be used to ensure reliable delivery of a system's service. One aspect of fault tolerance is the detection of errors caused by faults. Concurrent error detection (CED) techniques offer the advantage that transient and intermittent faults may be detected with greater probability than with off-line diagnostic tests. Applying time-redundant CED techniques can reduce hardware redundancy costs. However, most time-redundant CED techniques degrade a system's performance.

  15. Smart-Pixel Array Processors Based on Optimal Cellular Neural Networks for Space Sensor Applications

    NASA Technical Reports Server (NTRS)

    Fang, Wai-Chi; Sheu, Bing J.; Venus, Holger; Sandau, Rainer

    1997-01-01

    A smart-pixel cellular neural network (CNN) with hardware annealing capability, digitally programmable synaptic weights, and multisensor parallel interface has been under development for advanced space sensor applications. The smart-pixel CNN architecture is a programmable multi-dimensional array of optoelectronic neurons which are locally connected with their local neurons and associated active-pixel sensors. Integration of the neuroprocessor in each processor node of a scalable multiprocessor system offers orders-of-magnitude computing performance enhancements for on-board real-time intelligent multisensor processing and control tasks of advanced small satellites. The smart-pixel CNN operation theory, architecture, design and implementation, and system applications are investigated in detail. The VLSI (Very Large Scale Integration) implementation feasibility was illustrated by a prototype smart-pixel 5x5 neuroprocessor array chip of active dimensions 1380 micron x 746 micron in a 2-micron CMOS technology.

  16. Track recognition in 4 [mu]s by a systolic trigger processor using a parallel Hough transform

    SciTech Connect

    Klefenz, F.; Noffz, K.H.; Conen, W.; Zoz, R.; Kugel, A. . Lehrstuhl fuer Informatik V); Maenner, R. . Lehrstuhl fuer Informatik V Univ. Heidelberg . Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen)

    1993-08-01

    A parallel Hough transform processor has been developed that identifies circular particle tracks in a 2D projection of the OPAL jet chamber. The high-speed requirements imposed by the 8 bunch crossing mode of LEP could be fulfilled by computing the starting angle and the radius of curvature for each well defined track in less than 4 [mu]s. The system consists of a Hough transform processor that determines well defined tracks, and a Euler processor that counts their number by applying the Euler relation to the thresholded result of the Hough transform. A prototype of a systolic processor has been built that handles one sector of the jet chamber. It consists of 35 [times] 32 processing elements that were loaded into 21 programmable gate arrays (XILINX). This processor runs at a clock rate of 40 MHz. It has been tested offline with about 1,000 original OPAL events. No deviations from the off-line simulation have been found. A trigger efficiency of 93% has been obtained. The prototype together with the associated drift time measurement unit has been installed at the OPAL detector at LEP and 100k events have been sampled to evaluate the system under detector conditions.

  17. Some parallel algorithms on the four processor Cray X-MP4 supercomputer

    SciTech Connect

    Kincaid, D.R.; Oppe, T.C.

    1988-05-01

    Three numerical studies of parallel algorithms on a four processor Cray X-MP4 supercomputer are presented. These numerical experiments involve the following: a parallel version of ITPACKV 2C, a package for solving large sparse linear systems, a parallel version of the conjugate gradient method with line Jacobi preconditioning, and several parallel algorithms for computing the LU-factorization of dense matrices. 27 refs., 4 tabs.

  18. An iterative expanding and shrinking process for processor allocation in mixed-parallel workflow scheduling.

    PubMed

    Huang, Kuo-Chan; Wu, Wei-Ya; Wang, Feng-Jian; Liu, Hsiao-Ching; Hung, Chun-Hao

    2016-01-01

    Parallel computation has been widely applied in a variety of large-scale scientific and engineering applications. Many studies indicate that exploiting both task and data parallelisms, i.e. mixed-parallel workflows, to solve large computational problems can get better efficacy compared with either pure task parallelism or pure data parallelism. Scheduling traditional workflows of pure task parallelism on parallel systems has long been known to be an NP-complete problem. Mixed-parallel workflow scheduling has to deal with an additional challenging issue of processor allocation. In this paper, we explore the processor allocation issue in scheduling mixed-parallel workflows of moldable tasks, called M-task, and propose an Iterative Allocation Expanding and Shrinking (IAES) approach. Compared to previous approaches, our IAES has two distinguishing features. The first is allocating more processors to the tasks on allocated critical paths for effectively reducing the makespan of workflow execution. The second is allowing the processor allocation of an M-task to shrink during the iterative procedure, resulting in a more flexible and effective process for finding better allocation. The proposed IAES approach has been evaluated with a series of simulation experiments and compared to several well-known previous methods, including CPR, CPA, MCPA, and MCPA2. The experimental results indicate that our IAES approach outperforms those previous methods significantly in most situations, especially when nodes of the same layer in a workflow might have unequal workloads.

  19. Mathematical and numerical models to achieve high speed with special-purpose parallel processors

    SciTech Connect

    Cheng, H.S.; Wulff, W.; Mallen, A.N.

    1986-07-01

    One simulation facility that has been developed is the BNL Plant Analyzer, currently set up for BWR plant simulations at up to seven times faster than real-time process speeds. The principal hardware components of the BNL Plant Analyzer are two units of special-purpose parallel processors, the AD10 of Applied Dynamics International and a PDP-11/34 host computer. The AD10 is specifically designed for time-critical system simulations, utilizing the modern parallel processing technology with pipeline architecture. The simulator employs advanced modeling techniques and efficient integration techniques in conjunction with the parallel processors to achieve high speed performance.

  20. Effects of error sources on the parallelism of an optical matrix-vector processor

    NASA Technical Reports Server (NTRS)

    Perlee, Caroline J.; Casasent, David P.

    1990-01-01

    The error sources in a high accuracy optical matrix-vector processor are analyzed by numerical simulation in terms of their effects on the parallelism and speed of the processor. These effects are detailed for radices -2, -4 and -8. Radix -4 is shown to provide maximum parallel processing capabilities under the effects of the system's error sources. Processing speed is shown to be a function of matrix partitioning and the number of parallel processing channels. Consequently, radix -4 operation provides a higher processing speed than radix -2 and -8 for most matrix-vector multiplications when error source effects are considered.

  1. High speed vision processor with reconfigurable processing element array based on full-custom distributed memory

    NASA Astrophysics Data System (ADS)

    Chen, Zhe; Yang, Jie; Shi, Cong; Qin, Qi; Liu, Liyuan; Wu, Nanjian

    2016-04-01

    In this paper, a hybrid vision processor based on a compact full-custom distributed memory for near-sensor high-speed image processing is proposed. The proposed processor consists of a reconfigurable processing element (PE) array, a row processor (RP) array, and a dual-core microprocessor. The PE array includes two-dimensional processing elements with a compact full-custom distributed memory. It supports real-time reconfiguration between the PE array and the self-organized map (SOM) neural network. The vision processor is fabricated using a 0.18 µm CMOS technology. The circuit area of the distributed memory is reduced markedly into 1/3 of that of the conventional memory so that the circuit area of the vision processor is reduced by 44.2%. Experimental results demonstrate that the proposed design achieves correct functions.

  2. JESPP: Joint Experimentation on Scalable Parallel Processors Supercomputers

    DTIC Science & Technology

    2010-03-01

    experiment needs, taught a course on its use, improved logging via the JLogger system , enabled faster analysis via better data management and investigated...consequences ( gaming systems ). (Gottschalk, 2005) One of the tasks of a researcher is to participate in the community in such a way as to pro- vide...Shortest Path search algorithm SQL Structured Query Language S/T Sensor Target STI Sony , Toshiba and IBM (Consortium for Cell Processor game CPUs) TDB

  3. Using algebra for massively parallel processor design and utilization

    NASA Technical Reports Server (NTRS)

    Campbell, Lowell; Fellows, Michael R.

    1990-01-01

    This paper summarizes the author's advances in the design of dense processor networks. Within is reported a collection of recent constructions of dense symmetric networks that provide the largest know values for the number of nodes that can be placed in a network of a given degree and diameter. The constructions are in the range of current potential engineering significance and are based on groups of automorphisms of finite-dimensional vector spaces.

  4. HVDC control system based on parallel digital signal processors

    SciTech Connect

    Maharsi, Y.; Do, V.Q.; Sood, V.K.; Casoria, S.; Belanger, J.

    1995-05-01

    A numerical HVDC control system operating in real time has been developed for a simulator to be used for operator training. The control system, implemented with digital signal processors (DSPs), consists of typical HVDC control functions such as the synchronizing unit, the regulation unit, the protection unit, the firing unit, the tap changer and the reactive power regulation unit. Results from the steady-state and the transient performance validation tests carried out on the IREQ power system simulator are provided.

  5. Evaluation of fault-tolerant parallel-processor architectures over long space missions

    NASA Technical Reports Server (NTRS)

    Johnson, Sally C.

    1989-01-01

    The impact of a five year space mission environment on fault-tolerant parallel processor architectures is examined. The target application is a Strategic Defense Initiative (SDI) satellite requiring 256 parallel processors to provide the computation throughput. The reliability requirements are that the system still be operational after five years with .99 probability and that the probability of system failure during one-half hour of full operation be less than 10(-7). The fault tolerance features an architecture must possess to meet these reliability requirements are presented, many potential architectures are briefly evaluated, and one candidate architecture, the Charles Stark Draper Laboratory's Fault-Tolerant Parallel Processor (FTPP) is evaluated in detail. A methodology for designing a preliminary system configuration to meet the reliability and performance requirements of the mission is then presented and demonstrated by designing an FTPP configuration.

  6. Highly parallel reconfigurable computer architecture for robotic computation having plural processor cells each having right and left ensembles of plural processors

    NASA Technical Reports Server (NTRS)

    Fijany, Amir (Inventor); Bejczy, Antal K. (Inventor)

    1994-01-01

    In a computer having a large number of single-instruction multiple data (SIMD) processors, each of the SIMD processors has two sets of three individual processor elements controlled by a master control unit and interconnected among a plurality of register file units where data is stored. The register files input and output data in synchronism with a minor cycle clock under control of two slave control units controlling the register file units connected to respective ones of the two sets of processor elements. Depending upon which ones of the register file units are enabled to store or transmit data during a particular minor clock cycle, the processor elements within an SIMD processor are connected in rings or in pipeline arrays, and may exchange data with the internal bus or with neighboring SIMD processors through interface units controlled by respective ones of the two slave control units.

  7. On fault-tolerant structure, distributed fault-diagnosis, reconfiguration, and recovery of the array processors

    SciTech Connect

    Hosseini, S.H.

    1989-07-01

    The increasing need for the design of high-performance computers has led to the design of special purpose computers such as array processors. This paper studies the design of fault-tolerant array processors. First, it is shown how hardware redundancy can be employed in the existing structures in order to make them capable of withstanding the failure of some of the array links and processors. Then distributed fault-tolerance schemes are introduced for the diagnosis of the faulty elements, reconfiguration, and recovery of the array. Fault tolerance is maintained by the cooperation of processors in a decentralized form of control without the participation of any type of hardcore or fault-free central controller such as a host computer.

  8. Parallel calculation of multi-electrode array correlation networks.

    PubMed

    Ribeiro, Pedro; Simonotto, Jennifer; Kaiser, Marcus; Silva, Fernando

    2009-11-15

    When calculating correlation networks from multi-electrode array (MEA) data, one works with extensive computations. Unfortunately, as the MEAs grow bigger, the time needed for the computation grows even more: calculating pair-wise correlations for current 60 channel systems can take hours on normal commodity computers whereas for future 1000 channel systems it would take almost 280 times as long, given that the number of pairs increases with the square of the number of channels. Even taking into account the increase of speed in processors, soon it can be unfeasible to compute correlations in a single computer. Parallel computing is a way to sustain reasonable calculation times in the future. We provide a general tool for rapid computation of correlation networks which was tested for: (a) a single computer cluster with 16 cores, (b) the Newcastle Condor System utilizing idle processors of university computers and (c) the inter-cluster, with 192 cores. Our reusable tool provides a simple interface for neuroscientists, automating data partition and job submission, and also allowing coding in any programming language. It is also sufficiently flexible to be used in other high-performance computing environments.

  9. Parallel processors and nonlinear structural dynamics algorithms and software

    NASA Technical Reports Server (NTRS)

    Belytschko, Ted; Gilbertsen, Noreen D.; Neal, Mark O.; Plaskacz, Edward J.

    1989-01-01

    The adaptation of a finite element program with explicit time integration to a massively parallel SIMD (single instruction multiple data) computer, the CONNECTION Machine is described. The adaptation required the development of a new algorithm, called the exchange algorithm, in which all nodal variables are allocated to the element with an exchange of nodal forces at each time step. The architectural and C* programming language features of the CONNECTION Machine are also summarized. Various alternate data structures and associated algorithms for nonlinear finite element analysis are discussed and compared. Results are presented which demonstrate that the CONNECTION Machine is capable of outperforming the CRAY XMP/14.

  10. Coupled cluster algorithms for networks of shared memory parallel processors

    NASA Astrophysics Data System (ADS)

    Bentz, Jonathan L.; Olson, Ryan M.; Gordon, Mark S.; Schmidt, Michael W.; Kendall, Ricky A.

    2007-05-01

    As the popularity of using SMP systems as the building blocks for high performance supercomputers increases, so too increases the need for applications that can utilize the multiple levels of parallelism available in clusters of SMPs. This paper presents a dual-layer distributed algorithm, using both shared-memory and distributed-memory techniques to parallelize a very important algorithm (often called the "gold standard") used in computational chemistry, the single and double excitation coupled cluster method with perturbative triples, i.e. CCSD(T). The algorithm is presented within the framework of the GAMESS [M.W. Schmidt, K.K. Baldridge, J.A. Boatz, S.T. Elbert, M.S. Gordon, J.J. Jensen, S. Koseki, N. Matsunaga, K.A. Nguyen, S. Su, T.L. Windus, M. Dupuis, J.A. Montgomery, General atomic and molecular electronic structure system, J. Comput. Chem. 14 (1993) 1347-1363]. (General Atomic and Molecular Electronic Structure System) program suite and the Distributed Data Interface [M.W. Schmidt, G.D. Fletcher, B.M. Bode, M.S. Gordon, The distributed data interface in GAMESS, Comput. Phys. Comm. 128 (2000) 190]. (DDI), however, the essential features of the algorithm (data distribution, load-balancing and communication overhead) can be applied to more general computational problems. Timing and performance data for our dual-level algorithm is presented on several large-scale clusters of SMPs.

  11. Performance evaluation and modeling techniques for parallel processors

    SciTech Connect

    Dimpsey, R.T.

    1992-01-01

    This thesis addresses the issue of application performance under real operational conditions. A technique is introduced which accurately models the behavior of an application in real workloads. The methodology can evaluate the performance of the application as well as predict the effects on performance of certain system design changes. The constructed model is based on measurements obtained during normal machine operation and captures various performance issues including multiprogramming and system overheads, and contentions for resources. Methodologies to measure multiprogramming overhead (MPO) are introduced and illustrated on an Alliant FX/8, an Alliant Fx/80, and the Cedar parallel supercomputer. The measurements collected suggest that multiprogramming and system overheads can significantly impact application performance. The mean MPO incurred by PERFECT benchmarks executing in real workloads on an Alliant FX/80 is found to consume 16% of the processing power. Flor applications executing Cedar, between 10% and 60% of the application completion time is attributable to overhead caused by multiprogramming. Measurements also identify a Cedar FORTRAN construct (SDOALL) which is susceptible to performance degradation due to multiprogramming. Using the MPO measurements, the application performance model discussed above is constructed for computationally bound, parallel jobs executing on an Alliant FX/80. It is shown that the model can predict application completion time under real workloads. This is illustrated with several examples from the Perfect Benchmark suite. It is also shown that the model can predict the performance impact of system design changes. For example, the completion times of applications under a new scheduling policy are predicted. The model-building methodology is then validated with a number of empirical experiments.

  12. Processing modes and parallel processors in producing familiar keying sequences.

    PubMed

    Verwey, Willem B

    2003-05-01

    Recent theorizing indicates that the acquisition of movement sequence skill involves the development of several independent sequence representations at the same time. To examine this for the discrete sequence production task, participants in Experiment 1 produced a highly practiced sequence of six key presses in two conditions that allowed little preparation so that interkey intervals were slowed. Analyses of the distributions of moderately slowed interkey intervals indicated that this slowing was caused by the occasional use of two slower processing modes, that probably rely on independent sequence representations, and by reduced parallel processing in the fastest processing mode. Experiment 2 addressed the role of intention for the fast production of familiar keying sequences. It showed that the participants, who were not aware they were executing familiar sequences in a somewhat different task, had no benefits of prior practice. This suggests that the mechanisms underlying sequencing skills are not automatically activated by mere execution of familiar sequences, and that some form of top-down, intentional control remains necessary.

  13. A parallel encryption algorithm for dual-core processor based on chaotic map

    NASA Astrophysics Data System (ADS)

    Liu, Jiahui; Song, Dahua; Xu, Yiqiu

    2011-12-01

    In this paper, we propose a parallel chaos-based encryption scheme in order to take advantage of the dual-core processor. The chaos-based cryptosystem is combinatorially generated by the logistic map and Fibonacci sequence. Fibonacci sequence is employed to convert the value of the logistic map to integer data. The parallel algorithm is designed with a master/slave communication model with the Message Passing Interface (MPI). The experimental results show that chaotic cryptosystem possesses good statistical properties, and the parallel algorithm provides more enhanced performance against the serial version of the algorithm. It is suitable for encryption/decryption large sensitive data or multimedia.

  14. Reduction of solar vector magnetograph data using a microMSP array processor

    NASA Technical Reports Server (NTRS)

    Kineke, Jack

    1990-01-01

    The processing of raw data obtained by the solar vector magnetograph at NASA-Marshall requires extensive arithmetic operations on large arrays of real numbers. The objectives of this summer faculty fellowship study are to: (1) learn the programming language of the MicroMSP Array Processor and adapt some existing data reduction routines to exploit its capabilities; and (2) identify other applications and/or existing programs which lend themselves to array processor utilization which can be developed by undergraduate student programmers under the provisions of project JOVE.

  15. Digital signal array processor for NSLS booster power supply upgrade

    SciTech Connect

    Olsen, R.; Dabrowski, J.; Murray, J.

    1993-07-01

    The booster at the NSLS is being upgraded from 0.75 to 2 pulses per second. To accomplish this, new power supplied for the dipole, quadrupole, and sextupole have been installed. This paper will outline the design and function of the digital signal processor used as the primary control element in the power supply control system.

  16. A site oriented supercomputer for theoretical physics: The Fermilab Advanced Computer Program Multi Array Processor System (ACMAPS)

    SciTech Connect

    Nash, T.; Atac, R.; Cook, A.; Deppe, J.; Fischler, M.; Gaines, I.; Husby, D.; Pham, T.; Zmuda, T.; Eichten, E.

    1989-03-06

    The ACPMAPS multipocessor is a highly cost effective, local memory parallel computer with a hypercube or compound hypercube architecture. Communication requires the attention of only the two communicating nodes. The design is aimed at floating point intensive, grid like problems, particularly those with extreme computing requirements. The processing nodes of the system are single board array processors, each with a peak power of 20 Mflops, supported by 8 Mbytes of data and 2 Mbytes of instruction memory. The system currently being assembled has a peak power of 5 Gflops. The nodes are based on the Weitek XL Chip set. The system delivers performance at approximately $300/Mflop. 8 refs., 4 figs.

  17. On job assignment for a parallel system of processor sharing queues

    SciTech Connect

    Bonomi, F. )

    1990-07-01

    Interest in the job assignment problem for parallel queues has been recently stimulated by research in the area of load balancing in distributed systems, where one is concerned with assigning tasks or processes to processors in order to achieve optimal system performance. However, most of the studies found in the literature refer to a system of parallel queues with FCFS service discipline, while it is well known that the processors sharing (PS) service discipline is often a better model for CPU scheduling in time-shared computer systems. In this paper, the authors underline some interesting peculiarities of the assignment problem with PS queues as compared to the usual case of the FCFS systems. Also, they propose an approach to the design of assignment algorithms which, in this case, produces solutions performing better than the well-known join-the-shortest-queue (JSQ) assignment rule.

  18. Reconfiguration Schemes for Fault-Tolerant Processor Arrays

    DTIC Science & Technology

    1992-10-15

    class. processor- replaced by time redundancy to improve ,- witching class, local-redundancy class. the silicon area in FPAs. Recoufguraiowl 1...connection links of a pair of eveni and odd, fo/,-j. Then each switch will route the incoming mes- ,; witches for an ICube and IADM network of size N...i) = of the functions AC, and AC,. In practice, evenj and odd. C,-(j, f,-) = fol-. Thus, the address of the destination . witches can be identical and

  19. Construction of a parallel processor for simulating manipulators and other mechanical systems

    NASA Technical Reports Server (NTRS)

    Hannauer, George

    1991-01-01

    This report summarizes the results of NASA Contract NAS5-30905, awarded under phase 2 of the SBIR Program, for a demonstration of the feasibility of a new high-speed parallel simulation processor, called the Real-Time Accelerator (RTA). The principal goals were met, and EAI is now proceeding with phase 3: development of a commercial product. This product is scheduled for commercial introduction in the second quarter of 1992.

  20. Parallel Implementation of the Wideband DOA Algorithm on the IBM Cell BE Processor

    DTIC Science & Technology

    2010-05-01

    Abstract—The Multiple Signal Classification ( MUSIC ) algorithm is a powerful technique for determining the Direction of Arrival (DOA) of signals...Broadband Engine Processor (Cell BE). The process of adapting the serial based MUSIC algorithm to the Cell BE will be analyzed in terms of parallelism and...using Multiple Signal Classification MUSIC algorithm [4] • Computation of Focus matrix • Computation of number of sources • Separation of Signal

  1. Array distribution in data-parallel programs

    NASA Technical Reports Server (NTRS)

    Chatterjee, Siddhartha; Gilbert, John R.; Schreiber, Robert; Sheffler, Thomas J.

    1994-01-01

    We consider distribution at compile time of the array data in a distributed-memory implementation of a data-parallel program written in a language like Fortran 90. We allow dynamic redistribution of data and define a heuristic algorithmic framework that chooses distribution parameters to minimize an estimate of program completion time. We represent the program as an alignment-distribution graph. We propose a divide-and-conquer algorithm for distribution that initially assigns a common distribution to each node of the graph and successively refines this assignment, taking computation, realignment, and redistribution costs into account. We explain how to estimate the effect of distribution on computation cost and how to choose a candidate set of distributions. We present the results of an implementation of our algorithms on several test problems.

  2. Aligning parallel arrays to reduce communication

    NASA Technical Reports Server (NTRS)

    Sheffler, Thomas J.; Schreiber, Robert; Gilbert, John R.; Chatterjee, Siddhartha

    1994-01-01

    Axis and stride alignment is an important optimization in compiling data-parallel programs for distributed-memory machines. We previously developed an optimal algorithm for aligning array expressions. Here, we examine alignment for more general program graphs. We show that optimal alignment is NP-complete in this setting, so we study heuristic methods. This paper makes two contributions. First, we show how local graph transformations can reduce the size of the problem significantly without changing the best solution. This allows more complex and effective heuristics to be used. Second, we give a heuristic that can explore the space of possible solutions in a number of ways. We show that some of these strategies can give better solutions than a simple greedy approach proposed earlier. Our algorithms have been implemented; we present experimental results showing their effect on the performance of some example programs running on the CM-5.

  3. Data flow analysis of a highly parallel processor for a level 1 pixel trigger

    SciTech Connect

    Cancelo, G.; Gottschalk, Erik Edward; Pavlicek, V.; Wang, M.; Wu, J.

    2003-01-01

    The present work describes the architecture and data flow analysis of a highly parallel processor for the Level 1 Pixel Trigger for the BTeV experiment at Fermilab. First the Level 1 Trigger system is described. Then the major components are analyzed by resorting to mathematical modeling. Also, behavioral simulations are used to confirm the models. Results from modeling and simulations are fed back into the system in order to improve the architecture, eliminate bottlenecks, allocate sufficient buffering between processes and obtain other important design parameters. An interesting feature of the current analysis is that the models can be extended to a large class of architectures and parallel systems.

  4. Parallel microfluidic arrays for SPRi detection

    NASA Astrophysics Data System (ADS)

    Ouellet, Eric; Lausted, Christopher; Lin, Tao; Yang, Cheng-Wei; Hood, Leroy; Lagally, Eric T.

    2010-04-01

    Surface Plasmon Resonance imaging (SPRi) is a label-free technique for the quantitation of binding affinities and concentrations for a wide variety of target molecules. Although SPRi is capable of determining binding constants for multiple ligands in parallel, current commercial instruments are limited to a single analyte stream and a limited number of ligand spots. Measurement of target concentration also requires the serial introduction of different target concentrations; such repeated experiments are conducted manually and are therefore time-intensive. Likewise, the equilibrium determination of concentration for known binding affinity requires long times due to diffusion-limited kinetics to a surface-immobilized ligand. We have developed an integrated microfluidic array using soft lithography techniques for SPRi-based detection and determination of binding affinities for DNA aptamers against human alphathrombin. The device consists of 264 element-addressable chambers of 700 pL each isolated by microvalves. The device also contains a dilution network for simultaneous interrogation of up to six different target concentrations, further speeding detection times. The element-addressable design of the array allows interrogation of multiple ligands against multiple targets, and analytes from individual chambers may be collected for downstream analysis.

  5. Parallel microfluidic arrays for SPRi detection

    NASA Astrophysics Data System (ADS)

    Ouellet, Eric; Lausted, Christopher; Hood, Leroy; Lagally, Eric T.

    2008-08-01

    Surface Plasmon Resonance imaging (SPRi) is a label-free technique for the quantitation of binding affinities and concentrations for a wide variety of target molecules. Although SPRi is capable of determining binding constants for multiple ligands in parallel, current commercial instruments are limited to a single analyte stream and a limited number of ligand spots. Measurement of target concentration also requires the serial introduction of different target concentrations; such repeated experiments are conducted manually and are therefore time-intensive. Likewise, the equilibrium determination of concentration for known binding affinity requires long times due to diffusion-limited kinetics to a surface-immobilized ligand. We have developed an integrated microfluidic array using soft lithography techniques for SPRi-based detection and determination of binding affinities for DNA aptamers against human alphathrombin. The device consists of 264 element-addressable chambers isolated by microvalves. The resulting 700 pL volumes surrounding each ligand spot promise to decrease measurement time through reaction rate-limited kinetics. The device also contains a dilution network for simultaneous interrogation of up to six different target concentrations, further speeding detection times. Finally, the element-addressable design of the array allows interrogation of multiple ligands against multiple targets.

  6. Parallel fabrication of plasmonic nanocone sensing arrays.

    PubMed

    Horrer, Andreas; Schäfer, Christian; Broch, Katharina; Gollmer, Dominik A; Rogalski, Jan; Fulmes, Julia; Zhang, Dai; Meixner, Alfred J; Schreiber, Frank; Kern, Dieter P; Fleischer, Monika

    2013-12-09

    A fully parallel approach for the fabrication of arrays of metallic nanocones and triangular nanopyramids is presented. Different processes utilizing nanosphere lithography for the creation of etch masks are developed. Monolayers of spheres are reduced in size and directly used as masks, or mono- and double layers are employed as templates for the deposition of aluminum oxide masks. The masks are transferred into an underlying gold or silver layer by argon ion milling, which leads to nanocones or nanopyramids with very sharp tips. Near the tips the enhancement of an external electromagnetic field is particularly strong. This fact is confirmed by numerical simulations and by luminescence imaging in a confocal microscope. Such localized strong fields can amongst others be utilized for high-resolution, high-sensitivity spectroscopy and sensing of molecules near the tip. Arrays of such plasmonic nanostructures thus constitute controllable platforms for surface-enhanced Raman spectroscopy. A thin film of pentacene molecules is evaporated onto both nanocone and nanopyramid substrates, and the observed Raman enhancement is evaluated.

  7. Overview of Enhanced Data Stream Array Processor (EDSAP)

    DTIC Science & Technology

    1993-05-06

    24-bit arithmetic instead of 16 -bit), and more than 10 GOPS of processing capability will be available to create high-resolution (I ft) fully...Hex PE Board 15 4.2. 16 -Processing-Element Board 17 5. VME INTERFACE 19 5.1. Diagnostics and Software/Hardware Debugging 20 5.2. Clock Generation 21 6...board. 15 4-2 Floor plan of hex PE board. 16 5-1 Block diagram of VME interface. 19 6-1 Example of a radar interface. 23 A-1 SAR processor. 35 B-1

  8. Redundant disk arrays: Reliable, parallel secondary storage. Ph.D. Thesis

    NASA Technical Reports Server (NTRS)

    Gibson, Garth Alan

    1990-01-01

    During the past decade, advances in processor and memory technology have given rise to increases in computational performance that far outstrip increases in the performance of secondary storage technology. Coupled with emerging small-disk technology, disk arrays provide the cost, volume, and capacity of current disk subsystems, by leveraging parallelism, many times their performance. Unfortunately, arrays of small disks may have much higher failure rates than the single large disks they replace. Redundant arrays of inexpensive disks (RAID) use simple redundancy schemes to provide high data reliability. The data encoding, performance, and reliability of redundant disk arrays are investigated. Organizing redundant data into a disk array is treated as a coding problem. Among alternatives examined, codes as simple as parity are shown to effectively correct single, self-identifying disk failures.

  9. Implementation of context independent code on a new array processor: The Super-65

    NASA Technical Reports Server (NTRS)

    Colbert, R. O.; Bowhill, S. A.

    1981-01-01

    The feasibility of rewriting standard uniprocessor programs into code which contains no context-dependent branches is explored. Context independent code (CIC) would contain no branches that might require different processing elements to branch different ways. In order to investigate the possibilities and restrictions of CIC, several programs were recoded into CIC and a four-element array processor was built. This processor (the Super-65) consisted of three 6502 microprocessors and the Apple II microcomputer. The results obtained were somewhat dependent upon the specific architecture of the Super-65 but within bounds, the throughput of the array processor was found to increase linearly with the number of processing elements (PEs). The slope of throughput versus PEs is highly dependent on the program and varied from 0.33 to 1.00 for the sample programs.

  10. Interconnection arrangement of routers of processor boards in array of cabinets supporting secure physical partition

    DOEpatents

    Tomkins, James L.; Camp, William J.

    2007-07-17

    A multiple processor computing apparatus includes a physical interconnect structure that is flexibly configurable to support selective segregation of classified and unclassified users. The physical interconnect structure includes routers in service or compute processor boards distributed in an array of cabinets connected in series on each board and to respective routers in neighboring row cabinet boards with the routers in series connection coupled to routers in series connection in respective neighboring column cabinet boards. The array can include disconnect cabinets or respective routers in all boards in each cabinet connected in a toroid. The computing apparatus can include an emulator which permits applications from the same job to be launched on processors that use different operating systems.

  11. Fast structural design and analysis via hybrid domain decomposition on massively parallel processors

    NASA Technical Reports Server (NTRS)

    Farhat, Charbel

    1993-01-01

    A hybrid domain decomposition framework for static, transient and eigen finite element analyses of structural mechanics problems is presented. Its basic ingredients include physical substructuring and /or automatic mesh partitioning, mapping algorithms, 'gluing' approximations for fast design modifications and evaluations, and fast direct and preconditioned iterative solvers for local and interface subproblems. The overall methodology is illustrated with the structural design of a solar viewing payload that is scheduled to fly in March 1993. This payload has been entirely designed and validated by a group of undergraduate students at the University of Colorado using the proposed hybrid domain decomposition approach on a massively parallel processor. Performance results are reported on the CRAY Y-MP/8 and the iPSC-860/64 Touchstone systems, which represent both extreme parallel architectures. The hybrid domain decomposition methodology is shown to outperform leading solution algorithms and to exhibit an excellent parallel scalability.

  12. Mist collection on parallel fiber arrays

    NASA Astrophysics Data System (ADS)

    Labbé, Romain; Duprat, Camille

    2016-11-01

    Fog is an important source of fresh water in specific arid regions such as the Atacama Desert in Chile. The method used to collect water passively from fog, either for domestic consumption or research purposes, consists in erecting large porous fiber nets on which the mist droplets impact. The two main mechanisms involved with this process are the impact of the drops on the fibers and the drainage of the fluid from the net, while the main limiting factor is the clogging of the mesh by accumulated water. We consider a novel collection system, made of an array of parallel fibers, that we study experimentally with a wind mist tunnel. In addition, we develop theoretical models considering the coupling of wind flow, droplet trajectories and wetting of the fibers. We find that the collection efficiency strongly depends on the size and distribution of the drops formed on the fibers, and thus on the fibers diameter, inclination angle and wetting properties. In particular, we show that the collection efficiency is greater when large drops are formed on the fibers. By adjusting the fibers diameter and the inter-fiber spacing, we look for an optimal structure that maximizes the collection surface and the drainage, while avoiding flow deviations.

  13. Optimal piecewise linear schedules for LSGP- and LPGS-decomposed array processors via quadratic programming

    NASA Astrophysics Data System (ADS)

    Zimmermann, Karl-Heinz; Achtziger, Wolfgang

    2001-09-01

    The size of a systolic array synthesized from a uniform recurrence equation, whose computations are mapped by a linear function to the processors, matches the problem size. In practice, however, there exist several limiting factors on the array size. There are two dual schemes available to derive arrays of smaller size from large-size systolic arrays based on the partitioning of the large-size arrays into subarrays. In LSGP, the subarrays are clustered one-to-one into the processors of a small-size array, while in LPGS, the subarrays are serially assigned to a reduced-size array. In this paper, we propose a common methodology for both LSGP and LPGS based on polyhedral partitionings of large-size k-dimensional systolic arrays which are synthesized from n-dimensional uniform recurrences by linear mappings for allocation and timing. In particular, we address the optimization problem of finding optimal piecewise linear timing functions for small-size arrays. These are mappings composed of linear timing functions for the computations of the subarrays. We study a continuous approximation of this problem by passing from piecewise linear to piecewise quasi-linear timing functions. The resultant problem formulation is then a quadratic programming problem which can be solved by standard algorithms for nonlinear optimization problems.

  14. Parallel implementation of RX anomaly detection on multi-core processors: impact of data partitioning strategies

    NASA Astrophysics Data System (ADS)

    Molero, Jose M.; Garzón, Ester M.; García, Inmaculada; Plaza, Antonio

    2011-11-01

    Anomaly detection is an important task for remotely sensed hyperspectral data exploitation. One of the most widely used and successful algorithms for anomaly detection in hyperspectral images is the Reed-Xiaoli (RX) algorithm. Despite its wide acceptance and high computational complexity when applied to real hyperspectral scenes, few documented parallel implementations of this algorithm exist, in particular for multi-core processors. The advantage of multi-core platforms over other specialized parallel architectures is that they are a low-power, inexpensive, widely available and well-known technology. A critical issue in the parallel implementation of RX is the sample covariance matrix calculation, which can be approached in global or local fashion. This aspect is crucial for the RX implementation since the consideration of a local or global strategy for the computation of the sample covariance matrix is expected to affect both the scalability of the parallel solution and the anomaly detection results. In this paper, we develop new parallel implementations of the RX in multi-core processors and specifically investigate the impact of different data partitioning strategies when parallelizing its computations. For this purpose, we consider both global and local data partitioning strategies in the spatial domain of the scene, and further analyze their scalability in different multi-core platforms. The numerical effectiveness of the considered solutions is evaluated using receiver operating characteristics (ROC) curves, analyzing their capacity to detect thermal hot spots (anomalies) in hyperspectral data collected by the NASA's Airborne Visible Infra- Red Imaging Spectrometer system over the World Trade Center in New York, five days after the terrorist attacks of September 11th, 2001.

  15. Series-parallel method of direct solar array regulation

    NASA Technical Reports Server (NTRS)

    Gooder, S. T.

    1976-01-01

    A 40 watt experimental solar array was directly regulated by shorting out appropriate combinations of series and parallel segments of a solar array. Regulation switches were employed to control the array at various set-point voltages between 25 and 40 volts. Regulation to within + or - 0.5 volt was obtained over a range of solar array temperatures and illumination levels as an active load was varied from open circuit to maximum available power. A fourfold reduction in regulation switch power dissipation was achieved with series-parallel regulation as compared to the usual series-only switching for direct solar array regulation.

  16. Stochastic simulation of charged particle transport on the massively parallel processor

    NASA Technical Reports Server (NTRS)

    Earl, James A.

    1988-01-01

    Computations of cosmic-ray transport based upon finite-difference methods are afflicted by instabilities, inaccuracies, and artifacts. To avoid these problems, researchers developed a Monte Carlo formulation which is closely related not only to the finite-difference formulation, but also to the underlying physics of transport phenomena. Implementations of this approach are currently running on the Massively Parallel Processor at Goddard Space Flight Center, whose enormous computing power overcomes the poor statistical accuracy that usually limits the use of stochastic methods. These simulations have progressed to a stage where they provide a useful and realistic picture of solar energetic particle propagation in interplanetary space.

  17. Estimating water flow through a hillslope using the massively parallel processor

    NASA Technical Reports Server (NTRS)

    Devaney, Judy E.; Camillo, P. J.; Gurney, R. J.

    1988-01-01

    A new two-dimensional model of water flow in a hillslope has been implemented on the Massively Parallel Processor at the Goddard Space Flight Center. Flow in the soil both in the saturated and unsaturated zones, evaporation and overland flow are all modelled, and the rainfall rates are allowed to vary spatially. Previous models of this type had always been very limited computationally. This model takes less than a minute to model all the components of the hillslope water flow for a day. The model can now be used in sensitivity studies to specify which measurements should be taken and how accurate they should be to describe such flows for environmental studies.

  18. Block iterative restoration of astronomical images with the massively parallel processor

    NASA Technical Reports Server (NTRS)

    Heap, Sara R.; Lindler, Don J.

    1987-01-01

    A method is described for algebraic image restoration capable of treating astronomical images. For a typical 500 x 500 image, direct algebraic restoration would require the solution of a 250,000 x 250,000 linear system. The block iterative approach is used to reduce the problem to solving 4900 121 x 121 linear systems. The algorithm was implemented on the Goddard Massively Parallel Processor, which can solve a 121 x 121 system in approximately 0.06 seconds. Examples are shown of the results for various astronomical images.

  19. A 1,000 Frames/s Programmable Vision Chip with Variable Resolution and Row-Pixel-Mixed Parallel Image Processors

    PubMed Central

    Lin, Qingyu; Miao, Wei; Zhang, Wancheng; Fu, Qiuyu; Wu, Nanjian

    2009-01-01

    A programmable vision chip with variable resolution and row-pixel-mixed parallel image processors is presented. The chip consists of a CMOS sensor array, with row-parallel 6-bit Algorithmic ADCs, row-parallel gray-scale image processors, pixel-parallel SIMD Processing Element (PE) array, and instruction controller. The resolution of the image in the chip is variable: high resolution for a focused area and low resolution for general view. It implements gray-scale and binary mathematical morphology algorithms in series to carry out low-level and mid-level image processing and sends out features of the image for various applications. It can perform image processing at over 1,000 frames/s (fps). A prototype chip with 64 × 64 pixels resolution and 6-bit gray-scale image is fabricated in 0.18 μm Standard CMOS process. The area size of chip is 1.5 mm × 3.5 mm. Each pixel size is 9.5 μm × 9.5 μm and each processing element size is 23 μm × 29 μm. The experiment results demonstrate that the chip can perform low-level and mid-level image processing and it can be applied in the real-time vision applications, such as high speed target tracking. PMID:22454565

  20. The full-use-of-suitable-spares (FUSS) approach to hardware reconfiguration for fault-tolerant processor arrays

    SciTech Connect

    Chean, M. ); Fortes, J.A.B. . School of Electrical Engineering)

    1990-04-01

    A general approach to hardware reconfiguration is proposed for VLSI/WSI fault-tolerant processor arrays. The technique, called FUSS (full use of suitable spares), uses an indicator vector, the surplus vector, to guide the replacement of faulty processors within an array. Analytical study of the general FUSS algorithm shows that there is a linear relationship between the array size and the area of interconnect required for reconfiguration to be 100% successful.

  1. Animated computer graphics models of space and earth sciences data generated via the massively parallel processor

    NASA Technical Reports Server (NTRS)

    Treinish, Lloyd A.; Gough, Michael L.; Wildenhain, W. David

    1987-01-01

    The capability was developed of rapidly producing visual representations of large, complex, multi-dimensional space and earth sciences data sets via the implementation of computer graphics modeling techniques on the Massively Parallel Processor (MPP) by employing techniques recently developed for typically non-scientific applications. Such capabilities can provide a new and valuable tool for the understanding of complex scientific data, and a new application of parallel computing via the MPP. A prototype system with such capabilities was developed and integrated into the National Space Science Data Center's (NSSDC) Pilot Climate Data System (PCDS) data-independent environment for computer graphics data display to provide easy access to users. While developing these capabilities, several problems had to be solved independently of the actual use of the MPP, all of which are outlined.

  2. The Square Kilometre Array Science Data Processor. Preliminary compute platform design

    NASA Astrophysics Data System (ADS)

    Broekema, P. C.; van Nieuwpoort, R. V.; Bal, H. E.

    2015-07-01

    The Square Kilometre Array is a next-generation radio-telescope, to be built in South Africa and Western Australia. It is currently in its detailed design phase, with procurement and construction scheduled to start in 2017. The SKA Science Data Processor is the high-performance computing element of the instrument, responsible for producing science-ready data. This is a major IT project, with the Science Data Processor expected to challenge the computing state-of-the art even in 2020. In this paper we introduce the preliminary Science Data Processor design and the principles that guide the design process, as well as the constraints to the design. We introduce a highly scalable and flexible system architecture capable of handling the SDP workload.

  3. Parallel scheduling of recursively defined arrays

    NASA Technical Reports Server (NTRS)

    Myers, T. J.; Gokhale, M. B.

    1986-01-01

    A new method of automatic generation of concurrent programs which constructs arrays defined by sets of recursive equations is described. It is assumed that the time of computation of an array element is a linear combination of its indices, and integer programming is used to seek a succession of hyperplanes along which array elements can be computed concurrently. The method can be used to schedule equations involving variable length dependency vectors and mutually recursive arrays. Portions of the work reported here have been implemented in the PS automatic program generation system.

  4. Evaluation of soft-core processors on a Xilinx Virtex-5 field programmable gate array.

    SciTech Connect

    Learn, Mark Walter

    2011-04-01

    Node-based architecture (NBA) designs for future satellite projects hold the promise of decreasing system development time and costs, size, weight, and power and positioning the laboratory to address other emerging mission opportunities quickly. Reconfigurable field programmable gate array (FPGA)-based modules will comprise the core of several of the NBA nodes. Microprocessing capabilities will be necessary with varying degrees of mission-specific performance requirements on these nodes. To enable the flexibility of these reconfigurable nodes, it is advantageous to incorporate the microprocessor into the FPGA itself, either as a hard-core processor built into the FPGA or as a soft-core processor built out of FPGA elements. This document describes the evaluation of three reconfigurable FPGA-based soft-core processors for use in future NBA systems: the MicroBlaze (uB), the open-source Leon3, and the licensed Leon3. Two standard performance benchmark applications were developed for each processor. The first, Dhrystone, is a fixed-point operation metric. The second, Whetstone, is a floating-point operation metric. Several trials were run at varying code locations, loop counts, processor speeds, and cache configurations. FPGA resource utilization was recorded for each configuration.

  5. Application of an array processor to the analysis of magnetic data for the Doublet III tokamak

    SciTech Connect

    Wang, T.S.; Saito, M.T.

    1980-08-01

    Discussed herein is a fast computational technique employing the Floating Point Systems AP-190L array processor to analyze magnetic data for the Doublet III tokamak, a fusion research device. Interpretation of the experimental data requires the repeated solution of a free-boundary nonlinear partial differential equation, which describes the magnetohydrodynamic (MHD) equilibrium of the plasma. For this particular application, we have found that the array processor is only 1.4 and 3.5 times slower than the CDC-7600 and CRAY computers, respectively. The overhead on the host DEC-10 computer was kept to a minimum by chaining the complete Poisson solver and free-boundary algorithm into one single-load module using the vector function chainer (VFC). A simple time-sharing scheme for using the MHD code is also discussed.

  6. Performance Evaluation and Modeling Techniques for Parallel Processors. Ph.D. Thesis

    NASA Technical Reports Server (NTRS)

    Dimpsey, Robert Tod

    1992-01-01

    In practice, the performance evaluation of supercomputers is still substantially driven by singlepoint estimates of metrics (e.g., MFLOPS) obtained by running characteristic benchmarks or workloads. With the rapid increase in the use of time-shared multiprogramming in these systems, such measurements are clearly inadequate. This is because multiprogramming and system overhead, as well as other degradations in performance due to time varying characteristics of workloads, are not taken into account. In multiprogrammed environments, multiple jobs and users can dramatically increase the amount of system overhead and degrade the performance of the machine. Performance techniques, such as benchmarking, which characterize performance on a dedicated machine ignore this major component of true computer performance. Due to the complexity of analysis, there has been little work done in analyzing, modeling, and predicting the performance of applications in multiprogrammed environments. This is especially true for parallel processors, where the costs and benefits of multi-user workloads are exacerbated. While some may claim that the issue of multiprogramming is not a viable one in the supercomputer market, experience shows otherwise. Even in recent massively parallel machines, multiprogramming is a key component. It has even been claimed that a partial cause of the demise of the CM2 was the fact that it did not efficiently support time-sharing. In the same paper, Gordon Bell postulates that, multicomputers will evolve to multiprocessors in order to support efficient multiprogramming. Therefore, it is clear that parallel processors of the future will be required to offer the user a time-shared environment with reasonable response times for the applications. In this type of environment, the most important performance metric is the completion of response time of a given application. However, there are a few evaluation efforts addressing this issue.

  7. High-performance parallel processors based on star-coupled wavelength division multiplexing optical interconnects

    DOEpatents

    Deri, Robert J.; DeGroot, Anthony J.; Haigh, Ronald E.

    2002-01-01

    As the performance of individual elements within parallel processing systems increases, increased communication capability between distributed processor and memory elements is required. There is great interest in using fiber optics to improve interconnect communication beyond that attainable using electronic technology. Several groups have considered WDM, star-coupled optical interconnects. The invention uses a fiber optic transceiver to provide low latency, high bandwidth channels for such interconnects using a robust multimode fiber technology. Instruction-level simulation is used to quantify the bandwidth, latency, and concurrency required for such interconnects to scale to 256 nodes, each operating at 1 GFLOPS performance. Performance scales have been shown to .apprxeq.100 GFLOPS for scientific application kernels using a small number of wavelengths (8 to 32), only one wavelength received per node, and achievable optoelectronic bandwidth and latency.

  8. Parallel pipeline networking and signal processing with field-programmable gate arrays (FPGAs) and VCSEL-MSM smart pixels

    NASA Astrophysics Data System (ADS)

    Kuznia, C. B.; Sawchuk, Alexander A.; Zhang, Liping; Hoanca, Bogdan; Hong, Sunkwang; Min, Chris; Pansatiankul, Dhawat E.; Alpaslan, Zahir Y.

    2000-05-01

    We present a networking and signal processing architecture called Transpar-TR (Translucent Smart Pixel Array-Token- Ring) that utilizes smart pixel technology to perform 2D parallel optical data transfer between digital processing nodes. Transpar-TR moves data through the network in the form of 3D packets (2D spatial and 1D time). By utilizing many spatial parallel channels, Transpar-TR can achieve high throughput, low latency communication between nodes, even with each channel operating at moderate data rates. The 2D array of optical channels is created by an array of smart pixels, each with an optical input and optical output. Each smart pixel consists of two sections, an optical network interface and ALU-based processor with local memory. The optical network interface is responsible for transmitting and receiving optical data packets using a slotted token ring network protocol. The smart pixel array operates as a single-instruction multiple-data processor when processing data. The Transpar-TR network, consisting of networked smart pixel arrays, can perform pipelined parallel processing very efficiently on 2D data structures such as images and video. This paper discusses the Transpar-TR implementation in which each node is the printed circuit board integration of a VCSEL-MSM chip, a transimpedance receiver array chip and an FPGA chip.

  9. Parallel Spectral Acquisition with an ICR Cell Array

    PubMed Central

    Park, Sung-Gun; Anderson, Gordon A.; Navare, Arti T.; Bruce, James E.

    2016-01-01

    Mass measurement accuracy is a critical analytical figure-of-merit in most areas of mass spectrometry application. However, the time required for acquisition of high resolution, high mass accuracy data limits many applications and is an aspect under continual pressure for development. Current efforts target implementation of higher electrostatic and magnetic fields because ion oscillatory frequencies increase linearly with field strength. As such, the time required for spectral acquisition of a given resolving power and mass accuracy decreases linearly with increasing fields. Mass spectrometer developments to include multiple high resolution detectors that can be operated in parallel could further decrease the acquisition time by a factor of n, the number of detectors. Efforts described here resulted in development of an instrument with a set of Fourier transform ion cyclotron resonance (ICR) cells as detectors that constitute the first MS array capable of parallel high resolution spectral acquisition. ICR cell array systems consisting of three or five cells were constructed with printed circuit boards and installed within a single superconducting magnet and vacuum system. Independent ion populations were injected and trapped within each cell in the array. Upon filling the array, all ions in all cells were simultaneously excited and ICR signals from each cell were independently amplified and recorded in parallel. Presented here are the initial results of successful parallel spectral acquisition, parallel MS and MS/MS measurements, and parallel high resolution acquisition with the MS array system. PMID:26669509

  10. Parallel Spectral Acquisition with an Ion Cyclotron Resonance Cell Array.

    PubMed

    Park, Sung-Gun; Anderson, Gordon A; Navare, Arti T; Bruce, James E

    2016-01-19

    Mass measurement accuracy is a critical analytical figure-of-merit in most areas of mass spectrometry application. However, the time required for acquisition of high-resolution, high mass accuracy data limits many applications and is an aspect under continual pressure for development. Current efforts target implementation of higher electrostatic and magnetic fields because ion oscillatory frequencies increase linearly with field strength. As such, the time required for spectral acquisition of a given resolving power and mass accuracy decreases linearly with increasing fields. Mass spectrometer developments to include multiple high-resolution detectors that can be operated in parallel could further decrease the acquisition time by a factor of n, the number of detectors. Efforts described here resulted in development of an instrument with a set of Fourier transform ion cyclotron resonance (ICR) cells as detectors that constitute the first MS array capable of parallel high-resolution spectral acquisition. ICR cell array systems consisting of three or five cells were constructed with printed circuit boards and installed within a single superconducting magnet and vacuum system. Independent ion populations were injected and trapped within each cell in the array. Upon filling the array, all ions in all cells were simultaneously excited and ICR signals from each cell were independently amplified and recorded in parallel. Presented here are the initial results of successful parallel spectral acquisition, parallel mass spectrometry (MS) and MS/MS measurements, and parallel high-resolution acquisition with the MS array system.

  11. Beam-steering and jammer-nulling photorefractive phased-array radar processor

    NASA Astrophysics Data System (ADS)

    Sarto, Anthony W.; Weverka, Robert T.; Wagner, Kelvin H.

    1994-06-01

    We are developing a class of optical phased-array-radar processors which use the large number of degrees-of-freedom available in 3D photorefractive volume holograms to time integrate the adaptive weights to perform beam-steering and jammer-cancellation signal-processing tasks for very large phased-array antennas. We have experimentally demonstrated independently the two primary subsystems of the beam-steering and jammer-nulling phased-array radar processor, the beam-forming subsystem and the jammer-nulling subsystem, as well as simultaneous main beam formation and jammer suppression in the combined processor. The beam-steering subsystem calculates the angle of arrival of a desired signal of interest and steers the antenna pattern in the direction of this desired signal by forming a dynamic holographic grating proportional to the correlation between the incoming signal of interest from the antenna array and the temporal waveform of the desired signal. This grating is formed by repetitively applying the temporal waveform of the desired signal to a single acousto-optic Bragg cell and allowing the diffracted component from the Bragg cell to interfere with an optical mapping of the received phased-array antenna signal at a photorefractive crystal. The diffracted component from this grating is the antenna output modified by an array function pointed towards the desired signal of interest. This beam-steering task is performed with the only a priori information being that of the knowledge of a temporal waveform that correlates well with the desired signal and that the delay of the desired signal remains within the time aperture of the Bragg cell. The jammer-nulling subsystem computes the angles-of- arrival of multiple interfering narrowband radar jammers and adaptively steers nulls in the antenna pattern in order to extinguish the jammers by implementing a modified LMS algorithm in the optical domain. This task is performed in a second photorefractive crystal where

  12. Feasibility of using the Massively Parallel Processor for large eddy simulations and other Computational Fluid Dynamics applications

    NASA Technical Reports Server (NTRS)

    Bruno, John

    1984-01-01

    The results of an investigation into the feasibility of using the MPP for direct and large eddy simulations of the Navier-Stokes equations is presented. A major part of this study was devoted to the implementation of two of the standard numerical algorithms for CFD. These implementations were not run on the Massively Parallel Processor (MPP) since the machine delivered to NASA Goddard does not have sufficient capacity. Instead, a detailed implementation plan was designed and from these were derived estimates of the time and space requirements of the algorithms on a suitably configured MPP. In addition, other issues related to the practical implementation of these algorithms on an MPP-like architecture were considered; namely, adaptive grid generation, zonal boundary conditions, the table lookup problem, and the software interface. Performance estimates show that the architectural components of the MPP, the Staging Memory and the Array Unit, appear to be well suited to the numerical algorithms of CFD. This combined with the prospect of building a faster and larger MMP-like machine holds the promise of achieving sustained gigaflop rates that are required for the numerical simulations in CFD.

  13. The Fortran-P Translator: Towards Automatic Translation of Fortran 77 Programs for Massively Parallel Processors

    DOE PAGES

    O'keefe, Matthew; Parr, Terence; Edgar, B. Kevin; ...

    1995-01-01

    Massively parallel processors (MPPs) hold the promise of extremely high performance that, if realized, could be used to study problems of unprecedented size and complexity. One of the primary stumbling blocks to this promise has been the lack of tools to translate application codes to MPP form. In this article we show how applications codes written in a subset of Fortran 77, called Fortran-P, can be translated to achieve good performance on several massively parallel machines. This subset can express codes that are self-similar, where the algorithm applied to the global data domain is also applied to each subdomain. Wemore » have found many codes that match the Fortran-P programming style and have converted them using our tools. We believe a self-similar coding style will accomplish what a vectorizable style has accomplished for vector machines by allowing the construction of robust, user-friendly, automatic translation systems that increase programmer productivity and generate fast, efficient code for MPPs.« less

  14. Developments of 60 ghz antenna and wireless interconnect inside multi-chip module for parallel processor system

    NASA Astrophysics Data System (ADS)

    Yeh, Ho-Hsin

    In order to carry out the complicated computation inside the high performance computing (HPC) systems, tens to hundreds of parallel processor chips and physical wires are required to be integrated inside the multi-chip package module (MCM). The physical wires considered as the electrical interconnects between the processor chips, however, have the challenges on placements and routings because of the unequal progress between the semiconductor and I/O size reductions. The primary goal of the research is to overcome package design challenges---providing a hybrid computing architecture with implemented 60 GHz antennas as the high efficient wireless interconnect which could generate over 10 Gbps bandwidth on the data transmissions. The dissertation is divided into three major parts. In the first part, two different performance metrics, power loss required to be recovered ( PRE) and wireless link budget, on evaluating the antenna's system performance within the chip to chip wireless interconnect are introduced to address the design challenges and define the design goals. The second part contains the design concept, fabrication procedure and measurements of implemented 60 GHz broadband antenna in the application of multi-chip data transmissions. The developed antenna utilizes the periodically-patched artificial magnetic conductor (AMC) structure associated with the ground-shielded conductor in order to enhance the antenna's impedance matching bandwidth. The validation presents that over 10 GHz -10 dB S11 bandwidth which indicates the antenna's operating bandwidth and the horizontal data transmission capability which is required by planar type chip to chip interconnect can be achieved with the design concept. In order to reduce both PRE and wireless link budget numbers, a 60 GHz two-element array in the multi-chip communication is developed in the third part. The third section includes the combined-field analysis, the design concepts on two-element array and feeding

  15. Exploiting Parallelism in Geometry Processing with General Purpose Processors and Floating-Point SIMD Instructions

    DTIC Science & Technology

    2005-01-01

    a 4-issue dynamically scheduled processor that can issue at most 2 floating - point operations. In comparison, an 8-issue processor, ignoring cycle time...then doubling only the floating - point issue width of a 4-issue processor with SIMD instructions gives the best performance among the architectural

  16. Parallel arrays of Josephson junctions for submillimeter local oscillators

    NASA Technical Reports Server (NTRS)

    Pance, Aleksandar; Wengler, Michael J.

    1992-01-01

    In this paper we discuss the influence of the DC biasing circuit on operation of parallel biased quasioptical Josephson junction oscillator arrays. Because of nonuniform distribution of the DC biasing current along the length of the bias lines, there is a nonuniform distribution of magnetic flux in superconducting loops connecting every two junctions of the array. These DC self-field effects determine the state of the array. We present analysis and time-domain numerical simulations of these states for four biasing configurations. We find conditions for the in-phase states with maximum power output. We compare arrays with small and large inductances and determine the low inductance limit for nearly-in-phase array operation. We show how arrays can be steered in H-plane using the externally applied DC magnetic field.

  17. Some current uses of array processors for preprocessing of remote sensing data

    NASA Technical Reports Server (NTRS)

    Fischel, D.

    1984-01-01

    The preparation of remotely sensed data sets into a form useful to the analyst is a significant computational task, involving the processing of spacecraft data (e.g., orbit, attitude, temperatures, etc.), decommutation of the video telemetry stream, radiometric correction and geometric correction. Many of these processes are extremely well suited for implementation on attached array processors. Currently, at Goddard Space Flight Center a number of computer systems provide such capability for earth observations or are under development as test beds for future ground segment support. Six such systems will be discussed.

  18. An Analog Processor for Image Compression

    NASA Technical Reports Server (NTRS)

    Tawel, R.

    1992-01-01

    This paper describes a novel analog Vector Array Processor (VAP) that was designed for use in real-time and ultra-low power image compression applications. This custom CMOS processor is based architectually on the Vector Quantization (VQ) algorithm in image coding, and the hardware implementation fully exploits the inherent parallelism built-in the VQ algorithm.

  19. Integration Architecture of Content Addressable Memory and Massive-Parallel Memory-Embedded SIMD Matrix for Versatile Multimedia Processor

    NASA Astrophysics Data System (ADS)

    Kumaki, Takeshi; Ishizaki, Masakatsu; Koide, Tetsushi; Mattausch, Hans Jürgen; Kuroda, Yasuto; Gyohten, Takayuki; Noda, Hideyuki; Dosaka, Katsumi; Arimoto, Kazutami; Saito, Kazunori

    This paper presents an integration architecture of content addressable memory (CAM) and a massive-parallel memory-embedded SIMD matrix for constructing a versatile multimedia processor. The massive-parallel memory-embedded SIMD matrix has 2,048 2-bit processing elements, which are connected by a flexible switching network, and supports 2-bit 2,048-way bit-serial and word-parallel operations with a single command. The SIMD matrix architecture is verified to be a better way for processing the repeated arithmetic operation types in multimedia applications. The proposed architecture, reported in this paper, exploits in addition CAM technology and enables therefore fast pipelined table-lookup coding operations. Since both arithmetic and table-lookup operations execute extremely fast, the proposed novel architecture can realize consequently efficient and versatile multimedia data processing. Evaluation results of the proposed CAM-enhanced massive-parallel SIMD matrix processor for the example of the frequently used JPEG image-compression application show that the necessary clock cycle number can be reduced by 86% in comparison to a conventional mobile DSP architecture. The determined performances in Mpixel/mm2 are factors 3.3 and 4.4 better than with a CAM-less massive-parallel memory-embedded SIMD matrix processor and a conventional mobile DSP, respectively.

  20. Feasibility study for the implementation of NASTRAN on the ILLIAC 4 parallel processor

    NASA Technical Reports Server (NTRS)

    Field, E. I.

    1975-01-01

    The ILLIAC IV, a fourth generation multiprocessor using parallel processing hardware concepts, is operational at Moffett Field, California. Its capability to excel at matrix manipulation, makes the ILLIAC well suited for performing structural analyses using the finite element displacement method. The feasibility of modifying the NASTRAN (NASA structural analysis) computer program to make effective use of the ILLIAC IV was investigated. The characteristics are summarized of the ILLIAC and the ARPANET, a telecommunications network which spans the continent making the ILLIAC accessible to nearly all major industrial centers in the United States. Two distinct approaches are studied: retaining NASTRAN as it now operates on many of the host computers of the ARPANET to process the input and output while using the ILLIAC only for the major computational tasks, and installing NASTRAN to operate entirely in the ILLIAC environment. Though both alternatives offer similar and significant increases in computational speed over modern third generation processors, the full installation of NASTRAN on the ILLIAC is recommended. Specifications are presented for performing that task with manpower estimates and schedules to correspond.

  1. Constructing higher order DNA origami arrays using DNA junctions of anti-parallel/parallel double crossovers

    NASA Astrophysics Data System (ADS)

    Ma, Zhipeng; Park, Seongsu; Yamashita, Naoki; Kawai, Kentaro; Hirai, Yoshikazu; Tsuchiya, Toshiyuki; Tabata, Osamu

    2016-06-01

    DNA origami provides a versatile method for the construction of nanostructures with defined shape, size and other properties; such nanostructures may enable a hierarchical assembly of large scale architecture for the placement of other nanomaterials with atomic precision. However, the effective use of these higher order structures as functional components depends on knowledge of their assembly behavior and mechanical properties. This paper demonstrates construction of higher order DNA origami arrays with controlled orientations based on the formation of two types of DNA junctions: anti-parallel and parallel double crossovers. A two-step assembly process, in which preformed rectangular DNA origami monomer structures themselves undergo further self-assembly to form numerically unlimited arrays, was investigated to reveal the influences of assembly parameters. AFM observations showed that when parallel double crossover DNA junctions are used, the assembly of DNA origami arrays occurs with fewer monomers than for structures formed using anti-parallel double crossovers, given the same assembly parameters, indicating that the configuration of parallel double crossovers is not energetically preferred. However, the direct measurement by AFM force-controlled mapping shows that both DNA junctions of anti-parallel and parallel double crossovers have homogeneous mechanical stability with any part of DNA origami.

  2. Mobile and replicated alignment of arrays in data-parallel programs

    NASA Technical Reports Server (NTRS)

    Chatterjee, Siddhartha; Gilbert, John R.; Schreiber, Robert

    1993-01-01

    When a data-parallel language like FORTRAN 90 is compiled for a distributed-memory machine, aggregate data objects (such as arrays) are distributed across the processor memories. The mapping determines the amount of residual communication needed to bring operands of parallel operations into alignment with each other. A common approach is to break the mapping into two stages: first, an alignment that maps all the objects to an abstract template, and then a distribution that maps the template to the processors. We solve two facets of the problem of finding alignments that reduce residual communication: we determine alignments that vary in loops, and objects that should have replicated alignments. We show that loop-dependent mobile alignment is sometimes necessary for optimum performance, and we provide algorithms with which a compiler can determine good mobile alignments for objects within do loops. We also identify situations in which replicated alignment is either required by the program itself (via spread operations) or can be used to improve performance. We propose an algorithm based on network flow that determines which objects to replicate so as to minimize the total amount of broadcast communication in replication. This work on mobile and replicated alignment extends our earlier work on determining static alignment.

  3. Real-Time Adaptive Lossless Hyperspectral Image Compression using CCSDS on Parallel GPGPU and Multicore Processor Systems

    NASA Technical Reports Server (NTRS)

    Hopson, Ben; Benkrid, Khaled; Keymeulen, Didier; Aranki, Nazeeh; Klimesh, Matt; Kiely, Aaron

    2012-01-01

    The proposed CCSDS (Consultative Committee for Space Data Systems) Lossless Hyperspectral Image Compression Algorithm was designed to facilitate a fast hardware implementation. This paper analyses that algorithm with regard to available parallelism and describes fast parallel implementations in software for GPGPU and Multicore CPU architectures. We show that careful software implementation, using hardware acceleration in the form of GPGPUs or even just multicore processors, can exceed the performance of existing hardware and software implementations by up to 11x and break the real-time barrier for the first time for a typical test application.

  4. Investigations on the usefulness of the Massively Parallel Processor for study of electronic properties of atomic and condensed matter systems

    NASA Technical Reports Server (NTRS)

    Das, T. P.

    1988-01-01

    The usefulness of the Massively Parallel Processor (MPP) for investigation of electronic structures and hyperfine properties of atomic and condensed matter systems was explored. The major effort was directed towards the preparation of algorithms for parallelization of the computational procedure being used on serial computers for electronic structure calculations in condensed matter systems. Detailed descriptions of investigations and results are reported, including MPP adaptation of self-consistent charge extended Hueckel (SCCEH) procedure, MPP adaptation of the first-principles Hartree-Fock cluster procedure for electronic structures of large molecules and solid state systems, and MPP adaptation of the many-body procedure for atomic systems.

  5. General linear codes for fault-tolerant matrix operations on processor arrays

    NASA Technical Reports Server (NTRS)

    Nair, V. S. S.; Abraham, J. A.

    1988-01-01

    Various checksum codes have been suggested for fault-tolerant matrix computations on processor arrays. Use of these codes is limited due to potential roundoff and overflow errors. Numerical errors may also be misconstrued as errors due to physical faults in the system. In this a set of linear codes is identified which can be used for fault-tolerant matrix operations such as matrix addition, multiplication, transposition, and LU-decomposition, with minimum numerical error. Encoding schemes are given for some of the example codes which fall under the general set of codes. With the help of experiments, a rule of thumb for the selection of a particular code for a given application is derived.

  6. Mechanically verified hardware implementing an 8-bit parallel IO Byzantine agreement processor

    NASA Technical Reports Server (NTRS)

    Moore, J. Strother

    1992-01-01

    Consider a network of four processors that use the Oral Messages (Byzantine Generals) Algorithm of Pease, Shostak, and Lamport to achieve agreement in the presence of faults. Bevier and Young have published a functional description of a single processor that, when interconnected appropriately with three identical others, implements this network under the assumption that the four processors step in synchrony. By formalizing the original Pease, et al work, Bevier and Young mechanically proved that such a network achieves fault tolerance. We develop, formalize, and discuss a hardware design that has been mechanically proven to implement their processor. In particular, we formally define mapping functions from the abstract state space of the Bevier-Young processor to a concrete state space of a hardware module and state a theorem that expresses the claim that the hardware correctly implements the processor. We briefly discuss the Brock-Hunt Formal Hardware Description Language which permits designs both to be proved correct with the Boyer-Moore theorem prover and to be expressed in a commercially supported hardware description language for additional electrical analysis and layout. We briefly describe our implementation.

  7. Microchannel cross load array with dense parallel input

    DOEpatents

    Swierkowski, Stefan P.

    2004-04-06

    An architecture or layout for microchannel arrays using T or Cross (+) loading for electrophoresis or other injection and separation chemistry that are performed in microfluidic configurations. This architecture enables a very dense layout of arrays of functionally identical shaped channels and it also solves the problem of simultaneously enabling efficient parallel shapes and biasing of the input wells, waste wells, and bias wells at the input end of the separation columns. One T load architecture uses circular holes with common rows, but not columns, which allows the flow paths for each channel to be identical in shape, using multiple mirror image pieces. Another T load architecture enables the access hole array to be formed on a biaxial, collinear grid suitable for EDM micromachining (square holes), with common rows and columns.

  8. Parallel Syntheses of Peptides on Teflon-Patterned Paper Arrays (SyntArrays).

    PubMed

    Deiss, Frédérique; Yang, Yang; Derda, Ratmir

    2016-01-01

    Screening of peptides to find the ligands that bind to specific targets is an important step in drug discovery. These high-throughput screens require large number of structural variants of peptides to be synthesized and tested. This chapter describes the generation of arrays of peptides on Teflon-patterned sheets of paper. First, the protocol describes the patterning of paper with a Teflon solution to produce arrays with solvophobic barriers that are able to confine organic solvents. Next, we describe the parallel syntheses of 96 peptides on Teflon-patterned arrays using the SPOT synthesis method.

  9. Fast space-filling molecular graphics using dynamic partitioning among parallel processors.

    PubMed

    Gertner, B J; Whitnell, R M; Wilson, K R

    1991-09-01

    We present a novel algorithm for the efficient generation of high-quality space-filling molecular graphics that is particularly appropriate for the creation of the large number of images needed in the animation of molecular dynamics. Each atom of the molecule is represented by a sphere of an appropriate radius, and the image of the sphere is constructed pixel-by-pixel using a generalization of the lighting model proposed by Porter (Comp. Graphics 1978, 12, 282). The edges of the spheres are antialiased, and intersections between spheres are handled through a simple blending algorithm that provides very smooth edges. We have implemented this algorithm on a multiprocessor computer using a procedure that dynamically repartitions the effort among the processors based on the CPU time used by each processor to create the previous image. This dynamic reallocation among processors automatically maximizes efficiency in the face of both the changing nature of the image from frame to frame and the shifting demands of the other programs running simultaneously on the same processors. We present data showing the efficiency of this multiprocessing algorithm as the number of processors is increased. The combination of the graphics and multiprocessor algorithms allows the fast generation of many high-quality images.

  10. Parallelization Issues of Domain Specific Question Answering System on Cell B.E. Processors

    NASA Astrophysics Data System (ADS)

    Kumar, Tarun; Mittal, Ankush; Sondhi, Parikshit

    A question answering system is an information retrieval application which allows users to directly obtain appropriate answers to a question. In order to deal with an explosive growth of information over internet and increased number of processing stages in answer retrieval, time and processing hardware required by question answering system has increased. The need of hardware is currently served by connecting thousands of computers in cluster. But faster and less complex alternatives can be found as a multi-core processor. This paper presents a pioneer work by identifying major issues involved in porting a general question answering framework on a cell processor and their possible solutions. The work is evaluated by porting the indexing algorithm of our biomedical question answering system, INDOC (Internet Doctor) on cell processors.

  11. Programmable retinal dynamics in a CMOS mixed-signal array processor chip

    NASA Astrophysics Data System (ADS)

    Carmona, Ricardo A.; Jimenez-Garrido, Francisco J.; Dominguez-Castro, Rafael; Espejo, Servando; Rodriguez-Vazquez, Angel

    2003-04-01

    The retina is responsible of the treatment of visual information at early stages. Visual stimuli generate patterns of activity that are transmitted through its layered structure up to the ganglion cells that interface it to the optical nerve. In this trip of micrometers, information is sustained by continuous signals that interact in excitatory and inhibitory ways. This low-level processing compresses the relevant information of the images to a manageable size. The behavior of the more external layers of the biological retina has been successfully modelled within the Cellular Neural Network framework. Interactions between cells are realized on a local basic. Each cell interacts with its nearest neighbors and every cell in the same layer follows the same interconnection pattern. Intra- and inter-layer interactions are continuous in magnitude and time. The evolution of the network can be described by a set of coupled nonlinear differential equations. A mixed-signal VLSI implementation of focal-plane low-level image processing based upon this biological model constitutes a feasible and cost effective alternative to conventional digital processing in real-time applications. A CMOS Programmable Array Processor prototype chip has been designed and fabricated in a standard technology. It has been successfully tested, validating the proposed design techniques. The integrated system consists of a network of 2 coupled layers, containing 32×32 elementary processors, running at different time constants. Involved image processing algorithms can be programmed on this chip by tuning the appropriate interconnection weights, internally coded as analog but programmed via a digital interface. Propagative, active wave phenomena and retina-lake effects can be observed in this chip. Low-level image processing tasks for early vision applications can be developed based on these high-order dynamics.

  12. Scalable Unix commands for parallel processors : a high-performance implementation.

    SciTech Connect

    Ong, E.; Lusk, E.; Gropp, W.

    2001-06-22

    We describe a family of MPI applications we call the Parallel Unix Commands. These commands are natural parallel versions of common Unix user commands such as ls, ps, and find, together with a few similar commands particular to the parallel environment. We describe the design and implementation of these programs and present some performance results on a 256-node Linux cluster. The Parallel Unix Commands are open source and freely available.

  13. A digital magnetic resonance imaging spectrometer using digital signal processor and field programmable gate array

    NASA Astrophysics Data System (ADS)

    Liang, Xiao; Binghe, Sun; Yueping, Ma; Ruyan, Zhao

    2013-05-01

    A digital spectrometer for low-field magnetic resonance imaging is described. A digital signal processor (DSP) is utilized as the pulse programmer on which a pulse sequence is executed as a subroutine. Field programmable gate array (FPGA) devices that are logically mapped into the external addressing space of the DSP work as auxiliary controllers of gradient control, radio frequency (rf) generation, and rf receiving separately. The pulse programmer triggers an event by setting the 32-bit control register of the corresponding FPGA, and then the FPGA automatically carries out the event function according to preset configurations in cooperation with other devices; accordingly, event control of the spectrometer is flexible and efficient. Digital techniques are in widespread use: gradient control is implemented in real-time by a FPGA; rf source is constructed using direct digital synthesis technique, and rf receiver is constructed using digital quadrature detection technique. Well-designed performance is achieved, including 1 μs time resolution of the gradient waveform, 1 μs time resolution of the soft pulse, and 2 MHz signal receiving bandwidth. Both rf synthesis and rf digitalization operate at the same 60 MHz clock, therefore, the frequency range of transmitting and receiving is from DC to ˜27 MHz. A majority of pulse sequences have been developed, and the imaging performance of the spectrometer has been validated through a large number of experiments. Furthermore, the spectrometer is also suitable for relaxation measurement in nuclear magnetic resonance field.

  14. A Processor for Two-Dimensional Symmetric Eigenvalue and Singular Value Arrays.

    DTIC Science & Technology

    1987-05-01

    processors of the rotations 0 and f that diagonalize the 2 X 2 block stored in each of them (Coo 5in9)( 6 C)I COSV -sinf _ ir uo)-sinl0 cooe0 b d sin P... cosV ’ 107 The angles 0 and P are propagated to all processors in the same row and the same column, respectively, as the diagonal processor that has

  15. Numerical methods for matrix computations using arrays of processors. Final report, 15 August 1983-15 October 1986

    SciTech Connect

    Golub, G.H.

    1987-04-30

    The basic objective of this project was to consider a large class of matrix computations with particular emphasis on algorithms that can be implemented on arrays of processors. In particular, methods useful for sparse matrix computations were investigated. These computations arise in a variety of applications such as the solution of partial differential equations by multigrid methods and in the fitting of geodetic data. Some of the methods developed have already found their use on some of the newly developed architectures.

  16. A longitudinal multi-bunch feedback system using parallel digital signal processors

    SciTech Connect

    Sapozhnikov, L.; Fox, J.D.; Olsen, J.J.; Oxoby, G.; Linscott, I.; Drago, A.; Serio, M.

    1993-12-01

    A programmable longitudinal feedback system based on four AT&T 1610 digital signal processors has been developed as a component of the PEP-II R&D program. This longitudinal quick prototype is a proof of concept for the PEP-II system and implements full-speed bunch-by-bunch signal processing for storage rings with bunch spacing of 4 ns. The design incorporates a phase-detector-based front end that digitizes the oscillation phases of bunchies at the 250 MHz crossing rate, four programmable signal processors that compute correction signals, and a 250-MHz hold buffer/kicker driver stage that applies correction signals back on the beam. The design implements a general-purpose, table-driven downsampler that allows the system to be operated at several accelerator facilities. The hardware architecture of the signal processing is described, and the software algorithms used in the feedback signal computation are discussed. The system configuration used for tests at the LBL Advanced Light Source is presented.

  17. O(1) time algorithms for computing histogram and Hough transform on a cross-bridge reconfigurable array of processors

    SciTech Connect

    Kao, T.; Horng, S.; Wang, Y.

    1995-04-01

    Instead of using the base-2 number system, we use a base-m number system to represent the numbers used in the proposed algorithms. Such a strategy can be used to design an O(T) time, T = (log(sub m) N) + 1, prefix sum algorithm for a binary sequence with N-bit on a cross-bridge reconfigurable array of processors using N processors, where the data bus is m-bit wide. Then, this basic operation can be used to compute the histogram of an n x n image with G gray-level value in constant time using G x n x n processors, and compute the Hough transform of an image with N edge pixels and n x n parameter space in constant time using n x n x N processors, respectively. This result is better than the previously known results proposed in the literature. Also, the execution time of the proposed algorithms is tunable by the bus bandwidth. 43 refs.

  18. Fast String Search on Multicore Processors: Mapping fundamental algorithms onto parallel hardware

    SciTech Connect

    Scarpazza, Daniele P.; Villa, Oreste; Petrini, Fabrizio

    2008-04-01

    String searching is one of these basic algorithms. It has a host of applications, including search engines, network intrusion detection, virus scanners, spam filters, and DNA analysis, among others. The Cell processor, with its multiple cores, promises to speed-up string searching a lot. In this article, we show how we mapped string searching efficiently on the Cell. We present two implementations: • The fast implementation supports a small dictionary size (approximately 100 patterns) and provides a throughput of 40 Gbps, which is 100 times faster than reference implementations on x86 architectures. • The heavy-duty implementation is slower (3.3-4.3 Gbps), but supports dictionaries with tens of thousands of strings.

  19. Low-power, real-time digital video stabilization using the HyperX parallel processor

    NASA Astrophysics Data System (ADS)

    Hunt, Martin A.; Tong, Lin; Bindloss, Keith; Zhong, Shang; Lim, Steve; Schmid, Benjamin J.; Tidwell, J. D.; Willson, Paul D.

    2011-06-01

    Coherent Logix has implemented a digital video stabilization algorithm for use in soldier systems and small unmanned air / ground vehicles that focuses on significantly reducing the size, weight, and power as compared to current implementations. The stabilization application was implemented on the HyperX architecture using a dataflow programming methodology and the ANSI C programming language. The initial implementation is capable of stabilizing an 800 x 600, 30 fps, full color video stream with a 53ms frame latency using a single 100 DSP core HyperX hx3100TM processor running at less than 3 W power draw. By comparison an Intel Core2 Duo processor running the same base algorithm on a 320x240, 15 fps stream consumes on the order of 18W. The HyperX implementation is an overall 100x improvement in performance (processing bandwidth increase times power improvement) over the GPP based platform. In addition the implementation only requires a minimal number of components to interface directly to the imaging sensor and helmet mounted display or the same computing architecture can be used to generate software defined radio waveforms for communications links. In this application, the global motion due to the camera is measured using a feature based algorithm (11 x 11 Difference of Gaussian filter and Features from Accelerated Segment Test) and model fitting (Random Sample Consensus). Features are matched in consecutive frames and a control system determines the affine transform to apply to the captured frame that will remove or dampen the camera / platform motion on a frame-by-frame basis.

  20. The Sequential Implementation of Array Processors when there is Directional Uncertainty

    DTIC Science & Technology

    1975-08-01

    be completed . The primary processor uses a convenient description of a priori knowledge out of the class of natural conjugate densities (if such a...Furthermore, the mathematic.l tractability of the I [j natural conjugate priors may simplify the design of the primary processor. Performance The complete ...detection receivers discussed in this dissertation will be evaluated and compared. In general, the entire ROC curve is necessary to completely specify

  1. Evaluation of the Leon3 soft-core processor within a Xilinx radiation-hardened field-programmable gate array.

    SciTech Connect

    Learn, Mark Walter

    2012-01-01

    The purpose of this document is to summarize the work done to evaluate the performance of the Leon3 soft-core processor in a radiation environment while instantiated in a radiation-hardened static random-access memory based field-programmable gate array. This evaluation will look at the differences between two soft-core processors: the open-source Leon3 core and the fault-tolerant Leon3 core. Radiation testing of these two cores was conducted at the Texas A&M University Cyclotron facility and Lawrence Berkeley National Laboratory. The results of these tests are included within the report along with designs intended to improve the mitigation of the open-source Leon3. The test setup used for evaluating both versions of the Leon3 is also included within this document.

  2. Multimode power processor

    DOEpatents

    O'Sullivan, George A.; O'Sullivan, Joseph A.

    1999-01-01

    In one embodiment, a power processor which operates in three modes: an inverter mode wherein power is delivered from a battery to an AC power grid or load; a battery charger mode wherein the battery is charged by a generator; and a parallel mode wherein the generator supplies power to the AC power grid or load in parallel with the battery. In the parallel mode, the system adapts to arbitrary non-linear loads. The power processor may operate on a per-phase basis wherein the load may be synthetically transferred from one phase to another by way of a bumpless transfer which causes no interruption of power to the load when transferring energy sources. Voltage transients and frequency transients delivered to the load when switching between the generator and battery sources are minimized, thereby providing an uninterruptible power supply. The power processor may be used as part of a hybrid electrical power source system which may contain, in one embodiment, a photovoltaic array, diesel engine, and battery power sources.

  3. Multimode power processor

    DOEpatents

    O'Sullivan, G.A.; O'Sullivan, J.A.

    1999-07-27

    In one embodiment, a power processor which operates in three modes: an inverter mode wherein power is delivered from a battery to an AC power grid or load; a battery charger mode wherein the battery is charged by a generator; and a parallel mode wherein the generator supplies power to the AC power grid or load in parallel with the battery. In the parallel mode, the system adapts to arbitrary non-linear loads. The power processor may operate on a per-phase basis wherein the load may be synthetically transferred from one phase to another by way of a bumpless transfer which causes no interruption of power to the load when transferring energy sources. Voltage transients and frequency transients delivered to the load when switching between the generator and battery sources are minimized, thereby providing an uninterruptible power supply. The power processor may be used as part of a hybrid electrical power source system which may contain, in one embodiment, a photovoltaic array, diesel engine, and battery power sources. 31 figs.

  4. Fast 2D DOA Estimation Algorithm by an Array Manifold Matching Method with Parallel Linear Arrays

    PubMed Central

    Yang, Lisheng; Liu, Sheng; Li, Dong; Jiang, Qingping; Cao, Hailin

    2016-01-01

    In this paper, the problem of two-dimensional (2D) direction-of-arrival (DOA) estimation with parallel linear arrays is addressed. Two array manifold matching (AMM) approaches, in this work, are developed for the incoherent and coherent signals, respectively. The proposed AMM methods estimate the azimuth angle only with the assumption that the elevation angles are known or estimated. The proposed methods are time efficient since they do not require eigenvalue decomposition (EVD) or peak searching. In addition, the complexity analysis shows the proposed AMM approaches have lower computational complexity than many current state-of-the-art algorithms. The estimated azimuth angles produced by the AMM approaches are automatically paired with the elevation angles. More importantly, for estimating the azimuth angles of coherent signals, the aperture loss issue is avoided since a decorrelation procedure is not required for the proposed AMM method. Numerical studies demonstrate the effectiveness of the proposed approaches. PMID:26907301

  5. Optimizing ion channel models using a parallel genetic algorithm on graphical processors.

    PubMed

    Ben-Shalom, Roy; Aviv, Amit; Razon, Benjamin; Korngreen, Alon

    2012-01-01

    We have recently shown that we can semi-automatically constrain models of voltage-gated ion channels by combining a stochastic search algorithm with ionic currents measured using multiple voltage-clamp protocols. Although numerically successful, this approach is highly demanding computationally, with optimization on a high performance Linux cluster typically lasting several days. To solve this computational bottleneck we converted our optimization algorithm for work on a graphical processing unit (GPU) using NVIDIA's CUDA. Parallelizing the process on a Fermi graphic computing engine from NVIDIA increased the speed ∼180 times over an application running on an 80 node Linux cluster, considerably reducing simulation times. This application allows users to optimize models for ion channel kinetics on a single, inexpensive, desktop "super computer," greatly reducing the time and cost of building models relevant to neuronal physiology. We also demonstrate that the point of algorithm parallelization is crucial to its performance. We substantially reduced computing time by solving the ODEs (Ordinary Differential Equations) so as to massively reduce memory transfers to and from the GPU. This approach may be applied to speed up other data intensive applications requiring iterative solutions of ODEs.

  6. Method of up-front load balancing for local memory parallel processors

    NASA Technical Reports Server (NTRS)

    Baffes, Paul Thomas (Inventor)

    1990-01-01

    In a parallel processing computer system with multiple processing units and shared memory, a method is disclosed for uniformly balancing the aggregate computational load in, and utilizing minimal memory by, a network having identical computations to be executed at each connection therein. Read-only and read-write memory are subdivided into a plurality of process sets, which function like artificial processing units. Said plurality of process sets is iteratively merged and reduced to the number of processing units without exceeding the balance load. Said merger is based upon the value of a partition threshold, which is a measure of the memory utilization. The turnaround time and memory savings of the instant method are functions of the number of processing units available and the number of partitions into which the memory is subdivided. Typical results of the preferred embodiment yielded memory savings of from sixty to seventy five percent.

  7. Acoustooptic linear algebra processors - Architectures, algorithms, and applications

    NASA Technical Reports Server (NTRS)

    Casasent, D.

    1984-01-01

    Architectures, algorithms, and applications for systolic processors are described with attention to the realization of parallel algorithms on various optical systolic array processors. Systolic processors for matrices with special structure and matrices of general structure, and the realization of matrix-vector, matrix-matrix, and triple-matrix products and such architectures are described. Parallel algorithms for direct and indirect solutions to systems of linear algebraic equations and their implementation on optical systolic processors are detailed with attention to the pipelining and flow of data and operations. Parallel algorithms and their optical realization for LU and QR matrix decomposition are specifically detailed. These represent the fundamental operations necessary in the implementation of least squares, eigenvalue, and SVD solutions. Specific applications (e.g., the solution of partial differential equations, adaptive noise cancellation, and optimal control) are described to typify the use of matrix processors in modern advanced signal processing.

  8. Appendix E: Parallel Pascal development system

    NASA Technical Reports Server (NTRS)

    1985-01-01

    The Parallel Pascal Development System enables Parallel Pascal programs to be developed and tested on a conventional computer. It consists of several system programs, including a Parallel Pascal to standard Pascal translator, and a library of Parallel Pascal subprograms. The library includes subprograms for using Parallel Pascal on a parallel system with a fixed degree of parallelism, such as the Massively Parallel Processor, to conveniently manipulate arrays which have dimensions than the hardware. Programs can be conveninetly tested with small sized arrays on the conventional computer before attempting to run on a parallel system.

  9. Performance evaluation of the JPL interim digital SAR processor

    NASA Technical Reports Server (NTRS)

    Wu, C.; Barkan, B.; Curlander, J.; Jin, M.; Pang, S.

    1983-01-01

    The performance of the Interim Digital SAR Processor (IDP) was evaluated. The IDP processor was originally developed for experimental processing of digital SEASAT SAR data. One phase of the system upgrade which features parallel processing in three peripheral array processors, automated estimation for Doppler parameters, and unsupervised image pixel location determination and registration was executed. The method to compensate for the target range curvature effect was improved. A four point interpolation scheme is implemented to replace the nearest neighbor scheme used in the original IDP. The processor still maintains its fast throughput speed. The current performance and capability of the processing modes now available on the IDP system are updated.

  10. Design and numerical evaluation of a volume coil array for parallel MR imaging at ultrahigh fields

    PubMed Central

    Pang, Yong; Wong, Ernest W.H.; Yu, Baiying

    2014-01-01

    In this work, we propose and investigate a volume coil array design method using different types of birdcage coils for MR imaging. Unlike the conventional radiofrequency (RF) coil arrays of which the array elements are surface coils, the proposed volume coil array consists of a set of independent volume coils including a conventional birdcage coil, a transverse birdcage coil, and a helix birdcage coil. The magnetic fluxes of these three birdcage coils are intrinsically cancelled, yielding a highly decoupled volume coil array. In contrast to conventional non-array type volume coils, the volume coil array would be beneficial in improving MR signal-to-noise ratio (SNR) and also gain the capability of implementing parallel imaging. The volume coil array is evaluated at the ultrahigh field of 7T using FDTD numerical simulations, and the g-factor map at different acceleration rates was also calculated to investigate its parallel imaging performance. PMID:24649435

  11. Massively parallel computation of lattice associative memory classifiers on multicore processors

    NASA Astrophysics Data System (ADS)

    Ritter, Gerhard X.; Schmalz, Mark S.; Hayden, Eric T.

    2011-09-01

    Over the past quarter century, concepts and theory derived from neural networks (NNs) have featured prominently in the literature of pattern recognition. Implementationally, classical NNs based on the linear inner product can present performance challenges due to the use of multiplication operations. In contrast, NNs having nonlinear kernels based on Lattice Associative Memories (LAM) theory tend to concentrate primarily on addition and maximum/minimum operations. More generally, the emergence of LAM-based NNs, with their superior information storage capacity, fast convergence and training due to relatively lower computational cost, as well as noise-tolerant classification has extended the capabilities of neural networks far beyond the limited applications potential of classical NNs. This paper explores theory and algorithmic approaches for the efficient computation of LAM-based neural networks, in particular lattice neural nets and dendritic lattice associative memories. Of particular interest are massively parallel architectures such as multicore CPUs and graphics processing units (GPUs). Originally developed for video gaming applications, GPUs hold the promise of high computational throughput without compromising numerical accuracy. Unfortunately, currently-available GPU architectures tend to have idiosyncratic memory hierarchies that can produce unacceptably high data movement latencies for relatively simple operations, unless careful design of theory and algorithms is employed. Advantageously, some GPUs (e.g., the Nvidia Fermi GPU) are optimized for efficient streaming computation (e.g., concurrent multiply and add operations). As a result, the linear or nonlinear inner product structures of NNs are inherently suited to multicore GPU computational capabilities. In this paper, the authors' recent research in lattice associative memories and their implementation on multicores is overviewed, with results that show utility for a wide variety of pattern

  12. Design and evaluation of fault-tolerant VLSI/WSI processor arrays. Final technical report, 1 July 1985-31 December 1987

    SciTech Connect

    Fortes, J.A.

    1987-12-31

    This document is the final report of work performed under the project entitled Design and Evaluation of Fault-Tolerant VLSI/WSI Processor Arrays supported by the Innovative Science and Technology Office of the Strategic Defense Initiative Organization and administered through the Office of Naval Research under Contract No. 00014-85-k-0588. With the concurrence of Dr. Clifford Lau, the Scientific Officer for this project, this final report consists of reprints of publications reporting work performed under the project. In the attached list of publications are papers where fault-tolerant systems for processor arrays are proposed and studied. Studies on algorithmic and software aspects relevant to the systems are also reported, as well as hardware and reconfigurability issues for fault-tolerant processor arrays.

  13. High-speed, automatic controller design considerations for integrating array processor, multi-microprocessor, and host computer system architectures

    NASA Technical Reports Server (NTRS)

    Jacklin, S. A.; Leyland, J. A.; Warmbrodt, W.

    1985-01-01

    Modern control systems must typically perform real-time identification and control, as well as coordinate a host of other activities related to user interaction, online graphics, and file management. This paper discusses five global design considerations which are useful to integrate array processor, multimicroprocessor, and host computer system architectures into versatile, high-speed controllers. Such controllers are capable of very high control throughput, and can maintain constant interaction with the nonreal-time or user environment. As an application example, the architecture of a high-speed, closed-loop controller used to actively control helicopter vibration is briefly discussed. Although this system has been designed for use as the controller for real-time rotorcraft dynamics and control studies in a wind tunnel environment, the controller architecture can generally be applied to a wide range of automatic control applications.

  14. A fast adaptive convex hull algorithm on two-dimensional processor arrays with a reconfigurable BUS system

    NASA Technical Reports Server (NTRS)

    Olariu, S.; Schwing, J.; Zhang, J.

    1991-01-01

    A bus system that can change dynamically to suit computational needs is referred to as reconfigurable. We present a fast adaptive convex hull algorithm on a two-dimensional processor array with a reconfigurable bus system (2-D PARBS, for short). Specifically, we show that computing the convex hull of a planar set of n points taken O(log n/log m) time on a 2-D PARBS of size mn x n with 3 less than or equal to m less than or equal to n. Our result implies that the convex hull of n points in the plane can be computed in O(1) time in a 2-D PARBS of size n(exp 1.5) x n.

  15. High-performance ultra-low power VLSI analog processor for data compression

    NASA Technical Reports Server (NTRS)

    Tawel, Raoul (Inventor)

    1996-01-01

    An apparatus for data compression employing a parallel analog processor. The apparatus includes an array of processor cells with N columns and M rows wherein the processor cells have an input device, memory device, and processor device. The input device is used for inputting a series of input vectors. Each input vector is simultaneously input into each column of the array of processor cells in a pre-determined sequential order. An input vector is made up of M components, ones of which are input into ones of M processor cells making up a column of the array. The memory device is used for providing ones of M components of a codebook vector to ones of the processor cells making up a column of the array. A different codebook vector is provided to each of the N columns of the array. The processor device is used for simultaneously comparing the components of each input vector to corresponding components of each codebook vector, and for outputting a signal representative of the closeness between the compared vector components. A combination device is used to combine the signal output from each processor cell in each column of the array and to output a combined signal. A closeness determination device is then used for determining which codebook vector is closest to an input vector from the combined signals, and for outputting a codebook vector index indicating which of the N codebook vectors was the closest to each input vector input into the array.

  16. Design of Robust Adaptive Array Processors for Non-Stationary Ocean Environments

    DTIC Science & Technology

    2009-02-20

    arrays deployed in the North Pacific . In 2007-2008 horizontal line array data from SWellEx-96, another ONR-sponsored experiment, was also analyzed...element shallow VLA deployed for SPICE04 was completed. Results of this analysis were presented at North Pacific Acoustic Laboratory (NPAL) Workshop...noise in the North Pacific , focusing primarily on whether an eigenanalysis of noise covariance matrices revealed anything about the underlying

  17. Field Programmable Gate Array Hysteresis Control of Parallel Connected Inverters

    DTIC Science & Technology

    2006-06-01

    voltage with respect to time FPGA Field Programmable Gate Array GIC Generalized Impedance Converter GTO Gate-Turn-Off Transistors HDL...C. Figure 12 SEMIKRON PEBB [After Ref 13] G. FIELD PROGRAMMABLE GATE ARRAYS An FPGA is a generic semiconductor device containing a large... generate reference voltage and current waves for each of the three phases. The time to complete one logical operation inside the FPGA is a function of how

  18. Electrostatic quadrupole array for focusing parallel beams of charged particles

    DOEpatents

    Brodowski, John

    1982-11-23

    An array of electrostatic quadrupoles, capable of providing strong electrostatic focusing simultaneously on multiple beams, is easily fabricated from a single array element comprising a support rod and multiple electrodes spaced at intervals along the rod. The rods are secured to four terminals which are isolated by only four insulators. This structure requires bias voltage to be supplied to only two terminals and eliminates the need for individual electrode bias and insulators, as well as increases life by eliminating beam plating of insulators.

  19. NEUSORT2.0: a multiple-channel neural signal processor with systolic array buffer and channel-interleaving processing schedule.

    PubMed

    Chen, Tung-Chien; Yang, Zhi; Liu, Wentai; Chen, Liang-Gee

    2008-01-01

    An emerging class of neuroprosthetic devices aims to provide aggressive performance by integrating more complicated signal processing hardware into the neural recording system with a large amount of electrodes. However, the traditional parallel structure duplicating one neural signal processor (NSP) multiple times for multiple channels takes a heavy burden on chip area. The serial structure sequentially switching the processing task between channels requires a bulky memory to store neural data and may has a long processing delay. In this paper, a memory hierarchy of systolic array buffer is proposed to support signal processing interleavingly channel by channel in cycle basis to match up with the data flow of the optimized multiple-channel frontend interface circuitry. The NSP can thus be tightly coupled to the analog frontend interface circuitry and perform signal processing for multiple channels in real time without any bulky memory. Based on our previous one-channel NSP of NEUSORT1.0 [1], the proposed memory hierarchy is realized on NEUSORT2.0 for a 16-channel neural recording system. Compared to 16 of NEUSORT1.0, NEUSORT2.0 demonstrates a 81.50% saving in terms of areaxpower factor.

  20. Maskless, parallel patterning with zone-plate array lithography

    SciTech Connect

    Carter, D. J. D.; Gil, Dario; Menon, Rajesh; Mondol, Mark K.; Smith, Henry I.; Anderson, Erik H.

    1999-11-01

    Zone-plate array lithography (ZPAL) is a maskless lithography scheme that uses an array of shuttered zone plates to print arbitrary patterns on a substrate. An experimental ultraviolet ZPAL system has been constructed and used to simultaneously expose nine different patterns with a 3x3 array of zone plates in a quasidot-matrix fashion. We present exposed patterns, describe the system design and construction, and discuss issues essential to a functional ZPAL system. We also discuss another ZPAL system which operates with 4.5 nm x radiation from a point source. We present simulations which show that, with our existing x-ray zone plates and this system, we should be able to achieve 55 nm resolution. (c) 1999 American Vacuum Society.

  1. Parallel array of independent thermostats for column separations

    DOEpatents

    Foret, Frantisek; Karger, Barry L.

    2005-08-16

    A thermostat array including an array of two or more capillary columns (10) or two or more channels in a microfabricated device is disclosed. A heat conductive material (12) surrounded each individual column or channel in array, each individual column or channel being thermally insulated from every other individual column or channel. One or more independently controlled heating or cooling elements (14) is positioned adjacent to individual columns or channels within the heat conductive material, each heating or cooling element being connected to a source of heating or cooling, and one or more independently controlled temperature sensing elements (16) is positioned adjacent to the individual columns or channels within the heat conductive material. Each temperature sensing element is connected to a temperature controller.

  2. A frequency and sensitivity tunable microresonator array for high-speed quantum processor readout

    SciTech Connect

    Whittaker, J. D. Swenson, L. J.; Volkmann, M. H.; Spear, P.; Altomare, F.; Berkley, A. J.; Bunyk, P.; Harris, R.; Hilton, J. P.; Hoskinson, E.; Johnson, M. W.; Ladizinsky, E.; Lanting, T.; Oh, T.; Perminov, I.; Tolkacheva, E.; Yao, J.; Bumble, B.; Day, P. K.; Eom, B. H.; and others

    2016-01-07

    Superconducting microresonators have been successfully utilized as detection elements for a wide variety of applications. With multiplexing factors exceeding 1000 detectors per transmission line, they are the most scalable low-temperature detector technology demonstrated to date. For high-throughput applications, fewer detectors can be coupled to a single wire but utilize a larger per-detector bandwidth. For all existing designs, fluctuations in fabrication tolerances result in a non-uniform shift in resonance frequency and sensitivity, which ultimately limits the efficiency of bandwidth utilization. Here, we present the design, implementation, and initial characterization of a superconducting microresonator readout integrating two tunable inductances per detector. We demonstrate that these tuning elements provide independent control of both the detector frequency and sensitivity, allowing us to maximize the transmission line bandwidth utilization. Finally, we discuss the integration of these detectors in a multilayer fabrication stack for high-speed readout of the D-Wave quantum processor, highlighting the use of control and routing circuitry composed of single-flux-quantum loops to minimize the number of control wires at the lowest temperature stage.

  3. "Multipoint Force Feedback" Leveling of Massively Parallel Tip Arrays in Scanning Probe Lithography.

    PubMed

    Noh, Hanaul; Jung, Goo-Eun; Kim, Sukhyun; Yun, Seong-Hun; Jo, Ahjin; Kahng, Se-Jong; Cho, Nam-Joon; Cho, Sang-Joon

    2015-09-16

    Nanoscale patterning with massively parallel 2D array tips is of significant interest in scanning probe lithography. A challenging task for tip-based large area nanolithography is maintaining parallel tip arrays at the same contact point with a sample substrate in order to pattern a uniform array. Here, polymer pen lithography is demonstrated with a novel leveling method to account for the magnitude and direction of the total applied force of tip arrays by a multipoint force sensing structure integrated into the tip holder. This high-precision approach results in a 0.001° slope of feature edge length variation over 1 cm wide tip arrays. The position sensitive leveling operates in a fully automated manner and is applicable to recently developed scanning probe lithography techniques of various kinds which can enable "desktop nanofabrication."

  4. Beam Space Formulation of the Maximum Signal-to-Noise Ratio Array Processor.

    DTIC Science & Technology

    1980-12-01

    To investigate the dependance of the beam space gains on the number of input Sbeams used the crosspower spectral matrix was simulated for a number of...environments; in the first example (figure 9) the noise field exhibited only a weak azimuthal dependance whereas in figure 10 the presence of a strong...interference at 06-1 implied a strong azimuthal dependance of tile noise field. Both result, showed an improvement in the beamspace array gain estimates as the

  5. Achieving supercomputer performance for neural net simulation with an array of digital signal processors

    SciTech Connect

    Muller, U.A.; Baumle, B.; Kohler, P.; Gunzinger, A.; Guggenbuhl, W.

    1992-10-01

    Music, a DSP-based system with a parallel distributed-memory architecture, provides enormous computing power yet retains the flexibility of a general-purpose computer. Reaching a peak performance of 2.7 Gflops at a significantly lower cost, power consumption, and space requirement than conventional supercomputers, Music is well suited to computationally intensive applications such as neural network simulation. 12 refs., 9 figs., 2 tabs.

  6. Frequency and sensitivity tunable microresonator array for high-speed quantum processor readout

    NASA Astrophysics Data System (ADS)

    Hoskinson, Emile; Whittaker, J. D.; Swenson, L. J.; Volkmann, M. H.; Spear, P.; Altomare, F.; Berkley, A. J.; Bumble, B.; Bunyk, P.; Day, P. K.; Eom, B. H.; Harris, R.; Hilton, J. P.; Johnson, M. W.; Kleinsasser, A.; Ladizinsky, E.; Lanting, T.; Oh, T.; Perminov, I.; Tolkacheva, E.; Yao, J.

    Frequency multiplexed arrays of superconducting microresonators have been used as detectors in a variety of applications. The degree of multiplexing achievable is limited by fabrication variation causing non-uniform shifts in resonator frequencies. We have designed, implemented and characterized a superconducting microresonator readout that incorporates two tunable inductances per detector, allowing independent control of each detector frequency and sensitivity. The tunable inductances are adjusted using on-chip programmable digital-to-analog flux converters, which are programmed with a scalable addressing scheme that requires few external lines.

  7. Sequence information signal processor

    DOEpatents

    Peterson, John C.; Chow, Edward T.; Waterman, Michael S.; Hunkapillar, Timothy J.

    1999-01-01

    An electronic circuit is used to compare two sequences, such as genetic sequences, to determine which alignment of the sequences produces the greatest similarity. The circuit includes a linear array of series-connected processors, each of which stores a single element from one of the sequences and compares that element with each successive element in the other sequence. For each comparison, the processor generates a scoring parameter that indicates which segment ending at those two elements produces the greatest degree of similarity between the sequences. The processor uses the scoring parameter to generate a similar scoring parameter for a comparison between the stored element and the next successive element from the other sequence. The processor also delivers the scoring parameter to the next processor in the array for use in generating a similar scoring parameter for another pair of elements. The electronic circuit determines which processor and alignment of the sequences produce the scoring parameter with the highest value.

  8. A prototypic system of parallel electrophoresis in multiple capillaries coupled with microwell arrays.

    PubMed

    Su, Jing; Ren, Kangning; Dai, Wen; Zhao, Yihua; Zhou, Jianhua; Wu, Hongkai

    2011-11-01

    We present a microfluidic system that can be directly coupled with microwell array and perform parallel electrophoresis in multiple capillaries simultaneously. The system is based on an array of glass capillaries, fixed in a polydimethylsiloxane (PDMS) microfluidic scaffold, with one end open for interfacing with microwells. In this capillary array, every two adjacent capillaries act as a pair to be coupled with one microwell; samples in the microwells are introduced and separated by simply applying voltage between two electrodes that are placed at the other ends of capillaries; thus no complicated circuit design is required. We evaluate the performance of this system and perform multiple CE with direct sample introduction from microwell array. Also with this system, we demonstrate the analysis of cellular contents of cells lysed in a microwell array. Our results show that this prototypic system is a promising platform for high-throughput analysis of samples in microwell arrays.

  9. Using a Cray Y-MP as an array processor for a RISC Workstation

    NASA Technical Reports Server (NTRS)

    Lamaster, Hugh; Rogallo, Sarah J.

    1992-01-01

    As microprocessors increase in power, the economics of centralized computing has changed dramatically. At the beginning of the 1980's, mainframes and super computers were often considered to be cost-effective machines for scalar computing. Today, microprocessor-based RISC (reduced-instruction-set computer) systems have displaced many uses of mainframes and supercomputers. Supercomputers are still cost competitive when processing jobs that require both large memory size and high memory bandwidth. One such application is array processing. Certain numerical operations are appropriate to use in a Remote Procedure Call (RPC)-based environment. Matrix multiplication is an example of an operation that can have a sufficient number of arithmetic operations to amortize the cost of an RPC call. An experiment which demonstrates that matrix multiplication can be executed remotely on a large system to speed the execution over that experienced on a workstation is described.

  10. Development of a ground signal processor for digital synthetic array radar data

    NASA Technical Reports Server (NTRS)

    Griffin, C. R.; Estes, J. M.

    1981-01-01

    A modified APQ-102 sidelooking array radar (SLAR) in a B-57 aircraft test bed is used, with other optical and infrared sensors, in remote sensing of Earth surface features for various users at NASA Johnson Space Center. The video from the radar is normally recorded on photographic film and subsequently processed photographically into high resolution radar images. Using a high speed sampling (digitizing) system, the two receiver channels of cross-and co-polarized video are recorded on wideband magnetic tape along with radar and platform parameters. These data are subsequently reformatted and processed into digital synthetic aperture radar images with the image data available on magnetic tape for subsequent analysis by investigators. The system design and results obtained are described.

  11. Breast ultrasound tomography with two parallel transducer arrays: preliminary clinical results

    NASA Astrophysics Data System (ADS)

    Huang, Lianjie; Shin, Junseob; Chen, Ting; Lin, Youzuo; Intrator, Miranda; Hanson, Kenneth; Epstein, Katherine; Sandoval, Daniel; Williamson, Michael

    2015-03-01

    Ultrasound tomography has great potential to provide quantitative estimations of physical properties of breast tumors for accurate characterization of breast cancer. We design and manufacture a new synthetic-aperture breast ultrasound tomography system with two parallel transducer arrays. The distance of these two transducer arrays is adjustable for scanning breasts with different sizes. The ultrasound transducer arrays are translated vertically to scan the entire breast slice by slice and acquires ultrasound transmission and reflection data for whole-breast ultrasound imaging and tomographic reconstructions. We use the system to acquire patient data at the University of New Mexico Hospital for clinical studies. We present some preliminary imaging results of in vivo patient ultrasound data. Our preliminary clinical imaging results show promising of our breast ultrasound tomography system with two parallel transducer arrays for breast cancer imaging and characterization.

  12. Series-Parallel Superconducting Quantum Interference Device Arrays Using High-TC Ion Damage Junctions

    NASA Astrophysics Data System (ADS)

    Wong, Travis; Mukhanov, Oleg

    2015-03-01

    We have fabricated several designs of three junction series-parallel DC Superconducting Quantum Interference Device (BiSQUID) arrays in YBa2Cu3O7-x using 104 ion damage Josephson Junctions on a single 1 cm2 chip. A high aspect ratio ion implantation mask (30:1 ratio) with 30 nm slits was fabricated using electron beam lithography and low pressure reactive ion etching. Samples were irradiated with 60 keV helium ions to achieve a highly uniform damaged region throughout the thickness of the YBCO thin film as confirmed with Monte Carlo ion implantation simulations. Low frequency measurements of four different BiSQUID series-parallel SQUID array devices will be presented to investigate the effect of the BiSQUID design parameters on the linearity of the SQUID array in response to magnetic fields. BiSQUID arrays could provide a promising architecture for improved linearity transimpedance amplifiers with high linearity.

  13. Transmissive Nanohole Arrays for Massively-Parallel Optical Biosensing

    PubMed Central

    2015-01-01

    A high-throughput optical biosensing technique is proposed and demonstrated. This hybrid technique combines optical transmission of nanoholes with colorimetric silver staining. The size and spacing of the nanoholes are chosen so that individual nanoholes can be independently resolved in massive parallel using an ordinary transmission optical microscope, and, in place of determining a spectral shift, the brightness of each nanohole is recorded to greatly simplify the readout. Each nanohole then acts as an independent sensor, and the blocking of nanohole optical transmission by enzymatic silver staining defines the specific detection of a biological agent. Nearly 10000 nanoholes can be simultaneously monitored under the field of view of a typical microscope. As an initial proof of concept, biotinylated lysozyme (biotin-HEL) was used as a model analyte, giving a detection limit as low as 0.1 ng/mL. PMID:25530982

  14. Coding for parallel execution of hardware-in-the-loop millimeter-wave scene generation models on multicore SIMD processor architectures

    NASA Astrophysics Data System (ADS)

    Olson, Richard F.

    2013-05-01

    Rendering of point scatterer based radar scenes for millimeter wave (mmW) seeker tests in real-time hardware-in-the-loop (HWIL) scene generation requires efficient algorithms and vector-friendly computer architectures for complex signal synthesis. New processor technology from Intel implements an extended 256-bit vector SIMD instruction set (AVX, AVX2) in a multi-core CPU design providing peak execution rates of hundreds of GigaFLOPS (GFLOPS) on one chip. Real world mmW scene generation code can approach peak SIMD execution rates only after careful algorithm and source code design. An effective software design will maintain high computing intensity emphasizing register-to-register SIMD arithmetic operations over data movement between CPU caches or off-chip memories. Engineers at the U.S. Army Aviation and Missile Research, Development and Engineering Center (AMRDEC) applied two basic parallel coding methods to assess new 256-bit SIMD multi-core architectures for mmW scene generation in HWIL. These include use of POSIX threads built on vector library functions and more portable, highlevel parallel code based on compiler technology (e.g. OpenMP pragmas and SIMD autovectorization). Since CPU technology is rapidly advancing toward high processor core counts and TeraFLOPS peak SIMD execution rates, it is imperative that coding methods be identified which produce efficient and maintainable parallel code. This paper describes the algorithms used in point scatterer target model rendering, the parallelization of those algorithms, and the execution performance achieved on an AVX multi-core machine using the two basic parallel coding methods. The paper concludes with estimates for scale-up performance on upcoming multi-core technology.

  15. Mitigation of cache memory using an embedded hard-core PPC440 processor in a Virtex-5 Field Programmable Gate Array.

    SciTech Connect

    Learn, Mark Walter

    2010-02-01

    Sandia National Laboratories is currently developing new processing and data communication architectures for use in future satellite payloads. These architectures will leverage the flexibility and performance of state-of-the-art static-random-access-memory-based Field Programmable Gate Arrays (FPGAs). One such FPGA is the radiation-hardened version of the Virtex-5 being developed by Xilinx. However, not all features of this FPGA are being radiation-hardened by design and could still be susceptible to on-orbit upsets. One such feature is the embedded hard-core PPC440 processor. Since this processor is implemented in the FPGA as a hard-core, traditional mitigation approaches such as Triple Modular Redundancy (TMR) are not available to improve the processor's on-orbit reliability. The goal of this work is to investigate techniques that can help mitigate the embedded hard-core PPC440 processor within the Virtex-5 FPGA other than TMR. Implementing various mitigation schemes reliably within the PPC440 offers a powerful reconfigurable computing resource to these node-based processing architectures. This document summarizes the work done on the cache mitigation scheme for the embedded hard-core PPC440 processor within the Virtex-5 FPGAs, and describes in detail the design of the cache mitigation scheme and the testing conducted at the radiation effects facility on the Texas A&M campus.

  16. Closely-spaced double-row microstrip RF arrays for parallel MR imaging at ultrahigh fields

    PubMed Central

    Yan, Xinqiang; Xue, Rong; Zhang, Xiaoliang

    2015-01-01

    Radiofrequency (RF) coil arrays with high count of elements, e.g., closely-spaced multi-row arrays, exhibit superior parallel imaging performance in MRI. However, it is technically challenging and time-consuming to build multi-row arrays due to complex coupling issues. This paper presents a novel and simple method for closely-spaced multi-row RF array designs. Induced current elimination (ICE) decoupling method has shown the capability of reducing coupling between microstrip elements from different rows. In this study, its capability for decoupling array elements from the same row was investigated and validated by bench tests, with an isolation improvement from −8.9 dB to −20.7 dB. Based on this feature, a closely-spaced double-row microstrip array with 16 elements was built at 7T. S21 between any two elements of the 16-channel closely-spaced was better than −14 dB. In addition, its feasibility and performance was validated by MRI experiments. No significant image reconstruction- related noise amplifications were observed for parallel imaging even when reduced factor (R) achieves 4. The experimental results demonstrated that the proposed design might be a simple and efficient approach in fabricating closely-spaced multi-row RF arrays. PMID:26508810

  17. Closely-spaced double-row microstrip RF arrays for parallel MR imaging at ultrahigh fields.

    PubMed

    Yan, Xinqiang; Xue, Rong; Zhang, Xiaoliang

    2015-11-01

    Radiofrequency (RF) coil arrays with high count of elements, e.g., closely-spaced multi-row arrays, exhibit superior parallel imaging performance in MRI. However, it is technically challenging and time-consuming to build multi-row arrays due to complex coupling issues. This paper presents a novel and simple method for closely-spaced multi-row RF array designs. Induced current elimination (ICE) decoupling method has shown the capability of reducing coupling between microstrip elements from different rows. In this study, its capability for decoupling array elements from the same row was investigated and validated by bench tests, with an isolation improvement from -8.9 dB to -20.7 dB. Based on this feature, a closely-spaced double-row microstrip array with 16 elements was built at 7T. S21 between any two elements of the 16-channel closely-spaced was better than -14 dB. In addition, its feasibility and performance was validated by MRI experiments. No significant image reconstruction- related noise amplifications were observed for parallel imaging even when reduced factor (R) achieves 4. The experimental results demonstrated that the proposed design might be a simple and efficient approach in fabricating closely-spaced multi-row RF arrays.

  18. Multicoil resonance-based parallel array for smart wireless power delivery.

    PubMed

    Mirbozorgi, S A; Sawan, M; Gosselin, B

    2013-01-01

    This paper presents a novel resonance-based multicoil structure as a smart power surface to wirelessly power up apparatus like mobile, animal headstage, implanted devices, etc. The proposed powering system is based on a 4-coil resonance-based inductive link, the resonance coil of which is formed by an array of several paralleled coils as a smart power transmitter. The power transmitter employs simple circuit connections and includes only one power driver circuit per multicoil resonance-based array, which enables higher power transfer efficiency and power delivery to the load. The power transmitted by the driver circuit is proportional to the load seen by the individual coil in the array. Thus, the transmitted power scales with respect to the load of the electric/electronic system to power up, and does not divide equally over every parallel coils that form the array. Instead, only the loaded coils of the parallel array transmit significant part of total transmitted power to the receiver. Such adaptive behavior enables superior power, size and cost efficiency then other solutions since it does not need to use complex detection circuitry to find the location of the load. The performance of the proposed structure is verified by measurement results. Natural load detection and covering 4 times bigger area than conventional topologies with a power transfer efficiency of 55% are the novelties of presented paper.

  19. A parallel-series-fed microstrip array with high efficiency and low cross-polarization

    NASA Technical Reports Server (NTRS)

    Huang, John

    1992-01-01

    The requirements of a microstrip array with a vertically polarized fan beam are addressed that correspond to its use in C-band interferometric SAR. A combination of parallel- and series-feed techniques are utilized in an array design with a three-stage parallel-fed configuration to enhance bandwidth performance. The linearly polarized traveling-wave microstrip array antenna is fed by microstrip transmission lines in two rows of 36 elements that resonate at 5.30 GHz. The transmission lines are impedance-matched at every junction for all the waves that travel toward the two ends of the array. The two measured principal-plane patterns are shown, and the measured narrow-beam pattern is found to agree with the calculated values. The VSWR bandwidths and narrow and broad beamwidths of the antenna are found to permit efficient performance. The efficiency is attributed to the parallel and series-feed configuration which allows proper impedance matching, and low cross-polarization is a result of the antiphase feed technique employed in the configuration.

  20. dc properties of series-parallel arrays of Josephson junctions in an external magnetic field

    SciTech Connect

    Lewandowski, S.J. )

    1991-04-01

    A detailed dc theory of superconducting multijunction interferometers has previously been developed by several authors for the case of parallel junction arrays. The theory is now extended to cover the case of a loop containing several junctions connected in series. The problem is closely associated with high-{ital T}{sub {ital c}} superconductors and their clusters of intrinsic Josephson junctions. These materials exhibit spontaneous interferometric effects, and there is no reason to assume that the intrinsic junctions form only parallel arrays. A simple formalism of phase states is developed in order to express the superconducting phase differences across the junctions forming a series array as functions of the phase difference across the weakest junction of the system, and to relate the differences in critical currents of the junctions to gaps in the allowed ranges of their phase functions. This formalism is used to investigate the energy states of the array, which in the case of different junctions are split and separated by energy barriers of height depending on the phase gaps. Modifications of the washboard model of a single junction are shown. Next a superconducting inductive loop containing a series array of two junctions is considered, and this model is used to demonstrate the transitions between phase states and the associated instabilities. Finally, the critical current of a parallel connection of two series arrays is analyzed and shown to be a multivalued function of the externally applied magnetic flux. The instabilities caused by the presence of intrinsic serial junctions in granular high-{ital T}{sub {ital c}} materials are pointed out as a potential source of additional noise.

  1. Automatic Parallelization of Numerical Python Applications using the Global Arrays Toolkit

    SciTech Connect

    Daily, Jeffrey A.; Lewis, Robert R.

    2011-11-30

    Global Arrays is a software system from Pacific Northwest National Laboratory that enables an efficient, portable, and parallel shared-memory programming interface to manipulate distributed dense arrays. The NumPy module is the de facto standard for numerical calculation in the Python programming language, a language whose use is growing rapidly in the scientific and engineering communities. NumPy provides a powerful N-dimensional array class as well as other scientific computing capabilities. However, like the majority of the core Python modules, NumPy is inherently serial. Using a combination of Global Arrays and NumPy, we have reimplemented NumPy as a distributed drop-in replacement called Global Arrays in NumPy (GAiN). Serial NumPy applications can become parallel, scalable GAiN applications with only minor source code changes. Scalability studies of several different GAiN applications will be presented showing the utility of developing serial NumPy codes which can later run on more capable clusters or supercomputers.

  2. Performance of the UCAN2 Gyrokinetic Particle In Cell (PIC) Code on Two Massively Parallel Mainframes with Intel ``Sandy Bridge'' Processors

    NASA Astrophysics Data System (ADS)

    Leboeuf, Jean-Noel; Decyk, Viktor; Newman, David; Sanchez, Raul

    2013-10-01

    The massively parallel, 2D domain-decomposed, nonlinear, 3D, toroidal, electrostatic, gyrokinetic, Particle in Cell (PIC), Cartesian geometry UCAN2 code, with particle ions and adiabatic electrons, has been ported to two emerging mainframes. These two computers, one at NERSC in the US built by Cray named Edison and the other at the Barcelona Supercomputer Center (BSC) in Spain built by IBM named MareNostrum III (MNIII) just happen to share the same Intel ``Sandy Bridge'' processors. The successful port of UCAN2 to MNIII which came online first has enabled us to be up and running efficiently in record time on Edison. Overall, the performance of UCAN2 on Edison is superior to that on MNIII, particularly at large numbers of processors (>1024) for the same Intel IFORT compiler. This appears to be due to different MPI modules (OpenMPI on MNIII and MPICH2 on Edison) and different interconnection networks (Infiniband on MNIII and Cray's Aries on Edison) on the two mainframes. Details of these ports and comparative benchmarks are presented. Work supported by OFES, USDOE, under contract no. DE-FG02-04ER54741 with the University of Alaska at Fairbanks.

  3. Parallel force measurement with a polymeric microbeam array using an optical microscope and micromanipulator.

    PubMed

    Sasoglu, F Mert; Bohl, Andrew J; Allen, Kathleen B; Layton, Bradley E

    2009-01-01

    An image analysis method and its validation are presented for tracking the displacements of parallel mechanical force sensors. Force is measured using a combination of beam theory, optical microscopy, and image analysis. The primary instrument is a calibrated polymeric microbeam array mounted on a micromanipulator with the intended purpose of measuring traction forces on cell cultures or cell arrays. One application is the testing of hypotheses involving cellular mechanotransduction mechanisms. An Otsu-based image analysis code calculates displacement and force on cellular or other soft structures by using edge detection and image subtraction on digitally captured optical microscopy images. Forces as small as 250+/-50 nN and as great as 25+/-2.5 microN may be applied and measured upon as few as one or as many as hundreds of structures in parallel. A validation of the method is provided by comparing results from a rigid glass surface and a compliant polymeric surface.

  4. An Energy-Efficient and Scalable Deep Learning/Inference Processor With Tetra-Parallel MIMD Architecture for Big Data Applications.

    PubMed

    Park, Seong-Wook; Park, Junyoung; Bong, Kyeongryeol; Shin, Dongjoo; Lee, Jinmook; Choi, Sungpill; Yoo, Hoi-Jun

    2015-12-01

    Deep Learning algorithm is widely used for various pattern recognition applications such as text recognition, object recognition and action recognition because of its best-in-class recognition accuracy compared to hand-crafted algorithm and shallow learning based algorithms. Long learning time caused by its complex structure, however, limits its usage only in high-cost servers or many-core GPU platforms so far. On the other hand, the demand on customized pattern recognition within personal devices will grow gradually as more deep learning applications will be developed. This paper presents a SoC implementation to enable deep learning applications to run with low cost platforms such as mobile or portable devices. Different from conventional works which have adopted massively-parallel architecture, this work adopts task-flexible architecture and exploits multiple parallelism to cover complex functions of convolutional deep belief network which is one of popular deep learning/inference algorithms. In this paper, we implement the most energy-efficient deep learning and inference processor for wearable system. The implemented 2.5 mm × 4.0 mm deep learning/inference processor is fabricated using 65 nm 8-metal CMOS technology for a battery-powered platform with real-time deep inference and deep learning operation. It consumes 185 mW average power, and 213.1 mW peak power at 200 MHz operating frequency and 1.2 V supply voltage. It achieves 411.3 GOPS peak performance and 1.93 TOPS/W energy efficiency, which is 2.07× higher than the state-of-the-art.

  5. Excitation of a Parallel Plate Waveguide by an Array of Rectangular Waveguides

    NASA Technical Reports Server (NTRS)

    Rengarajan, Sembiam

    2011-01-01

    This work addresses the problem of excitation of a parallel plate waveguide by an array of rectangular waveguides that arises in applications such as the continuous transverse stub (CTS) antenna and dual-polarized parabolic cylindrical reflector antennas excited by a scanning line source. In order to design the junction region between the parallel plate waveguide and the linear array of rectangular waveguides, waveguide sizes have to be chosen so that the input match is adequate for the range of scan angles for both polarizations. Electromagnetic wave scattered by the junction of a parallel plate waveguide by an array of rectangular waveguides is analyzed by formulating coupled integral equations for the aperture electric field at the junction. The integral equations are solved by the method of moments. In order to make the computational process efficient and accurate, the method of weighted averaging was used to evaluate rapidly oscillating integrals encountered in the moment matrix. In addition, the real axis spectral integral is evaluated in a deformed contour for speed and accuracy. The MoM results for a large finite array have been validated by comparing its reflection coefficients with corresponding results for an infinite array generated by the commercial finite element code, HFSS. Once the aperture electric field is determined by MoM, the input reflection coefficients at each waveguide port, and coupling for each polarization over the range of useful scan angles, are easily obtained. Results for the input impedance and coupling characteristics for both the vertical and horizontal polarizations are presented over a range of scan angles. It is shown that the scan range is limited to about 35 for both polarizations and therefore the optimum waveguide is a square of size equal to about 0.62 free space wavelength.

  6. Experimental Results for a Photonic Time Reversal Processor for Adaptive Control of an Ultra Wideband Phased Array Antenna

    DTIC Science & Technology

    2008-03-01

    Radar , Boston: Artech House, 1994. 2. H. Zmuda, “ Optical Beamforming for Phased Array Antennas,” Chapter 19, R...Beamforming, Phased Array Antennas, Time Reversal, Ultra Wideband Radar 1 INTRODUCTION 1.1 Photonic Processing for Microwave Phased Array ...Architecture for Broadband Adaptive Nulling with Linear and Conformal Phased Array Antennas”, Fiber and Integrated Optics , vol. 19, no. 2, March 2000, pp.

  7. Simulation verification of SNR and parallel imaging improvements by ICE-decoupled loop array in MRI.

    PubMed

    Yan, Xinqiang; Cao, Zhipeng; Zhang, Xiaoliang

    2016-04-01

    Transmit/receive L/C loop arrays with the induced current elimination (ICE) or magnetic wall decoupling method has shown high signal-to-noise (SNR) and excellent parallel imaging ability for MR imaging at ultrahigh fields, e.g., 7 T. In this study, we aim to numerically analyze the performance of an eight-channel ICE-decoupled loop array at 7 T. Three dimensional (3-D) electromagnetic (EM) and radiofrequency (RF) circuit co-simulation approach was employed. The values of all capacitors were obtained by optimizing the S-parameters of all coil elements. The EM simulation accurately modeled the coil structure, the phantom and the excitation. All coil elements were well matched to 50 ohm and the isolation between any two coil elements was better -15 dB. The simulated S parameters were exactly similar with the experimental results, indicating the simulation results were reliable. Compared with the conventional capacitively decoupled array, the ICE-decoupled array had higher sensitivity at the peripheral areas of the image subjects due to the shielding effect of the decoupling loops. The increased receive sensitivity resulted in an improvement of signal intensity and SNR for the ICE-decoupled array.

  8. Optimal expression evaluation for data parallel architectures

    NASA Technical Reports Server (NTRS)

    Gilbert, John R.; Schreiber, Robert

    1991-01-01

    A data parallel machine represents an array or other composits data structure by allocating one processor per data item. A pointwise operation can be performed between two such arrays in unit time, provided their corresponding elements are allocated in the same processors. If the arrays are not aligned in this fashion, the cost of moving one or both of them is part of the cost of operation. The choice of where to perform the operation then affects this cost. If an expression with several operands is to be evaluated, there may be many choices of where to perform the intermediate operations. An efficient algorithm is given to find the minimum cost way to evaluate an expression, for several different data parallel architectures. The algorithm applies to any architecture in which the metric describing the cost of moving an array has a property called robustness. This encompasses most of the common data parallel communication architectures, including meshes of arbitrary dimension and hypercubes.

  9. Optimal expression evaluation for data parallel architectures

    NASA Technical Reports Server (NTRS)

    Gilbert, J. R.; Schreiber, R.

    1990-01-01

    A data parallel machine represents an array or other composite data structure by allocating one processor per data item. A pointwise operation can be performed between two such arrays in unit time, provided their corresponding elements are allocated in the same processors. If the arrays are not aligned in this fashion, the cost of moving one or both of them is part of the cost of operation. The choice of where to perform the operation then affects this cost. If an expression with several operands is to be evaluated, there may be many choices of where to perform the intermediate operations. An efficient algorithm is given to find the minimum cost way to evaluate an expression, for several different data parallel architectures. The algorithm applies to any architecture in which the metric describing the cost of moving an array has a property called robustness. This encompasses most of the common data parallel communication architectures, including meshes of arbitrary dimension and hypercubes.

  10. High-performance SPAD array detectors for parallel photon timing applications

    NASA Astrophysics Data System (ADS)

    Rech, I.; Cuccato, A.; Antonioli, S.; Cammi, C.; Gulinatti, A.; Ghioni, M.

    2012-02-01

    Over the past few years there has been a growing interest in monolithic arrays of single photon avalanche diodes (SPAD) for spatially resolved detection of faint ultrafast optical signals. SPADs implemented in planar technologies offer the typical advantages of microelectronic devices (small size, ruggedness, low voltage, low power, etc.). Furthermore, they have inherently higher photon detection efficiency than PMTs and are able to provide, beside sensitivities down to single-photons, very high acquisition speeds. In order to make SPAD array more and more competitive in time-resolved application it is necessary to face problems like electrical crosstalk between adjacent pixel, moreover all the singlephoton timing electronics with picosecond resolution has to be developed. In this paper we present a new instrument suitable for single-photon imaging applications and made up of 32 timeresolved parallel channels. The 32x1 pixel array that includes SPAD detectors represents the system core, and an embedded data elaboration unit performs on-board data processing for single-photon counting applications. Photontiming information is exported through a custom parallel cable that can be connected to an external multichannel TCSPC system.

  11. A Full Parallel Event Driven Readout Technique for Area Array SPAD FLIM Image Sensors

    PubMed Central

    Nie, Kaiming; Wang, Xinlei; Qiao, Jun; Xu, Jiangtao

    2016-01-01

    This paper presents a full parallel event driven readout method which is implemented in an area array single-photon avalanche diode (SPAD) image sensor for high-speed fluorescence lifetime imaging microscopy (FLIM). The sensor only records and reads out effective time and position information by adopting full parallel event driven readout method, aiming at reducing the amount of data. The image sensor includes four 8 × 8 pixel arrays. In each array, four time-to-digital converters (TDCs) are used to quantize the time of photons’ arrival, and two address record modules are used to record the column and row information. In this work, Monte Carlo simulations were performed in Matlab in terms of the pile-up effect induced by the readout method. The sensor’s resolution is 16 × 16. The time resolution of TDCs is 97.6 ps and the quantization range is 100 ns. The readout frame rate is 10 Mfps, and the maximum imaging frame rate is 100 fps. The chip’s output bandwidth is 720 MHz with an average power of 15 mW. The lifetime resolvability range is 5–20 ns, and the average error of estimated fluorescence lifetimes is below 1% by employing CMM to estimate lifetimes. PMID:26828490

  12. Numerical Study of a Crossed Loop Coil Array for Parallel Magnetic Resonance Imaging

    SciTech Connect

    Hernandez, J.; Solis, S. E.; Rodriguez, A. O.

    2008-08-11

    A coil design has been recently proposed by Temnikov (Instrum Exp Tech. 2005;48;636-637), with higher experimental signal-to-noise ratio than that of the birdcage coil. It is also claimed that it is possible to individually tune it with a single chip capacitor. This coil design shows a great resemble to the gradiometer coil. These results motivated us to numerically simulate a three-coil array for parallel magnetic resonance imaging and in vivo magnetic resonance spectroscopy with multi nuclear capability. The magnetic field was numerical simulated by solving Maxwell's equations with the finite element method. Uniformity profiles were calculated at the midsection for one single coil and showed a good agreement with the experimental data. Then, two more coils were added to form two different coil arrays: coil elements were equally distributed by an angle of a 30 deg. angle. Then, uniformity profiles were calculated again for all cases at the midsection. Despite the strong interaction among all coil elements, very good field uniformity can be achieved. These numerical results indicate that this coil array may be a good choice for magnetic resonance imaging parallel imaging.

  13. Parallel and series FED microstrip array with high efficiency and low cross polarization

    NASA Technical Reports Server (NTRS)

    Huang, John (Inventor)

    1995-01-01

    A microstrip array antenna for vertically polarized fan beam (approximately 2 deg x 50 deg) for C-band SAR applications with a physical area of 1.7 m by 0.17 m comprises two rows of patch elements and employs a parallel feed to left- and right-half sections of the rows. Each section is divided into two segments that are fed in parallel with the elements in each segment fed in series through matched transmission lines for high efficiency. The inboard section has half the number of patch elements of the outboard section, and the outboard sections, which have tapered distribution with identical transmission line sections, terminated with half wavelength long open-circuit stubs so that the remaining energy is reflected and radiated in phase. The elements of the two inboard segments of the two left- and right-half sections are provided with tapered transmission lines from element to element for uniform power distribution over the central third of the entire array antenna. The two rows of array elements are excited at opposite patch feed locations with opposite (180 deg difference) phases for reduced cross-polarization.

  14. N-Body Classical Systems and Neural Networks on a 3d SIMD Massive Parallel Processor:. APE100/QUADRICS

    NASA Astrophysics Data System (ADS)

    Paolucci, P. S.

    A number of physical systems (e.g., N body Newtonian, Coulombian or Lennard-Jones systems) can be described by N2 interaction terms. Completely connected neural networks are characterised by the same kind of connections: Each neuron sends signals to all the other neurons via synapses. The APE100/Quadricsmassive parallel architecture, with processing power in excess of 100 Gigaflops and a central memory of 8 Gigabytes seems to have processing power and memory adequate to simulate systems formed by more than 1 billion synapses or interaction terms. On the other hand the processing nodes of APE100/Quadrics are organised in a tridimensional cubic lattice; each processing node has a direct communication path only toward the first neighboring nodes. Here we describe a convenient way to map systems with global connectivity onto the first-neighbors connectivity of the APE100/Quadrics architecture. Some numeric criteria, which are useful for matching SIMD tridimensional architectures with globally connected simulations, are introduced.

  15. Multi-focus parallel detection of fluorescent molecules at picomolar concentration with photonic nanojets arrays

    NASA Astrophysics Data System (ADS)

    Ghenuche, Petru; de Torres, Juan; Ferrand, Patrick; Wenger, Jérôme

    2014-09-01

    Fluorescence sensing and fluorescence correlation spectroscopy (FCS) are powerful methods to detect and characterize single molecules; yet, their use has been restricted by expensive and complex optical apparatus. Here, we present a simple integrated design using a self-assembled bi-dimensional array of microspheres to realize multi-focus parallel detection scheme for FCS. We simultaneously illuminate and collect the fluorescence from several tens of microspheres, which all generate their own photonic nanojet to efficiently excite the molecules and collect the fluorescence emission. Each photonic nanojet contributes to the global detection volume, reaching FCS detection volumes of several tens of femtoliters while preserving the fluorescence excitation and collection efficiencies. The microspheres photonic nanojets array enables FCS experiments at low picomolar concentrations with a drastic reduction in apparatus cost and alignment constraints, ideal for microfluidic chip integration.

  16. Multi-focus parallel detection of fluorescent molecules at picomolar concentration with photonic nanojets arrays

    SciTech Connect

    Ghenuche, Petru; Torres, Juan de; Ferrand, Patrick; Wenger, Jérôme

    2014-09-29

    Fluorescence sensing and fluorescence correlation spectroscopy (FCS) are powerful methods to detect and characterize single molecules; yet, their use has been restricted by expensive and complex optical apparatus. Here, we present a simple integrated design using a self-assembled bi-dimensional array of microspheres to realize multi-focus parallel detection scheme for FCS. We simultaneously illuminate and collect the fluorescence from several tens of microspheres, which all generate their own photonic nanojet to efficiently excite the molecules and collect the fluorescence emission. Each photonic nanojet contributes to the global detection volume, reaching FCS detection volumes of several tens of femtoliters while preserving the fluorescence excitation and collection efficiencies. The microspheres photonic nanojets array enables FCS experiments at low picomolar concentrations with a drastic reduction in apparatus cost and alignment constraints, ideal for microfluidic chip integration.

  17. Controlled Growth of Parallel Oriented ZnO Nanostructural Arrays on Ga2O3 Nanowires

    DTIC Science & Technology

    2008-11-01

    been obtained by growth of ZnO ,10-13 Ga2O3, 14 SnO2 , 15 and GaAs16 nanorod branches symmetrically around the nanowire (NW) cores composed of materials...Controlled Growth of Parallel Oriented ZnO Nanostructural Arrays on Ga2O3 Nanowires Lena Mazeina,* Yoosuf N. Picard, and Sharka M. Prokes Electronics...Manuscript ReceiVed NoVember 6, 2008 ABSTRACT: Novel hierarchical ZnO -Ga2O3 nanostructures were fabricated via a two stage growth process. Nanowires of Ga2O3

  18. Enhancing lab-on-a-chip performance via tunable parallel liquid mircolens arrays

    NASA Astrophysics Data System (ADS)

    Liu, Ye; Zeng, Xuefeng; Dong, Liang; Jiang, Hongrui

    2009-02-01

    Pathogen detection increasingly shows significance not only for hospital laboratories, but also for in-field usage. Nowadays the microfabrication technologies give us the possibility to integrate optical devices for detection and microfluidic channels for fluorescein-labeled pathogen suspension into a single chip (i.e. optofluidics), thus providing simple, sensitive and inexpensive methods of pathogen detection. One interesting optofluidic component is a microlens whose optical axis is parallel to the substrate used. We hereby report an in situ formed tunable liquid microlens array and its applications for dynamic lab-on-a-chip, such as enhancing fluorescence emission in and detection of laminar fluid flows, and characterizing surface reaction. The de-ionized water microlenses are intrinsically formed via liquid-air interfaces of liquid droplets, whose positions are precisely controlled by air/liquid injection and pinned at T-shaped junctions of octadecyltrichlorosilane(OTS) treated polymerized isobornyl acrylate(poly(IBA)) microchannels. By pneumatic manipulation inside the channel, the microlenses can be separately tuned in focal lengths along the microchannels parallel to the substrate. Then via the tunable microlenses, excitation light is dynamically focused onto fluorescent fluidic samples, and thus the fluorescence emission signal for detection is significantly increased compared to the case without the microlenses, as a result of the enhancement of the fluorescence excitation.Meanwhile, in lab-on-a-chip, controlled microfluidic interfaces are also important, and as our microlens array directly faces the cross sections of these interfaces, we have also shown the potential for surface reaction study at such interfaces by the microlens array.

  19. High-throughput fabrication of micrometer-sized compound parabolic mirror arrays by using parallel laser direct-write processing

    NASA Astrophysics Data System (ADS)

    Yan, Wensheng; Cumming, Benjamin P.; Gu, Min

    2015-07-01

    Micrometer-sized parabolic mirror arrays have significant applications in both light emitting diodes and solar cells. However, low fabrication throughput has been identified as major obstacle for the mirror arrays towards large-scale applications due to the serial nature of the conventional method. Here, the mirror arrays are fabricated by using a parallel laser direct-write processing, which addresses this barrier. In addition, it is demonstrated that the parallel writing is able to fabricate complex arrays besides simple arrays and thus offers wider applications. Optical measurements show that each single mirror confines the full-width at half-maximum value to as small as 17.8 μm at the height of 150 μm whilst providing a transmittance of up to 68.3% at a wavelength of 633 nm in good agreement with the calculation values.

  20. Silicon-substrate microelectrode arrays for parallel recording of neural activity in peripheral and cranial nerves.

    PubMed

    Kovacs, G T; Storment, C W; Halks-Miller, M; Belczynski, C R; Della Santina, C C; Lewis, E R; Maluf, N I

    1994-06-01

    A new process for the fabrication of regeneration microelectrode arrays for peripheral and cranial nerve applications is presented. This type of array is implanted between the severed ends of nerves, the axons of which regenerate through via holes in the silicon and are thereafter held fixed with respect to the microelectrodes. The process described is designed for compatibility with industry-standard CMOS or BiCMOS processes (it does not involve high-temperature process steps nor heavily-doped etch-stop layers), and provides a thin membrane for the via holes, surrounded by a thick silicon supporting rim. Many basic questions remain regarding the optimum via hole and microelectrode geometries in terms of both biological and electrical performance of the implants, and therefore passive versions were fabricated as tools for addressing these issues in on-going work. Versions of the devices were implanted in the rat peroneal nerve and in the frog auditory nerve. In both cases, regeneration was verified histologically and it was observed that the regenerated nerves had reorganized into microfascicles containing both myelinated and unmyelinated axons and corresponding to the grid pattern of the via holes. These microelectrode arrays were shown to allow the recording of action potential signals in both the peripheral and cranial nerve setting, from several microelectrodes in parallel.

  1. Trapping and Detection of Nanoparticles and Cells Using a Parallel Photonic Nanojet Array.

    PubMed

    Li, Yuchao; Xin, Hongbao; Liu, Xiaoshuai; Zhang, Yao; Lei, Hongxiang; Li, Baojun

    2016-06-28

    In advanced nanoscience, there is a strong desire to trap and detect nanoscale objects with high-throughput, single-nanoparticle resolution and high selectivity. Although emerging optical methods have enabled the selective trapping and detection of multiple micrometer-sized objects, it remains a great challenge to extend this functionality to the nanoscale. Here, we report an approach to trap and detect nanoparticles and subwavelength cells at low optical power using a parallel photonic nanojet array produced by assembling microlenses on an optical fiber probe. Benefiting from the subwavelength confinement of the photonic nanojets, tens to hundreds of nanotraps were formed in three dimensions. Backscattering signals were detected in real time with single-nanoparticle resolution and enhancement factors of 10(3)-10(4). Selective trapping of nanoparticles and cells from a particle mixture or human blood solution was demonstrated using the nanojet array. The developed nanojet array is potentially a powerful tool for nanoparticle assembly, biosensing, single-cell analysis, and optical sorting.

  2. Optoelectronic parallel processing with smart pixel arrays for automated screening of cervical smear imagery

    NASA Astrophysics Data System (ADS)

    Metz, John Langdon

    2000-10-01

    This thesis investigates the use of optoelectronic parallel processing systems with smart photosensor arrays (SPAs) to examine cervical smear images. The automation of cervical smear screening seeks to reduce human workload and improve the accuracy of detecting pre- cancerous and cancerous conditions. Increasing the parallelism of image processing improves the speed and accuracy of locating regions-of-interest (ROI) from images of the cervical smear for the first stage of a two-stage screening system. The two-stage approach first detects ROI optoelectronically before classifying them using more time consuming electronic algorithms. The optoelectronic hit/miss transform (HMT) is computed using gray scale modulation spatial light modulators in an optical correlator. To further the parallelism of this system, a novel CMOS SPA computes the post processing steps required by the HMT algorithm. The SPA reduces the subsequent bandwidth passed into the second, electronic image processing stage classifying the detected ROI. Limitations in the miss operation of the HMT suggest using only the hit operation for detecting ROI. This makes possible a single SPA chip approach using only the hit operation for ROI detection which may replace the optoelectronic correlator in the screening system. Both the HMT SPA postprocessor and the SPA ROI detector design provide compact, efficient, and low-cost optoelectronic solutions to performing ROI detection on cervical smears. Analysis of optoelectronic ROI detection with electronic ROI classification shows these systems have the potential to perform at, or above, the current error rates for manual classification of cervical smears.

  3. Exploiting Processor Groups to Extend Scalability of the GA Shared Memory Programming Model

    SciTech Connect

    Nieplocha, Jarek; Krishnan, Manoj Kumar; Palmer, Bruce J.; Tipparaju, Vinod; Zhang, Yeliang

    2005-05-04

    Exploiting processor groups is becoming increasingly important for programming next-generation high-end systems composed of tens or hundreds of thousands of processors. This paper discusses the requirements, functionality and development of multilevel-parallelism based on processor groups in the context of the Global Array (GA) shared memory programming model. The main effort involves management of shared data, rather than interprocessor communication. Experimental results for the NAS NPB Conjugate Gradient benchmark and a molecular dynamics (MD) application are presented for a Linux cluster with Myrinet and illustrate the value of the proposed approach for improving scalability. While the original GA version of the CG benchmark lagged MPI, the processor-group version outperforms MPI in all cases, except for a few points on the smallest problem size. Similarly, the group version of the MD application improves execution time by 58% on 32 processors.

  4. A micromachined silicon parallel acoustic delay line (PADL) array for real-time photoacoustic tomography (PAT)

    NASA Astrophysics Data System (ADS)

    Cho, Young Y.; Chang, Cheng-Chung; Wang, Lihong V.; Zou, Jun

    2015-03-01

    To achieve real-time photoacoustic tomography (PAT), massive transducer arrays and data acquisition (DAQ) electronics are needed to receive the PA signals simultaneously, which results in complex and high-cost ultrasound receiver systems. To address this issue, we have developed a new PA data acquisition approach using acoustic time delay. Optical fibers were used as parallel acoustic delay lines (PADLs) to create different time delays in multiple channels of PA signals. This makes the PA signals reach a single-element transducer at different times. As a result, they can be properly received by single-channel DAQ electronics. However, due to their small diameter and fragility, using optical fiber as acoustic delay lines poses a number of challenges in the design, construction and packaging of the PADLs, thereby limiting their performances and use in real imaging applications. In this paper, we report the development of new silicon PADLs, which are directly made from silicon wafers using advanced micromachining technologies. The silicon PADLs have very low acoustic attenuation and distortion. A linear array of 16 silicon PADLs were assembled into a handheld package with one common input port and one common output port. To demonstrate its real-time PAT capability, the silicon PADL array (with its output port interfaced with a single-element transducer) was used to receive 16 channels of PA signals simultaneously from a tissue-mimicking optical phantom sample. The reconstructed PA image matches well with the imaging target. Therefore, the silicon PADL array can provide a 16× reduction in the ultrasound DAQ channels for real-time PAT.

  5. The International Conference on Vector and Parallel Computing (2nd)

    DTIC Science & Technology

    1989-01-17

    in Reservoir Simulation "... . ................... 7 "ParaScope: A Parallel Programing Environment ........................ 8 "Current Directions and...built. "Large-Scale Computing in Reservoir Simulation " In addition, a new tri-level parallel architecture pro- Pchard Ewing, University of Wyomitg...viding a large array of simple processors for image pro- The objective of reservoir simulation is to understand cessing, a medium-sized array of more

  6. Computation and parallel implementation for early vision

    NASA Technical Reports Server (NTRS)

    Gualtieri, J. Anthony

    1990-01-01

    The problem of early vision is to transform one or more retinal illuminance images-pixel arrays-to image representations built out of such primitive visual features such as edges, regions, disparities, and clusters. These transformed representations form the input to later vision stages that perform higher level vision tasks including matching and recognition. Researchers developed algorithms for: (1) edge finding in the scale space formulation; (2) correlation methods for computing matches between pairs of images; and (3) clustering of data by neural networks. These algorithms are formulated for parallel implementation of SIMD machines, such as the Massively Parallel Processor, a 128 x 128 array processor with 1024 bits of local memory per processor. For some cases, researchers can show speedups of three orders of magnitude over serial implementations.

  7. Parallelization of the NAS Conjugate Gradient Benchmark Using the Global Arrays Shared Memory Programming Model

    SciTech Connect

    Zhang, Yeliang; Tipparaju, Vinod; Nieplocha, Jarek; Hariri, Salim

    2005-04-08

    The NAS Conjugate Gradient (CG) benchmark is an important scientific kernel used to evaluate machine performance and compare characteristics of different programming models. Global Arrays (GA) toolkit supports a shared memory programming paradigm— even on distributed memory systems— and offers the programmer control over the distribution and locality that are important for optimizing performance on scalable architectures. In this paper, we describe and compare two different parallelization strategies of the CG benchmark using GA and report performance results on a shared-memory system as well as on a cluster. Performance benefits of using shared memory for irregular/sparse computations have been demonstrated before in context of the CG benchmark using OpenMP. Similarly, the GA implementation outperforms the standard MPI implementation on shared memory system, in our case the SGI Altix. However, with GA these benefits are extended to distributed memory systems and demonstrated on a Linux cluster with Myrinet.

  8. Subwavelength microwave imaging using an array of parallel conducting wires as a lens

    NASA Astrophysics Data System (ADS)

    Belov, Pavel A.; Hao, Yang; Sudhakaran, Sunil

    2006-01-01

    An original realization of a lens capable of transmitting images with subwavelength resolution is proposed. The lens is formed by an array of parallel conducting wires and effectively operates as a telegraph which captures a distribution of the electric field at the front interface of the lens and transmits it to the back side without distortions. This regime of operation is called canalization and is inherent in flat lenses formed by electromagnetic crystals. The theoretical estimations are supported by numerical simulations and experimental verification. The subwavelength resolution of λ/15 and 18% bandwidth of operation are demonstrated at gigahertz frequencies. The proposed lens is capable of transporting subwavelength images without distortion to nearly unlimited distances since the influence of losses to the lens operation is negligibly small.

  9. Modeling of the phase lag causing fluidelastic instability in a parallel triangular tube array

    NASA Astrophysics Data System (ADS)

    Khalifa, Ahmed; Weaver, David; Ziada, Samir

    2013-11-01

    Fluidelastic instability is considered a critical flow induced vibration mechanism in tube and shell heat exchangers. It is believed that a finite time lag between tube vibration and fluid response is essential to predict the phenomenon. However, the physical nature of this time lag is not fully understood. This paper presents a fundamental study of this time delay using a parallel triangular tube array with a pitch ratio of 1.54. A computational fluid dynamics (CFD) model was developed and validated experimentally in an attempt to investigate the interaction between tube vibrations and flow perturbations at lower reduced velocities Ur=1-6 and Reynolds numbers Re=2000-12 000. The numerical predictions of the phase lag are in reasonable agreement with the experimental measurements for the range of reduced velocities Ug/fd=6-7. It was found that there are two propagation mechanisms; the first is associated with the acoustic wave propagation at low reduced velocities, Ur<2, and the second mechanism for higher reduced velocities is associated with the vorticity shedding and convection. An empirical model of the two mechanisms is developed and the phase lag predictions are in reasonable agreement with the experimental and numerical measurements. The developed phase lag model is then coupled with the semi-analytical model of Lever and Weaver to predict the fluidelastic stability threshold. Improved predictions of the stability boundaries for the parallel triangular array were achieved. In addition, the present study has explained why fluidelastic instability does not occur below some threshold reduced velocity.

  10. Parallel Aligned Mesopore Arrays in Pyramidal-Shaped Gallium Nitride and Their Photocatalytic Applications.

    PubMed

    Kim, Hee Jun; Park, Joonmo; Ye, Byeong Uk; Yoo, Chul Jong; Lee, Jong-Lam; Ryu, Sang-Wan; Lee, Heon; Choi, Kyoung Jin; Baik, Jeong Min

    2016-07-20

    Parallel aligned mesopore arrays in pyramidal-shaped GaN are fabricated by using an electrochemical anodic etching technique, followed by inductively coupled plasma etching assisted by SiO2 nanosphere lithography, and used as a promising photoelectrode for solar water oxidation. The parallel alignment of the pores of several tens of micrometers scale in length is achieved by the low applied voltage and prepattern guided anodization. The dry etching of single-layer SiO2 nanosphere-coated GaN produces a pyramidal shape of the GaN, making the pores open at both sides and shortening the escape path of evolved gas bubbles produced inside pores during the water oxidation. The absorption spectra show that the light absorption in the UV range is ∼93% and that there is a red shift in the absorption edge by 30 nm, compared with the flat GaN. It also shows a remarkable enhancement in the photocurrent density by 5.3 times, compared with flat GaN. Further enhancement (∼40%) by the deposition of Ni was observed due to the generation of an electric field, which increases the charge separation ratio.

  11. MVSP: multithreaded VLIW stream processor

    NASA Astrophysics Data System (ADS)

    Sardashti, Somayeh; Ghasemi, Hamid Reza; Fatemi, Omid

    2006-02-01

    Stream processing is a new trend in computer architecture design which fills the gap between inflexible special-purpose media architectures and programmable architectures with low computational ability for media processing. Stream processors are designed for computationally intensive media applications characterized by high data parallelism and producer-consumer locality with little global data reuse. In this paper, we propose a new stream processor, named MVSP1. This processor is a programmable stream processor based on Imagine [1]. MVSP exploits TLP2, DLP 3, SP 4 and ILP 5 parallelisms inherent in media applications. Full simulator of MVSP has been implemented and several media workloads composed of EEMBC [2] benchmarks have been applied. The simulation results show the performance and functional unit utilization improvements of more than two times in comparison with Imagine processor.

  12. The Development of Low Threshold Laser Arrays and Their Applications in Parallel Optical Datalinks

    NASA Astrophysics Data System (ADS)

    Zhao, Hanmin

    We present the analytical and experimental study for the development of ultra-low threshold InGaAs/GaAs/AlGaAs quantum well semiconductor lasers and laser arrays grown on non planar substrates by MOCVD. This study has resulted in the demonstration of some of the lowest threshold currents and current densities yet reported as well as the demonstration of multichannel optical datalinks working at 1Gbit/sec/channel. The gain properties of the InGaAs/GaAs strained quantum wells and the lasing properties of InGaAs/GaAs lasers were theoretically analysed. Using MOCVD growth technique, the growth condition for InGaAs/GaAs quantum wells and InGaAs/GaAs broad area lasers were optimized. InGaAs/GaAs broad area laser threshold current density as low as 56 A/cm^{-2} were obtained. The growth and doping properties of InGaAs/GaAs quantum wells and AlGaAs bulk layer on non planar substrates were studied and the unique properties we learned from the above study were used to design and fabricate a new buried heterostructure InGaAs/GaAs laser for low threshold and high efficiency operation. Record low threshold current of 0.5 mA and 0.6 mA were obtained for as cleaved DQW lasers and SQW lasers respectively. HR coated SQW laser threshold currents as low as 0.15 mA were obtained which is the lowest reported threshold current in a diode laser. This new technique produces high yield and high laser uniformity because of the simple growth and processing procedures involved. Highly uniform InGaAs/GaAs SQW and DQW laser arrays with sub-milliampere threshold currents were obtained. Using the above low threshold lasers, a unique three terminal laser structure that is suitable for high speed, high efficiency, large signal, digital modulation was investigated. Three terminal laser arrays were used in a wide bandwidth parallel optical datalink system. High data transfer rate (1GBit/sec/channel) and low bit error rate (BER) ({<}10^{-13}) with large phase margin were obtained. This parallel

  13. Large-scale parallel surface functionalization of goblet-type whispering gallery mode microcavity arrays for biosensing applications.

    PubMed

    Bog, Uwe; Brinkmann, Falko; Kalt, Heinz; Koos, Christian; Mappes, Timo; Hirtz, Michael; Fuchs, Harald; Köber, Sebastian

    2014-10-15

    A novel surface functionalization technique is presented for large-scale selective molecule deposition onto whispering gallery mode microgoblet cavities. The parallel technique allows damage-free individual functionalization of the cavities, arranged on-chip in densely packaged arrays. As the stamp pad a glass slide is utilized, bearing phospholipids with different functional head groups. Coated microcavities are characterized and demonstrated as biosensors.

  14. Optical signal processing of phased array radar

    NASA Astrophysics Data System (ADS)

    Weverka, Robert T.

    This thesis develops optical processors that scale to very high processing speed. Optical signal processing is often promoted on the basis of smaller size, lower weight and lower power consumption as well as higher signal processing speed. While each of these requirements has applications, it is the ones that require processing speed beyond that available in electronics that are most compelling. Thirty years ago, optical processing was the only method fast enough to process Synthetic Aperture Radar (SAR), one of the more demanding signal processing tasks at this time. Since that time electronic processing speed has improved sufficiently to tackle that problem. We have sought out the problems that require significantly higher processing speed and developed optical processors that tackle these more difficult problems. The components that contribute to high signal processing speed are high input signal bandwidth, a large number of parallel input channels each with this high bandwidth, and a large number of parallel operations required on each input channel. Adaptive signal processing for phased array radar has all of these factors. The processors developed for this task scale well in three dimensions, which allows them to maximize parallelism for high speed. This thesis explores an example of a negative feedback adaptive phased array processor and an example of a positive feedback phased array processor. The negative feedback processor uses and array of inputs in up to two dimensions together with the time history of the signal in the third dimension to adapt the array pattern to null out incoming jammer signals. The positive feedback processor uses the incoming signals and assumptions about the radar scene to correct for position errors in a phased array. Discovery and analysis of these new processors are facilitated by an original volume holographic analysis technique developed in the thesis. The thesis includes a new acoustooptic Bragg cell geometry developed with

  15. Cellular Array Processing Simulation

    NASA Astrophysics Data System (ADS)

    Lee, Harry C.; Preston, Earl W.

    1981-11-01

    The Cellular Array Processing Simulation (CAPS) system is a high-level image language that runs on a multiprocessor configuration. CAPS is interpretively decoded on a conventional minicomputer with all image operation instructions executed on an array processor. The synergistic environment that exists between the minicomputer and the array processor gives CAPS its high-speed throughput, while maintaining a convenient conversational user language. CAPS was designed to be both modular and table driven so that it can be easily maintained and modified. CAPS uses the image convolution operator as one of its primitives and performs this cellular operation by decomposing it into parallel image steps that are scheduled to be executed on the array processor. Among its features is the ability to observe the imagery in real time as a user's algorithm is executed. This feature reduces the need for image storage space, since it is feasible to retain only original images and produce resultant images when needed. CAPS also contains a language processor that permits users to develop re-entrant image processing subroutines or algorithms.

  16. Comparison of 3-D synthetic aperture phased-array ultrasound imaging and parallel beamforming.

    PubMed

    Rasmussen, Morten Fischer; Jensen, Jørgen Arendt

    2014-10-01

    This paper demonstrates that synthetic aperture imaging (SAI) can be used to achieve real-time 3-D ultrasound phased-array imaging. It investigates whether SAI increases the image quality compared with the parallel beamforming (PB) technique for real-time 3-D imaging. Data are obtained using both simulations and measurements with an ultrasound research scanner and a commercially available 3.5- MHz 1024-element 2-D transducer array. To limit the probe cable thickness, 256 active elements are used in transmit and receive for both techniques. The two imaging techniques were designed for cardiac imaging, which requires sequences designed for imaging down to 15 cm of depth and a frame rate of at least 20 Hz. The imaging quality of the two techniques is investigated through simulations as a function of depth and angle. SAI improved the full-width at half-maximum (FWHM) at low steering angles by 35%, and the 20-dB cystic resolution by up to 62%. The FWHM of the measured line spread function (LSF) at 80 mm depth showed a difference of 20% in favor of SAI. SAI reduced the cyst radius at 60 mm depth by 39% in measurements. SAI improved the contrast-to-noise ratio measured on anechoic cysts embedded in a tissue-mimicking material by 29% at 70 mm depth. The estimated penetration depth on the same tissue-mimicking phantom shows that SAI increased the penetration by 24% compared with PB. Neither SAI nor PB achieved the design goal of 15 cm penetration depth. This is likely due to the limited transducer surface area and a low SNR of the experimental scanner used.

  17. Signal processor packaging design

    NASA Astrophysics Data System (ADS)

    McCarley, Paul L.; Phipps, Mickie A.

    1993-10-01

    The Signal Processor Packaging Design (SPPD) program was a technology development effort to demonstrate that a miniaturized, high throughput programmable processor could be fabricated to meet the stringent environment imposed by high speed kinetic energy guided interceptor and missile applications. This successful program culminated with the delivery of two very small processors, each about the size of a large pin grid array package. Rockwell International's Tactical Systems Division in Anaheim, California developed one of the processors, and the other was developed by Texas Instruments' (TI) Defense Systems and Electronics Group (DSEG) of Dallas, Texas. The SPPD program was sponsored by the Guided Interceptor Technology Branch of the Air Force Wright Laboratory's Armament Directorate (WL/MNSI) at Eglin AFB, Florida and funded by SDIO's Interceptor Technology Directorate (SDIO/TNC). These prototype processors were subjected to rigorous tests of their image processing capabilities, and both successfully demonstrated the ability to process 128 X 128 infrared images at a frame rate of over 100 Hz.

  18. Hardware multiplier processor

    DOEpatents

    Pierce, P.E.

    A hardware processor is disclosed which in the described embodiment is a memory mapped multiplier processor that can operate in parallel with a 16 bit microcomputer. The multiplier processor decodes the address bus to receive specific instructions so that in one access it can write and automatically perform single or double precision multiplication involving a number written to it with or without addition or subtraction with a previously stored number. It can also, on a single read command automatically round and scale a previously stored number. The multiplier processor includes two concatenated 16 bit multiplier registers, two 16 bit concatenated 16 bit multipliers, and four 16 bit product registers connected to an internal 16 bit data bus. A high level address decoder determines when the multiplier processor is being addressed and first and second low level address decoders generate control signals. In addition, certain low order address lines are used to carry uncoded control signals. First and second control circuits coupled to the decoders generate further control signals and generate a plurality of clocking pulse trains in response to the decoded and address control signals.

  19. Hardware multiplier processor

    DOEpatents

    Pierce, Paul E.

    1986-01-01

    A hardware processor is disclosed which in the described embodiment is a memory mapped multiplier processor that can operate in parallel with a 16 bit microcomputer. The multiplier processor decodes the address bus to receive specific instructions so that in one access it can write and automatically perform single or double precision multiplication involving a number written to it with or without addition or subtraction with a previously stored number. It can also, on a single read command automatically round and scale a previously stored number. The multiplier processor includes two concatenated 16 bit multiplier registers, two 16 bit concatenated 16 bit multipliers, and four 16 bit product registers connected to an internal 16 bit data bus. A high level address decoder determines when the multiplier processor is being addressed and first and second low level address decoders generate control signals. In addition, certain low order address lines are used to carry uncoded control signals. First and second control circuits coupled to the decoders generate further control signals and generate a plurality of clocking pulse trains in response to the decoded and address control signals.

  20. Development of micropump-actuated negative pressure pinched injection for parallel electrophoresis on array microfluidic chip.

    PubMed

    Li, Bowei; Jiang, Lei; Xie, Hua; Gao, Yan; Qin, Jianhua; Lin, Bingcheng

    2009-09-01

    A micropump-actuated negative pressure pinched injection method is developed for parallel electrophoresis on a multi-channel LIF detection system. The system has a home-made device that could individually control 16-port solenoid valves and a high-voltage power supply. The laser beam is excitated and distributes to the array separation channels for detection. The hybrid Glass-PDMS microfluidic chip comprises two common reservoirs, four separation channels coupled to their respective pneumatic micropumps and two reference channels. Due to use of pressure as a driving force, the proposed method has no sample bias effect for separation. There is only one high-voltage supply needed for separation without relying on the number of channels, which is significant for high-throughput analysis, and the time for sample loading is shortened to 1 s. In addition, the integrated micropumps can provide the versatile interface for coupling with other function units to satisfy the complicated demands. The performance is verified by separation of DNA marker and Hepatitis B virus DNA samples. And this method is also expected to show the potential throughput for the DNA analysis in the field of disease diagnosis.

  1. Parameter allocation of parallel array bistable stochastic resonance and its application in communication systems

    NASA Astrophysics Data System (ADS)

    Liu, Jian; Wang, You-Guo; Zhai, Qi-Qing; Liu, Jin

    2016-10-01

    In this paper, we propose a parameter allocation scheme in a parallel array bistable stochastic resonance-based communication system (P-BSR-CS) to improve the performance of weak binary pulse amplitude modulated (BPAM) signal transmissions. The optimal parameter allocation policy of the P-BSR-CS is provided to minimize the bit error rate (BER) and maximize the channel capacity (CC) under the adiabatic approximation condition. On this basis, we further derive the best parameter selection theorem in realistic communication scenarios via variable transformation. Specifically, the P-BSR structure design not only brings the robustness of parameter selection optimization, where the optimal parameter pair is not fixed but variable in quite a wide range, but also produces outstanding system performance. Theoretical analysis and simulation results indicate that in the P-BSR-CS the proposed parameter allocation scheme yields considerable performance improvement, particularly in very low signal-to-noise ratio (SNR) environments. Project supported by the National Natural Science Foundation of China (Grant No. 61179027), the Qinglan Project of Jiangsu Province of China (Grant No. QL06212006), and the University Postgraduate Research and Innovation Project of Jiangsu Province (Grant Nos. KYLX15_0829, KYLX15_0831).

  2. Imer-product array processor for retrieval of stored images represented by bipolar binary (+1,-1) pixels using partial input trinary pixels represented by (+1,-1)

    NASA Technical Reports Server (NTRS)

    Liu, Hua-Kuang (Inventor); Awwal, Abdul A. S. (Inventor); Karim, Mohammad A. (Inventor)

    1993-01-01

    An inner-product array processor is provided with thresholding of the inner product during each iteration to make more significant the inner product employed in estimating a vector to be used as the input vector for the next iteration. While stored vectors and estimated vectors are represented in bipolar binary (1,-1), only those elements of an initial partial input vector that are believed to be common with those of a stored vector are represented in bipolar binary; the remaining elements of a partial input vector are set to 0. This mode of representation, in which the known elements of a partial input vector are in bipolar binary form and the remaining elements are set equal to 0, is referred to as trinary representation. The initial inner products corresponding to the partial input vector will then be equal to the number of known elements. Inner-product thresholding is applied to accelerate convergence and to avoid convergence to a negative input product.

  3. Tiled Multicore Processors

    NASA Astrophysics Data System (ADS)

    Taylor, Michael B.; Lee, Walter; Miller, Jason E.; Wentzlaff, David; Bratt, Ian; Greenwald, Ben; Hoffmann, Henry; Johnson, Paul R.; Kim, Jason S.; Psota, James; Saraf, Arvind; Shnidman, Nathan; Strumpen, Volker; Frank, Matthew I.; Amarasinghe, Saman; Agarwal, Anant

    For the last few decades Moore’s Law has continually provided exponential growth in the number of transistors on a single chip. This chapter describes a class of architectures, called tiled multicore architectures, that are designed to exploit massive quantities of on-chip resources in an efficient, scalable manner. Tiled multicore architectures combine each processor core with a switch to create a modular element called a tile. Tiles are replicated on a chip as needed to create multicores with any number of tiles. The Raw processor, a pioneering example of a tiled multicore processor, is examined in detail to explain the philosophy, design, and strengths of such architectures. Raw addresses the challenge of building a general-purpose architecture that performs well on a larger class of stream and embedded computing applications than existing microprocessors, while still running existing ILP-based sequential programs with reasonable performance. Central to achieving this goal is Raw’s ability to exploit all forms of parallelism, including ILP, DLP, TLP, and Stream parallelism. Raw approaches this challenge by implementing plenty of on-chip resources - including logic, wires, and pins - in a tiled arrangement, and exposing them through a new ISA, so that the software can take advantage of these resources for parallel applications. Compared to a traditional superscalar processor, Raw performs within a factor of 2x for sequential applications with a very low degree of ILP, about 2x-9x better for higher levels of ILP, and 10x-100x better when highly parallel applications are coded in a stream language or optimized by hand.

  4. Biological Information Signal Processor

    NASA Technical Reports Server (NTRS)

    Chow, Edward T.; Peterson, John C.; Yoo, Michael M.

    1993-01-01

    Biological Information Signal Processor (BISP) is computing system analyzing data on deoxyribonucleic acid (DNA) sequences for molecular genetic analysis. Includes coprocessors, specialized microprocessors complementing present and future computers by performing rapidly most-time-consuming DNA-sequence-analyzing functions, establishing relationships (alignments) between both global sequences and defining patterns in multiple sequences. Also includes state-of-art software and data-base systems on both conventional and parallel computer systems to augment analytical abilities of developmental coprocessors.

  5. Image coding using parallel implementations of the embedded zerotree wavelet algorithm

    NASA Astrophysics Data System (ADS)

    Creusere, Charles D.

    1996-03-01

    We explore here the implementation of Shapiro's embedded zerotree wavelet (EZW) image coding algorithms on an array of parallel processors. To this end, we first consider the problem of parallelizing the basic wavelet transform, discussing past work in this area and the compatibility of that work with the zerotree coding process. From this discussion, we present a parallel partitioning of the transform which is computationally efficient and which allows the wavelet coefficients to be coded with little or no additional inter-processor communication. The key to achieving low data dependence between the processors is to ensure that each processor contains only entire zerotrees of wavelet coefficients after the decomposition is complete. We next quantify the rate-distortion tradeoffs associated with different levels of parallelization for a few variations of the basic coding algorithm. Studying these results, we conclude that the quality of the coder decreases as the number of parallel processors used to implement it increases. Noting that the performance of the parallel algorithm might be unacceptably poor for large processor arrays, we also develop an alternate algorithm which always achieves the same rate-distortion performance as the original sequential EZW algorithm at the cost of higher complexity and reduced scalability.

  6. Atmospheric plasma jet array in parallel electric and gas flow fields for three-dimensional surface treatment

    NASA Astrophysics Data System (ADS)

    Cao, Z.; Walsh, J. L.; Kong, M. G.

    2009-01-01

    This letter reports on electrical and optical characteristics of a ten-channel atmospheric pressure glow discharge jet array in parallel electric and gas flow fields. Challenged with complex three-dimensional substrates including surgical tissue forceps and sloped plastic plate of up to 15°, the jet array is shown to achieve excellent jet-to-jet uniformity both in time and in space. Its spatial uniformity is four times better than a comparable single jet when both are used to treat a 15° sloped substrate. These benefits are likely from an effective self-adjustment mechanism among individual jets facilitated by individualized ballast and spatial redistribution of surface charges.

  7. Parallel self-mixing imaging system based on an array of vertical-cavity surface-emitting lasers

    SciTech Connect

    Tucker, John R.; Baque, Johnathon L.; Lim, Yah Leng; Zvyagin, Andrei V.; Rakic, Aleksandar D

    2007-09-01

    In this paper we investigate the feasibility of a massively parallel self-mixing imaging system based on an array of vertical-cavity surface-emitting lasers (VCSELs) to measure surface profiles of displacement,distance, velocity, and liquid flow rate. The concept of the system is demonstrated using a prototype to measure the velocity at different radial points on a rotating disk, and the velocity profile of diluted milk in a custom built diverging-converging planar flow channel. It is envisaged that a scaled up version of the parallel self-mixing imaging system will enable real-time surface profiling, vibrometry, and flowmetry.

  8. Parallel detection of harmful algae using reverse transcription polymerase chain reaction labeling coupled with membrane-based DNA array.

    PubMed

    Zhang, Chunyun; Chen, Guofu; Ma, Chaoshuai; Wang, Yuanyuan; Zhang, Baoyu; Wang, Guangce

    2014-03-01

    Harmful algal blooms (HABs) are a global problem, which can cause economic loss to aquaculture industry's and pose a potential threat to human health. More attention must be made on the development of effective detection methods for the causative microalgae. The traditional microscopic examination has many disadvantages, such as low efficiency, inaccuracy, and requires specialized skill in identification and especially is incompetent for parallel analysis of several morphologically similar microalgae to species level at one time. This study aimed at exploring the feasibility of using membrane-based DNA array for parallel detection of several microalgae by selecting five microaglae, including Heterosigma akashiwo, Chaetoceros debilis, Skeletonema costatum, Prorocentrum donghaiense, and Nitzschia closterium as test species. Five species-specific (taxonomic) probes were designed from variable regions of the large subunit ribosomal DNA (LSU rDNA) by visualizing the alignment of LSU rDNA of related species. The specificity of the probes was confirmed by dot blot hybridization. The membrane-based DNA array was prepared by spotting the tailed taxonomic probes onto positively charged nylon membrane. Digoxigenin (Dig) labeling of target molecules was performed by multiple PCR/RT-PCR using RNA/DNA mixture of five microalgae as template. The Dig-labeled amplification products were hybridized with the membrane-based DNA array to produce visible hybridization signal indicating the presence of target algae. Detection sensitivity comparison showed that RT-PCR labeling (RPL) coupled with hybridization was tenfold more sensitive than DNA-PCR-labeling-coupled with hybridization. Finally, the effectiveness of RPL coupled with membrane-based DNA array was validated by testing with simulated and natural water samples, respectively. All of these results indicated that RPL coupled with membrane-based DNA array is specific, simple, and sensitive for parallel detection of microalgae which

  9. Design and implementation of a high performance network security processor

    NASA Astrophysics Data System (ADS)

    Wang, Haixin; Bai, Guoqiang; Chen, Hongyi

    2010-03-01

    The last few years have seen many significant progresses in the field of application-specific processors. One example is network security processors (NSPs) that perform various cryptographic operations specified by network security protocols and help to offload the computation intensive burdens from network processors (NPs). This article presents a high performance NSP system architecture implementation intended for both internet protocol security (IPSec) and secure socket layer (SSL) protocol acceleration, which are widely employed in virtual private network (VPN) and e-commerce applications. The efficient dual one-way pipelined data transfer skeleton and optimised integration scheme of the heterogenous parallel crypto engine arrays lead to a Gbps rate NSP, which is programmable with domain specific descriptor-based instructions. The descriptor-based control flow fragments large data packets and distributes them to the crypto engine arrays, which fully utilises the parallel computation resources and improves the overall system data throughput. A prototyping platform for this NSP design is implemented with a Xilinx XC3S5000 based FPGA chip set. Results show that the design gives a peak throughput for the IPSec ESP tunnel mode of 2.85 Gbps with over 2100 full SSL handshakes per second at a clock rate of 95 MHz.

  10. Quadrature transmit array design using single-feed circularly polarized patch antenna for parallel transmission in MR imaging

    PubMed Central

    Pang, Yong; Yu, Baiying; Vigneron, Daniel B.

    2014-01-01

    Quadrature coils are often desired in MR applications because they can improve MR sensitivity and also reduce excitation power. In this work, we propose, for the first time, a quadrature array design strategy for parallel transmission at 298 MHz using single-feed circularly polarized (CP) patch antenna technique. Each array element is a nearly square ring microstrip antenna and is fed at a point on the diagonal of the antenna to generate quadrature magnetic fields. Compared with conventional quadrature coils, the single-feed structure is much simple and compact, making the quadrature coil array design practical. Numerical simulations demonstrate that the decoupling between elements is better than –35 dB for all the elements and the RF fields are homogeneous with deep penetration and quadrature behavior in the area of interest. Bloch equation simulation is also performed to simulate the excitation procedure by using an 8-element quadrature planar patch array to demonstrate its feasibility in parallel transmission at the ultrahigh field of 7 Tesla. PMID:24649430

  11. Quadrature transmit array design using single-feed circularly polarized patch antenna for parallel transmission in MR imaging.

    PubMed

    Pang, Yong; Yu, Baiying; Vigneron, Daniel B; Zhang, Xiaoliang

    2014-02-01

    Quadrature coils are often desired in MR applications because they can improve MR sensitivity and also reduce excitation power. In this work, we propose, for the first time, a quadrature array design strategy for parallel transmission at 298 MHz using single-feed circularly polarized (CP) patch antenna technique. Each array element is a nearly square ring microstrip antenna and is fed at a point on the diagonal of the antenna to generate quadrature magnetic fields. Compared with conventional quadrature coils, the single-feed structure is much simple and compact, making the quadrature coil array design practical. Numerical simulations demonstrate that the decoupling between elements is better than -35 dB for all the elements and the RF fields are homogeneous with deep penetration and quadrature behavior in the area of interest. Bloch equation simulation is also performed to simulate the excitation procedure by using an 8-element quadrature planar patch array to demonstrate its feasibility in parallel transmission at the ultrahigh field of 7 Tesla.

  12. Parallel and Space-Efficient Construction of Burrows-Wheeler Transform and Suffix Array for Big Genome Data.

    PubMed

    Liu, Yongchao; Hankeln, Thomas; Schmidt, Bertil

    2016-01-01

    Next-generation sequencing technologies have led to the sequencing of more and more genomes, propelling related research into the era of big data. In this paper, we present ParaBWT, a parallelized Burrows-Wheeler transform (BWT) and suffix array construction algorithm for big genome data. In ParaBWT, we have investigated a progressive construction approach to constructing the BWT of single genome sequences in linear space complexity, but with a small constant factor. This approach has been further parallelized using multi-threading based on a master-slave coprocessing model. After gaining the BWT, the suffix array is constructed in a memory-efficient manner. The performance of ParaBWT has been evaluated using two sequences generated from two human genome assemblies: the Ensembl Homo sapiens assembly and the human reference genome. Our performance comparison to FMD-index and Bwt-disk reveals that on 12 CPU cores, ParaBWT runs up to 2.2× faster than FMD-index and up to 99.0× faster than Bwt-disk. BWT construction algorithms for very long genomic sequences are time consuming and (due to their incremental nature) inherently difficult to parallelize. Thus, their parallelization is challenging and even relatively small speedups like the ones of our method over FMD-index are of high importance to research. ParaBWT is written in C++, and is freely available at http://parabwt.sourceforge.net.

  13. Signal processor chip implementation

    NASA Astrophysics Data System (ADS)

    Beraud, J. P.

    1985-03-01

    Advances in technology have made it now possible to integrate very large microprocessors on a single chip. Two basic design methodologies are available, including gate array and custom design. The present paper is concerned with a signal processor (SP) chip which is based on a mixture of the two technologies. Involved is a high-density chip which requires little manual effort for its production. The SP is characterized by separate instruction and data memories. The SP consists of three main parts which operate simultaneously. These parts include the sequencer, the address generator, and the computer portion. The chip comprises a library of predesigned building blocks. Attention is given to a signal processor block diagram, the basic TTL gate, a two-input master-slave latch, the physical library, aspects of logical design, the multiplier basic cell and adder line organization, and physical design methodology.

  14. Integrated RF/shim coil array for parallel reception and localized B0 shimming in the human brain.

    PubMed

    Truong, Trong-Kha; Darnell, Dean; Song, Allen W

    2014-12-01

    The purpose of this work was to develop a novel integrated radiofrequency and shim (RF/shim) coil array that can perform parallel reception and localized B0 shimming in the human brain with the same coils, thereby maximizing both the signal-to-noise ratio and shimming efficiency. A 32-channel receive-only head coil array was modified to enable both RF currents (for signal reception) and direct currents (for B0 shimming) to flow in individual coil elements. Its in vivo performance was assessed in the frontal brain region, which is affected by large susceptibility-induced B0 inhomogeneities. The coil modifications did not reduce their quality factor or signal-to-noise ratio. Axial B0 maps and echo-planar images acquired in vivo with direct currents optimized to shim specific slices showed substantially reduced B0 inhomogeneities and image distortions in the frontal brain region. The B0 root-mean-square error in the anterior half of the brain was reduced by 60.3% as compared to that obtained with second-order spherical harmonic shimming. These results demonstrate that the integrated RF/shim coil array can perform parallel reception and localized B0 shimming in the human brain and provide a much more effective shimming than conventional spherical harmonic shimming alone, without taking up additional space in the magnet bore and without compromising the signal-to-noise ratio or shimming performance.

  15. Parallel nanomanufacturing via electrohydrodynamic jetting from microfabricated externally-fed emitter arrays

    NASA Astrophysics Data System (ADS)

    Ponce de Leon, Philip J.; Hill, Frances A.; Heubel, Eric V.; Velásquez-García, Luis F.

    2015-06-01

    We report the design, fabrication, and characterization of planar arrays of externally-fed silicon electrospinning emitters for high-throughput generation of polymer nanofibers. Arrays with as many as 225 emitters and with emitter density as large as 100 emitters cm-2 were characterized using a solution of dissolved PEO in water and ethanol. Devices with emitter density as high as 25 emitters cm-2 deposit uniform imprints comprising fibers with diameters on the order of a few hundred nanometers. Mass flux rates as high as 417 g hr-1 m-2 were measured, i.e., four times the reported production rate of the leading commercial free-surface electrospinning sources. Throughput increases with increasing array size at constant emitter density, suggesting the design can be scaled up with no loss of productivity. Devices with emitter density equal to 100 emitters cm-2 fail to generate fibers but uniformly generate electrosprayed droplets. For the arrays tested, the largest measured mass flux resulted from arrays with larger emitter separation operating at larger bias voltages, indicating the strong influence of electrical field enhancement on the performance of the devices. Incorporation of a ground electrode surrounding the array tips helps equalize the emitter field enhancement across the array as well as control the spread of the imprints over larger distances.

  16. High Speed Publication Subscription Brokering Through Highly Parallel Processing on Field Programmable Gate Array (FPGA)

    DTIC Science & Technology

    2010-01-01

    and that Unix style newlines are being used. Section 2. Hardware Required for a Single Node All of the information in the multi- node hardware...AFRL-RI-RS-TR-2010-29 Final Technical Report January 2010 HIGH SPEED PUBLICATION SUBSCRIPTION BROKERING THROUGH HIGHLY PARALLEL ...2007 – August 2009 4. TITLE AND SUBTITLE HIGH SPEED PUBLICATION SUBSCRIPTION BROKERING THROUGH HIGHLY PARALLEL PROCESSING ON FIELD PROGRAMMABLE

  17. Implementation and Assessment of Advanced Analog Vector-Matrix Processor

    NASA Technical Reports Server (NTRS)

    Gary, Charles K.; Bualat, Maria G.; Lum, Henry, Jr. (Technical Monitor)

    1994-01-01

    This paper discusses the design and implementation of an analog optical vecto-rmatrix coprocessor with a throughput of 128 Mops for a personal computer. Vector matrix calculations are inherently parallel, providing a promising domain for the use of optical calculators. However, to date, digital optical systems have proven too cumbersome to replace electronics, and analog processors have not demonstrated sufficient accuracy in large scale systems. The goal of the work described in this paper is to demonstrate a viable optical coprocessor for linear operations. The analog optical processor presented has been integrated with a personal computer to provide full functionality and is the first demonstration of an optical linear algebra processor with a throughput greater than 100 Mops. The optical vector matrix processor consists of a laser diode source, an acoustooptical modulator array to input the vector information, a liquid crystal spatial light modulator to input the matrix information, an avalanche photodiode array to read out the result vector of the vector matrix multiplication, as well as transport optics and the electronics necessary to drive the optical modulators and interface to the computer. The intent of this research is to provide a low cost, highly energy efficient coprocessor for linear operations. Measurements of the analog accuracy of the processor performing 128 Mops are presented along with an assessment of the implications for future systems. A range of noise sources, including cross-talk, source amplitude fluctuations, shot noise at the detector, and non-linearities of the optoelectronic components are measured and compared to determine the most significant source of error. The possibilities for reducing these sources of error are discussed. Also, the total error is compared with that expected from a statistical analysis of the individual components and their relation to the vector-matrix operation. The sufficiency of the measured accuracy of the

  18. High-efficiency ordered silicon nano-conical-frustum array solar cells by self-powered parallel electron lithography.

    PubMed

    Lu, Yuerui; Lal, Amit

    2010-11-10

    Nanostructured silicon thin film solar cells are promising, due to the strongly enhanced light trapping, high carrier collection efficiency, and potential low cost. Ordered nanostructure arrays, with large-area controllable spacing, orientation, and size, are critical for reliable light-trapping and high-efficiency solar cells. Available top-down lithography approaches to fabricate large-area ordered nanostructure arrays are challenging due to the requirement of both high lithography resolution and high throughput. Here, a novel ordered silicon nano-conical-frustum array structure, exhibiting an impressive absorbance of 99% (upper bound) over wavelengths 400-1100 nm by a thickness of only 5 μm, is realized by our recently reported technique self-powered parallel electron lithography that has high-throughput and sub-35-nm high resolution. Moreover, high-efficiency (up to 10.8%) solar cells are demonstrated, using these ordered ultrathin silicon nano-conical-frustum arrays. These related fabrication techniques can also be transferred to low-cost substrate solar energy harvesting device applications.

  19. Parallel optical coherence tomography in scattering samples using a two-dimensional smart-pixel detector array

    NASA Astrophysics Data System (ADS)

    Ducros, M.; Laubscher, M.; Karamata, B.; Bourquin, S.; Lasser, T.; Salathé, R. P.

    2002-02-01

    Parallel optical coherence tomography in scattering samples is demonstrated using a 58×58 smart-pixel detector array. A femtosecond mode-locked Ti:Sapphire laser in combination with a free space Michelson interferometer was employed to achieve 4 μm longitudinal resolution and 9 μm transverse resolution on a 260×260 μm2 field of view. We imaged a resolution target covered by an intralipid solution with different scattering coefficients as well as onion cells.

  20. Parallel grid population

    DOEpatents

    Wald, Ingo; Ize, Santiago

    2015-07-28

    Parallel population of a grid with a plurality of objects using a plurality of processors. One example embodiment is a method for parallel population of a grid with a plurality of objects using a plurality of processors. The method includes a first act of dividing a grid into n distinct grid portions, where n is the number of processors available for populating the grid. The method also includes acts of dividing a plurality of objects into n distinct sets of objects, assigning a distinct set of objects to each processor such that each processor determines by which distinct grid portion(s) each object in its distinct set of objects is at least partially bounded, and assigning a distinct grid portion to each processor such that each processor populates its distinct grid portion with any objects that were previously determined to be at least partially bounded by its distinct grid portion.

  1. Real-Time Signal Processor for Pulsar Studies

    NASA Astrophysics Data System (ADS)

    Ramkumar, P. S.; Deshpande, A. A.

    2001-12-01

    This paper describes the design, tests and preliminary results of a real-time parallel signal processor built to aid a wide variety of pulsar observations. The signal processor reduces the distortions caused by the effects of dispersion, Faraday rotation, doppler acceleration and parallactic angle variations, at a sustained data rate of 32 Msamples/sec. It also folds the pulses coherently over the period and integrates adjacent samples in time and frequency to enhance the signal-to-noise ratio. The resulting data are recorded for further off-line analysis of the characteristics of pulsars and the intervening medium. The signal processing for analysis of pulsar signals is quite complex, imposing the need for a high computational throughput, typically of the order of a Giga operations per second (GOPS). Conventionally, the high computational demand restricts the flexibility to handle only a few types of pulsar observations. This instrument is designed to handle a wide variety of Pulsar observations with the Giant Metre Wave Radio Telescope (GMRT), and is flexible enough to be used in many other high-speed, signal processing applications. The technology used includes field-programmable-gate-array(FPGA) based data/code routing interfaces, PC-AT based control, diagnostics and data acquisition, digital signal processor (DSP) chip based parallel processing nodes and C language based control software and DSP-assembly programs for signal processing. The architecture and the software implementation of the parallel processor are fine-tuned to realize about 60 MOPS per DSP node and a multiple-instruction-multiple-data (MIMD) capability.

  2. Comparative Analysis on the Performance of a Short String of Series-Connected and Parallel-Connected Photovoltaic Array Under Partial Shading

    NASA Astrophysics Data System (ADS)

    Vijayalekshmy, S.; Rama Iyer, S.; Beevi, Bisharathu

    2015-09-01

    The output power from the photovoltaic (PV) array decreases and the array exhibit multiple peaks when it is subjected to partial shading (PS). The power loss in the PV array varies with the array configuration, physical location and the shading pattern. This paper compares the relative performance of a PV array consisting of a short string of three PV modules for two different configurations. The mismatch loss, shading loss, fill factor and the power loss due to the failure in tracking of the global maximum power point, of a series string with bypass diodes and short parallel string are analysed using MATLAB/Simulink model. The performance of the system is investigated for three different conditions of solar insolation for the same shading pattern. Results indicate that there is considerable power loss due to shading in a series string during PS than in a parallel string with same number of modules.

  3. Kokkos Array

    SciTech Connect

    Edwards Daniel Sunderland, Harold Carter

    2012-09-12

    The Kokkos Array library implements shared-memory array data structures and parallel task dispatch interfaces for data-parallel computational kernels that are performance-portable to multicore-CPU and manycore-accelerator (e.g., GPGPU) devices.

  4. Parallel Transport of Biological Cells using Individually Addressable VCSEL Arrays as Optical Tweezers

    DTIC Science & Technology

    2005-06-01

    microbeam arrays, in: Proceedings of the OSA Optics in Computing (OtuC1–2), 10 January 2001, Lake Tahoe , Nevada, October 2001. [3] M.M. Wang, M...greatest axial restoring force to the three-dimensional trap comes from the photons at the largest incidence angle. Photons in the center portion of...Alsever’s solu- tion, and a sparse concentration of 5 and 10 mm polystyrene microspheres (Bangs Laboratories Inc.) immersed in DI water . To measure the

  5. Simulations of high current wire array Z-pinches using a parallel 3D resistive MHD

    NASA Astrophysics Data System (ADS)

    Chittenden, J. P.; Jennings, C. A.; Ciardi, A.

    2006-10-01

    We present calculations of the implosion and stagnation phases of wire array Z-pinches at Sandia National Laboratory which model the full 3D plasma volume. Modelling the full volume in 3D is found to be necessary in order to accommodate all possible mechanisms for broadening the width of the imploding plasma and for modelling all modes of instability in the stagnated pinch. The width of the imploding plasma is shown to arise from the evolution of the uncorrelated modulations present on each wire in the array early in time into a globally correlated 3D instability structure. The 3D nature of the collision of two nested arrays is highlighted and the implications for radiation pulse shaping are discussed. The addition of a simple circuit model to model the Z generator allows the pinch energetics during stagnation to be treated more accurately and provides another point of comparison to experimental data. The implications of these results for improved X-ray production are discussed both for the keV range and for soft X-ray radiation sources used in inertial confinement fusion research. This work was partially supported by the U.S. Department of Energy through cooperative agreement DE-FC03-02NA00057.

  6. Field programmable gate array based parallel strapdown algorithm design for strapdown inertial navigation systems.

    PubMed

    Li, Zong-Tao; Wu, Tie-Jun; Lin, Can-Long; Ma, Long-Hua

    2011-01-01

    A new generalized optimum strapdown algorithm with coning and sculling compensation is presented, in which the position, velocity and attitude updating operations are carried out based on the single-speed structure in which all computations are executed at a single updating rate that is sufficiently high to accurately account for high frequency angular rate and acceleration rectification effects. Different from existing algorithms, the updating rates of the coning and sculling compensations are unrelated with the number of the gyro incremental angle samples and the number of the accelerometer incremental velocity samples. When the output sampling rate of inertial sensors remains constant, this algorithm allows increasing the updating rate of the coning and sculling compensation, yet with more numbers of gyro incremental angle and accelerometer incremental velocity in order to improve the accuracy of system. Then, in order to implement the new strapdown algorithm in a single FPGA chip, the parallelization of the algorithm is designed and its computational complexity is analyzed. The performance of the proposed parallel strapdown algorithm is tested on the Xilinx ISE 12.3 software platform and the FPGA device XC6VLX550T hardware platform on the basis of some fighter data. It is shown that this parallel strapdown algorithm on the FPGA platform can greatly decrease the execution time of algorithm to meet the real-time and high precision requirements of system on the high dynamic environment, relative to the existing implemented on the DSP platform.

  7. Template-directed atomically precise self-organization of perfectly ordered parallel cerium silicide nanowire arrays on Si(110)-16 × 2 surfaces

    PubMed Central

    2013-01-01

    The perfectly ordered parallel arrays of periodic Ce silicide nanowires can self-organize with atomic precision on single-domain Si(110)-16 × 2 surfaces. The growth evolution of self-ordered parallel Ce silicide nanowire arrays is investigated over a broad range of Ce coverages on single-domain Si(110)-16 × 2 surfaces by scanning tunneling microscopy (STM). Three different types of well-ordered parallel arrays, consisting of uniformly spaced and atomically identical Ce silicide nanowires, are self-organized through the heteroepitaxial growth of Ce silicides on a long-range grating-like 16 × 2 reconstruction at the deposition of various Ce coverages. Each atomically precise Ce silicide nanowire consists of a bundle of chains and rows with different atomic structures. The atomic-resolution dual-polarity STM images reveal that the interchain coupling leads to the formation of the registry-aligned chain bundles within individual Ce silicide nanowire. The nanowire width and the interchain coupling can be adjusted systematically by varying the Ce coverage on a Si(110) surface. This natural template-directed self-organization of perfectly regular parallel nanowire arrays allows for the precise control of the feature size and positions within ±0.2 nm over a large area. Thus, it is a promising route to produce parallel nanowire arrays in a straightforward, low-cost, high-throughput process. PMID:24188092

  8. Optimal expression evaluation for data parallel architectures

    NASA Technical Reports Server (NTRS)

    Gilbert, John R.; Schreiber, Robert

    1990-01-01

    A data parallel machine represents an array or other composite data structure by allocating one processor (at least conceptually) per data item. A pointwise operation can be performed between two such arrays in unit time, provided their corresponding elements are allocated in the same processors. If the arrays are not aligned in this fashion, the cost of moving one or both of them is part of the cost of the operation. The choice of where to perform the operation then affects this cost. If an expression with several operands is to be evaluated, there may be many choices of where to perform the intermediate operations. An efficient algorithm is given to find the minimum-cost way to evaluate an expression, for several different data parallel architectures. This algorithm applies to any architecture in which the metric describing the cost of moving an array is robust. This encompasses most of the common data parallel communication architectures, including meshes of arbitrary dimension and hypercubes. Remarks are made on several variations of the problem, some of which are solved and some of which remain open.

  9. Parallel rendering techniques for massively parallel visualization

    SciTech Connect

    Hansen, C.; Krogh, M.; Painter, J.

    1995-07-01

    As the resolution of simulation models increases, scientific visualization algorithms which take advantage of the large memory. and parallelism of Massively Parallel Processors (MPPs) are becoming increasingly important. For large applications rendering on the MPP tends to be preferable to rendering on a graphics workstation due to the MPP`s abundant resources: memory, disk, and numerous processors. The challenge becomes developing algorithms that can exploit these resources while minimizing overhead, typically communication costs. This paper will describe recent efforts in parallel rendering for polygonal primitives as well as parallel volumetric techniques. This paper presents rendering algorithms, developed for massively parallel processors (MPPs), for polygonal, spheres, and volumetric data. The polygon algorithm uses a data parallel approach whereas the sphere and volume render use a MIMD approach. Implementations for these algorithms are presented for the Thinking Ma.chines Corporation CM-5 MPP.

  10. Twyman-Green-type integrated laser interferometer array for parallel MEMS testing

    NASA Astrophysics Data System (ADS)

    Oliva, M.; Michaelis, D.; Dannberg, P.; Józwik, M.; Liżewski, K.; Kujawińska, M.; Zeitner, U. D.

    2012-01-01

    In this paper the concept, design and realization of an integrated laser interferometer are presented. The integrated Twyman-Green-type interferometer is based on micro-optical diffraction gratings in the resonance domain. The fabrication process of these gratings and their assembly in the functional interferometer will be discussed in detail. The interferometer, available in array configuration of 5 × 5 channels, is part of a test set-up for a fast characterization of M(O)EMS devices at a wafer level. The first results of the interferometric measurements on an MEMS object are presented.

  11. Upset Characterization of the PowerPC405 Hard-core Processor Embedded in Virtex-II Pro Field Programmable Gate Arrays

    NASA Technical Reports Server (NTRS)

    Swift, Gary M.; Allen, Gregory S.; Farmanesh, Farhad; George, Jeffrey; Petrick, David J.; Chayab, Fayez

    2006-01-01

    Shown in this presentation are recent results for the upset susceptibility of the various types of memory elements in the embedded PowerPC405 in the Xilinx V2P40 FPGA. For critical flight designs where configuration upsets are mitigated effectively through appropriate design triplication and configuration scrubbing, these upsets of processor elements can dominate the system error rate. Data from irradiations with both protons and heavy ions are given and compared using available models.

  12. Precisely-controlled fabrication of single ZnO nanoemitter arrays and their possible application in low energy parallel electron beam exposure.

    PubMed

    He, H; She, J C; Huang, Y F; Deng, S Z; Xu, N S

    2012-03-21

    Precisely-controlled fabrication of single ZnO nanoemitter arrays and their possible application in low energy parallel electron beam exposure are reported. A well defined polymethyl methacrylate (PMMA) nanohole template was employed for local solution-phase growth of single ZnO nanoemitter arrays. Chlorine plasma etching for surface smoothing and pulsed-laser illumination in nitrogen for nitrogen doping were performed, which can significantly enhance the electron emission and improve the emitter-to-emitter uniformity in performance. Mechanisms responsible for the field emission enhancing effect are proposed. Low voltage (368 V) e-beam exposure was performed by using a ZnO nanoemitter array and a periodical hole pattern (0.72-1.26 μm in diameter) was produced on a thin (25 nm) PMMA. The work demonstrates the feasibility of utilizing single ZnO nano-field emitter arrays for low voltage parallel electron beam lithography.

  13. Scripts for Scalable Monitoring of Parallel Filesystem Infrastructure

    SciTech Connect

    Caldwell, Blake

    2014-02-27

    Scripts for scalable monitoring of parallel filesystem infrastructure provide frameworks for monitoring the health of block storage arrays and large InfiniBand fabrics. The block storage framework uses Python multiprocessing to within scale the number monitored arrays to scale with the number of processors in the system. This enables live monitoring of HPC-scale filesystem with 10-50 storage arrays. For InfiniBand monitoring, there are scripts included that monitor InfiniBand health of each host along with visualization tools for mapping the topology of complex fabric topologies.

  14. Design and optimization of multi-class series-parallel linear electromagnetic array artificial muscle.

    PubMed

    Li, Jing; Ji, Zhenyu; Shi, Xuetao; You, Fusheng; Fu, Feng; Liu, Ruigang; Xia, Junying; Wang, Nan; Bai, Jing; Wang, Zhanxi; Qin, Xiansheng; Dong, Xiuzhen

    2014-01-01

    Skeletal muscle exhibiting complex and excellent precision has evolved for millions of years. Skeletal muscle has better performance and simpler structure compared with existing driving modes. Artificial muscle may be designed by analyzing and imitating properties and structure of skeletal muscle based on bionics, which has been focused on by bionic researchers, and a structure mode of linear electromagnetic array artificial muscle has been designed in this paper. Half sarcomere is the minimum unit of artificial muscle and electromagnetic model has been built. The structural parameters of artificial half sarcomere actuator were optimized to achieve better movement performance. Experimental results show that artificial half sarcomere actuator possesses great motion performance such as high response speed, great acceleration, small weight and size, robustness, etc., which presents a promising application prospect of artificial half sarcomere actuator.

  15. High-resolution parallel-detection sensor array using piezo-phototronics effect

    DOEpatents

    Wang, Zhong L.; Pan, Caofeng

    2015-07-28

    A pressure sensor element includes a substrate, a first type of semiconductor material layer and an array of elongated light-emitting piezoelectric nanostructures extending upwardly from the first type of semiconductor material layer. A p-n junction is formed between each nanostructure and the first type semiconductor layer. An insulative resilient medium layer is infused around each of the elongated light-emitting piezoelectric nanostructures. A transparent planar electrode, disposed on the resilient medium layer, is electrically coupled to the top of each nanostructure. A voltage source is coupled to the first type of semiconductor material layer and the transparent planar electrode and applies a biasing voltage across each of the nanostructures. Each nanostructure emits light in an intensity that is proportional to an amount of compressive strain applied thereto.

  16. A periodic array of nano-scale parallel slats for high-efficiency electroosmotic pumping.

    PubMed

    Kung, Chun-Fei; Wang, Chang-Yi; Chang, Chien-Cheng

    2013-12-01

    It is known that the eletroosmotic (EO) flow rate through a nano-scale channel is extremely small. A channel made of a periodic array of slats is proposed to effectively promote the EO pumping, and thus greatly improve the EO flow rate. The geometrically simple array is complicated enough that four length scales are involved: the vertical period 2L, lateral period 2aL, width of the slat 2cL as well as the Debye length λD. The EO pumping rate is determined by the normalized lengths: a, c, or the perforation fraction of slats η=1-(c/a) and the dimensionless electrokinetic width K=L/λD. In a nano-scale channel, K is of order unity or less. EO pumping in both longitudinal and transverse directions (denoted as longitudinal EO pumping (LEOP) and transverse EO pumping (TEOP), respectively) is investigated by solving the Debye-Hückel approximation and viscous electro-kinetic equation. The main findings include that (i) the EO pumping rates of LEOP for small K are remarkably improved (by one order of magnitude) when we have longer slats (a≫1) and a large perforation fraction of slats (η > 0.7); (ii) the EO pumping rates of TEOP for small K can also be much improved but less significantly with longer slats and a large perforation fraction of slats. Nevertheless, it must be noted that in practice K cannot be made arbitrarily small as the criterion of φc≈0 for the reference potential at the channel center put lower bounds on K; in other words, there are geometrical limits for the use of the Poisson-Boltzmann equation.

  17. High-resolution parallel phase-shifting digital holography using a low-resolution phase-shifting array device based on image inpainting.

    PubMed

    Jiao, Shuming; Zou, Wenbin

    2017-02-01

    Parallel phase-shifting digital holography can record high-quality holograms efficiently from fast-moving objects in a dynamic scene. However, a phase-shifting array device with a cell size identical to image sensors is required, which imposes difficulty in practice. This Letter proposes a novel scheme to employ a low-resolution phase-shifting array device to achieve high-resolution parallel phase-shifting digital holography, based on image inpainting performed on incomplete holograms. The experimental results validate the effectiveness of the proposed scheme.

  18. Opto-electronic morphological processor

    NASA Technical Reports Server (NTRS)

    Yu, Jeffrey W. (Inventor); Chao, Tien-Hsin (Inventor); Cheng, Li J. (Inventor); Psaltis, Demetri (Inventor)

    1993-01-01

    The opto-electronic morphological processor of the present invention is capable of receiving optical inputs and emitting optical outputs. The use of optics allows implementation of parallel input/output, thereby overcoming a major bottleneck in prior art image processing systems. The processor consists of three components, namely, detectors, morphological operators and modulators. The detectors and operators are fabricated on a silicon VLSI chip and implement the optical input and morphological operations. A layer of ferro-electric liquid crystals is integrated with a silicon chip to provide the optical modulation. The implementation of the image processing operators in electronics leads to a wide range of applications and the use of optical connections allows cascadability of these parallel opto-electronic image processing components and high speed operation. Such an opto-electronic morphological processor may be used as the pre-processing stage in an image recognition system. In one example disclosed herein, the optical input/optical output morphological processor of the invention is interfaced with a binary phase-only correlator to produce an image recognition system.

  19. Photorefractive processing for large adaptive phased arrays

    NASA Astrophysics Data System (ADS)

    Weverka, Robert T.; Wagner, Kelvin; Sarto, Anthony

    1996-03-01

    An adaptive null-steering phased-array optical processor that utilizes a photorefractive crystal to time integrate the adaptive weights and null out correlated jammers is described. This is a beam-steering processor in which the temporal waveform of the desired signal is known but the look direction is not. The processor computes the angle(s) of arrival of the desired signal and steers the array to look in that direction while rotating the nulls of the antenna pattern toward any narrow-band jammers that may be present. We have experimentally demonstrated a simplified version of this adaptive phased-array-radar processor that nulls out the narrow-band jammers by using feedback-correlation detection. In this processor it is assumed that we know a priori only that the signal is broadband and the jammers are narrow band. These are examples of a class of optical processors that use the angular selectivity of volume holograms to form the nulls and look directions in an adaptive phased-array-radar pattern and thereby to harness the computational abilities of three-dimensional parallelism in the volume of photorefractive crystals. The development of this processing in volume holographic system has led to a new algorithm for phased-array-radar processing that uses fewer tapped-delay lines than does the classic time-domain beam former. The optical implementation of the new algorithm has the further advantage of utilization of a single photorefractive crystal to implement as many as a million adaptive weights, allowing the radar system to scale to large size with no increase in processing hardware.

  20. Platinum plasmonic nanostructure arrays for massively parallel single-molecule detection based on enhanced fluorescence measurements.

    PubMed

    Saito, Toshiro; Takahashi, Satoshi; Obara, Takayuki; Itabashi, Naoshi; Imai, Kazumichi

    2011-11-04

    We fabricated platinum bowtie nanostructure arrays producing fluorescence enhancement and evaluated their performance using two-photon photoluminescence and single-molecule fluorescence measurements. A comprehensive selection of suitable materials was explored by electromagnetic simulation and Pt was chosen as the plasmonic material for visible light excitation near 500 nm, which is preferable for multicolor dye-labeling applications like DNA sequencing. The observation of bright photoluminescence (λ = 500-600 nm) from each Pt nanostructure, induced by irradiation at 800 nm with a femtosecond laser pulse, clearly indicates that a highly enhanced local field is created near the Pt nanostructure. The attachment of a single dye molecule was attempted between the Pt triangles of each nanostructure by using selective immobilization chemistry. The fluorescence intensities of the single dye molecule localized on the nanostructures were measured. A highly enhanced fluorescence, which was increased by a factor of 30, was observed. The two-photon photoluminescence intensity and fluorescence intensity showed qualitatively consistent gap size dependence. However, the average fluorescence enhancement factor was rather repressed even in the nanostructure with the smallest gap size compared to the large growth of photoluminescence. The variation of the position of the dye molecule attached to the nanostructure may influence the wide distribution of the fluorescence enhancement factor and cause the rather small average value of the fluorescence enhancement factor.

  1. Reconfigurable VLSI architecture for a database processor

    SciTech Connect

    Oflazer, K.

    1983-01-01

    This work brings together the processing potential offered by regularly structured VLSI processing units and the architecture of a database processor-the relational associative processor (RAP). The main motivations are to integrate a RAP cell processor on a few VLSI chips and improve performance by employing procedures exploiting these VLSI chips and the system level reconfigurability of processing resources. The resulting VLSI database processor consists of parallel processing cells that can be reconfigured into a large processor to execute the hard operations of projection and semijoin efficiently. It is shown that such a configuration can provide 2 to 3 orders of magnitude of performance improvement over previous implementations of the RAP system in the execution of such operations. 27 refs.

  2. Online track processor for the CDF upgrade

    SciTech Connect

    E. J. Thomson et al.

    2002-07-17

    A trigger track processor, called the eXtremely Fast Tracker (XFT), has been designed for the CDF upgrade. This processor identifies high transverse momentum (> 1.5 GeV/c) charged particles in the new central outer tracking chamber for CDF II. The XFT design is highly parallel to handle the input rate of 183 Gbits/s and output rate of 44 Gbits/s. The processor is pipelined and reports the result for a new event every 132 ns. The processor uses three stages: hit classification, segment finding, and segment linking. The pattern recognition algorithms for the three stages are implemented in programmable logic devices (PLDs) which allow in-situ modification of the algorithm at any time. The PLDs reside on three different types of modules. The complete system has been installed and commissioned at CDF II. An overview of the track processor and performance in CDF Run II are presented.

  3. Implementing Access to Data Distributed on Many Processors

    NASA Technical Reports Server (NTRS)

    James, Mark

    2006-01-01

    A reference architecture is defined for an object-oriented implementation of domains, arrays, and distributions written in the programming language Chapel. This technology primarily addresses domains that contain arrays that have regular index sets with the low-level implementation details being beyond the scope of this discussion. What is defined is a complete set of object-oriented operators that allows one to perform data distributions for domain arrays involving regular arithmetic index sets. What is unique is that these operators allow for the arbitrary regions of the arrays to be fragmented and distributed across multiple processors with a single point of access giving the programmer the illusion that all the elements are collocated on a single processor. Today's massively parallel High Productivity Computing Systems (HPCS) are characterized by a modular structure, with a large number of processing and memory units connected by a high-speed network. Locality of access as well as load balancing are primary concerns in these systems that are typically used for high-performance scientific computation. Data distributions address these issues by providing a range of methods for spreading large data sets across the components of a system. Over the past two decades, many languages, systems, tools, and libraries have been developed for the support of distributions. Since the performance of data parallel applications is directly influenced by the distribution strategy, users often resort to low-level programming models that allow fine-tuning of the distribution aspects affecting performance, but, at the same time, are tedious and error-prone. This technology presents a reusable design of a data-distribution framework for data parallel high-performance applications. Distributions are a means to express locality in systems composed of large numbers of processor and memory components connected by a network. Since distributions have a great effect on the performance of

  4. Contextual classification on a CDC Flexible Processor system. [for photomapped remote sensing data

    NASA Technical Reports Server (NTRS)

    Smith, B. W.; Siegel, H. J.; Swain, P. H.

    1981-01-01

    A potential hardware organization for the Flexible Processor Array is presented. An algorithm that implements a contextual classifier for remote sensing data analysis is given, along with uniprocessor classification algorithms. The Flexible Processor algorithm is provided, as are simulated timings for contextual classifiers run on the Flexible Processor Array and another system. The timings are analyzed for context neighborhoods of sizes three and nine.

  5. Parallel asynchronous systems and image processing algorithms

    NASA Technical Reports Server (NTRS)

    Coon, D. D.; Perera, A. G. U.

    1989-01-01

    A new hardware approach to implementation of image processing algorithms is described. The approach is based on silicon devices which would permit an independent analog processing channel to be dedicated to evey pixel. A laminar architecture consisting of a stack of planar arrays of the device would form a two-dimensional array processor with a 2-D array of inputs located directly behind a focal plane detector array. A 2-D image data stream would propagate in neuronlike asynchronous pulse coded form through the laminar processor. Such systems would integrate image acquisition and image processing. Acquisition and processing would be performed concurrently as in natural vision systems. The research is aimed at implementation of algorithms, such as the intensity dependent summation algorithm and pyramid processing structures, which are motivated by the operation of natural vision systems. Implementation of natural vision algorithms would benefit from the use of neuronlike information coding and the laminar, 2-D parallel, vision system type architecture. Besides providing a neural network framework for implementation of natural vision algorithms, a 2-D parallel approach could eliminate the serial bottleneck of conventional processing systems. Conversion to serial format would occur only after raw intensity data has been substantially processed. An interesting challenge arises from the fact that the mathematical formulation of natural vision algorithms does not specify the means of implementation, so that hardware implementation poses intriguing questions involving vision science.

  6. NeuroFlow: A General Purpose Spiking Neural Network Simulation Platform using Customizable Processors.

    PubMed

    Cheung, Kit; Schultz, Simon R; Luk, Wayne

    2015-01-01

    NeuroFlow is a scalable spiking neural network simulation platform for off-the-shelf high performance computing systems using customizable hardware processors such as Field-Programmable Gate Arrays (FPGAs). Unlike multi-core processors and application-specific integrated circuits, the processor architecture of NeuroFlow can be redesigned and reconfigured to suit a particular simulation to deliver optimized performance, such as the degree of parallelism to employ. The compilation process supports using PyNN, a simulator-independent neural network description language, to configure the processor. NeuroFlow supports a number of commonly used current or conductance based neuronal models such as integrate-and-fire and Izhikevich models, and the spike-timing-dependent plasticity (STDP) rule for learning. A 6-FPGA system can simulate a network of up to ~600,000 neurons and can achieve a real-time performance of 400,000 neurons. Using one FPGA, NeuroFlow delivers a speedup of up to 33.6 times the speed of an 8-core processor, or 2.83 times the speed of GPU-based platforms. With high flexibility and throughput, NeuroFlow provides a viable environment for large-scale neural network simulation.

  7. NeuroFlow: A General Purpose Spiking Neural Network Simulation Platform using Customizable Processors

    PubMed Central

    Cheung, Kit; Schultz, Simon R.; Luk, Wayne

    2016-01-01

    NeuroFlow is a scalable spiking neural network simulation platform for off-the-shelf high performance computing systems using customizable hardware processors such as Field-Programmable Gate Arrays (FPGAs). Unlike multi-core processors and application-specific integrated circuits, the processor architecture of NeuroFlow can be redesigned and reconfigured to suit a particular simulation to deliver optimized performance, such as the degree of parallelism to employ. The compilation process supports using PyNN, a simulator-independent neural network description language, to configure the processor. NeuroFlow supports a number of commonly used current or conductance based neuronal models such as integrate-and-fire and Izhikevich models, and the spike-timing-dependent plasticity (STDP) rule for learning. A 6-FPGA system can simulate a network of up to ~600,000 neurons and can achieve a real-time performance of 400,000 neurons. Using one FPGA, NeuroFlow delivers a speedup of up to 33.6 times the speed of an 8-core processor, or 2.83 times the speed of GPU-based platforms. With high flexibility and throughput, NeuroFlow provides a viable environment for large-scale neural network simulation. PMID:26834542

  8. Highly Parallel Computing Architectures by using Arrays of Quantum-dot Cellular Automata (QCA): Opportunities, Challenges, and Recent Results

    NASA Technical Reports Server (NTRS)

    Fijany, Amir; Toomarian, Benny N.

    2000-01-01

    There has been significant improvement in the performance of VLSI devices, in terms of size, power consumption, and speed, in recent years and this trend may also continue for some near future. However, it is a well known fact that there are major obstacles, i.e., physical limitation of feature size reduction and ever increasing cost of foundry, that would prevent the long term continuation of this trend. This has motivated the exploration of some fundamentally new technologies that are not dependent on the conventional feature size approach. Such technologies are expected to enable scaling to continue to the ultimate level, i.e., molecular and atomistic size. Quantum computing, quantum dot-based computing, DNA based computing, biologically inspired computing, etc., are examples of such new technologies. In particular, quantum-dots based computing by using Quantum-dot Cellular Automata (QCA) has recently been intensely investigated as a promising new technology capable of offering significant improvement over conventional VLSI in terms of reduction of feature size (and hence increase in integration level), reduction of power consumption, and increase of switching speed. Quantum dot-based computing and memory in general and QCA specifically, are intriguing to NASA due to their high packing density (10(exp 11) - 10(exp 12) per square cm ) and low power consumption (no transfer of current) and potentially higher radiation tolerant. Under Revolutionary Computing Technology (RTC) Program at the NASA/JPL Center for Integrated Space Microelectronics (CISM), we have been investigating the potential applications of QCA for the space program. To this end, exploiting the intrinsic features of QCA, we have designed novel QCA-based circuits for co-planner (i.e., single layer) and compact implementation of a class of data permutation matrices, a class of interconnection networks, and a bit-serial processor. Building upon these circuits, we have developed novel algorithms and QCA

  9. Calculating electronic tunnel currents in networks of disordered irregularly shaped nanoparticles by mapping networks to arrays of parallel nonlinear resistors

    SciTech Connect

    Aghili Yajadda, Mir Massoud

    2014-10-21

    We have shown both theoretically and experimentally that tunnel currents in networks of disordered irregularly shaped nanoparticles (NPs) can be calculated by considering the networks as arrays of parallel nonlinear resistors. Each resistor is described by a one-dimensional or a two-dimensional array of equal size nanoparticles that the tunnel junction gaps between nanoparticles in each resistor is assumed to be equal. The number of tunnel junctions between two contact electrodes and the tunnel junction gaps between nanoparticles are found to be functions of Coulomb blockade energies. In addition, the tunnel barriers between nanoparticles were considered to be tilted at high voltages. Furthermore, the role of thermal expansion coefficient of the tunnel junction gaps on the tunnel current is taken into account. The model calculations fit very well to the experimental data of a network of disordered gold nanoparticles, a forest of multi-wall carbon nanotubes, and a network of few-layer graphene nanoplates over a wide temperature range (5-300 K) at low and high DC bias voltages (0.001 mV–50 V). Our investigations indicate, although electron cotunneling in networks of disordered irregularly shaped NPs may occur, non-Arrhenius behavior at low temperatures cannot be described by the cotunneling model due to size distribution in the networks and irregular shape of nanoparticles. Non-Arrhenius behavior of the samples at zero bias voltage limit was attributed to the disorder in the samples. Unlike the electron cotunneling model, we found that the crossover from Arrhenius to non-Arrhenius behavior occurs at two temperatures, one at a high temperature and the other at a low temperature.

  10. Massively Parallel Implementation of Explicitly Correlated Coupled-Cluster Singles and Doubles Using TiledArray Framework.

    PubMed

    Peng, Chong; Calvin, Justus A; Pavošević, Fabijan; Zhang, Jinmei; Valeev, Edward F

    2016-12-29

    A new distributed-memory massively parallel implementation of standard and explicitly correlated (F12) coupled-cluster singles and doubles (CCSD) with canonical O(N(6)) computational complexity is described. The implementation is based on the TiledArray tensor framework. Novel features of the implementation include (a) all data greater than O(N) is distributed in memory and (b) the mixed use of density fitting and integral-driven formulations that optionally allows to avoid storage of tensors with three and four unoccupied indices. Excellent strong scaling is demonstrated on a multicore shared-memory computer, a commodity distributed-memory computer, and a national-scale supercomputer. The performance on a shared-memory computer is competitive with the popular CCSD implementations in ORCA and Psi4. Moreover, the CCSD performance on a commodity-size cluster significantly improves on the state-of-the-art package NWChem. The large-scale parallel explicitly correlated coupled-cluster implementation makes routine accurate estimation of the coupled-cluster basis set limit for molecules with 20 or more atoms. Thus, it can provide valuable benchmarks for the merging reduced-scaling coupled-cluster approaches. The new implementation allowed us to revisit the basis set limit for the CCSD contribution to the binding energy of π-stacked uracil dimer, a challenging paradigm of π-stacking interactions from the S66 benchmark database. The revised value for the CCSD correlation binding energy obtained with the help of quadruple-ζ CCSD computations, -8.30 ± 0.02 kcal/mol, is significantly different from the S66 reference value, -8.50 kcal/mol, as well as other CBS limit estimates in the recent literature.

  11. Massive parallel insertion site sequencing of an arrayed Sinorhizobium meliloti signature-tagged mini-Tn 5 transposon mutant library.

    PubMed

    Serrania, Javier; Johner, Tobias; Rupp, Oliver; Goesmann, Alexander; Becker, Anke

    2017-02-21

    Transposon mutagenesis in conjunction with identification of genomic transposon insertion sites is a powerful tool for gene function studies. We have implemented a protocol for parallel determination of transposon insertion sites by Illumina sequencing involving a hierarchical barcoding method that allowed for tracking back insertion sites to individual clones of an arrayed signature-tagged transposon mutant library. This protocol was applied to further characterize a signature-tagged mini-Tn 5 mutant library comprising about 12,000 mutants of the symbiotic nitrogen-fixing alphaproteobacterium Sinorhizobium meliloti (Pobigaylo et al., 2006; Appl. Environ. Microbiol. 72, 4329-4337). Previously, insertion sites have been determined for 5000 mutants of this library. Combining an adapter-free, inverse PCR method for sequencing library preparation with next generation sequencing, we identified 4473 novel insertion sites, increasing the total number of transposon mutants with known insertion site to 9562. The number of protein-coding genes that were hit at least once by a transposon increased by 1231 to a total number of 3673 disrupted genes, which represents 59% of the predicted protein-coding genes in S. meliloti.

  12. Infrared laser transillumination CT imaging system using parallel fiber arrays and optical switches for finger joint imaging

    NASA Astrophysics Data System (ADS)

    Sasaki, Yoshiaki; Emori, Ryota; Inage, Hiroki; Goto, Masaki; Takahashi, Ryo; Yuasa, Tetsuya; Taniguchi, Hiroshi; Devaraj, Balasigamani; Akatsuka, Takao

    2004-05-01

    The heterodyne detection technique, on which the coherent detection imaging (CDI) method founds, can discriminate and select very weak, highly directional forward scattered, and coherence retaining photons that emerge from scattering media in spite of their complex and highly scattering nature. That property enables us to reconstruct tomographic images using the same reconstruction technique as that of X-Ray CT, i.e., the filtered backprojection method. Our group had so far developed a transillumination laser CT imaging method based on the CDI method in the visible and near-infrared regions and reconstruction from projections, and reported a variety of tomographic images both in vitro and in vivo of biological objects to demonstrate the effectiveness to biomedical use. Since the previous system was not optimized, it took several hours to obtain a single image. For a practical use, we developed a prototype CDI-based imaging system using parallel fiber array and optical switches to reduce the measurement time significantly. Here, we describe a prototype transillumination laser CT imaging system using fiber-optic based on optical heterodyne detection for early diagnosis of rheumatoid arthritis (RA), by demonstrating the tomographic imaging of acrylic phantom as well as the fundamental imaging properties. We expect that further refinements of the fiber-optic-based laser CT imaging system could lead to a novel and practical diagnostic tool for rheumatoid arthritis and other joint- and bone-related diseases in human finger.

  13. Detection and Classification of Low Probability of Intercept Radar Signals Using Parallel Filter Arrays and Higher Order Statistics

    NASA Astrophysics Data System (ADS)

    Taboada, Fernando L.

    2002-09-01

    Low probability of intercept (LPI) is that property of an emitter that because of its low power, wide bandwidth, frequency variability, or other design attributes, makes it difficult to be detected or identified by means of passive intercept devices such as radar warning, electronic support and electronic intelligence receivers. In order to detect LPI radar waveforms new signal processing techniques are required. This thesis first develops a MATLAB toolbox to generate important types of LPI waveforms based on frequency and phase modulation. The power spectral density and the periodic ambiguity function are examined for each waveforms. These signals are then used to test a novel signal processing technique that detects the waveforms parameters and classifies the intercepted signal in various degrees of noise. The technique is based on the use of parallel filter (sub-band) arrays and higher order statistics (third-order cumulant estimator). Each sub-band signal is treated individually and is followed by the third-order estimator in order to suppress any symmetrical noise that might be present. The significance of this technique is that it separates the LPI waveforms in small frequency bands, providing a detailed time-frequency description of the unknown signal. Finally, the resulting output matrix is processed by a feature extraction routine to detect the waveforms parameters. Identification of the signal is based on the modulation parameters detected.

  14. SPECT reconstruction using a backpropagation neural network implemented on a massively parallel SIMD computer

    SciTech Connect

    Kerr, J.P.; Bartlett, E.B.

    1992-12-31

    In this paper, the feasibility of reconstructing a single photon emission computed tomography (SPECT) image via the parallel implementation of a backpropagation neural network is shown. The MasPar, MP-1 is a single instruction multiple data (SIMD) massively parallel machine. It is composed of a 128 x 128 array of 4-bit processors. The neural network is distributed on the array by dedicating a processor to each node and each interconnection of the network. An 8 x 8 SPECT image slice section is projected into eight planes. It is shown that based on the projections, the neural network can produce the original SPECT slice image exactly. Likewise, when trained on two parallel slices, separated by one slice, the neural network is able to reproduce the center, untrained image to an RMS error of 0.001928.

  15. FFT Computation with Systolic Arrays, A New Architecture

    NASA Technical Reports Server (NTRS)

    Boriakoff, Valentin

    1994-01-01

    The use of the Cooley-Tukey algorithm for computing the l-d FFT lends itself to a particular matrix factorization which suggests direct implementation by linearly-connected systolic arrays. Here we present a new systolic architecture that embodies this algorithm. This implementation requires a smaller number of processors and a smaller number of memory cells than other recent implementations, as well as having all the advantages of systolic arrays. For the implementation of the decimation-in-frequency case, word-serial data input allows continuous real-time operation without the need of a serial-to-parallel conversion device. No control or data stream switching is necessary. Computer simulation of this architecture was done in the context of a 1024 point DFT with a fixed point processor, and CMOS processor implementation has started.

  16. MULTI-CORE AND OPTICAL PROCESSOR RELATED APPLICATIONS RESEARCH AT OAK RIDGE NATIONAL LABORATORY

    SciTech Connect

    Barhen, Jacob; Kerekes, Ryan A; ST Charles, Jesse Lee; Buckner, Mark A

    2008-01-01

    High-speed parallelization of common tasks holds great promise as a low-risk approach to achieving the significant increases in signal processing and computational performance required for next generation innovations in reconfigurable radio systems. Researchers at the Oak Ridge National Laboratory have been working on exploiting the parallelization offered by this emerging technology and applying it to a variety of problems. This paper will highlight recent experience with four different parallel processors applied to signal processing tasks that are directly relevant to signal processing required for SDR/CR waveforms. The first is the EnLight Optical Core Processor applied to matched filter (MF) correlation processing via fast Fourier transform (FFT) of broadband Dopplersensitive waveforms (DSW) using active sonar arrays for target tracking. The second is the IBM CELL Broadband Engine applied to 2-D discrete Fourier transform (DFT) kernel for image processing and frequency domain processing. And the third is the NVIDIA graphical processor applied to document feature clustering. EnLight Optical Core Processor. Optical processing is inherently capable of high-parallelism that can be translated to very high performance, low power dissipation computing. The EnLight 256 is a small form factor signal processing chip (5x5 cm2) with a digital optical core that is being developed by an Israeli startup company. As part of its evaluation of foreign technology, ORNL's Center for Engineering Science Advanced Research (CESAR) had access to a precursor EnLight 64 Alpha hardware for a preliminary assessment of capabilities in terms of large Fourier transforms for matched filter banks and on applications related to Doppler-sensitive waveforms. This processor is optimized for array operations, which it performs in fixed-point arithmetic at the rate of 16 TeraOPS at 8-bit precision. This is approximately 1000 times faster than the fastest DSP available today. The optical core

  17. Parallel processing ITS

    SciTech Connect

    Fan, W.C.; Halbleib, J.A. Sr.

    1996-09-01

    This report provides a users` guide for parallel processing ITS on a UNIX workstation network, a shared-memory multiprocessor or a massively-parallel processor. The parallelized version of ITS is based on a master/slave model with message passing. Parallel issues such as random number generation, load balancing, and communication software are briefly discussed. Timing results for example problems are presented for demonstration purposes.

  18. Development of a parallel detection and processing system using a multidetector array for wave field restoration in scanning transmission electron microscopy.

    PubMed

    Taya, Masaki; Matsutani, Takaomi; Ikuta, Takashi; Saito, Hidekazu; Ogai, Keiko; Harada, Yoshihito; Tanaka, Takeo; Takai, Yoshizo

    2007-08-01

    A parallel image detection and image processing system for scanning transmission electron microscopy was developed using a multidetector array consisting of a multianode photomultiplier tube arranged in an 8 x 8 square array. The system enables the taking of 64 images simultaneously from different scattered directions with a scanning time of 2.6 s. Using the 64 images, phase and amplitude contrast images of gold particles on an amorphous carbon thin film could be separately reconstructed by applying respective 8 shaped bandpass Fourier filters for each image and multiplying the phase and amplitude reconstructing factors.

  19. Parallel algorithms and architecture for computation of manipulator forward dynamics

    NASA Technical Reports Server (NTRS)

    Fijany, Amir; Bejczy, Antal K.

    1989-01-01

    Parallel computation of manipulator forward dynamics is investigated. Considering three classes of algorithms for the solution of the problem, that is, the O(n), the O(n exp 2), and the O(n exp 3) algorithms, parallelism in the problem is analyzed. It is shown that the problem belongs to the class of NC and that the time and processors bounds are of O(log2/2n) and O(n exp 4), respectively. However, the fastest stable parallel algorithms achieve the computation time of O(n) and can be derived by parallelization of the O(n exp 3) serial algorithms. Parallel computation of the O(n exp 3) algorithms requires the development of parallel algorithms for a set of fundamentally different problems, that is, the Newton-Euler formulation, the computation of the inertia matrix, decomposition of the symmetric, positive definite matrix, and the solution of triangular systems. Parallel algorithms for this set of problems are developed which can be efficiently implemented on a unique architecture, a triangular array of n(n+2)/2 processors with a simple nearest-neighbor interconnection. This architecture is particularly suitable for VLSI and WSI implementations. The developed parallel algorithm, compared to the best serial O(n) algorithm, achieves an asymptotic speedup of more than two orders-of-magnitude in the computation the forward dynamics.

  20. Parallel architectures for iterative methods on adaptive, block structured grids

    NASA Technical Reports Server (NTRS)

    Gannon, D.; Vanrosendale, J.

    1983-01-01

    A parallel computer architecture well suited to the solution of partial differential equations in complicated geometries is proposed. Algorithms for partial differential equations contain a great deal of parallelism. But this parallelism can be difficult to exploit, particularly on complex problems. One approach to extraction of this parallelism is the use of special purpose architectures tuned to a given problem class. The architecture proposed here is tuned to boundary value problems on complex domains. An adaptive elliptic algorithm which maps effectively onto the proposed architecture is considered in detail. Two levels of parallelism are exploited by the proposed architecture. First, by making use of the freedom one has in grid generation, one can construct grids which are locally regular, permitting a one to one mapping of grids to systolic style processor arrays, at least over small regions. All local parallelism can be extracted by this approach. Second, though there may be a regular global structure to the grids constructed, there will be parallelism at this level. One approach to finding and exploiting this parallelism is to use an architecture having a number of processor clusters connected by a switching network. The use of such a network creates a highly flexible architecture which automatically configures to the problem being solved.

  1. Optimal processor assignment for pipeline computations

    NASA Technical Reports Server (NTRS)

    Nicol, David M.; Simha, Rahul; Choudhury, Alok N.; Narahari, Bhagirath

    1991-01-01

    The availability of large scale multitasked parallel architectures introduces the following processor assignment problem for pipelined computations. Given a set of tasks and their precedence constraints, along with their experimentally determined individual responses times for different processor sizes, find an assignment of processor to tasks. Two objectives are of interest: minimal response given a throughput requirement, and maximal throughput given a response time requirement. These assignment problems differ considerably from the classical mapping problem in which several tasks share a processor; instead, it is assumed that a large number of processors are to be assigned to a relatively small number of tasks. Efficient assignment algorithms were developed for different classes of task structures. For a p processor system and a series parallel precedence graph with n constituent tasks, an O(np2) algorithm is provided that finds the optimal assignment for the response time optimization problem; it was found that the assignment optimizing the constrained throughput in O(np2log p) time. Special cases of linear, independent, and tree graphs are also considered.

  2. Sandia secure processor : a native Java processor.

    SciTech Connect

    Wickstrom, Gregory Lloyd; Gale, Jason Carl; Ma, Kwok Kee

    2003-08-01

    The Sandia Secure Processor (SSP) is a new native Java processor that has been specifically designed for embedded applications. The SSP's design is a system composed of a core Java processor that directly executes Java bytecodes, on-chip intelligent IO modules, and a suite of software tools for simulation and compiling executable binary files. The SSP is unique in that it provides a way to control real-time IO modules for embedded applications. The system software for the SSP is a 'class loader' that takes Java .class files (created with your favorite Java compiler), links them together, and compiles a binary. The complete SSP system provides very powerful functionality with very light hardware requirements with the potential to be used in a wide variety of small-system embedded applications. This paper gives a detail description of the Sandia Secure Processor and its unique features.

  3. Graph-Based Dynamic Assignment Of Multiple Processors

    NASA Technical Reports Server (NTRS)

    Hayes, Paul J.; Andrews, Asa M.

    1994-01-01

    Algorithm-to-architecture mapping model (ATAMM) is strategy minimizing time needed to periodically execute graphically described, data-driven application algorithm on multiple data processors. Implemented as operating system managing flow of data and dynamically assigns nodes of graph to processors. Predicts throughput versus number of processors available to execute given application algorithm. Includes rules ensuring application algorithm represented by graph executed periodically without deadlock and in shortest possible repetition time. ATAMM proves useful in maximizing effectiveness of parallel computing systems.

  4. Transitive closure on the imagine stream processor

    SciTech Connect

    Griem, Gorden; Oliker, Leonid

    2003-11-11

    The increasing gap between processor and memory speeds is a well-known problem in modern computer architecture. The Imagine system is designed to address the processor-memory gap through streaming technology. Stream processors are best-suited for computationally intensive applications characterized by high data parallelism and producer-consumer locality with minimal data dependencies. This work examines an efficient streaming implementation of the computationally intensive Transitive Closure (TC) algorithm on the Imagine platform. We develop a tiled TC algorithm specifically for the Imagine environment, which efficiently reuses streams to minimize expensive off-chip data transfers. The implementation requires complex stream programming since the memory hierarchy and cluster organization of the underlying architecture are exposed to the Imagine programmer. Results demonstrate that limited performance of TC is achieved primarily due to the complicated data-dependencies of the blocked algorithm. This work is an ongoing effort to identify classes of scientific problems well-suited for streaming processors.

  5. Development of a MEMS electrostatic condenser lens array for nc-Si surface electron emitters of the Massive Parallel Electron Beam Direct-Write system

    NASA Astrophysics Data System (ADS)

    Kojima, A.; Ikegami, N.; Yoshida, T.; Miyaguchi, H.; Muroyama, M.; Yoshida, S.; Totsu, K.; Koshida, N.; Esashi, M.

    2016-03-01

    Developments of a Micro Electro-Mechanical System (MEMS) electrostatic Condenser Lens Array (CLA) for a Massively Parallel Electron Beam Direct Write (MPEBDW) lithography system are described. The CLA converges parallel electron beams for fine patterning. The structure of the CLA was designed on a basis of analysis by a finite element method (FEM) simulation. The lens was fabricated with precise machining and assembled with a nanocrystalline silicon (nc-Si) electron emitter array as an electron source of MPEBDW. The nc-Si electron emitter has the advantage that a vertical-emitted surface electron beam can be obtained without any extractor electrodes. FEM simulation of electron optics characteristics showed that the size of the electron beam emitted from the electron emitter was reduced to 15% by a radial direction, and the divergence angle is reduced to 1/18.

  6. High performance parallel computers for science: New developments at the Fermilab advanced computer program

    SciTech Connect

    Nash, T.; Areti, H.; Atac, R.; Biel, J.; Cook, A.; Deppe, J.; Edel, M.; Fischler, M.; Gaines, I.; Hance, R.

    1988-08-01

    Fermilab's Advanced Computer Program (ACP) has been developing highly cost effective, yet practical, parallel computers for high energy physics since 1984. The ACP's latest developments are proceeding in two directions. A Second Generation ACP Multiprocessor System for experiments will include $3500 RISC processors each with performance over 15 VAX MIPS. To support such high performance, the new system allows parallel I/O, parallel interprocess communication, and parallel host processes. The ACP Multi-Array Processor, has been developed for theoretical physics. Each $4000 node is a FORTRAN or C programmable pipelined 20 MFlops (peak), 10 MByte single board computer. These are plugged into a 16 port crossbar switch crate which handles both inter and intra crate communication. The crates are connected in a hypercube. Site oriented applications like lattice gauge theory are supported by system software called CANOPY, which makes the hardware virtually transparent to users. A 256 node, 5 GFlop, system is under construction. 10 refs., 7 figs.

  7. Silicon on-chip 1D photonic crystal nanobeam bandstop filters for the parallel multiplexing of ultra-compact integrated sensor array.

    PubMed

    Yang, Daquan; Wang, Chuan; Ji, Yuefeng

    2016-07-25

    We propose a novel multiplexed ultra-compact high-sensitivity one-dimensional (1D) photonic crystal (PC) nanobeam cavity sensor array on a monolithic silicon chip, referred to as Parallel Integrated 1D PC Nanobeam Cavity Sensor Array (PI-1DPC-NCSA). The performance of the device is investigated numerically with three-dimensional finite-difference time-domain (3D-FDTD) technique. The PI-1DPC-NCSA consists of multiple parallel-connected channels of integrated 1D PC nanobeam cavities/waveguides with gap separations. On each channel, by connecting two additional 1D PC nanobeam bandstop filters (1DPC-NBFs) to a 1D PC nanobeam cavity sensor (1DPC-NCS) in series, a transmission spectrum with a single targeted resonance is achieved for the purpose of multiplexed sensing applications. While the other spurious resonances are filtered out by the stop-band of 1DPC-NBF, multiple 1DPC-NCSs at different resonances can be connected in parallel without spectrum overlap. Furthermore, in order for all 1DPC-NCSs to be integrated into microarrays and to be interrogated simultaneously with a single input/output port, all channels are then connected in parallel by using a 1 × n taper-type equal power splitter and a n × 1 S-type power combiner in the input port and output port, respectively (n is the channel number). The concept model of PI-1DPC-NCSA is displayed with a 3-parallel-channel 1DPC-NCSs array containing series-connected 1DPC-NBFs. The bulk refractive index sensitivities as high as 112.6nm/RIU, 121.7nm/RIU, and 148.5nm/RIU are obtained (RIU = Refractive Index Unit). In particular, the footprint of the 3-parallel-channel PI-1DPC-NCSA is 4.5μm × 50μm (width × length), decreased by more than three orders of magnitude compared to 2D PC integrated sensor arrays. Thus, this is a promising platform for realizing ultra-compact lab-on-a-chip applications with high integration density and high parallel-multiplexing capabilities.

  8. Final Report, Center for Programming Models for Scalable Parallel Computing: Co-Array Fortran, Grant Number DE-FC02-01ER25505

    SciTech Connect

    Robert W. Numrich

    2008-04-22

    The major accomplishment of this project is the production of CafLib, an 'object-oriented' parallel numerical library written in Co-Array Fortran. CafLib contains distributed objects such as block vectors and block matrices along with procedures, attached to each object, that perform basic linear algebra operations such as matrix multiplication, matrix transpose and LU decomposition. It also contains constructors and destructors for each object that hide the details of data decomposition from the programmer, and it contains collective operations that allow the programmer to calculate global reductions, such as global sums, global minima and global maxima, as well as vector and matrix norms of several kinds. CafLib is designed to be extensible in such a way that programmers can define distributed grid and field objects, based on vector and matrix objects from the library, for finite difference algorithms to solve partial differential equations. A very important extra benefit that resulted from the project is the inclusion of the co-array programming model in the next Fortran standard called Fortran 2008. It is the first parallel programming model ever included as a standard part of the language. Co-arrays will be a supported feature in all Fortran compilers, and the portability provided by standardization will encourage a large number of programmers to adopt it for new parallel application development. The combination of object-oriented programming in Fortran 2003 with co-arrays in Fortran 2008 provides a very powerful programming model for high-performance scientific computing. Additional benefits from the project, beyond the original goal, include a programto provide access to the co-array model through access to the Cray compiler as a resource for teaching and research. Several academics, for the first time, included the co-array model as a topic in their courses on parallel computing. A separate collaborative project with LANL and PNNL showed how to extend the

  9. Magnetic arrays

    SciTech Connect

    Trumper, David L.; Kim, Won-jong; Williams, Mark E.

    1997-05-20

    Electromagnet arrays which can provide selected field patterns in either two or three dimensions, and in particular, which can provide single-sided field patterns in two or three dimensions. These features are achieved by providing arrays which have current densities that vary in the windings both parallel to the array and in the direction of array thickness.

  10. Magnetic arrays

    DOEpatents

    Trumper, D.L.; Kim, W.; Williams, M.E.

    1997-05-20

    Electromagnet arrays are disclosed which can provide selected field patterns in either two or three dimensions, and in particular, which can provide single-sided field patterns in two or three dimensions. These features are achieved by providing arrays which have current densities that vary in the windings both parallel to the array and in the direction of array thickness. 12 figs.

  11. Unstructured Adaptive Grid Computations on an Array of SMPs

    NASA Technical Reports Server (NTRS)

    Biswas, Rupak; Pramanick, Ira; Sohn, Andrew; Simon, Horst D.

    1996-01-01

    Dynamic load balancing is necessary for parallel adaptive methods to solve unsteady CFD problems on unstructured grids. We have presented such a dynamic load balancing framework called JOVE, in this paper. Results on a four-POWERnode POWER CHALLENGEarray demonstrated that load balancing gives significant performance improvements over no load balancing for such adaptive computations. The parallel speedup of JOVE, implemented using MPI on the POWER CHALLENCEarray, was significant, being as high as 31 for 32 processors. An implementation of JOVE that exploits 'an array of SMPS' architecture was also studied; this hybrid JOVE outperformed flat JOVE by up to 28% on the meshes and adaption models tested. With large, realistic meshes and actual flow-solver and adaption phases incorporated into JOVE, hybrid JOVE can be expected to yield significant advantage over flat JOVE, especially as the number of processors is increased, thus demonstrating the scalability of an array of SMPs architecture.

  12. Periodic parallel array of nanopillars and nanoholes resulting from colloidal stripes patterned by geometrically confined evaporative self-assembly for unique anisotropic wetting.

    PubMed

    Li, Xiangmeng; Wang, Chunhui; Shao, Jinyou; Ding, Yucheng; Tian, Hongmiao; Li, Xiangming; Wang, Li

    2014-11-26

    In this paper we present an economical process to create anisotropic microtextures based on periodic parallel stripes of monolayer silica nanoparticles (NPs) patterned by geometrically confined evaporative self-assembly (GCESA). In the GCESA process, a straight meniscus of a colloidal dispersion is initially formed in an opened enclosure, which is composed of two parallel plates bounded by a U-shaped spacer sidewall on three sides with an evaporating outlet on the fourth side. Lateral evaporation of the colloidal dispersion leads to periodic "stick-slip" receding of the meniscus (evaporative front), as triggered by the "coffee-ring" effect, promoting the assembly of silica NPs into periodic parallel stripes. The morphology of stripes can be well controlled by tailoring process variables such as substrate wettability, NP concentration, temperature, and gap height, etc. Furthermore, arrayed patterns of nanopillars or nanoholes are generated on a silicon wafer using the as-prepared colloidal stripes as an etching mask or template. Such arrayed patterns can reveal unique anisotropic wetting properties, which have a large contact angle hysteresis viewing from both the parallel and perpendicular directions in addition to a large wetting anisotropy.

  13. Processor-Group Aware Runtime Support for Shared-and Global-Address Space Models

    SciTech Connect

    Krishnan, Manoj Kumar; Tipparaju, Vinod; Palmer, Bruce; Nieplocha, Jarek

    2004-12-07

    Exploiting multilevel parallelism using processor groups is becoming increasingly important for programming on high-end systems. This paper describes a group-aware run-time support for shared-/global- address space programming models. The current effort has been undertaken in the context of the Aggregate Remote Memory Copy Interface (ARMCI) [5], a portable runtime system used as a communication layer for Global Arrays [6], Co-Array Fortran (CAF) [9], GPSHMEM [10], Co-Array Python [11], and also end-user applications. The paper describes the management of shared memory, integration of shared memory communication and RDMA on clusters with SMP nodes, and registration. These are all required for efficient multi- method and multi-protocol communication on modern systems. Focus is placed on techniques for supporting process groups while maximizing communication performance and efficiently managing global memory system-wide.

  14. ALMA Correlator Real-Time Data Processor

    NASA Astrophysics Data System (ADS)

    Pisano, J.; Amestica, R.; Perez, J.

    2005-10-01

    The design of a real-time Linux application utilizing Real-Time Application Interface (RTAI) to process real-time data from the radio astronomy correlator for the Atacama Large Millimeter Array (ALMA) is described. The correlator is a custom-built digital signal processor which computes the cross-correlation function of two digitized signal streams. ALMA will have 64 antennas with 2080 signal streams each with a sample rate of 4 giga-samples per second. The correlator's aggregate data output will be 1 gigabyte per second. The software is defined by hard deadlines with high input and processing data rates, while requiring interfaces to non real-time external computers. The designed computer system - the Correlator Data Processor or CDP, consists of a cluster of 17 SMP computers, 16 of which are compute nodes plus a master controller node all running real-time Linux kernels. Each compute node uses an RTAI kernel module to interface to a 32-bit parallel interface which accepts raw data at 64 megabytes per second in 1 megabyte chunks every 16 milliseconds. These data are transferred to tasks running on multiple CPUs in hard real-time using RTAI's LXRT facility to perform quantization corrections, data windowing, FFTs, and phase corrections for a processing rate of approximately 1 GFLOPS. Highly accurate timing signals are distributed to all seventeen computer nodes in order to synchronize them to other time-dependent devices in the observatory array. RTAI kernel tasks interface to the timing signals providing sub-millisecond timing resolution. The CDP interfaces, via the master node, to other computer systems on an external intra-net for command and control, data storage, and further data (image) processing. The master node accesses these external systems utilizing ALMA Common Software (ACS), a CORBA-based client-server software infrastructure providing logging, monitoring, data delivery, and intra-computer function invocation. The software is being developed in tandem

  15. Implementation of a spiral CT backprojection algorithm on the Cell Broadband Engine processor

    NASA Astrophysics Data System (ADS)

    Bockenbach, Olivier; Goddard, Iain; Schuberth, Sebastian; Seebass, Martin

    2006-03-01

    Over the last few decades, the medical imaging community has passionately debated over different approaches to implement reconstruction algorithms for Spiral CT. Numerous alternatives have been proposed. Whether they are approximate, exact or, iterative, those implementations generally include a backprojection step. Specialized compute platforms have been designed to perform this compute-intensive algorithm within a timeframe compatible with hospital-workflow requirements. Solving the performance problem in a cost-effective way had driven designers to use a combination of digital signal processor (DSP) chips, general-purpose processors, application-specific integrated circuits (ASICs) and field programmable gate arrays (FPGAs). The Cell processor by IBM offers an interesting alternative for implementing the backprojection, especially since it offers a good level of parallelism and vast I/O capabilities. In this paper, we consider the implementation of a straight backprojection algorithm on the Cell processor to design a cost-effective system that matches the performance requirements of clinically deployed systems. The effects on performance of system parameters such as pitch and detector size are also analyzed to determine the ideal system size for modern CT scanners.

  16. Rapid geodesic mapping of brain functional connectivity: implementation of a dedicated co-processor in a field-programmable gate array (FPGA) and application to resting state functional MRI.

    PubMed

    Minati, Ludovico; Cercignani, Mara; Chan, Dennis

    2013-10-01

    Graph theory-based analyses of brain network topology can be used to model the spatiotemporal correlations in neural activity detected through fMRI, and such approaches have wide-ranging potential, from detection of alterations in preclinical Alzheimer's disease through to command identification in brain-machine interfaces. However, due to prohibitive computational costs, graph-based analyses to date have principally focused on measuring connection density rather than mapping the topological architecture in full by exhaustive shortest-path determination. This paper outlines a solution to this problem through parallel implementation of Dijkstra's algorithm in programmable logic. The processor design is optimized for large, sparse graphs and provided in full as synthesizable VHDL code. An acceleration factor between 15 and 18 is obtained on a representative resting-state fMRI dataset, and maps of Euclidean path length reveal the anticipated heterogeneous cortical involvement in long-range integrative processing. These results enable high-resolution geodesic connectivity mapping for resting-state fMRI in patient populations and real-time geodesic mapping to support identification of imagined actions for fMRI-based brain-machine interfaces.

  17. Airfoil-based electromagnetic energy harvester containing parallel array motion between moving coil and multi-pole magnets towards enhanced power density.

    PubMed

    Leung, Chung Ming; Wang, Ya; Chen, Wusi

    2016-11-01

    In this letter, the airfoil-based electromagnetic energy harvester containing parallel array motion between moving coil and trajectory matching multi-pole magnets was investigated. The magnets were aligned in an alternatively magnetized formation of 6 magnets to explore enhanced power density. In particular, the magnet array was positioned in parallel to the trajectory of the tip coil within its tip deflection span. The finite element simulations of the magnetic flux density and induced voltages at an open circuit condition were studied to find the maximum number of alternatively magnetized magnets that was required for the proposed energy harvester. Experimental results showed that the energy harvester with a pair of 6 alternatively magnetized linear magnet arrays was able to generate an induced voltage (Vo) of 20 V, with an open circuit condition, and 475 mW, under a 30 Ω optimal resistance load operating with the wind speed (U) at 7 m/s and a natural bending frequency of 3.54 Hz. Compared to the traditional electromagnetic energy harvester with a single magnet moving through a coil, the proposed energy harvester, containing multi-pole magnets and parallel array motion, enables the moving coil to accumulate a stronger magnetic flux in each period of the swinging motion. In addition to the comparison made with the airfoil-based piezoelectric energy harvester of the same size, our proposed electromagnetic energy harvester generates 11 times more power output, which is more suitable for high-power-density energy harvesting applications at regions with low environmental frequency.

  18. Airfoil-based electromagnetic energy harvester containing parallel array motion between moving coil and multi-pole magnets towards enhanced power density

    NASA Astrophysics Data System (ADS)

    Leung, Chung Ming; Wang, Ya; Chen, Wusi

    2016-11-01

    In this letter, the airfoil-based electromagnetic energy harvester containing parallel array motion between moving coil and trajectory matching multi-pole magnets was investigated. The magnets were aligned in an alternatively magnetized formation of 6 magnets to explore enhanced power density. In particular, the magnet array was positioned in parallel to the trajectory of the tip coil within its tip deflection span. The finite element simulations of the magnetic flux density and induced voltages at an open circuit condition were studied to find the maximum number of alternatively magnetized magnets that was required for the proposed energy harvester. Experimental results showed that the energy harvester with a pair of 6 alternatively magnetized linear magnet arrays was able to generate an induced voltage (Vo) of 20 V, with an open circuit condition, and 475 mW, under a 30 Ω optimal resistance load operating with the wind speed (U) at 7 m/s and a natural bending frequency of 3.54 Hz. Compared to the traditional electromagnetic energy harvester with a single magnet moving through a coil, the proposed energy harvester, containing multi-pole magnets and parallel array motion, enables the moving coil to accumulate a stronger magnetic flux in each period of the swinging motion. In addition to the comparison made with the airfoil-based piezoelectric energy harvester of the same size, our proposed electromagnetic energy harvester generates 11 times more power output, which is more suitable for high-power-density energy harvesting applications at regions with low environmental frequency.

  19. Doppler-free, multiwavelength acousto-optic deflector for two-photon addressing arrays of Rb atoms in a quantum information processor

    NASA Astrophysics Data System (ADS)

    Kim, Sangtaek; McLeod, Robert R.; Saffman, M.; Wagner, Kelvin H.

    2008-04-01

    We demonstrate a dual wavelength acousto-optic deflector (AOD) designed to deflect two wavelengths to the same angles by driving with two RF frequencies. The AOD is designed as a beam scanner to address two-photon transitions in a two-dimensional array of trapped neutral Rb87 atoms in a quantum computer. Momentum space is used to design AODs that have the same diffraction angles for two wavelengths (780 and 480 nm) and have nonoverlapping Bragg-matched frequency response at these wavelengths, so that there will be no cross talk when proportional frequencies are applied to diffract the two wavelengths. The appropriate crystal orientation, crystal shape, transducer size, and transducer height are determined for an AOD made with a tellurium dioxide crystal (TeO2). The designed and fabricated AOD has more than 100 resolvable spots, widely separated band shapes for the two wavelengths within an overall octave bandwidth, spatially overlapping diffraction angles for both wavelengths (780 and 480 nm), and a 4 μs or less access time. Cascaded AODs in which the first device upshifts and the second downshifts allow Doppler-free scanning as required for addressing the narrow atomic resonance without detuning. We experimentally show the diffraction-limited Doppler-free scanning performance and spatial resolution of the designed AOD.

  20. Broadcasting collective operation contributions throughout a parallel computer

    DOEpatents

    Faraj, Ahmad [Rochester, MN

    2012-02-21

    Methods, systems, and products are disclosed for broadcasting collective operation contributions throughout a parallel computer. The parallel computer includes a plurality of compute nodes connected together through a data communications network. Each compute node has a plurality of processors for use in collective parallel operations on the parallel computer. Broadcasting collective operation contributions throughout a parallel computer according to embodiments of the present invention includes: transmitting, by each processor on each compute node, that processor's collective operation contribution to the other processors on that compute node using intra-node communications; and transmitting on a designated network link, by each processor on each compute node according to a serial processor transmission sequence, that processor's collective operation contribution to the other processors on the other compute nodes using inter-node communications.

  1. Architectures for reasoning in parallel

    NASA Technical Reports Server (NTRS)

    Hall, Lawrence O.

    1989-01-01

    The research conducted has dealt with rule-based expert systems. The algorithms that may lead to effective parallelization of them were investigated. Both the forward and backward chained control paradigms were investigated in the course of this work. The best computer architecture for the developed and investigated algorithms has been researched. Two experimental vehicles were developed to facilitate this research. They are Backpac, a parallel backward chained rule-based reasoning system and Datapac, a parallel forward chained rule-based reasoning system. Both systems have been written in Multilisp, a version of Lisp which contains the parallel construct, future. Applying the future function to a function causes the function to become a task parallel to the spawning task. Additionally, Backpac and Datapac have been run on several disparate parallel processors. The machines are an Encore Multimax with 10 processors, the Concert Multiprocessor with 64 processors, and a 32 processor BBN GP1000. Both the Concert and the GP1000 are switch-based machines. The Multimax has all its processors hung off a common bus. All are shared memory machines, but have different schemes for sharing the memory and different locales for the shared memory. The main results of the investigations come from experiments on the 10 processor Encore and the Concert with partitions of 32 or less processors. Additionally, experiments have been run with a stripped down version of EMYCIN.

  2. Adjunct processors in embedded medical imaging systems

    NASA Astrophysics Data System (ADS)

    Trepanier, Marc; Goddard, Iain

    2002-05-01

    Adjunct processors have traditionally been used for certain tasks in medical imaging systems. Often based on application-specific integrated circuits (ASICs), these processors formed X-ray image-processing pipelines or constituted the backprojectors in computed tomography (CT) systems. We examine appropriate functions to perform with adjunct processing and draw some conclusions about system design trade-offs. These trade-offs have traditionally focused on the required performance and flexibility of individual system components, with increasing emphasis on time-to-market impact. Typically, front-end processing close to the sensor has the most intensive processing requirements. However, the performance capabilities of each level are dynamic and the system architect must keep abreast of the current capabilities of all options to remain competitive. Designers are searching for the most efficient implementation of their particular system requirements. We cite algorithm characteristics that point to effective solutions by adjunct processors. We have developed a field- programmable gate array (FPGA) adjunct-processor solution for a Cone-Beam Reconstruction (CBR) algorithm that offers significant performance improvements over a general-purpose processor implementation. The same hardware could efficiently perform other image processing functions such as two-dimensional (2D) convolution. The potential performance, price, operating power, and flexibility advantages of an FPGA adjunct processor over an ASIC, DSP or general-purpose processing solutions are compelling.

  3. Introduction to Parallel Computing

    DTIC Science & Technology

    1992-05-01

    Topology C, Ada, C++, Data-parallel FORTRAN, 2D mesh of node boards, each node FORTRAN-90 (late 1992) board has 1 application processor Devopment Tools ...parallel machines become the wave of the present, tools are increasingly needed to assist programmers in creating parallel tasks and coordinating...their activities. Linda was designed to be such a tool . Linda was designed with three important goals in mind: to be portable, efficient, and easy to use

  4. Trajectory optimization using parallel shooting method on parallel computer

    SciTech Connect

    Wirthman, D.J.; Park, S.Y.; Vadali, S.R.

    1995-03-01

    The efficiency of a parallel shooting method on a parallel computer for solving a variety of optimal control guidance problems is studied. Several examples are considered to demonstrate that a speedup of nearly 7 to 1 is achieved with the use of 16 processors. It is suggested that further improvements in performance can be achieved by parallelizing in the state domain. 10 refs.

  5. SPROC: A multiple-processor DSP IC

    NASA Technical Reports Server (NTRS)

    Davis, R.

    1991-01-01

    A large, single-chip, multiple-processor, digital signal processing (DSP) integrated circuit (IC) fabricated in HP-Cmos34 is presented. The innovative architecture is best suited for analog and real-time systems characterized by both parallel signal data flows and concurrent logic processing. The IC is supported by a powerful development system that transforms graphical signal flow graphs into production-ready systems in minutes. Automatic compiler partitioning of tasks among four on-chip processors gives the IC the signal processing power of several conventional DSP chips.

  6. Scioto: A Framework for Global-ViewTask Parallelism

    SciTech Connect

    Dinan, James S.; Krishnamoorthy, Sriram; Larkins, D. B.; Nieplocha, Jaroslaw; Sadayappan, Ponnuswamy

    2008-09-09

    We introduce Scioto, Shared Collections of Task Objects, a framework for supporting task-parallelism in one-sided and global-view parallel programming models. Scioto provides lightweight, locality aware dynamic load balancing and interoperates with existing parallel models including MPI, SHMEM, CAF, and Global Arrays. Through task parallelism, the Scioto framework provides a solution for overcoming load imbalance and heterogeneity as well as dynamic mapping of computation onto emerging multicore architectures. In this paper, we present the design and implementation of the Scioto framework and demonstrate its effectiveness on the Unbalanced Tree Search (UTS) benchmark and two quantum chemistry codes: the closed shell Self-Consistent Field (SCF) method and a sparse tensor contraction kernel extracted from a coupled cluster computation. We explore the efficiency and scalability of Scioto through these sample applications and demonstrate that is offers low overhead, achieves good performance on heterogeneous and multicore clusters, and scales to hundreds of processors.

  7. Reconfigurable computer array: The bridge between high speed sensors and low speed computing

    SciTech Connect

    Robinson, S.H.; Caffrey, M.P.; Dunham, M.E.

    1998-06-16

    A universal limitation of RF and imaging front-end sensors is that they easily produce data at a higher rate than any general-purpose computer can continuously handle. Therefore, Los Alamos National Laboratory has developed a custom Reconfigurable Computing Array board to support a large variety of processing applications including wideband RF signals, LIDAR and multi-dimensional imaging. The boards design exploits three key features to achieve its performance. First, there are large banks of fast memory dedicated to each reconfigurable processor and also shared between pairs of processors. Second, there are dedicated data paths between processors, and from a processor to flexible I/O interfaces. Third, the design provides the ability to link multiple boards into a serial and/or parallel structure.

  8. NOCA-1 functions with γ-tubulin and in parallel to Patronin to assemble non-centrosomal microtubule arrays in C. elegans

    PubMed Central

    Wang, Shaohe; Wu, Di; Quintin, Sophie; Green, Rebecca A; Cheerambathur, Dhanya K; Ochoa, Stacy D; Desai, Arshad; Oegema, Karen

    2015-01-01

    Non-centrosomal microtubule arrays assemble in differentiated tissues to perform mechanical and transport-based functions. In this study, we identify Caenorhabditis elegans NOCA-1 as a protein with homology to vertebrate ninein. NOCA-1 contributes to the assembly of non-centrosomal microtubule arrays in multiple tissues. In the larval epidermis, NOCA-1 functions redundantly with the minus end protection factor Patronin/PTRN-1 to assemble a circumferential microtubule array essential for worm growth and morphogenesis. Controlled degradation of a γ-tubulin complex subunit in this tissue revealed that γ-tubulin acts with NOCA-1 in parallel to Patronin/PTRN-1. In the germline, NOCA-1 and γ-tubulin co-localize at the cell surface, and inhibiting either leads to a microtubule assembly defect. γ-tubulin targets independently of NOCA-1, but NOCA-1 targeting requires γ-tubulin when a non-essential putatively palmitoylated cysteine is mutated. These results show that NOCA-1 acts with γ-tubulin to assemble non-centrosomal arrays in multiple tissues and highlight functional overlap between the ninein and Patronin protein families. DOI: http://dx.doi.org/10.7554/eLife.08649.001 PMID:26371552

  9. Architecture and data processing alternatives for the TSE computer. Volume 3: Execution of a parallel counting algorithm using array logic (Tse) devices

    NASA Technical Reports Server (NTRS)

    Metcalfe, A. G.; Bodenheimer, R. E.

    1976-01-01

    A parallel algorithm for counting the number of logic-l elements in a binary array or image developed during preliminary investigation of the Tse concept is described. The counting algorithm is implemented using a basic combinational structure. Modifications which improve the efficiency of the basic structure are also presented. A programmable Tse computer structure is proposed, along with a hardware control unit, Tse instruction set, and software program for execution of the counting algorithm. Finally, a comparison is made between the different structures in terms of their more important characteristics.

  10. Switch for serial or parallel communication networks

    DOEpatents

    Crosette, Dario B.

    1994-01-01

    A communication switch apparatus and a method for use in a geographically extensive serial, parallel or hybrid communication network linking a multi-processor or parallel processing system has a very low software processing overhead in order to accommodate random burst of high density data. Associated with each processor is a communication switch. A data source and a data destination, a sensor suite or robot for example, may also be associated with a switch. The configuration of the switches in the network are coordinated through a master processor node and depends on the operational phase of the multi-processor network: data acquisition, data processing, and data exchange. The master processor node passes information on the state to be assumed by each switch to the processor node associated with the switch. The processor node then operates a series of multi-state switches internal to each communication switch. The communication switch does not parse and interpret communication protocol and message routing information. During a data acquisition phase, the communication switch couples sensors producing data to the processor node associated with the switch, to a downlink destination on the communications network, or to both. It also may couple an uplink data source to its processor node. During the data exchange phase, the switch couples its processor node or an uplink data source to a downlink destination (which may include a processor node or a robot), or couples an uplink source to its processor node and its processor node to a downlink destination.

  11. Switch for serial or parallel communication networks

    DOEpatents

    Crosette, D.B.

    1994-07-19

    A communication switch apparatus and a method for use in a geographically extensive serial, parallel or hybrid communication network linking a multi-processor or parallel processing system has a very low software processing overhead in order to accommodate random burst of high density data. Associated with each processor is a communication switch. A data source and a data destination, a sensor suite or robot for example, may also be associated with a switch. The configuration of the switches in the network are coordinated through a master processor node and depends on the operational phase of the multi-processor network: data acquisition, data processing, and data exchange. The master processor node passes information on the state to be assumed by each switch to the processor node associated with the switch. The processor node then operates a series of multi-state switches internal to each communication switch. The communication switch does not parse and interpret communication protocol and message routing information. During a data acquisition phase, the communication switch couples sensors producing data to the processor node associated with the switch, to a downlink destination on the communications network, or to both. It also may couple an uplink data source to its processor node. During the data exchange phase, the switch couples its processor node or an uplink data source to a downlink destination (which may include a processor node or a robot), or couples an uplink source to its processor node and its processor node to a downlink destination. 9 figs.

  12. Highly parallel computation

    NASA Technical Reports Server (NTRS)

    Denning, Peter J.; Tichy, Walter F.

    1990-01-01

    Among the highly parallel computing architectures required for advanced scientific computation, those designated 'MIMD' and 'SIMD' have yielded the best results to date. The present development status evaluation of such architectures shown neither to have attained a decisive advantage in most near-homogeneous problems' treatment; in the cases of problems involving numerous dissimilar parts, however, such currently speculative architectures as 'neural networks' or 'data flow' machines may be entailed. Data flow computers are the most practical form of MIMD fine-grained parallel computers yet conceived; they automatically solve the problem of assigning virtual processors to the real processors in the machine.

  13. Programmable DNA-Mediated Multitasking Processor.

    PubMed

    Shu, Jian-Jun; Wang, Qi-Wen; Yong, Kian-Yan; Shao, Fangwei; Lee, Kee Jin

    2015-04-30

    Because of DNA appealing features as perfect material, including minuscule size, defined structural repeat and rigidity, programmable DNA-mediated processing is a promising computing paradigm, which employs DNAs as information storing and processing substrates to tackle the computational problems. The massive parallelism of DNA hybridization exhibits transcendent potential to improve multitasking capabilities and yield a tremendous speed-up over the conventional electronic processors with stepwise signal cascade. As an example of multitasking capability, we present an in vitro programmable DNA-mediated optimal route planning processor as a functional unit embedded in contemporary navigation systems. The novel programmable DNA-mediated processor has several advantages over the existing silicon-mediated methods, such as conducting massive data storage and simultaneous processing via much fewer materials than conventional silicon devices.

  14. Control structures for high speed processors

    NASA Technical Reports Server (NTRS)

    Maki, G. K.; Mankin, R.; Owsley, P. A.; Kim, G. M.

    1982-01-01

    A special processor was designed to function as a Reed Solomon decoder with throughput data rate in the Mhz range. This data rate is significantly greater than is possible with conventional digital architectures. To achieve this rate, the processor design includes sequential, pipelined, distributed, and parallel processing. The processor was designed using a high level language register transfer language. The RTL can be used to describe how the different processes are implemented by the hardware. One problem of special interest was the development of dependent processes which are analogous to software subroutines. For greater flexibility, the RTL control structure was implemented in ROM. The special purpose hardware required approximately 1000 SSI and MSI components. The data rate throughput is 2.5 megabits/second. This data rate is achieved through the use of pipelined and distributed processing. This data rate can be compared with 800 kilobits/second in a recently proposed very large scale integration design of a Reed Solomon encoder.

  15. 3081/E processor

    SciTech Connect

    Kunz, P.F.; Gravina, M.; Oxoby, G.; Rankin, P.; Trang, Q.; Ferran, P.M.; Fucci, A.; Hinton, R.; Jacobs, D.; Martin, B.

    1984-04-01

    The 3081/E project was formed to prepare a much improved IBM mainframe emulator for the future. Its design is based on a large amount of experience in using the 168/E processor to increase available CPU power in both online and offline environments. The processor will be at least equal to the execution speed of a 370/168 and up to 1.5 times faster for heavy floating point code. A single processor will thus be at least four times more powerful than the VAX 11/780, and five processors on a system would equal at least the performance of the IBM 3081K. With its large memory space and simple but flexible high speed interface, the 3081/E is well suited for the online and offline needs of high energy physics in the future.

  16. Gang scheduling a parallel machine

    SciTech Connect

    Gorda, B.C.; Brooks, E.D. III.

    1991-03-01

    Program development on parallel machines can be a nightmare of scheduling headaches. We have developed a portable time sharing mechanism to handle the problem of scheduling gangs of processors. User program and their gangs of processors are put to sleep and awakened by the gang scheduler to provide a time sharing environment. Time quantums are adjusted according to priority queues and a system of fair share accounting. The initial platform for this software is the 128 processor BBN TC2000 in use in the Massively Parallel Computing Initiative at the Lawrence Livermore National Laboratory. 2 refs., 1 fig.

  17. Comparison of simulated parallel transmit body arrays at 3 T using excitation uniformity, global SAR, local SAR and power efficiency metrics

    PubMed Central

    Guérin, Bastien; Gebhardt, Matthias; Serano, Peter; Adalsteinsson, Elfar; Hamm, Michael; Pfeuffer, Josef; Nistler, Juergen; Wald, Lawrence L.

    2014-01-01

    Purpose We compare the performance of 8 parallel transmit (pTx) body arrays with up to 32 channels and a standard birdcage design. Excitation uniformity, local SAR, global SAR and power metrics are analyzed in the torso at 3 T for RF-shimming and 2-spoke excitations. Methods We used a fast co-simulation strategy for field calculation in the presence of coupling between transmit channels. We designed spoke pulses using magnitude least squares (MLS) optimization with explicit constraint of SAR and power and compared the performance of the different pTx coils using the L-curve method. Results PTx arrays outperformed the conventional birdcage coil in all metrics except peak and average power efficiency. The presence of coupling exacerbated this power efficiency problem. At constant excitation fidelity, the pTx array with 24 channels arranged in 3 z-rows could decrease local SAR more than 4-fold (2-fold) for RF-shimming (2-spoke) compared to the birdcage coil for pulses of equal duration. Multi-row pTx coils had a marked performance advantage compared to single row designs, especially for coronal imaging. Conclusion PTx coils can simultaneously improve the excitation uniformity and reduce SAR compared to a birdcage coil when SAR metrics are explicitly constrained in the pulse design. PMID:24752979

  18. A rapidly modulated multifocal detection scheme for parallel acquisition of Raman spectra from a 2-D focal array.

    PubMed

    Kong, Lingbo; Chan, James

    2014-07-01

    We report the development of a rapidly modulated multifocal detection scheme that enables full Raman spectra (~500-2000 cm(-1)) from a 2-D focal array to be acquired simultaneously. A spatial light modulator splits a laser beam to generate an m × n multifocal array. Raman signals generated within each focus are projected simultaneously into a spectrometer and imaged onto a TE-cooled CCD camera. A shuttering system using different masks is constructed to collect the superimposed Raman spectra of different multifocal patterns. The individual Raman spectrum from each focus is then retrieved from the superimposed spectra with no crosstalk using a postacquisition data processing algorithm. This system is expected to significantly improve the speed of current Raman-based instruments such as laser tweezers Raman spectroscopy and hyperspectral Raman imaging.

  19. Detection and Classification of Low Probability of Intercept Radar Signals Using Parallel Filter Arrays and Higher Order Statistics

    DTIC Science & Technology

    2002-09-01

    Resulting Plots for Different LPI Radar Signals (1) FMCW Table 9 shows a FMCW signal with carrier frequency equal to 1 KHz, sampling frequency equal to...REPORT TYPE AND DATES COVERED Master’s Thesis 4. TITLE AND SUBTITLE: Detection and Classification of LPI Radar Signals using Parallel Filter...In order to detect LPI radar waveforms new signal processing techniques are required. This thesis first develops a MATLAB® toolbox to generate

  20. Configurable Multi-Purpose Processor

    NASA Technical Reports Server (NTRS)

    Valencia, J. Emilio; Forney, Chirstopher; Morrison, Robert; Birr, Richard

    2010-01-01

    Advancements in technology have allowed the miniaturization of systems used in aerospace vehicles. This technology is driven by the need for next-generation systems that provide reliable, responsive, and cost-effective range operations while providing increased capabilities such as simultaneous mission support, increased launch trajectories, improved launch, and landing opportunities, etc. Leveraging the newest technologies, the command and telemetry processor (CTP) concept provides for a compact, flexible, and integrated solution for flight command and telemetry systems and range systems. The CTP is a relatively small circuit board that serves as a processing platform for high dynamic, high vibration environments. The CTP can be reconfigured and reprogrammed, allowing it to be adapted for many different applications. The design is centered around a configurable field-programmable gate array (FPGA) device that contains numerous logic cells that can be used to implement traditional integrated circuits. The FPGA contains two PowerPC processors running the Vx-Works real-time operating system and are used to execute software programs specific to each application. The CTP was designed and developed specifically to provide telemetry functions; namely, the command processing, telemetry processing, and GPS metric tracking of a flight vehicle. However, it can be used as a general-purpose processor board to perform numerous functions implemented in either hardware or software using the FPGA s processors and/or logic cells. Functionally, the CTP was designed for range safety applications where it would ultimately become part of a vehicle s flight termination system. Consequently, the major functions of the CTP are to perform the forward link command processing, GPS metric tracking, return link telemetry data processing, error detection and correction, data encryption/ decryption, and initiate flight termination action commands. Also, the CTP had to be designed to survive and

  1. System and method for representing and manipulating three-dimensional objects on massively parallel architectures

    DOEpatents

    Karasick, Michael S.; Strip, David R.

    1996-01-01

    A parallel computing system is described that comprises a plurality of uniquely labeled, parallel processors, each processor capable of modelling a three-dimensional object that includes a plurality of vertices, faces and edges. The system comprises a front-end processor for issuing a modelling command to the parallel processors, relating to a three-dimensional object. Each parallel processor, in response to the command and through the use of its own unique label, creates a directed-edge (d-edge) data structure that uniquely relates an edge of the three-dimensional object to one face of the object. Each d-edge data structure at least includes vertex descriptions of the edge and a description of the one face. As a result, each processor, in response to the modelling command, operates upon a small component of the model and generates results, in parallel with all other processors, without the need for processor-to-processor intercommunication.

  2. System and method for representing and manipulating three-dimensional objects on massively parallel architectures

    DOEpatents

    Karasick, M.S.; Strip, D.R.

    1996-01-30

    A parallel computing system is described that comprises a plurality of uniquely labeled, parallel processors, each processor capable of modeling a three-dimensional object that includes a plurality of vertices, faces and edges. The system comprises a front-end processor for issuing a modeling command to the parallel processors, relating to a three-dimensional object. Each parallel processor, in response to the command and through the use of its own unique label, creates a directed-edge (d-edge) data structure that uniquely relates an edge of the three-dimensional object to one face of the object. Each d-edge data structure at least includes vertex descriptions of the edge and a description of the one face. As a result, each processor, in response to the modeling command, operates upon a small component of the model and generates results, in parallel with all other processors, without the need for processor-to-processor intercommunication. 8 figs.

  3. Concurrent and Accurate Short Read Mapping on Multicore Processors.

    PubMed

    Martínez, Héctor; Tárraga, Joaquín; Medina, Ignacio; Barrachina, Sergio; Castillo, Maribel; Dopazo, Joaquín; Quintana-Ortí, Enrique S

    2015-01-01

    We introduce a parallel aligner with a work-flow organization for fast and accurate mapping of RNA sequences on servers equipped with multicore processors. Our software, HPG Aligner SA (HPG Aligner SA is an open-source application. The software is available at http://www.opencb.org, exploits a suffix array to rapidly map a large fraction of the RNA fragments (reads), as well as leverages the accuracy of the Smith-Waterman algorithm to deal with conflictive reads. The aligner is enhanced with a careful strategy to detect splice junctions based on an adaptive division of RNA reads into small segments (or seeds), which are then mapped onto a number of candidate alignment locations, providing crucial information for the successful alignment of the complete reads. The experimental results on a platform with Intel multicore technology report the parallel performance of HPG Aligner SA, on RNA reads of 100-400 nucleotides, which excels in execution time/sensitivity to state-of-the-art aligners such as TopHat 2+Bowtie 2, MapSplice, and STAR.

  4. Detector defect correction of medical images on graphics processors

    NASA Astrophysics Data System (ADS)

    Membarth, Richard; Hannig, Frank; Teich, Jürgen; Litz, Gerhard; Hornegger, Heinz

    2011-03-01

    The ever increasing complexity and power dissipation of computer architectures in the last decade blazed the trail for more power efficient parallel architectures. Hence, such architectures like field-programmable gate arrays (FPGAs) and particular graphics cards attained great interest and are consequently adopted for parallel execution of many number crunching loop programs from fields like image processing or linear algebra. However, there is little effort to deploy barely computational, but memory intensive applications to graphics hardware. This paper considers a memory intensive detector defect correction pipeline for medical imaging with strict latency requirements. The image pipeline compensates for different effects caused by the detector during exposure of X-ray images and calculates parameters to control the subsequent dosage. So far, dedicated hardware setups with special processors like DSPs were used for such critical processing. We show that this is today feasible with commodity graphics hardware. Using CUDA as programming model, it is demonstrated that the detector defect correction pipeline consisting of more than ten algorithms is significantly accelerated and that a speedup of 20x can be achieved on NVIDIA's Quadro FX 5800 compared to our reference implementation. For deployment in a streaming application with steadily new incoming data, it is shown that the memory transfer overhead of successive images to the graphics card memory is reduced by 83% using double buffering.

  5. Multiple Embedded Processors for Fault-Tolerant Computing

    NASA Technical Reports Server (NTRS)

    Bolotin, Gary; Watson, Robert; Katanyoutanant, Sunant; Burke, Gary; Wang, Mandy

    2005-01-01

    A fault-tolerant computer architecture has been conceived in an effort to reduce vulnerability to single-event upsets (spurious bit flips caused by impingement of energetic ionizing particles or photons). As in some prior fault-tolerant architectures, the redundancy needed for fault tolerance is obtained by use of multiple processors in one computer. Unlike prior architectures, the multiple processors are embedded in a single field-programmable gate array (FPGA). What makes this new approach practical is the recent commercial availability of FPGAs that are capable of having multiple embedded processors. A working prototype (see figure) consists of two embedded IBM PowerPC 405 processor cores and a comparator built on a Xilinx Virtex-II Pro FPGA. This relatively simple instantiation of the architecture implements an error-detection scheme. A planned future version, incorporating four processors and two comparators, would correct some errors in addition to detecting them.

  6. Issue Mechanism for Embedded Simultaneous Multithreading Processor

    NASA Astrophysics Data System (ADS)

    Zang, Chengjie; Imai, Shigeki; Frank, Steven; Kimura, Shinji

    Simultaneous Multithreading (SMT) technology enhances instruction throughput by issuing multiple instructions from multiple threads within one clock cycle. For in-order pipeline to each thread, SMT processors can provide large number of issued instructions close to or surpass than using out-of-order pipeline. In this work, we show an efficient issue logic for predicated instruction sequence with the parallel flag in each instruction, where the predicate register based issue control is adopted and the continuous instructions with the parallel flag of ‘0’ are executed in parallel. The flag is pre-defined by a compiler. Instructions from different threads are issued based on the round-robin order. We also introduce an Instruction Queue skip mechanism for thread if the queue is empty. Using this kind of issue logic, we designed a 6 threads, 7-stage, in-order pipeline processor. Based on this processor, we compare round-robin issue policy (RR(T1-Tn)) with other policies: thread one always has the highest priority (PR(T1)) and thread one or thread n has the highest priority in turn (PR(T1-Tn)). The results show that RR(T1-Tn) policy outperforms others and PR(T1-Tn) is almost the same to RR(T1-Tn) from the point of view of the issued instructions per cycle.

  7. Parallel image compression

    NASA Technical Reports Server (NTRS)

    Reif, John H.

    1987-01-01

    A parallel compression algorithm for the 16,384 processor MPP machine was developed. The serial version of the algorithm can be viewed as a combination of on-line dynamic lossless test compression techniques (which employ simple learning strategies) and vector quantization. These concepts are described. How these concepts are combined to form a new strategy for performing dynamic on-line lossy compression is discussed. Finally, the implementation of this algorithm in a massively parallel fashion on the MPP is discussed.

  8. Buffered coscheduling for parallel programming and enhanced fault tolerance

    SciTech Connect

    Petrini, Fabrizio; Feng, Wu-chun

    2006-01-31

    A computer implemented method schedules processor jobs on a network of parallel machine processors or distributed system processors. Control information communications generated by each process performed by each processor during a defined time interval is accumulated in buffers, where adjacent time intervals are separated by strobe intervals for a global exchange of control information. A global exchange of the control information communications at the end of each defined time interval is performed during an intervening strobe interval so that each processor is informed by all of the other processors of the number of incoming jobs to be received by each processor in a subsequent time interval. The buffered coscheduling method of this invention also enhances the fault tolerance of a network of parallel machine processors or distributed system processors

  9. Hypercluster - Parallel processing for computational mechanics

    NASA Technical Reports Server (NTRS)

    Blech, Richard A.

    1988-01-01

    An account is given of the development status, performance capabilities and implications for further development of NASA-Lewis' testbed 'hypercluster' parallel computer network, in which multiple processors communicate through a shared memory. Processors have local as well as shared memory; the hypercluster is expanded in the same manner as the hypercube, with processor clusters replacing the normal single processor node. The NASA-Lewis machine has three nodes with a vector personality and one node with a scalar personality. Each of the vector nodes uses four board-level vector processors, while the scalar node uses four general-purpose microcomputer boards.

  10. Holographic Routing Network For Parallel Processing Machines

    NASA Astrophysics Data System (ADS)

    Maniloff, Eric S.; Johnson, Kristina M.; Reif, John H.

    1989-10-01

    Dynamic holographic architectures for connecting processors in parallel computers have been generally limited by the response time of the holographic recording media. In this paper we present a different approach to dynamic optical interconnects involving spatial light modulators (SLMs) and volume holograms. Multiple-exposure holograms are stored in a volume recording media, which associate the address of a destination processor encoded on a spatial light modulator with a distinct reference beam. A destination address programmed on the spatial light modulator is then holographically steered to the correct destination processor. We present the design and experimental results of a holographic router for connecting four originator processors to four destination processors.

  11. Implementing the PM Programming Language using MPI and OpenMP - a New Tool for Programming Geophysical Models on Parallel Systems

    NASA Astrophysics Data System (ADS)

    Bellerby, Tim

    2015-04-01

    PM (Parallel Models) is a new parallel programming language specifically designed for writing environmental and geophysical models. The language is intended to enable implementers to concentrate on the science behind the model rather than the details of running on parallel hardware. At the same time PM leaves the programmer in control - all parallelisation is explicit and the parallel structure of any given program may be deduced directly from the code. This paper describes a PM implementation based on the Message Passing Interface (MPI) and Open Multi-Processing (OpenMP) standards, looking at issues involved with translating the PM parallelisation model to MPI/OpenMP protocols and considering performance in terms of the competing factors of finer-grained parallelisation and increased communication overhead. In order to maximise portability, the implementation stays within the MPI 1.3 standard as much as possible, with MPI-2 MPI-IO file handling the only significant exception. Moreover, it does not assume a thread-safe implementation of MPI. PM adopts a two-tier abstract representation of parallel hardware. A PM processor is a conceptual unit capable of efficiently executing a set of language tasks, with a complete parallel system consisting of an abstract N-dimensional array of such processors. PM processors may map to single cores executing tasks using cooperative multi-tasking, to multiple cores or even to separate processing nodes, efficiently sharing tasks using algorithms such as work stealing. While tasks may move between hardware elements within a PM processor, they may not move between processors without specific programmer intervention. Tasks are assigned to processors using a nested parallelism approach, building on ideas from Reyes et al. (2009). The main program owns all available processors. When the program enters a parallel statement then either processors are divided out among the newly generated tasks (number of new tasks < number of processors

  12. Efficiency of parallel direct optimization

    NASA Technical Reports Server (NTRS)

    Janies, D. A.; Wheeler, W. C.

    2001-01-01

    Tremendous progress has been made at the level of sequential computation in phylogenetics. However, little attention has been paid to parallel computation. Parallel computing is particularly suited to phylogenetics because of the many ways large computational problems can be broken into parts that can be analyzed concurrently. In this paper, we investigate the scaling factors and efficiency of random addition and tree refinement strategies using the direct optimization software, POY, on a small (10 slave processors) and a large (256 slave processors) cluster of networked PCs running LINUX. These algorithms were tested on several data sets composed of DNA and morphology ranging from 40 to 500 taxa. Various algorithms in POY show fundamentally different properties within and between clusters. All algorithms are efficient on the small cluster for the 40-taxon data set. On the large cluster, multibuilding exhibits excellent parallel efficiency, whereas parallel building is inefficient. These results are independent of data set size. Branch swapping in parallel shows excellent speed-up for 16 slave processors on the large cluster. However, there is no appreciable speed-up for branch swapping with the further addition of slave processors (>16). This result is independent of data set size. Ratcheting in parallel is efficient with the addition of up to 32 processors in the large cluster. This result is independent of data set size. c2001 The Willi Hennig Society.

  13. Efficiency of parallel direct optimization.

    PubMed

    Janies, D A; Wheeler, W C

    2001-03-01

    Tremendous progress has been made at the level of sequential computation in phylogenetics. However, little attention has been paid to parallel computation. Parallel computing is particularly suited to phylogenetics because of the many ways large computational problems can be broken into parts that can be analyzed concurrently. In this paper, we investigate the scaling factors and efficiency of random addition and tree refinement strategies using the direct optimization software, POY, on a small (10 slave processors) and a large (256 slave processors) cluster of networked PCs running LINUX. These algorithms were tested on several data sets composed of DNA and morphology ranging from 40 to 500 taxa. Various algorithms in POY show fundamentally different properties within and between clusters. All algorithms are efficient on the small cluster for the 40-taxon data set. On the large cluster, multibuilding exhibits excellent parallel efficiency, whereas parallel building is inefficient. These results are independent of data set size. Branch swapping in parallel shows excellent speed-up for 16 slave processors on the large cluster. However, there is no appreciable speed-up for branch swapping with the further addition of slave processors (>16). This result is independent of data set size. Ratcheting in parallel is efficient with the addition of up to 32 processors in the large cluster. This result is independent of data set size.

  14. Soft-core processor study for node-based architectures.

    SciTech Connect

    Van Houten, Jonathan Roger; Jarosz, Jason P.; Welch, Benjamin James; Gallegos, Daniel E.; Learn, Mark Walter

    2008-09-01

    Node-based architecture (NBA) designs for future satellite projects hold the promise of decreasing system development time and costs, size, weight, and power and positioning the laboratory to address other emerging mission opportunities quickly. Reconfigurable Field Programmable Gate Array (FPGA) based modules will comprise the core of several of the NBA nodes. Microprocessing capabilities will be necessary with varying degrees of mission-specific performance requirements on these nodes. To enable the flexibility of these reconfigurable nodes, it is advantageous to incorporate the microprocessor into the FPGA itself, either as a hardcore processor built into the FPGA or as a soft-core processor built out of FPGA elements. This document describes the evaluation of three reconfigurable FPGA based processors for use in future NBA systems--two soft cores (MicroBlaze and non-fault-tolerant LEON) and one hard core (PowerPC 405). Two standard performance benchmark applications were developed for each processor. The first, Dhrystone, is a fixed-point operation metric. The second, Whetstone, is a floating-point operation metric. Several trials were run at varying code locations, loop counts, processor speeds, and cache configurations. FPGA resource utilization was recorded for each configuration. Cache configurations impacted the results greatly; for optimal processor efficiency it is necessary to enable caches on the processors. Processor caches carry a penalty; cache error mitigation is necessary when operating in a radiation environment.

  15. Long-term, multisite, parallel, in-cell recording and stimulation by an array of extracellular microelectrodes.

    PubMed

    Hai, Aviad; Shappir, Joseph; Spira, Micha E

    2010-07-01

    Here we report on the development of a novel neuroelectronic interface consisting of an array of noninvasive gold-mushroom-shaped microelectrodes (gMmicroEs) that practically provide intracellular recordings and stimulation of many individual neurons, while the electrodes maintain an extracellular position. The development of this interface allows simultaneous, multisite, long-term recordings of action potentials and subthreshold potentials with matching quality and signal-to-noise ratio of conventional intracellular sharp glass microelectrodes or patch electrodes. We refer to the novel approach as "in-cell recording and stimulation by extracellular electrodes" to differentiate it from the classical intracellular recording and stimulation methods. This novel technique is expected to revolutionize the analysis of neuronal networks in relations to learning, information storage and can be used to develop novel drugs as well as high fidelity neural prosthetics and brain-machine systems.

  16. Stochastic propagation of an array of parallel cracks: Exploratory work on matrix fatigue damage in composite laminates

    SciTech Connect

    Williford, R.E.

    1989-09-01

    Transverse cracking of polymeric matrix materials is an important fatigue damage mechanism in continuous-fiber composite laminates. The propagation of an array of these cracks is a stochastic problem usually treated by Monte Carlo methods. However, this exploratory work proposes an alternative approach wherein the Monte Carlo method is replaced by a more closed-form recursion relation based on fractional Brownian motion.'' A fractal scaling equation is also proposed as a substitute for the more empirical Paris equation describing individual crack growth in this approach. Preliminary calculations indicate that the new recursion relation is capable of reproducing the primary features of transverse matrix fatigue cracking behavior. Although not yet fully tested or verified, this cursion relation may eventually be useful for real-time applications such as monitoring damage in aircraft structures.

  17. The Alaska SAR processor

    NASA Technical Reports Server (NTRS)

    Carande, R. E.; Charny, B.

    1988-01-01

    The Alaska SAR processor was designed to process over 200 100 km x 100 km (Seasat like) frames per day from the raw SAR data, at a ground resolution of 30 m x 30 m from ERS-1, J-ERS-1, and Radarsat. The near real time processor is a set of custom hardware modules operating in a pipelined architecture, controlled by a general purpose computer. Input to the processor is provided from a high density digital cassette recording of the raw data stream as received by the ground station. A two pass processing is performed. During the first pass clutter-lock and auto-focus measurements are made. The second pass uses the results to accomplish final image formation which is recorded on a high density digital cassette. The processing algorithm uses fast correlation techniques for range and azimuth compression. Radiometric compensation, interpolation and deskewing is also performed by the processor. The standard product of the ASP is a high resolution four-look image, with a low resolution (100 to 200 m) many look image provided simultaneously.

  18. Fabrication and Evaluation of a Micro(Bio)Sensor Array Chip for Multiple Parallel Measurements of Important Cell Biomarkers

    PubMed Central

    Pemberton, Roy M.; Cox, Timothy; Tuffin, Rachel; Drago, Guido A.; Griffiths, John; Pittson, Robin; Johnson, Graham; Xu, Jinsheng; Sage, Ian C.; Davies, Rhodri; Jackson, Simon K.; Kenna, Gerry; Luxton, Richard; Hart, John P.

    2014-01-01

    This report describes the design and development of an integrated electrochemical cell culture monitoring system, based on enzyme-biosensors and chemical sensors, for monitoring indicators of mammalian cell metabolic status. MEMS technology was used to fabricate a microwell-format silicon platform including a thermometer, onto which chemical sensors (pH, O2) and screen-printed biosensors (glucose, lactate), were grafted/deposited. Microwells were formed over the fabricated sensors to give 5-well sensor strips which were interfaced with a multipotentiostat via a bespoke connector box interface. The operation of each sensor/biosensor type was examined individually, and examples of operating devices in five microwells in parallel, in either potentiometric (pH sensing) or amperometric (glucose biosensing) mode are shown. The performance characteristics of the sensors/biosensors indicate that the system could readily be applied to cell culture/toxicity studies. PMID:25360580

  19. Parallel Architecture For Robotics Computation

    NASA Technical Reports Server (NTRS)

    Fijany, Amir; Bejczy, Antal K.

    1990-01-01

    Universal Real-Time Robotic Controller and Simulator (URRCS) is highly parallel computing architecture for control and simulation of robot motion. Result of extensive algorithmic study of different kinematic and dynamic computational problems arising in control and simulation of robot motion. Study led to development of class of efficient parallel algorithms for these problems. Represents algorithmically specialized architecture, in sense capable of exploiting common properties of this class of parallel algorithms. System with both MIMD and SIMD capabilities. Regarded as processor attached to bus of external host processor, as part of bus memory.

  20. Development of a prototype PET scanner with depth-of-interaction measurement using solid-state photomultiplier arrays and parallel readout electronics.

    PubMed

    Shao, Yiping; Sun, Xishan; Lan, Kejian A; Bircher, Chad; Lou, Kai; Deng, Zhi

    2014-03-07

    In this study, we developed a prototype animal PET by applying several novel technologies to use solid-state photomultiplier (SSPM) arrays to measure the depth of interaction (DOI) and improve imaging performance. Each PET detector has an 8 × 8 array of about 1.9 × 1.9 × 30.0 mm(3) lutetium-yttrium-oxyorthosilicate scintillators, with each end optically connected to an SSPM array (16 channels in a 4 × 4 matrix) through a light guide to enable continuous DOI measurement. Each SSPM has an active area of about 3 × 3 mm(2), and its output is read by a custom-developed application-specific integrated circuit to directly convert analogue signals to digital timing pulses that encode the interaction information. These pulses are transferred to and are decoded by a field-programmable gate array-based time-to-digital convertor for coincident event selection and data acquisition. The independent readout of each SSPM and the parallel signal process can significantly improve the signal-to-noise ratio and enable the use of flexible algorithms for different data processes. The prototype PET consists of two rotating detector panels on a portable gantry with four detectors in each panel to provide 16 mm axial and variable transaxial field-of-view (FOV) sizes. List-mode ordered subset expectation maximization image reconstruction was implemented. The measured mean energy, coincidence timing and DOI resolution for a crystal were about 17.6%, 2.8 ns and 5.6 mm, respectively. The measured transaxial resolutions at the center of the FOV were 2.0 mm and 2.3 mm for images reconstructed with and without DOI, respectively. In addition, the resolutions across the FOV with DOI were substantially better than those without DOI. The quality of PET images of both a hot-rod phantom and mouse acquired with DOI was much higher than that of images obtained without DOI. This study demonstrates that SSPM arrays and advanced readout/processing electronics can be used to develop a practical DOI

  1. Development of a prototype PET scanner with depth-of-interaction measurement using solid-state photomultiplier arrays and parallel readout electronics

    PubMed Central

    Shao, Yiping; Sun, Xishan; Lan, Kejian A.; Bircher, Chad; Lou, Kai; Deng, Zhi

    2014-01-01

    In this study, we developed a prototype animal PET by applying several novel technologies to use the solid-state photomultiplier (SSPM) arrays for measuring the depth-of-interaction (DOI) and improving imaging performance. Each PET detector has an 8×8 array of about 1.9×1.9×30.0 mm3 lutetium-yttrium-oxyorthosilicate (LYSO) scintillators, with each end optically connected to a SSPM array (16-channel in a 4×4 matrix) through a light guide to enable continuous DOI measurement. Each SSPM has an active area of about 3×3 mm2, and its output is read by a custom-developed application-specific-integrated-circuit (ASIC) to directly convert analog signals to digital timing pulses that encode the interaction information. These pulses are transferred to and be decoded by a field-programmable-gate-array (FPGA) based time-to-digital convertor for coincident event selection and data acquisition. The independent readout of each SSPM and the parallel signal process can significantly improve the signal-to-noise ratio and enable using flexible algorithms for different data processes. The prototype PET consists of two rotating detector panels on a portable gantry with four detectors in each panel to provide 16 mm axial and variable transaxial field-of-view (FOV) sizes. List-mode ordered-subset-expectation-maximization image reconstruction was implemented. The measured mean energy, coincidence timing, and DOI resolution for a crystal were about 17.6%, 2.8 ns, and 5.6 mm, respectively. The measured transaxial resolutions at the center of the FOV were 2.0 mm and 2.3 mm for images reconstructed with and without DOI, respectively. In addition, the resolutions across the FOV with DOI were substantially better than those without DOI. The quality of PET images of both a hot-rod phantom and mouse acquired with DOI was much higher than that of images obtained without DOI. This study demonstrates that SSPM arrays and advanced readout/processing electronics can be used to develop a practical

  2. Rapid, Single-Molecule Assays in Nano/Micro-Fluidic Chips with Arrays of Closely Spaced Parallel Channels Fabricated by Femtosecond Laser Machining

    PubMed Central

    Canfield, Brian K.; King, Jason K.; Robinson, William N.; Hofmeister, William H.; Davis, Lloyd M.

    2014-01-01

    Cost-effective pharmaceutical drug discovery depends on increasing assay throughput while reducing reagent needs. To this end, we are developing an ultrasensitive, fluorescence-based platform that incorporates a nano/micro-fluidic chip with an array of closely spaced channels for parallelized optical readout of single-molecule assays. Here we describe the use of direct femtosecond laser machining to fabricate several hundred closely spaced channels on the surfaces of fused silica substrates. The channels are sealed by bonding to a microscope cover slip spin-coated with a thin film of poly(dimethylsiloxane). Single-molecule detection experiments are conducted using a custom-built, wide-field microscope. The array of channels is epi-illuminated by a line-generating red diode laser, resulting in a line focus just a few microns thick across a 500 micron field of view. A dilute aqueous solution of fluorescently labeled biomolecules is loaded into the device and fluorescence is detected with an electron-multiplying CCD camera, allowing acquisition rates up to 7 kHz for each microchannel. Matched digital filtering based on experimental parameters is used to perform an initial, rapid assessment of detected fluorescence. More detailed analysis is obtained through fluorescence correlation spectroscopy. Simulated fluorescence data is shown to agree well with experimental values. PMID:25140634

  3. Generating local addresses and communication sets for data-parallel programs

    NASA Technical Reports Server (NTRS)

    Chatterjee, Siddhartha; Gilbert, John R.; Long, Fred J. E.; Schreiber, Robert; Teng, Shang-Hua

    1993-01-01

    Generating local addresses and communication sets is an important issue in distributed-memory implementations of data-parallel languages such as High Performance Fortran. We show that for an array A affinely aligned to a template that is distributed across p processors with a cyclic(k) distribution, and a computation involving the regular section A, the local memory access sequence for any processor is characterized by a finite state machine of at most k states. We present fast algorithms for computing the essential information about these state machines, and extend the framework to handle multidimensional arrays. We also show how to generate communication sets using the state machine approach. Performance results show that this solution requires very little runtime overhead and acceptable preprocessing time.

  4. Higher Q factor and higher extinction ratio with lower detection limit photonic crystal-parallel-integrated sensor array for on-chip optical multiplexing sensing.

    PubMed

    Zhou, Jian; Huang, Lijun; Fu, Zhongyuan; Sun, Fujun; Tian, Huiping

    2016-12-10

    We introduce an alternative method to establish a nanoscale sensor array based on a photonic crystal (PhC) slab, which is referred to as a 1×4 monolithic PhC parallel-integrated sensor array (PhC-PISA). To realize this function, four lattice-shifted resonant cavities are butt-coupled to four output waveguide branches, respectively. By shifting the first to the two closest neighboring holes around the defect, a high Q factor over 1.5×104 has been obtained. Owing to the slightly different cavity spacing, each PhC resonator shows an independent resonant peak shift as the refractive index changes surrounding the resonant cavity. The specific single peak with a well-defined extinction ratio exceeds 25 dB. By applying the finite-difference time-domain (FDTD) method, we demonstrate that the sensitivities of each sensor in PhC-PISA S1=60.500  nm/RIU, S2=59.623  nm/RIU, S3=62.500  nm/RIU, and S4=51.142  nm/RIU (refractive index unit) are achieved, respectively. In addition, the negligible crosstalk and detection limit as small as 1×10-4 have been observed. The proposed sensor array as a desirable platform has great potential to realize optical multiplexing sensing and high-density monolithic integration.

  5. Parallelism in computational chemistry: Applications in quantum and statistical mechanics

    NASA Astrophysics Data System (ADS)

    Clementi, E.; Corongiu, G.; Detrich, J. H.; Kahnmohammadbaigi, H.; Chin, S.; Domingo, L.; Laaksonen, A.; Nguyen, N. L.

    1985-08-01

    Often very fundamental biochemical and biophysical problems defy simulations because of limitation in today's computers. We present and discuss a distributed system composed of two IBM-4341 and one IBM-4381, as front-end processors, and ten FPS-164 attached array processors. This parallel system-called LCAP-has presently a peak performance of about 120 MFlops; extensions to higher performance are discussed. Presently, the system applications use a modified version of VM/SP as the operating system: description of the modifications is given. Three applications programs have migrated from sequential to parallel; a molecular quantum mechanical, a Metropolis-Monte Carlo and a Molecular Dynamics program. Descriptions of the parallel codes are briefly outlined. As examples and tests of these applications we report on a study for proton tunneling in DNA base-pairs, very relevant to spontaneous mutations in genetics. As a second example, we present a Monte Carlo study of liquid water at room temperature where not only two- and three-body interactions are considered but-for the first time-also four-body interactions are included. Finally we briefly summarize a molecular dynamics study where two- and three-body interactions have been considered. These examples, and very positive performance comparison with today's supercomputers allow us to conclude that parallel computers and programming of the type we have considered, represent a pragmatic answer to many computer intensive problems.

  6. Hybrid Electro-Optic Processor

    DTIC Science & Technology

    1991-07-01

    This report describes the design of a hybrid electro - optic processor to perform adaptive interference cancellation in radar systems. The processor is...modulator is reported. Included is this report is a discussion of the design, partial fabrication in the laboratory, and partial testing of the hybrid electro ... optic processor. A follow on effort is planned to complete the construction and testing of the processor. The work described in this report is the

  7. Processor register error correction management

    SciTech Connect

    Bose, Pradip; Cher, Chen-Yong; Gupta, Meeta S.

    2016-12-27

    Processor register protection management is disclosed. In embodiments, a method of processor register protection management can include determining a sensitive logical register for executable code generated by a compiler, generating an error-correction table identifying the sensitive logical register, and storing the error-correction table in a memory accessible by a processor. The processor can be configured to generate a duplicate register of the sensitive logical register identified by the error-correction table.

  8. RISC Processors and High Performance Computing

    NASA Technical Reports Server (NTRS)

    Saini, Subhash; Bailey, David H.; Lasinski, T. A. (Technical Monitor)

    1995-01-01

    In this tutorial, we will discuss top five current RISC microprocessors: The IBM Power2, which is used in the IBM RS6000/590 workstation and in the IBM SP2 parallel supercomputer, the DEC Alpha, which is in the DEC Alpha workstation and in the Cray T3D; the MIPS R8000, which is used in the SGI Power Challenge; the HP PA-RISC 7100, which is used in the HP 700 series workstations and in the Convex Exemplar; and the Cray proprietary processor, which is used in the new Cray J916. The architecture of these microprocessors will first be presented. The effective performance of these processors will then be compared, both by citing standard benchmarks and also in the context of implementing a real applications. In the process, different programming models such as data parallel (CM Fortran and HPF) and message passing (PVM and MPI) will be introduced and compared. The latest NAS Parallel Benchmark (NPB) absolute performance and performance per dollar figures will be presented. The next generation of the NP13 will also be described. The tutorial will conclude with a discussion of general trends in the field of high performance computing, including likely future developments in hardware and software technology, and the relative roles of vector supercomputers tightly coupled parallel computers, and clusters of workstations. This tutorial will provide a unique cross-machine comparison not available elsewhere.

  9. Accelerating parallel transmit array B1 mapping in high field MRI with slice undersampling and interpolation by kriging.

    PubMed

    Ferrand, Guillaume; Luong, Michel; Cloos, Martijn A; Amadon, Alexis; Wackernagel, Hans

    2014-08-01

    Transmit arrays have been developed to mitigate the RF field inhomogeneity commonly observed in high field magnetic resonance imaging (MRI), typically above 3T. To this end, the knowledge of the RF complex-valued B1 transmit-sensitivities of each independent radiating element has become essential. This paper details a method to speed up a currently available B1-calibration method. The principle relies on slice undersampling, slice and channel interleaving and kriging, an interpolation method developed in geostatistics and applicable in many domains. It has been demonstrated that, under certain conditions, kriging gives the best estimator of a field in a region of interest. The resulting accelerated sequence allows mapping a complete set of eight volumetric field maps of the human head in about 1 min. For validation, the accuracy of kriging is first evaluated against a well-known interpolation technique based on Fourier transform as well as to a B1-maps interpolation method presented in the literature. This analysis is carried out on simulated and decimated experimental B1 maps. Finally, the accelerated sequence is compared to the standard sequence on a phantom and a volunteer. The new sequence provides B1 maps three times faster with a loss of accuracy limited potentially to about 5%.

  10. Hybrid photomultiplier tube and photodiode parallel detection array for wideband optical spectroscopy of the breast guided by magnetic resonance imaging

    NASA Astrophysics Data System (ADS)

    El-Ghussein, Fadi; Mastanduno, Michael A.; Jiang, Shudong; Pogue, Brian W.; Paulsen, Keith D.

    2014-01-01

    A new optical parallel detection system of hybrid frequency and continuous-wave domains was developed to improve the data quality and accuracy in recovery of all breast optical properties. This new system was deployed in a previously existing system for magnetic resonance imaging (MRI)-guided spectroscopy, and allows incorporation of additional near-infrared wavelengths beyond 850 nm, with interlaced channels of photomultiplier tubes (PMTs) and silicon photodiodes (PDs). The acquisition time for obtaining frequency-domain data at six wavelengths (660, 735, 785, 808, 826, and 849 nm) and continuous-wave data at three wavelengths (903, 912, and 948 nm) is 12 min. The dynamic ranges of the detected signal are 105 and 106 for PMT and PD detectors, respectively. Compared to the previous detection system, the SNR ratio of frequency-domain detection was improved by nearly 103 through the addition of an RF amplifier and the utilization of programmable gain. The current system is being utilized in a clinical trial imaging suspected breast cancer tumors as detected by contrast MRI scans.

  11. Hybrid photomultiplier tube and photodiode parallel detection array for wideband optical spectroscopy of the breast guided by magnetic resonance imaging

    PubMed Central

    Mastanduno, Michael A.; Jiang, Shudong; Pogue, Brian W.; Paulsen, Keith D.

    2013-01-01

    Abstract. A new optical parallel detection system of hybrid frequency and continuous-wave domains was developed to improve the data quality and accuracy in recovery of all breast optical properties. This new system was deployed in a previously existing system for magnetic resonance imaging (MRI)-guided spectroscopy, and allows incorporation of additional near-infrared wavelengths beyond 850 nm, with interlaced channels of photomultiplier tubes (PMTs) and silicon photodiodes (PDs). The acquisition time for obtaining frequency-domain data at six wavelengths (660, 735, 785, 808, 826, and 849 nm) and continuous-wave data at three wavelengths (903, 912, and 948 nm) is 12 min. The dynamic ranges of the detected signal are 105 and 106 for PMT and PD detectors, respectively. Compared to the previous detection system, the SNR ratio of frequency-domain detection was improved by nearly 103 through the addition of an RF amplifier and the utilization of programmable gain. The current system is being utilized in a clinical trial imaging suspected breast cancer tumors as detected by contrast MRI scans. PMID:23979460

  12. Microprogrammable high-speed bit slice image processor

    SciTech Connect

    Thomas, P.E.; Glass, R.D.

    1981-01-01

    The processor's basic architecture is dynamically alterable into either a serial or pipelined configuration achieving higher speed than either architecture alone could provide. The speed is enhanced further by the availability of eight parallel paths allowing a maximum throughput in excess of 40 million operations per second. The algorithms implemented include sobel edge, shape/connectivity, laplacian, histogram flattening and compression, a sophisticated peak detection scheme, and a destreaking function. Being microprogrammable, the processor will allow implementation of additional algorithms for alternative applications. Ensuing discussion develops the overall architecture from a functional point of view illustrating the parallelism in the design which allowed efficient implementation of this general class of algorithms.

  13. MAP3D: a media processor approach for high-end 3D graphics

    NASA Astrophysics Data System (ADS)

    Darsa, Lucia; Stadnicki, Steven; Basoglu, Chris

    1999-12-01

    Equator Technologies, Inc. has used a software-first approach to produce several programmable and advanced VLIW processor architectures that have the flexibility to run both traditional systems tasks and an array of media-rich applications. For example, Equator's MAP1000A is the world's fastest single-chip programmable signal and image processor targeted for digital consumer and office automation markets. The Equator MAP3D is a proposal for the architecture of the next generation of the Equator MAP family. The MAP3D is designed to achieve high-end 3D performance and a variety of customizable special effects by combining special graphics features with high performance floating-point and media processor architecture. As a programmable media processor, it offers the advantages of a completely configurable 3D pipeline--allowing developers to experiment with different algorithms and to tailor their pipeline to achieve the highest performance for a particular application. With the support of Equator's advanced C compiler and toolkit, MAP3D programs can be written in a high-level language. This allows the compiler to successfully find and exploit any parallelism in a programmer's code, thus decreasing the time to market of a given applications. The ability to run an operating system makes it possible to run concurrent applications in the MAP3D chip, such as video decoding while executing the 3D pipelines, so that integration of applications is easily achieved--using real-time decoded imagery for texturing 3D objects, for instance. This novel architecture enables an affordable, integrated solution for high performance 3D graphics.

  14. Highly parallel computer architecture for robotic computation

    NASA Technical Reports Server (NTRS)

    Fijany, Amir (Inventor); Bejczy, Anta K. (Inventor)

    1991-01-01

    In a computer having a large number of single instruction multiple data (SIMD) processors, each of the SIMD processors has two sets of three individual processor elements controlled by a master control unit and interconnected among a plurality of register file units where data is stored. The register files input and output data in synchronism with a minor cycle clock under control of two slave control units controlling the register file units connected to respective ones of the two sets of processor elements. Depending upon which ones of the register file units are enabled to store or transmit data during a particular minor clock cycle, the processor elements within an SIMD processor are connected in rings or in pipeline arrays, and may exchange data with the internal bus or with neighboring SIMD processors through interface units controlled by respective ones of the two slave control units.

  15. Problem size, parallel architecture and optimal speedup

    NASA Technical Reports Server (NTRS)

    Nicol, David M.; Willard, Frank H.

    1987-01-01

    The communication and synchronization overhead inherent in parallel processing can lead to situations where adding processors to the solution method actually increases execution time. Problem type, problem size, and architecture type all affect the optimal number of processors to employ. The numerical solution of an elliptic partial differential equation is examined in order to study the relationship between problem size and architecture. The equation's domain is discretized into n sup 2 grid points which are divided into partitions and mapped onto the individual processor memories. The relationships between grid size, stencil type, partitioning strategy, processor execution time, and communication network type are analytically quantified. In so doing, the optimal number of processors was determined to assign to the solution, and identified (1) the smallest grid size which fully benefits from using all available processors, (2) the leverage on performance given by increasing processor speed or communication network speed, and (3) the suitability of various architectures for large numerical problems.

  16. QSpike tools: a generic framework for parallel batch preprocessing of extracellular neuronal signals recorded by substrate microelectrode arrays

    PubMed Central

    Mahmud, Mufti; Pulizzi, Rocco; Vasilaki, Eleni; Giugliano, Michele

    2014-01-01

    Micro-Electrode Arrays (MEAs) have emerged as a mature technique to investigate brain (dys)functions in vivo and in in vitro animal models. Often referred to as “smart” Petri dishes, MEAs have demonstrated a great potential particularly for medium-throughput studies in vitro, both in academic and pharmaceutical industrial contexts. Enabling rapid comparison of ionic/pharmacological/genetic manipulations with control conditions, MEAs are employed to screen compounds by monitoring non-invasively the spontaneous and evoked neuronal electrical activity in longitudinal studies, with relatively inexpensive equipment. However, in order to acquire sufficient statistical significance, recordings last up to tens of minutes and generate large amount of raw data (e.g., 60 channels/MEA, 16 bits A/D conversion, 20 kHz sampling rate: approximately 8 GB/MEA,h uncompressed). Thus, when the experimental conditions to be tested are numerous, the availability of fast, standardized, and automated signal preprocessing becomes pivotal for any subsequent analysis and data archiving. To this aim, we developed an in-house cloud-computing system, named QSpike Tools, where CPU-intensive operations, required for preprocessing of each recorded channel (e.g., filtering, multi-unit activity detection, spike-sorting, etc.), are decomposed and batch-queued to a multi-core architecture or to a computers cluster. With the commercial availability of new and inexpensive high-density MEAs, we believe that disseminating QSpike Tools might facilitate its wide adoption and customization, and inspire the creation of community-supported cloud-computing facilities for MEAs users. PMID:24678297

  17. Algorithmic commonalities in the parallel environment

    NASA Technical Reports Server (NTRS)

    Mcanulty, Michael A.; Wainer, Michael S.

    1987-01-01

    The ultimate aim of this project was to analyze procedures from substantially different application areas to discover what is either common or peculiar in the process of conversion to the Massively Parallel Processor (MPP). Three areas were identified: molecular dynamic simulation, production systems (rule systems), and various graphics and vision algorithms. To date, only selected graphics procedures have been investigated. They are the most readily available, and produce the most visible results. These include simple polygon patch rendering, raycasting against a constructive solid geometric model, and stochastic or fractal based textured surface algorithms. Only the simplest of conversion strategies, mapping a major loop to the array, has been investigated so far. It is not entirely satisfactory.

  18. The U.S. Sarsat geosynchronous experiment - Ground processor description and test results

    NASA Technical Reports Server (NTRS)

    Flikkema, P. G.; Davisson, L. D.

    1988-01-01

    The development of a specialized digital signal processor, the Geosynchronous Signal Processor (GSP), for short beacon burst signal detection and demodulation is described. The processing is based on fast Fourier Transform techniques for detection and message integration on the respective message bursts for demodulation. The GSP is based on array processor technology; it is designed to yield an ultimate capacity of 50-75 simultaneous beacon transmissions within the nominal 20 KHz bandwidth centered at 406.025 MHz.

  19. A programmable systolic trigger processor for FERA-bus data

    NASA Astrophysics Data System (ADS)

    Appelquist, G.; Hovander, B.; Sellden, B.; Bohm, C.

    1992-09-01

    A generic CAMAC based trigger processor module for fast processing of large amounts of Analog to Digital Converter (ADC) data was designed. This module was realized using complex programmable gate arrays. The gate arrays were connected to memories and multipliers in such a way that different gate array configurations can cover a wide range of module applications. Using this module, it is possible to construct complex trigger processors. The module uses both the fast ECL FERA bus and the CAMAC bus for inputs and outputs. The latter is used for set up and control but may also be used for data output. Large numbers of ADC's can be served by a hierarchical arrangement of trigger processor modules which process ADC data with pipeline arithmetics and produce the final result at the apex of the pyramid. The trigger decision is transmitted to the data acquisition system via a logic signal while numeric results may be extracted by the CAMAC controller. The trigger processor was developed for the proposed neutral particle search. It was designed to serve as a second level trigger processor. It was required to correct all ADC raw data for efficiency and pedestal, calculate the total calorimeter energy, obtain the optimal time of flight data, and calculate the particle mass. A suitable mass cut would then deliver the trigger decision.

  20. Fault detection and bypass in a sequence information signal processor

    NASA Technical Reports Server (NTRS)

    Peterson, John C. (Inventor); Chow, Edward T. (Inventor)

    1992-01-01

    The invention comprises a plurality of scan registers, each such register respectively associated with a processor element; an on-chip comparator, encoder and fault bypass register. Each scan register generates a unitary signal the logic state of which depends on the correctness of the input from the previous processor in the systolic array. These unitary signals are input to a common comparator which generates an output indicating whether or not an error has occurred. These unitary signals are also input to an encoder which identifies the location of any fault detected so that an appropriate multiplexer can be switched to bypass the faulty processor element. Input scan data can be readily programmed to fully exercise all of the processor elements so that no fault can remain undetected.

  1. Optical backplane interconnect switch for data processors and computers

    NASA Technical Reports Server (NTRS)

    Hendricks, Herbert D.; Benz, Harry F.; Hammer, Jacob M.

    1989-01-01

    An optoelectronic integrated device design is reported which can be used to implement an all-optical backplane interconnect switch. The switch is sized to accommodate an array of processors and memories suitable for direct replacement into the basic avionic multiprocessor backplane. The optical backplane interconnect switch is also suitable for direct replacement of the PI bus traffic switch and at the same time, suitable for supporting pipelining of the processor and memory. The 32 bidirectional switchable interconnects are configured with broadcast capability for controls, reconfiguration, and messages. The approach described here can handle a serial interconnection of data processors or a line-to-link interconnection of data processors. An optical fiber demonstration of this approach is presented.

  2. Incremental Parallelization of Non-Data-Parallel Programs Using the Charon Message-Passing Library

    NASA Technical Reports Server (NTRS)

    VanderWijngaart, Rob F.

    2000-01-01

    Message passing is among the most popular techniques for parallelizing scientific programs on distributed-memory architectures. The reasons for its success are wide availability (MPI), efficiency, and full tuning control provided to the programmer. A major drawback, however, is that incremental parallelization, as offered by compiler directives, is not generally possible, because all data structures have to be changed throughout the program simultaneously. Charon remedies this situation through mappings between distributed and non-distributed data. It allows breaking up the parallelization into small steps, guaranteeing correctness at every stage. Several tools are available to help convert legacy codes into high-performance message-passing programs. They usually target data-parallel applications, whose loops carrying most of the work can be distributed among all processors without much dependency analysis. Others do a full dependency analysis and then convert the code virtually automatically. Even more toolkits are available that aid construction from scratch of message passing programs. None, however, allows piecemeal translation of codes with complex data dependencies (i.e. non-data-parallel programs) into message passing codes. The Charon library (available in both C and Fortran) provides incremental parallelization capabilities by linking legacy code arrays with distributed arrays. During the conversion process, non-distributed and distributed arrays exist side by side, and simple mapping functions allow the programmer to switch between the two in any location in the program. Charon also provides wrapper functions that leave the structure of the legacy code intact, but that allow execution on truly distributed data. Finally, the library provides a rich set of communication functions that support virtually all patterns of remote data demands in realistic structured grid scientific programs, including transposition, nearest-neighbor communication, pipelining

  3. Scaling and performance of a 3-D radiation hydrodynamics code on message-passing parallel computers: final report

    SciTech Connect

    Hayes, J C; Norman, M

    1999-10-28

    This report details an investigation into the efficacy of two approaches to solving the radiation diffusion equation within a radiation hydrodynamic simulation. Because leading-edge scientific computing platforms have evolved from large single-node vector processors to parallel aggregates containing tens to thousands of individual CPU's, the ability of an algorithm to maintain high compute efficiency when distributed over a large array of nodes is critically important. The viability of an algorithm thus hinges upon the tripartite question of numerical accuracy, total time to solution, and parallel efficiency.

  4. Parallel VLSI architecture emulation and the organization of APSA/MPP

    NASA Technical Reports Server (NTRS)

    Odonnell, John T.

    1987-01-01

    The Applicative Programming System Architecture (APSA) combines an applicative language interpreter with a novel parallel computer architecture that is well suited for Very Large Scale Integration (VLSI) implementation. The Massively Parallel Processor (MPP) can simulate VLSI circuits by allocating one processing element in its square array to an area on a square VLSI chip. As long as there are not too many long data paths, the MPP can simulate a VLSI clock cycle very rapidly. The APSA circuit contains a binary tree with a few long paths and many short ones. A skewed H-tree layout allows every processing element to simulate a leaf cell and up to four tree nodes, with no loss in parallelism. Emulation of a key APSA algorithm on the MPP resulted in performance 16,000 times faster than a Vax. This speed will make it possible for the APSA language interpreter to run fast enough to support research in parallel list processing algorithms.

  5. Extending Automatic Parallelization to Optimize High-Level Abstractions for Multicore

    SciTech Connect

    Liao, C; Quinlan, D J; Willcock, J J; Panas, T

    2008-12-12

    Automatic introduction of OpenMP for sequential applications has attracted significant attention recently because of the proliferation of multicore processors and the simplicity of using OpenMP to express parallelism for shared-memory systems. However, most previous research has only focused on C and Fortran applications operating on primitive data types. C++ applications using high-level abstractions, such as STL containers and complex user-defined types, are largely ignored due to the lack of research compilers that are readily able to recognize high-level object-oriented abstractions and leverage their associated semantics. In this paper, we automatically parallelize C++ applications using ROSE, a multiple-language source-to-source compiler infrastructure which preserves the high-level abstractions and gives us access to their semantics. Several representative parallelization candidate kernels are used to explore semantic-aware parallelization strategies for high-level abstractions, combined with extended compiler analyses. Those kernels include an array-base computation loop, a loop with task-level parallelism, and a domain-specific tree traversal. Our work extends the applicability of automatic parallelization to modern applications using high-level abstractions and exposes more opportunities to take advantage of multicore processors.

  6. An Optical Tomography System Using a Digital Signal Processor

    PubMed Central

    Rahim, Ruzairi Abdul; Thiam, Chiam Kok; Fazalul Rahiman, Mohd Hafiz

    2008-01-01

    The use of a personal computer together with a Data Acquisition System (DAQ) as the processing tool in optical tomography systems has been the norm ever since the beginning of process tomography. However, advancements in silicon fabrication technology allow nowadays the fabrication of powerful Digital Signal Processors (DSP) at a reasonable cost. This allows this technology to be used in an optical tomography system since data acquisition and processing can be performed within the DSP. Thus, the dependency on a personal computer and a DAQ to sample and process the external signals can be reduced or even eliminated. The DSP system was customized to control the data acquisition process of 16×16 optical sensor array, arranged in parallel beam projection. The data collected was used to reconstruct the cross sectional image of the pipeline conveyor. For image display purposes, the reconstructed image was sent to a personal computer via serial communication. This allows the use of a laptop to display the tomogram image besides performing any other offline analysis. PMID:27879811

  7. An Optical Tomography System Using a Digital Signal Processor.

    PubMed

    Rahim, Ruzairi Abdul; Thiam, Chiam Kok; Rahiman, Mohd Hafiz Fazalul

    2008-03-27

    The use of a personal computer together with a Data Acquisition System (DAQ) as the processing tool in optical tomography systems has been the norm ever since the beginning of process tomography. However, advancements in silicon fabrication technology allow nowadays the fabrication of powerful Digital Signal Processors (DSP) at a reasonable cost. This allows this technology to be used in an optical tomography system since data acquisition and processing can be performed within the DSP. Thus, the dependency on a personal computer and a DAQ to sample and process the external signals can be reduced or even eliminated. The DSP system was customized to control the data acquisition process of 16x16 optical sensor array, arranged in parallel beam projection. The data collected was used to reconstruct the cross sectional image of the pipeline conveyor. For image display purposes, the reconstructed image was sent to a personal computer via serial communication. This allows the use of a laptop to display the tomogram image besides performing any other offline analysis.

  8. Generic-type hierarchical multi digital signal processor system for hard-field tomography.

    PubMed

    Garcia Castillo, Sergio; Ozanyan, Krikor B

    2007-05-01

    This article introduces the design and implementation of a hierarchical multi digital signal processor system aimed to perform parallel multichannel measurements and data processing of the type widely used in hard-field tomography. Details are presented of a complete tomography system with modular and expandable architecture, capable of accommodating a variety of data processing modalities, configured by software. The configuration of the acquisition and processing circuits and the management of the data flow allow a data frame rate of up to 250 kHz. Results of a case study, guided path tomography for temperature mapping, are shown as a direct demonstration of the system's capabilities. Digital lock-in detection is employed for data processing to extract the information from ac measurements of the temperature-induced resistance changes in an array of 32 noninteracting transducers, which is further exported for visualization.

  9. Distributed processor allocation for launching applications in a massively connected processors complex

    DOEpatents

    Pedretti, Kevin

    2008-11-18

    A compute processor allocator architecture for allocating compute processors to run applications in a multiple processor computing apparatus is distributed among a subset of processors within the computing apparatus. Each processor of the subset includes a compute processor allocator. The compute processor allocators can share a common database of information pertinent to compute processor allocation. A communication path permits retrieval of information from the database independently of the compute processor allocators.

  10. Massively parallel mathematical sieves

    SciTech Connect

    Montry, G.R.

    1989-01-01

    The Sieve of Eratosthenes is a well-known algorithm for finding all prime numbers in a given subset of integers. A parallel version of the Sieve is described that produces computational speedups over 800 on a hypercube with 1,024 processing elements for problems of fixed size. Computational speedups as high as 980 are achieved when the problem size per processor is fixed. The method of parallelization generalizes to other sieves and will be efficient on any ensemble architecture. We investigate two highly parallel sieves using scattered decomposition and compare their performance on a hypercube multiprocessor. A comparison of different parallelization techniques for the sieve illustrates the trade-offs necessary in the design and implementation of massively parallel algorithms for large ensemble computers.

  11. Scaling and Graphical Transport-Map Analysis of Ambipolar Schottky-Barrier Thin-Film Transistors Based on a Parallel Array of Si Nanowires.

    PubMed

    Jeon, Dae-Young; Pregl, Sebastian; Park, So Jeong; Baraban, Larysa; Cuniberti, Gianaurelio; Mikolajick, Thomas; Weber, Walter M

    2015-07-08

    Si nanowire (Si-NW) based thin-film transistors (TFTs) have been considered as a promising candidate for next-generation flexible and wearable electronics as well as sensor applications with high performance. Here, we have fabricated ambipolar Schottky-barrier (SB) TFTs consisting of a parallel array of Si-NWs and performed an in-depth study related to their electrical performance and operation mechanism through several electrical parameters extracted from the channel length scaling based method. Especially, the newly suggested current-voltage (I-V) contour map clearly elucidates the unique operation mechanism of the ambipolar SB-TFTs, governed by Schottky-junction between NiSi2 and Si-NW. Further, it reveals for the first-time in SB based FETs the important internal electrostatic coupling between the channel and externally applied voltages. This work provides helpful information for the realization of practical circuits with ambipolar SB-TFTs that can be transferred to different substrate technologies and applications.

  12. Dielectrophoresis-assisted massively parallel cell pairing and fusion based on field constriction created by a micro-orifice array sheet.

    PubMed

    Kimura, Yuji; Gel, Murat; Techaumnat, Boonchai; Oana, Hidehiro; Kotera, Hidetoshi; Washizu, Masao

    2011-09-01

    In this paper, we present a novel electrofusion device that enables massive parallelism, using an electrically insulating sheet having a two-dimensional micro-orifice array. The sheet is sandwiched by a pair of micro-chambers with immersed electrodes, and each chamber is filled with the suspensions of the two types of cells to be fused. Dielectrophoresis, assisted by sedimentation, is used to position the cells in the upper chamber down onto the orifices, then the device is flipped over to position the cells on the other side, so that cell pairs making contact in the orifice are formed. When a pulse voltage is applied to the electrodes, most voltage drop occurs around the orifice and impressed on the cell membrane in the orifice. This makes possible the application of size-independent voltage to fuse two cells in contact at all orifices exclusively in 1:1 manner. In the experiment, cytoplasm of one of the cells is stained with a fluorescence dye, and the transfer of the fluorescence to the other cell is used as the indication of fusion events. The two-dimensional orifice arrangement at the pitch of 50 μm realizes simultaneous fusion of 6 × 10³ cells on a 4 mm diameter chip, and the fusion yield of 78-90% is achieved for various sizes and types of cells.

  13. Parallel Monte Carlo simulation of multilattice thin film growth

    NASA Astrophysics Data System (ADS)

    Shu, J. W.; Lu, Qin; Wong, Wai-on; Huang, Han-chen

    2001-07-01

    This paper describe a new parallel algorithm for the multi-lattice Monte Carlo atomistic simulator for thin film deposition (ADEPT), implemented on parallel computer using the PVM (Parallel Virtual Machine) message passing library. This parallel algorithm is based on domain decomposition with overlapping and asynchronous communication. Multiple lattices are represented by a single reference lattice through one-to-one mappings, with resulting computational demands being comparable to those in the single-lattice Monte Carlo model. Asynchronous communication and domain overlapping techniques are used to reduce the waiting time and communication time among parallel processors. Results show that the algorithm is highly efficient with large number of processors. The algorithm was implemented on a parallel machine with 50 processors, and it is suitable for parallel Monte Carlo simulation of thin film growth with either a distributed memory parallel computer or a shared memory machine with message passing libraries. In this paper, the significant communication time in parallel MC simulation of thin film growth is effectively reduced by adopting domain decomposition with overlapping between sub-domains and asynchronous communication among processors. The overhead of communication does not increase evidently and speedup shows an ascending tendency when the number of processor increases. A near linear increase in computing speed was achieved with number of processors increases and there is no theoretical limit on the number of processors to be used. The techniques developed in this work are also suitable for the implementation of the Monte Carlo code on other parallel systems.

  14. Waste from food processors

    SciTech Connect

    Sheehan, K.

    1993-12-01

    Food processing companies, by nature of the commodities they deal in and the products they provide, generate a much higher percentage of biodegradable, organic wastes than they do nonorganic wastes. The high percentage of food materials, and to a lesser extent, paper, found in a food processor's waste stream makes composting a highly cost-effective way to manage the wastes. This is the last in a series of articles that discussed solid waste management in various public arenas. Each segment highlighted particulars -- the waste stream; how the waste is handled; waste reduction and recovery programs; and the direction of future waste management -- that are specific to that area.

  15. Enhancing Scalability of Parallel Structured AMR Calculations

    SciTech Connect

    Wissink, A M; Hysom, D; Hornung, R D

    2003-02-10

    This paper discusses parallel scaling performance of large scale parallel structured adaptive mesh refinement (SAMR) calculations in SAMRAI. Previous work revealed that poor scaling qualities in the adaptive gridding operations in SAMR calculations cause them to become dominant for cases run on up to 512 processors. This work describes algorithms we have developed to enhance the efficiency of the adaptive gridding operations. Performance of the algorithms is evaluated for two adaptive benchmarks run on up 512 processors of an IBM SP system.

  16. Fast Parallel Computation Of Multibody Dynamics

    NASA Technical Reports Server (NTRS)

    Fijany, Amir; Kwan, Gregory L.; Bagherzadeh, Nader

    1996-01-01

    Constraint-force algorithm fast, efficient, parallel-computation algorithm for solving forward dynamics problem of multibody system like robot arm or vehicle. Solves problem in minimum time proportional to log(N) by use of optimal number of processors proportional to N, where N is number of dynamical degrees of freedom: in this sense, constraint-force algorithm both time-optimal and processor-optimal parallel-processing algorithm.

  17. Parallel design patterns for a low-power, software-defined compressed video encoder

    NASA Astrophysics Data System (ADS)

    Bruns, Michael W.; Hunt, Martin A.; Prasad, Durga; Gunupudi, Nageswara R.; Sonachalam, Sekar

    2011-06-01

    Video compression algorithms such as H.264 offer much potential for parallel processing that is not always exploited by the technology of a particular implementation. Consumer mobile encoding devices often achieve real-time performance and low power consumption through parallel processing in Application Specific Integrated Circuit (ASIC) technology, but many other applications require a software-defined encoder. High quality compression features needed for some applications such as 10-bit sample depth or 4:2:2 chroma format often go beyond the capability of a typical consumer electronics device. An application may also need to efficiently combine compression with other functions such as noise reduction, image stabilization, real time clocks, GPS data, mission/ESD/user data or software-defined radio in a low power, field upgradable implementation. Low power, software-defined encoders may be implemented using a massively parallel memory-network processor array with 100 or more cores and distributed memory. The large number of processor elements allow the silicon device to operate more efficiently than conventional DSP or CPU technology. A dataflow programming methodology may be used to express all of the encoding processes including motion compensation, transform and quantization, and entropy coding. This is a declarative programming model in which the parallelism of the compression algorithm is expressed as a hierarchical graph of tasks with message communication. Data parallel and task parallel design patterns are supported without the need for explicit global synchronization control. An example is described of an H.264 encoder developed for a commercially available, massively parallel memorynetwork processor device.

  18. Parallel Modem Architectures for High-Data-Rate Space Modems

    NASA Astrophysics Data System (ADS)

    Satorius, E.

    2014-08-01

    Existing software-defined radios (SDRs) for space are limited in data volume by several factors, including bandwidth, space-qualified analog-to-digital converter (ADC) technology, and processor throughput, e.g., the throughput of a space-qualified field-programmable gate array (FPGA). In an attempt to further improve the throughput of space-based SDRs and to fully exploit the newer and more capable space-qualified technology (ADCs, FPGAs), we are evaluating parallel transmitter/receiver architectures for space SDRs. These architectures would improve data volume for both deep-space and particularly proximity (e.g., relay) links. In this article, designs for FPGA implementation of a high-rate parallel modem are presented as well as both fixed- and floating-point simulated performance results based on a functional design that is suitable for FPGA implementation.

  19. All-optical digital processor based on harmonic generation phenomena

    NASA Astrophysics Data System (ADS)

    Shcherbakov, Alexandre S.; Rakovsky, Vsevolod Y.

    1990-07-01

    Digital optical processors are designed to combine ultra- parallel data procesing capabilities of optical aystems cnd high accur&cy of performed computations. The ultimate limit of the processing rate can be anticipated from all-optical parcllel erchitecturea based on networks o logic gates using materials exibiting strong electronic nonlinearities with response times less than 1O seconds1.

  20. Parallel algorithms for interactive manipulation of digital terrain models

    NASA Technical Reports Server (NTRS)

    Davis, E. W.; Mcallister, D. F.; Nagaraj, V.

    1988-01-01

    Interactive three-dimensional graphics applications, such as terrain data representation and manipulation, require extensive arithmetic processing. Massively parallel machines are attractive for this application since they offer high computational rates, and grid connected architectures provide a natural mapping for grid based terrain models. Presented here are algorithms for data movement on the massive parallel processor (MPP) in support of pan and zoom functions over large data grids. It is an extension of earlier work that demonstrated real-time performance of graphics functions on grids that were equal in size to the physical dimensions of the MPP. When the dimensions of a data grid exceed the processing array size, data is packed in the array memory. Windows of the total data grid are interactively selected for processing. Movement of packed data is needed to distribute items across the array for efficient parallel processing. Execution time for data movement was found to exceed that for arithmetic aspects of graphics functions. Performance figures are given for routines written in MPP Pascal.

  1. Use of parallel computing in mass processing of laser data

    NASA Astrophysics Data System (ADS)

    Będkowski, J.; Bratuś, R.; Prochaska, M.; Rzonca, A.

    2015-12-01

    The first part of the paper includes a description of the rules used to generate the algorithm needed for the purpose of parallel computing and also discusses the origins of the idea of research on the use of graphics processors in large scale processing of laser scanning data. The next part of the paper includes the results of an efficiency assessment performed for an array of different processing options, all of which were substantially accelerated with parallel computing. The processing options were divided into the generation of orthophotos using point clouds, coloring of point clouds, transformations, and the generation of a regular grid, as well as advanced processes such as the detection of planes and edges, point cloud classification, and the analysis of data for the purpose of quality control. Most algorithms had to be formulated from scratch in the context of the requirements of parallel computing. A few of the algorithms were based on existing technology developed by the Dephos Software Company and then adapted to parallel computing in the course of this research study. Processing time was determined for each process employed for a typical quantity of data processed, which helped confirm the high efficiency of the solutions proposed and the applicability of parallel computing to the processing of laser scanning data. The high efficiency of parallel computing yields new opportunities in the creation and organization of processing methods for laser scanning data.

  2. 3D-Flow processor for a programmable Level-1 trigger (feasibility study)

    SciTech Connect

    Crosetto, D.

    1992-10-01

    A feasibility study has been made to use the 3D-Flow processor in a pipelined programmable parallel processing architecture to identify particles such as electrons, jets, muons, etc., in high-energy physics experiments.

  3. Efficacy of Code Optimization on Cache-based Processors

    NASA Technical Reports Server (NTRS)

    VanderWijngaart, Rob F.; Chancellor, Marisa K. (Technical Monitor)

    1997-01-01

    The current common wisdom in the U.S. is that the powerful, cost-effective supercomputers of tomorrow will be based on commodity (RISC) micro-processors with cache memories. Already, most distributed systems in the world use such hardware as building blocks. This shift away from vector supercomputers and towards cache-based systems has brought about a change in programming paradigm, even when ignoring issues of parallelism. Vector machines require inner-loop independence and regular, non-pathological memory strides (usually this means: non-power-of-two strides) to allow efficient vectorization of array operations. Cache-based systems require spatial and temporal locality of data, so that data once read from main memory and stored in high-speed cache memory is used optimally before being written back to main memory. This means that the most cache-friendly array operations are those that feature zero or unit stride, so that each unit of data read from main memory (a cache line) contains information for the next iteration in the loop. Moreover, loops ought to be 'fat', meaning that as many operations as possible are performed on cache data-provided instruction caches do not overflow and enough registers are available. If unit stride is not possible, for example because of some data dependency, then care must be taken to avoid pathological strides, just ads on vector computers. For cache-based systems the issues are more complex, due to the effects of associativity and of non-unit block (cache line) size. But there is more to the story. Most modern micro-processors are superscalar, which means that they can issue several (arithmetic) instructions per clock cycle, provided that there are enough independent instructions in the loop body. This is another argument for providing fat loop bodies. With these restrictions, it appears fairly straightforward to produce code that will run efficiently on any cache-based system. It can be argued that although some of the important

  4. Linear array implementation of the EM algorithm for PET image reconstruction

    SciTech Connect

    Rajan, K.; Patnaik, L.M.; Ramakrishna, J.

    1995-08-01

    The PET image reconstruction based on the EM algorithm has several attractive advantages over the conventional convolution back projection algorithms. However, the PET image reconstruction based on the EM algorithm is computationally burdensome for today`s single processor systems. In addition, a large memory is required for the storage of the image, projection data, and the probability matrix. Since the computations are easily divided into tasks executable in parallel, multiprocessor configurations are the ideal choice for fast execution of the EM algorithms. In tis study, the authors attempt to overcome these two problems by parallelizing the EM algorithm on a multiprocessor systems. The parallel EM algorithm on a linear array topology using the commercially available fast floating point digital signal processor (DSP) chips as the processing elements (PE`s) has been implemented. The performance of the EM algorithm on a 386/387 machine, IBM 6000 RISC workstation, and on the linear array system is discussed and compared. The results show that the computational speed performance of a linear array using 8 DSP chips as PE`s executing the EM image reconstruction algorithm is about 15.5 times better than that of the IBM 6000 RISC workstation. The novelty of the scheme is its simplicity. The linear array topology is expandable with a larger number of PE`s. The architecture is not dependant on the DSP chip chosen, and the substitution of the latest DSP chip is straightforward and could yield better speed performance.

  5. Reconfigurable data path processor

    NASA Technical Reports Server (NTRS)

    Donohoe, Gregory (Inventor)

    2005-01-01

    A reconfigurable data path processor comprises a plurality of independent processing elements. Each of the processing elements advantageously comprising an identical architecture. Each processing element comprises a plurality of data processing means for generating a potential output. Each processor is also capable of through-putting an input as a potential output with little or no processing. Each processing element comprises a conditional multiplexer having a first conditional multiplexer input, a second conditional multiplexer input and a conditional multiplexer output. A first potential output value is transmitted to the first conditional multiplexer input, and a second potential output value is transmitted to the second conditional multiplexer output. The conditional multiplexer couples either the first conditional multiplexer input or the second conditional multiplexer input to the conditional multiplexer output, according to an output control command. The output control command is generated by processing a set of arithmetic status-bits through a logical mask. The conditional multiplexer output is coupled to a first processing element output. A first set of arithmetic bits are generated according to the processing of the first processable value. A second set of arithmetic bits may be generated from a second processing operation. The selection of the arithmetic status-bits is performed by an arithmetic-status bit multiplexer selects the desired set of arithmetic status bits from among the first and second set of arithmetic status bits. The conditional multiplexer evaluates the select arithmetic status bits according to logical mask defining an algorithm for evaluating the arithmetic status bits.

  6. Novel processor architecture for onboard infrared sensors

    NASA Astrophysics Data System (ADS)

    Hihara, Hiroki; Iwasaki, Akira; Tamagawa, Nobuo; Kuribayashi, Mitsunobu; Hashimoto, Masanori; Mitsuyama, Yukio; Ochi, Hiroyuki; Onodera, Hidetoshi; Kanbara, Hiroyuki; Wakabayashi, Kazutoshi; Tada, Munehiro

    2016-09-01

    Infrared sensor system is a major concern for inter-planetary missions that investigate the nature and the formation processes of planets and asteroids. The infrared sensor system requires signal preprocessing functions that compensate for the intensity of infrared image sensors to get high quality data and high compression ratio through the limited capacity of transmission channels towards ground stations. For those implementations, combinations of Field Programmable Gate Arrays (FPGAs) and microprocessors are employed by AKATSUKI, the Venus Climate Orbiter, and HAYABUSA2, the asteroid probe. On the other hand, much smaller size and lower power consumption are demanded for future missions to accommodate more sensors. To fulfill this future demand, we developed a novel processor architecture which consists of reconfigurable cluster cores and programmable-logic cells with complementary atom switches. The complementary atom switches enable hardware programming without configuration memories, and thus soft-error on logic circuit connection is completely eliminated. This is a noteworthy advantage for space applications which cannot be found in conventional re-writable FPGAs. Almost one-tenth of lower power consumption is expected compared to conventional re-writable FPGAs because of the elimination of configuration memories. The proposed processor architecture can be reconfigured by behavioral synthesis with higher level language specification. Consequently, compensation functions are implemented in a single chip without accommodating program memories, which is accompanied with conventional microprocessors, while maintaining the comparable performance. This enables us to embed a processor element on each infrared signal detector output channel.

  7. Software-Reconfigurable Processors for Spacecraft

    NASA Technical Reports Server (NTRS)

    Farrington, Allen; Gray, Andrew; Bell, Bryan; Stanton, Valerie; Chong, Yong; Peters, Kenneth; Lee, Clement; Srinivasan, Jeffrey

    2005-01-01

    A report presents an overview of an architecture for a software-reconfigurable network data processor for a spacecraft engaged in scientific exploration. When executed on suitable electronic hardware, the software performs the functions of a physical layer (in effect, acts as a software radio in that it performs modulation, demodulation, pulse-shaping, error correction, coding, and decoding), a data-link layer, a network layer, a transport layer, and application-layer processing of scientific data. The software-reconfigurable network processor is undergoing development to enable rapid prototyping and rapid implementation of communication, navigation, and scientific signal-processing functions; to provide a long-lived communication infrastructure; and to provide greatly improved scientific-instrumentation and scientific-data-processing functions by enabling science-driven in-flight reconfiguration of computing resources devoted to these functions. This development is an extension of terrestrial radio and network developments (e.g., in the cellular-telephone industry) implemented in software running on such hardware as field-programmable gate arrays, digital signal processors, traditional digital circuits, and mixed-signal application-specific integrated circuits (ASICs).

  8. Parallel unstructured grid generation

    NASA Technical Reports Server (NTRS)

    Loehner, Rainald; Camberos, Jose; Merriam, Marshal

    1991-01-01

    A parallel unstructured grid generation algorithm is presented and implemented on the Hypercube. Different processor hierarchies are discussed, and the appropraite hierarchies for mesh generation and mesh smoothing are selected. A domain-splitting algorithm for unstructured grids which tries to minimize the surface-to-volume ratio of each subdomain is described. This splitting algorithm is employed both for grid generation and grid smoothing. Results obtained on the Hypercube demonstrate the effectiveness of the algorithms developed.

  9. Parallel Genetic Algorithm for Alpha Spectra Fitting

    NASA Astrophysics Data System (ADS)

    García-Orellana, Carlos J.; Rubio-Montero, Pilar; González-Velasco, Horacio

    2005-01-01

    We present a performance study of alpha-particle spectra fitting using parallel Genetic Algorithm (GA). The method uses a two-step approach. In the first step we run parallel GA to find an initial solution for the second step, in which we use Levenberg-Marquardt (LM) method for a precise final fit. GA is a high resources-demanding method, so we use a Beowulf cluster for parallel simulation. The relationship between simulation time (and parallel efficiency) and processors number is studied using several alpha spectra, with the aim of obtaining a method to estimate the optimal processors number that must be used in a simulation.

  10. Parallel machine architecture for production rule systems

    DOEpatents

    Allen, Jr., John D.; Butler, Philip L.

    1989-01-01

    A parallel processing system for production rule programs utilizes a host processor for storing production rule right hand sides (RHS) and a plurality of rule processors for storing left hand sides (LHS). The rule processors operate in parallel in the recognize phase of the system recognize -Act Cycle to match their respective LHS's against a stored list of working memory elements (WME) in order to find a self consistent set of WME's. The list of WME is dynamically varied during the Act phase of the system in which the host executes or fires rule RHS's for those rules for which a self-consistent set has been found by the rule processors. The host transmits instructions for creating or deleting working memory elements as dictated by the rule firings until the rule processors are unable to find any further self-consistent working memory element sets at which time the production rule system is halted.

  11. FY 2006 Accomplishment Colony - "Services and Interfaces to Support Large Numbers of Processors"

    SciTech Connect

    Jones, T; Kale, L; Moreira, J; Mendes, C; Chakravorty, S; Tauferner, A; Inglett, T

    2006-06-30

    The Colony Project is developing operating system and runtime system technology to enable efficient general purpose environments on tens of thousands of processors. To accomplish this, we are investigating memory management techniques, fault management strategies, and parallel resource management schemes. Recent results show promising findings for scalable strategies based on processor virtualization, in-memory checkpointing, and parallel aware modifications to full featured operating systems.

  12. Fast Parallel Computation Of Manipulator Inverse Dynamics

    NASA Technical Reports Server (NTRS)

    Fijany, Amir; Bejczy, Antal K.

    1991-01-01

    Method for fast parallel computation of inverse dynamics problem, essential for real-time dynamic control and simulation of robot manipulators, undergoing development. Enables exploitation of high degree of parallelism and, achievement of significant computational efficiency, while minimizing various communication and synchronization overheads as well as complexity of required computer architecture. Universal real-time robotic controller and simulator (URRCS) consists of internal host processor and several SIMD processors with ring topology. Architecture modular and expandable: more SIMD processors added to match size of problem. Operate asynchronously and in MIMD fashion.

  13. Optimal parallel evaluation of AND trees

    NASA Technical Reports Server (NTRS)

    Wah, Benjamin W.; Li, Guo-Jie

    1990-01-01

    A quantitative analysis based on both preemptive and nonpreemptive critical-path scheduling algorithms is presently conducted for the optimal degree of parallelism required in evaluating a given AND tree. The optimal degree of parallelism is found to depend on problem complexity, precedence-graph shape, and task-time distribution along each path. In addition to demonstrating the optimality of the preemptive critical-path scheduling algorithm for evaluating an arbitrary AND tree on a fixed number of processors, the possibility of efficiently ascertaining tight bounds on the number of processors for optimal processor-time efficiency is illustrated.

  14. CoNNeCT Baseband Processor Module

    NASA Technical Reports Server (NTRS)

    Yamamoto, Clifford K; Jedrey, Thomas C.; Gutrich, Daniel G.; Goodpasture, Richard L.

    2011-01-01

    A document describes the CoNNeCT Baseband Processor Module (BPM) based on an updated processor, memory technology, and field-programmable gate arrays (FPGAs). The BPM was developed from a requirement to provide sufficient computing power and memory storage to conduct experiments for a Software Defined Radio (SDR) to be implemented. The flight SDR uses the AT697 SPARC processor with on-chip data and instruction cache. The non-volatile memory has been increased from a 20-Mbit EEPROM (electrically erasable programmable read only memory) to a 4-Gbit Flash, managed by the RTAX2000 Housekeeper, allowing more programs and FPGA bit-files to be stored. The volatile memory has been increased from a 20-Mbit SRAM (static random access memory) to a 1.25-Gbit SDRAM (synchronous dynamic random access memory), providing additional memory space for more complex operating systems and programs to be executed on the SPARC. All memory is EDAC (error detection and correction) protected, while the SPARC processor implements fault protection via TMR (triple modular redundancy) architecture. Further capability over prior BPM designs includes the addition of a second FPGA to implement features beyond the resources of a single FPGA. Both FPGAs are implemented with Xilinx Virtex-II and are interconnected by a 96-bit bus to facilitate data exchange. Dedicated 1.25- Gbit SDRAMs are wired to each Xilinx FPGA to accommodate high rate data buffering for SDR applications as well as independent SpaceWire interfaces. The RTAX2000 manages scrub and configuration of each Xilinx.

  15. Scalable parallel communications

    NASA Technical Reports Server (NTRS)

    Maly, K.; Khanna, S.; Overstreet, C. M.; Mukkamala, R.; Zubair, M.; Sekhar, Y. S.; Foudriat, E. C.

    1992-01-01

    Coarse-grain parallelism in networking (that is, the use of multiple protocol processors running replicated software sending over several physical channels) can be used to provide gigabit communications for a single application. Since parallel network performance is highly dependent on real issues such as hardware properties (e.g., memory speeds and cache hit rates), operating system overhead (e.g., interrupt handling), and protocol performance (e.g., effect of timeouts), we have performed detailed simulations studies of both a bus-based multiprocessor workstation node (based on the Sun Galaxy MP multiprocessor) and a distributed-memory parallel computer node (based on the Touchstone DELTA) to evaluate the behavior of coarse-grain parallelism. Our results indicate: (1) coarse-grain parallelism can deliver multiple 100 Mbps with currently available hardware platforms and existing networking protocols (such as Transmission Control Protocol/Internet Protocol (TCP/IP) and parallel Fiber Distributed Data Interface (FDDI) rings); (2) scale-up is near linear in n, the number of protocol processors, and channels (for small n and up to a few hundred Mbps); and (3) since these results are based on existing hardware without specialized devices (except perhaps for some simple modifications of the FDDI boards), this is a low cost solution to providing multiple 100 Mbps on current machines. In addition, from both the performance analysis and the properties of these architectures, we conclude: (1) multiple processors providing identical services and the use of space division multiplexing for the physical channels can provide better reliability than monolithic approaches (it also provides graceful degradation and low-cost load balancing); (2) coarse-grain parallelism supports running several transport protocols in parallel to provide different types of service (for example, one TCP handles small messages for many users, other TCP's running in parallel provide high bandwidth

  16. Japanese document recognition and retrieval system using programmable SIMD processor

    NASA Astrophysics Data System (ADS)

    Miyahara, Sueharu; Suzuki, Akira; Tada, Shunkichi; Kawatani, Takahiko

    1991-02-01

    This paper describes a new efficient information-filing system for a large number of documents. The system is designed to recognize Japanese characters and make full-text searches across a document database. Key components of the system are a small fully-programmable parallel processor for both recognition and retrieval an image scanner for document input and a personal computer as the operator console. The processor is constructed by a bit-serial single instruction multiple data stream architecture (SIMD) and all components including the 256 processor elements and 11 MB of RAM are integrated on one board. The recognition process divides a document into text lines isolates each character extracts character pattern features and then identifies character categories. The entire process is performed by a single micro-program package down-loaded from the console. The recognition accuracy is more than 99. 0 for about 3 printed Japanese characters at a performance speed of more than 14 characters per second. The processor can also be made available for high speed information retrieval by changing the down-loaded microprogram package. The retrieval process can obtain sentences that include the same information as an inquiry text from the database previously created through character recognition. Retrieval performance is very fast with 20 million individual Japanese characters being examined each second when the database is stored in the processor''s IC memory. It was confirmed that a high performance but flexible and cost-effective document-information-processing system

  17. Parallel architectures for vision

    SciTech Connect

    Maresca, M. ); Lavin, M.A. ); Li, H. )

    1988-08-01

    Vision computing involves the execution of a large number of operations on large sets of structured data. Sequential computers cannot achieve the speed required by most of the current applications and therefore parallel architectural solutions have to be explored. In this paper the authors examine the options that drive the design of a vision oriented computer, starting with the analysis of the basic vision computation and communication requirements. They briefly review the classical taxonomy for parallel computers, based on the multiplicity of the instruction and data stream, and apply a recently proposed criterion, the degree of autonomy of each processor, to further classify fine-grain SIMD massively parallel computers. They identify three types of processor autonomy, namely operation autonomy, addressing autonomy, and connection autonomy. For each type they give the basic definitions and show some examples. They focus on the concept of connection autonomy, which they believe is a key point in the development of massively parallel architectures for vision. They show two examples of parallel computers featuring different types of connection autonomy - the Connection Machine and the Polymorphic-Torus - and compare their cost and benefit.

  18. Interactive digital signal processor

    NASA Technical Reports Server (NTRS)

    Mish, W. H.; Wenger, R. M.; Behannon, K. W.; Byrnes, J. B.

    1982-01-01

    The Interactive Digital Signal Processor (IDSP) is examined. It consists of a set of time series analysis Operators each of which operates on an input file to produce an output file. The operators can be executed in any order that makes sense and recursively, if desired. The operators are the various algorithms used in digital time series analysis work. User written operators can be easily interfaced to the sysatem. The system can be operated both interactively and in batch mode. In IDSP a file can consist of up to n (currently n=8) simultaneous time series. IDSP currently includes over thirty standard operators that range from Fourier transform operations, design and application of digital filters, eigenvalue analysis, to operators that provide graphical output, allow batch operation, editing and display information.

  19. Reconfigurable pipelined processor

    SciTech Connect

    Saccardi, R.J.

    1989-09-19

    This patent describes a reconfigurable pipelined processor for processing data. It comprises: a plurality of memory devices for storing bits of data; a plurality of arithmetic units for performing arithmetic functions with the data; cross bar means for connecting the memory devices with the arithmetic units for transferring data therebetween; at least one counter connected with the cross bar means for providing a source of addresses to the memory devices; at least one variable tick delay device connected with each of the memory devices and arithmetic units; and means for providing control bits to the variable tick delay device for variably controlling the input and output operations thereof to selectively delay the memory devices and arithmetic units to align the data for processing in a selected sequence.

  20. Some Ideas about Idea Processors.

    ERIC Educational Resources Information Center

    Dobrin, David N.

    Idea processors are computer programs that can aid the user in creating outlines by allowing the user to move, reorder, renumber, expand upon, or delete entries with a push of a button. The question is whether these programs are useful and should be offered to students. Theoretically, an idea processor prioritizes ideas by placing them in a…

  1. Never Trust Your Word Processor

    ERIC Educational Resources Information Center

    Linke, Dirk

    2009-01-01

    In this article, the author talks about the auto correction mode of word processors that leads to a number of problems and describes an example in biochemistry exams that shows how word processors can lead to mistakes in databases and in papers. The author contends that, where this system is applied, spell checking should not be left to a word…

  2. Implementing clips on a parallel computer

    NASA Technical Reports Server (NTRS)

    Riley, Gary

    1987-01-01

    The C language integrated production system (CLIPS) is a forward chaining rule based language to provide training and delivery for expert systems. Conceptually, rule based languages have great potential for benefiting from the inherent parallelism of the algorithms that they employ. During each cycle of execution, a knowledge base of information is compared against a set of rules to determine if any rules are applicable. Parallelism also can be employed for use with multiple cooperating expert systems. To investigate the potential benefits of using a parallel computer to speed up the comparison of facts to rules in expert systems, a parallel version of CLIPS was developed for the FLEX/32, a large grain parallel computer. The FLEX implementation takes a macroscopic approach in achieving parallelism by splitting whole sets of rules among several processors rather than by splitting the components of an individual rule among processors. The parallel CLIPS prototype demonstrates the potential advantages of integrating expert system tools with parallel computers.

  3. A parallel algorithm for channel routing on a hypercube

    NASA Technical Reports Server (NTRS)

    Brouwer, Randall; Banerjee, Prithviraj

    1987-01-01

    A new parallel simulated annealing algorithm for channel routing on a P processor hypercube is presented. The basic idea used is to partition a set of tracks equally among processors in the hypercube. In parallel, P/2 pairs of processors perform displacements and exchanges of nets between tracks, compute the changes in cost functions, and accept moves using a parallel annealing criteria. Through the use of a unique distributed data structure, it is possible to minimize message traffic and add versatility and efficiency in a parallel routing tool. The algorithm has been implemented and is being tested on some of the popular channel problems from the literature.

  4. Design of free space interconnected signal processor

    NASA Astrophysics Data System (ADS)

    Murdocca, Miles; Stone, Thomas

    1993-12-01

    Progress is described on a collaborative effort between the Photonics Center at Rome Laboratory (RL), Griffiss AFB and Rutgers University, through the RL Expert Science and Engineering (ES&E) program. The goal of the effort is to develop a prototype random access memory (RAM) that can be used in a signal processor for a computing model that consists of cascaded arrays of optical logic gates interconnected in free space with regular patterns. The effort involved the optical and architectural development of a cascadable optical logic system in which microlaser pumped S-SEED devices serve as logic gates. At the completion of the contract, two gate-level layouts of the module were completed which were created in collaboration with RL personnel. The basic layout of the optical system has been developed, and key components have been tested. The delayed delivery of microlaser arrays precluded completion of the processor during the contract period, but preliminary testing was made possible through the use of other microlaser devices.

  5. Debugging Parallel Programs with Instant Replay.

    DTIC Science & Technology

    1986-09-01

    produce the same results. In this paper we present a general solution for reproducing the execution behavior of parallel programs, termed Instant Replay...Instant Replay on the BBN Butterfly Parallel Processor, and discuss how it can be incorporated into the debugging cycle for parallel programs. This...program often do not produce the same results. In this paper we present a general solution for reproducing the execution behavior of parallel

  6. Parallel processing data network of master and slave transputers controlled by a serial control network

    DOEpatents

    Crosetto, D.B.

    1996-12-31

    The present device provides for a dynamically configurable communication network having a multi-processor parallel processing system having a serial communication network and a high speed parallel communication network. The serial communication network is used to disseminate commands from a master processor to a plurality of slave processors to effect communication protocol, to control transmission of high density data among nodes and to monitor each slave processor`s status. The high speed parallel processing network is used to effect the transmission of high density data among nodes in the parallel processing system. Each node comprises a transputer, a digital signal processor, a parallel transfer controller, and two three-port memory devices. A communication switch within each node connects it to a fast parallel hardware channel through which all high density data arrives or leaves the node. 6 figs.

  7. Matrix preconditioning: a robust operation for optical linear algebra processors.

    PubMed

    Ghosh, A; Paparao, P

    1987-07-15

    Analog electrooptical processors are best suited for applications demanding high computational throughput with tolerance for inaccuracies. Matrix preconditioning is one such application. Matrix preconditioning is a preprocessing step for reducing the condition number of a matrix and is used extensively with gradient algorithms for increasing the rate of convergence and improving the accuracy of the solution. In this paper, we describe a simple parallel algorithm for matrix preconditioning, which can be implemented efficiently on a pipelined optical linear algebra processor. From the results of our numerical experiments we show that the efficacy of the preconditioning algorithm is affected very little by the errors of the optical system.

  8. Highly scalable linear solvers on thousands of processors.

    SciTech Connect

    Domino, Stefan Paul; Karlin, Ian; Siefert, Christopher; Hu, Jonathan Joseph; Robinson, Allen Conrad; Tuminaro, Raymond Stephen

    2009-09-01

    In this report we summarize research into new parallel algebraic multigrid (AMG) methods. We first provide a introduction to parallel AMG. We then discuss our research in parallel AMG algorithms for very large scale platforms. We detail significant improvements in the AMG setup phase to a matrix-matrix multiplication kernel. We present a smoothed aggregation AMG algorithm with fewer communication synchronization points, and discuss its links to domain decomposition methods. Finally, we discuss a multigrid smoothing technique that utilizes two message passing layers for use on multicore processors.

  9. Optoelectronic parallel-matching architecture: architecture description, performance estimation, and prototype demonstration.

    PubMed

    Kagawa, K; Nitta, K; Ogura, Y; Tanida, J; Ichioka, Y

    2001-01-10

    We propose an optoelectronic parallel-matching architecture (PMA) that provides powerful processing capabilities in global processing compared with conventional parallel-computing architectures. The PMA is composed of a global processor called a parallel-matching (PM) module and multiple processing elements (PE's). The PM module is implemented by a large-fan-out free-space optical interconnection and a PM smart-pixel array (PM-SPA). In the proposed architecture, by means of the PM module each PE can monitor the other PE's by use of several kinds of global data matching as well as interprocessor communication. Theoretical evaluation of the performance shows that the proposed PMA provides tremendous improvement in global processing. A prototype demonstrator of the PM module is constructed on the basis of state-of-the-art optoelectronic devices and a diffractive optical element. The prototype is assumed for use in a multiple-processor system composed of 4 x 4 PE's that are completely connected through bit-serial optical communication channels. The PM-SPA is emulated by a complex programmable device and a complementary metal-oxide semiconductor photodetector array. On the prototype demonstrator the fundamental operations of the PM module were verified at 15 MHz.

  10. A scalable parallel open architecture data acquisition system for low to high rate experiments, test beams and all SSC (Superconducting Super Collider) detectors

    SciTech Connect

    Barsotti, E.; Booth, A.; Bowden, M.; Swoboda, C. ); Lockyer, N.; VanBerg, R. )

    1989-12-01

    A new era of high-energy physics research is beginning requiring accelerators with much higher luminosities and interaction rates in order to discover new elementary particles. As a consequences, both orders of magnitude higher data rates from the detector and online processing power, well beyond the capabilities of current high energy physics data acquisition systems, are required. This paper describes a new data acquisition system architecture which draws heavily from the communications industry, is totally parallel (i.e., without any bottlenecks), is capable of data rates of hundreds of GigaBytes per second from the detector and into an array of online processors (i.e., processor farm), and uses an open systems architecture to guarantee compatibility with future commercially available online processor farms. The main features of the system architecture are standard interface ICs to detector subsystems wherever possible, fiber optic digital data transmission from the near-detector electronics, a self-routing parallel event builder, and the use of industry-supported and high-level language programmable processors in the proposed BCD system for both triggers and online filters. A brief status report of an ongoing project at Fermilab to build the self-routing parallel event builder will also be given in the paper. 3 figs., 1 tab.

  11. Survey of new vector computers: The CRAY 1S from CRAY research; the CYBER 205 from CDC and the parallel computer from ICL - architecture and programming

    NASA Technical Reports Server (NTRS)

    Gentzsch, W.

    1982-01-01

    Problems which can arise with vector and parallel computers are discussed in a user oriented context. Emphasis is placed on the algorithms used and the programming techniques adopted. Three recently developed supercomputers are examined and typical application examples are given in CRAY FORTRAN, CYBER 205 FORTRAN and DAP (distributed array processor) FORTRAN. The systems performance is compared. The addition of parts of two N x N arrays is considered. The influence of the architecture on the algorithms and programming language is demonstrated. Numerical analysis of magnetohydrodynamic differential equations by an explicit difference method is illustrated, showing very good results for all three systems. The prognosis for supercomputer development is assessed.

  12. Advances in Domain Mapping of Massively Parallel Scientific Computations

    SciTech Connect

    Leland, Robert W.; Hendrickson, Bruce A.

    2015-10-01

    One of the most important concerns in parallel computing is the proper distribution of workload across processors. For most scientific applications on massively parallel machines, the best approach to this distribution is to employ data parallelism; that is, to break the datastructures supporting a computation into pieces and then to assign those pieces to different processors. Collectively, these partitioning and assignment tasks comprise the domain mapping problem.

  13. Electrostatically focused addressable field emission array chips (AFEA's) for high-speed massively parallel maskless digital E-beam direct write lithography and scanning electron microscopy

    DOEpatents

    Thomas, Clarence E.; Baylor, Larry R.; Voelkl, Edgar; Simpson, Michael L.; Paulus, Michael J.; Lowndes, Douglas H.; Whealton, John H.; Whitson, John C.; Wilgen, John B.

    2002-12-24

    Systems and methods are described for addressable field emission array (AFEA) chips. A method of operating an addressable field-emission array, includes: generating a plurality of electron beams from a pluralitly of emitters that compose the addressable field-emission array; and focusing at least one of the plurality of electron beams with an on-chip electrostatic focusing stack. The systems and methods provide advantages including the avoidance of space-charge blow-up.

  14. Green Secure Processors: Towards Power-Efficient Secure Processor Design

    NASA Astrophysics Data System (ADS)

    Chhabra, Siddhartha; Solihin, Yan

    With the increasing wealth of digital information stored on computer systems today, security issues have become increasingly important. In addition to attacks targeting the software stack of a system, hardware attacks have become equally likely. Researchers have proposed Secure Processor Architectures which utilize hardware mechanisms for memory encryption and integrity verification to protect the confidentiality and integrity of data and computation, even from sophisticated hardware attacks. While there have been many works addressing performance and other system level issues in secure processor design, power issues have largely been ignored. In this paper, we first analyze the sources of power (energy) increase in different secure processor architectures. We then present a power analysis of various secure processor architectures in terms of their increase in power consumption over a base system with no protection and then provide recommendations for designs that offer the best balance between performance and power without compromising security. We extend our study to the embedded domain as well. We also outline the design of a novel hybrid cryptographic engine that can be used to minimize the power consumption for a secure processor. We believe that if secure processors are to be adopted in future systems (general purpose or embedded), it is critically important that power issues are considered in addition to performance and other system level issues. To the best of our knowledge, this is the first work to examine the power implications of providing hardware mechanisms for security.

  15. Acousto-optic/CCD real-time SAR data processor

    NASA Technical Reports Server (NTRS)

    Psaltis, D.

    1983-01-01

    The SAR processor which uses an acousto-optic device as the input electronic-to-optical transducer and a 2-D CCD image sensor, which is operated in the time-delay-and-integrate (TDI) mode is presented. The CCD serves as the optical detector, and it simultaneously operates as an array of optically addressed correlators. The lines of the focused SAR image form continuously (at the radar PRF) at the final row of the CCD. The principles of operation of this processor, its performance characteristics, the state-of-the-art of the devices used and experimental results are outlined. The methods by which this processor can be made flexible so that it can be dynamically adapted to changing SAR geometries is discussed.

  16. Distributed job scheduling in SCI Local Area MultiProcessors

    SciTech Connect

    Agasaveeran, S.; Li, Qiang

    1996-12-31

    Local Area MultiProcessors (LAMP) is a network of personal workstations with distributed shared physical memory provided by high performance technologies such as SCI. LAMP is more tightly coupled than the traditional local area networks (LAN) but is more loosely coupled than the bus based multiprocessors. This paper presents a distributed scheduling algorithm which exploits the distributed shared memory in SCI-LAMP to schedule the idle remote processors among the requesting workstations. It considers fairness by allocating remote processing capacity to the requesting workstations based on their priorities according to the decay-usage scheduling approach. The performance of the algorithm in scheduling both sequential and parallel jobs is evaluated by simulation. It is found that the higher priority nodes achieve faster job response times and higher speedups than that of the lower priority nodes. Lower scheduling overhead allows finer granularity of remote processors sharing than in LAN.

  17. Selectable Lightweight Attack Munition Operating Component of the Gate Array

    DTIC Science & Technology

    1991-04-01

    B Timer String 3 Time Out 4 Clock Check 4 Self Check 4 Input Processor 5 Antidisturbance Processor 6 Safety Pin Processor 6 Output Control 7 SLAM Gate...timer string 11 4 Time out 12 5 Clock check 13 6 Self check 14 7 Input processor 15 8 Antidisturbance processor 16 9 Safety pin processor 17 10...Output control 18 11 SLAM gate array 19 it to become activated. At this point, the system performs a self-test. When the safety pin is removed, the timing

  18. Prototyping Parallel and Distributed Programs in Proteus

    DTIC Science & Technology

    1990-10-01

    Cole90, Gibb89]. " Highly-parallel processors - Applications for highly-parallel machines such as the CM- 2 or the iPSC are programmed using data...Programming, (Prentice-Hall, Englewood Cliffs, NJ) 1990. [Gibb89] Gibbons , P.B., "A more practical PRAM model", in: Proceedings of the First ACM

  19. Partitioning in parallel processing of production systems

    SciTech Connect

    Oflazer, K.

    1987-01-01

    This thesis presents research on certain issues related to parallel processing of production systems. It first presents a parallel production system interpreter that has been implemented on a four-processor multiprocessor. This parallel interpreter is based on Forgy's OPS5 interpreter and exploits production-level parallelism in production systems. Runs on the multiprocessor system indicate that it is possible to obtain speed-up of around 1.7 in the match computation for certain production systems when productions are split into three sets that are processed in parallel. The next issue addressed is that of partitioning a set of rules to processors in a parallel interpreter with production-level parallelism, and the extent of additional improvement in performance. The partitioning problem is formulated and an algorithm for approximate solutions is presented. The thesis next presents a parallel processing scheme for OPS5 production systems that allows some redundancy in the match computation. This redundancy enables the processing of a production to be divided into units of medium granularity each of which can be processed in parallel. Subsequently, a parallel processor architecture for implementing the parallel processing algorithm is presented.

  20. ENABLE -- A systolic 2nd level trigger processor for track finding and e/[pi] discrimination for ATLAS/LHC

    SciTech Connect

    Klefenz, F.; Noffz, K.H.; Zoz, R. . Lehrstuhl fuer Informatik V); Maenner, R. . Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen)

    1994-08-01

    The Enable Machine is a systolic 2nd level trigger processor for the transition radiation detector (TRD) of ATLAS/LHC. It is developed within the EAST/RD-11 collaboration at CERN. The task of the processor is to find electron tracks and to reject pion tracks according to the EAST benchmark algorithm in less than 10[mu]s. Track are identified by template matching in a ([psi],z) region of interest (RoI) selected by a 1st level trigger. In the ([psi],z) plane tracks of constant curvature are straight lines. The relevant lines form mask templates. Track identification is done by histogramming the coincidences of the templates and the RoI data for each possible track. The Enable Machine is an array processor that handles tracks of the same slope in parallel, and tracks of different slope in a pipeline. It is composed of two units, the Enable histogrammer unit and the Enable z/[psi]-board. The interface daughter board is equipped with a HIPPI-interface developed at JINR/-Dubna, and Xilinx 'corner turning' data converter chips. Enable uses programmable gate arrays (XILINX) for histogramming and synchronous SRAMs for pattern storage. With a clock rate of 40 MHz the trigger decision time is 6.5 [mu]s and the latency 7.0 [mu]s. The Enable machine is scalable in the RoI size as well as in the number of tracks processed. It can be adapted to different recognition tasks and detector setups. The prototype of the Enable Machine has been tested in a beam time of the RD6 collaboration at CERN in October 1993.

  1. Meteorological Processors and Accessory Programs

    EPA Pesticide Factsheets

    Surface and upper air data, provided by NWS, are important inputs for air quality models. Before these data are used in some of the EPA dispersion models, meteorological processors are used to manipulate the data.

  2. Benchmarking NWP Kernels on Multi- and Many-core Processors

    NASA Astrophysics Data System (ADS)

    Michalakes, J.; Vachharajani, M.

    2008-12-01

    Increased computing power for weather, climate, and atmospheric science has provided direct benefits for defense, agriculture, the economy, the environment, and public welfare and convenience. Today, very large clusters with many thousands of processors are allowing scientists to move forward with simulations of unprecedented size. But time-critical applications such as real-time forecasting or climate prediction need strong scaling: faster nodes and processors, not more of them. Moreover, the need for good cost- performance has never been greater, both in terms of performance per watt and per dollar. For these reasons, the new generations of multi- and many-core processors being mass produced for commercial IT and "graphical computing" (video games) are being scrutinized for their ability to exploit the abundant fine- grain parallelism in atmospheric models. We present results of our work to date identifying key computational kernels within the dynamics and physics of a large community NWP model, the Weather Research and Forecast (WRF) model. We benchmark and optimize these kernels on several different multi- and many-core processors. The goals are to (1) characterize and model performance of the kernels in terms of computational intensity, data parallelism, memory bandwidth pressure, memory footprint, etc. (2) enumerate and classify effective strategies for coding and optimizing for these new processors, (3) assess difficulties and opportunities for tool or higher-level language support, and (4) establish a continuing set of kernel benchmarks that can be used to measure and compare effectiveness of current and future designs of multi- and many-core processors for weather and climate applications.

  3. Parallel processing architecture for H.264 deblocking filter on multi-core platforms

    NASA Astrophysics Data System (ADS)

    Prasad, Durga P.; Sonachalam, Sekar; Kunchamwar, Mangesh K.; Gunupudi, Nageswara Rao

    2012-03-01

    Massively parallel computing (multi-core) chips offer outstanding new solutions that satisfy the increasing demand for high resolution and high quality video compression technologies such as H.264. Such solutions not only provide exceptional quality but also efficiency, low power, and low latency, previously unattainable in software based designs. While custom hardware and Application Specific Integrated Circuit (ASIC) technologies may achieve lowlatency, low power, and real-time performance in some consumer devices, many applications require a flexible and scalable software-defined solution. The deblocking filter in H.264 encoder/decoder poses difficult implementation challenges because of heavy data dependencies and the conditional nature of the computations. Deblocking filter implementations tend to be fixed and difficult to reconfigure for different needs. The ability to scale up for higher quality requirements such as 10-bit pixel depth or a 4:2:2 chroma format often reduces the throughput of a parallel architecture designed for lower feature set. A scalable architecture for deblocking filtering, created with a massively parallel processor based solution, means that the same encoder or decoder will be deployed in a variety of applications, at different video resolutions, for different power requirements, and at higher bit-depths and better color sub sampling patterns like YUV, 4:2:2, or 4:4:4 formats. Low power, software-defined encoders/decoders may be implemented using a massively parallel processor array, like that found in HyperX technology, with 100 or more cores and distributed memory. The large number of processor elements allows the silicon device to operate more efficiently than conventional DSP or CPU technology. This software programing model for massively parallel processors offers a flexible implementation and a power efficiency close to that of ASIC solutions. This work describes a scalable parallel architecture for an H.264 compliant deblocking

  4. A Parallel Vector Machine for the PM Programming Language

    NASA Astrophysics Data System (ADS)

    Bellerby, Tim

    2016-04-01

    PM is a new programming language which aims to make the writing of computational geoscience models on parallel hardware accessible to scientists who are not themselves expert parallel programmers. It is based around the concept of communicating operators: language constructs that enable variables local to a single invocation of a parallelised loop to be viewed as if they were arrays spanning the entire loop domain. This mechanism enables different loop invocations (which may or may not be executing on different processors) to exchange information in a manner that extends the successful Communicating Sequential Processes idiom from single messages to collective communication. Communicating operators avoid the additional synchronisation mechanisms, such as atomic variables, required when programming using the Partitioned Global Address Space (PGAS) paradigm. Using a single loop invocation as the fundamental unit of concurrency enables PM to uniformly represent different levels of parallelism from vector operations through shared memory systems to distributed grids. This paper describes an implementation of PM based on a vectorised virtual machine. On a single processor node, concurrent operations are implemented using masked vector operations. Virtual machine instructions operate on vectors of values and may be unmasked, masked using a Boolean field, or masked using an array of active vector cell locations. Conditional structures (such as if-then-else or while statement implementations) calculate and apply masks to the operations they control. A shift in mask representation from Boolean to location-list occurs when active locations become sufficiently sparse. Parallel loops unfold data structures (or vectors of data structures for nested loops) into vectors of values that may additionally be distributed over multiple computational nodes and then split into micro-threads compatible with the size of the local cache. Inter-node communication is accomplished using

  5. Parallel Markov chain Monte Carlo simulations.

    PubMed

    Ren, Ruichao; Orkoulas, G

    2007-06-07

    With strict detailed balance, parallel Monte Carlo simulation through domain decomposition cannot be validated with conventional Markov chain theory, which describes an intrinsically serial stochastic process. In this work, the parallel version of Markov chain theory and its role in accelerating Monte Carlo simulations via cluster computing is explored. It is shown that sequential updating is the key to improving efficiency in parallel simulations through domain decomposition. A parallel scheme is proposed to reduce interprocessor communication or synchronization, which slows down parallel simulation with increasing number of processors. Parallel simulation results for the two-dimensional lattice gas model show substantial reduction of simulation time for systems of moderate and large size.

  6. Parallel Markov chain Monte Carlo simulations

    NASA Astrophysics Data System (ADS)

    Ren, Ruichao; Orkoulas, G.

    2007-06-01

    With strict detailed balance, parallel Monte Carlo simulation through domain decomposition cannot be validated with conventional Markov chain theory, which describes an intrinsically serial stochastic process. In this work, the parallel version of Markov chain theory and its role in accelerating Monte Carlo simulations via cluster computing is explored. It is shown that sequential updating is the key to improving efficiency in parallel simulations through domain decomposition. A parallel scheme is proposed to reduce interprocessor communication or synchronization, which slows down parallel simulation with increasing number of processors. Parallel simulation results for the two-dimensional lattice gas model show substantial reduction of simulation time for systems of moderate and large size.

  7. Parallel distributed free-space optoelectronic computer engine using flat plug-on-top optics package

    NASA Astrophysics Data System (ADS)

    Berger, Christoph; Ekman, Jeremy T.; Wang, Xiaoqing; Marchand, Philippe J.; Spaanenburg, Henk; Kiamilev, Fouad E.; Esener, Sadik C.

    2000-05-01

    We report about ongoing work on a free-space optical interconnect system, which will demonstrate a Fast Fourier Transformation calculation, distributed among six processor chips. Logically, the processors are arranged in two linear chains, where each element communicates optically with its nearest neighbors. Physically, the setup consists of a large motherboard, several multi-chip carrier modules, which hold the processor/driver chips and the optoelectronic chips (arrays of lasers and detectors), and several plug-on-top optics modules, which provide the optical links between the chip carrier modules. The system design tries to satisfy numerous constraints, such as compact size, potential for mass-production, suitability for large arrays (up to 1024 parallel channels), compatibility with standard electronics fabrication and packaging technology, potential for active misalignment compensation by integration MEMS technology, and suitability for testing different imaging topologies. We present the system architecture together with details of key components and modules, and report on first experiences with prototype modules of the setup.

  8. 40 CFR 791.45 - Processors.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ...) DATA REIMBURSEMENT Basis for Proposed Order § 791.45 Processors. (a) Generally, processors will be... processors as well as manufacturers to assume direct testing and data reimbursement responsibilities. (2... 40 Protection of Environment 31 2010-07-01 2010-07-01 true Processors. 791.45 Section...

  9. ELIPS: Toward a Sensor Fusion Processor on a Chip

    NASA Technical Reports Server (NTRS)

    Daud, Taher; Stoica, Adrian; Tyson, Thomas; Li, Wei-te; Fabunmi, James

    1998-01-01

    The paper presents the concept and initial tests from the hardware implementation of a low-power, high-speed reconfigurable sensor fusion processor. The Extended Logic Intelligent Processing System (ELIPS) processor is developed to seamlessly combine rule-based systems, fuzzy logic, and neural networks to achieve parallel fusion of sensor in compact low power VLSI. The first demonstration of the ELIPS concept targets interceptor functionality; other applications, mainly in robotics and autonomous systems are considered for the future. The main assumption behind ELIPS is that fuzzy, rule-based and neural forms of computation can serve as the main primitives of an "intelligent" processor. Thus, in the same way classic processors are designed to optimize the hardware implementation of a set of fundamental operations, ELIPS is developed as an efficient implementation of computational intelligence primitives, and relies on a set of fuzzy set, fuzzy inference and neural modules, built in programmable analog hardware. The hardware programmability allows the processor to reconfigure into different machines, taking the most efficient hardware implementation during each phase of information processing. Following software demonstrations on several interceptor data, three important ELIPS building blocks (a fuzzy set preprocessor, a rule-based fuzzy system and a neural network) have been fabricated in analog VLSI hardware and demonstrated microsecond-processing times.

  10. PVM Enhancement for Beowulf Multiple-Processor Nodes

    NASA Technical Reports Server (NTRS)

    Springer, Paul

    2006-01-01

    A recent version of the Parallel Virtual Machine (PVM) computer program has been enhanced to enable use of multiple processors in a single node of a Beowulf system (a cluster of personal computers that runs the Linux operating system). A previous version of PVM had been enhanced by addition of a software port, denoted BEOLIN, that enables the incorporation of a Beowulf system into a larger parallel processing system administered by PVM, as though the Beowulf system were a single computer in the larger system. BEOLIN spawns tasks on (that is, automatically assigns tasks to) individual nodes within the cluster. However, BEOLIN does not enable the use of multiple processors in a single node. The present enhancement adds support for a parameter in the PVM command line that enables the user to specify which Internet Protocol host address the code should use in communicating with other Beowulf nodes. This enhancement also provides for the case in which each node in a Beowulf system contains multiple processors. In this case, by making multiple references to a single node, the user can cause the software to spawn multiple tasks on the multiple processors in that node.

  11. Efficient Multicriteria Protein Structure Comparison on Modern Processor Architectures.

    PubMed

    Sharma, Anuj; Manolakos, Elias S

    2015-01-01

    Fast increasing computational demand for all-to-all protein structures comparison (PSC) is a result of three confounding factors: rapidly expanding structural proteomics databases, high computational complexity of pairwise protein comparison algorithms, and the trend in the domain towards using multiple criteria for protein structures comparison (MCPSC) and combining results. We have developed a software framework that exploits many-core and multicore CPUs to implement efficient parallel MCPSC in modern processors based on three popular PSC methods, namely, TMalign, CE, and USM. We evaluate and compare the performance and efficiency of the two parallel MCPSC implementations using Intel's experimental many-core Single-Chip Cloud Computer (SCC) as well as Intel's Core i7 multicore processor. We show that the 48-core SCC is more efficient than the latest generation Core i7, achieving a speedup factor of 42 (efficiency of 0.9), making many-core processors an exciting emerging technology for large-scale structural proteomics. We compare and contrast the performance of the two processors on several datasets and also show that MCPSC outperforms its component methods in grouping related domains, achieving a high F-measure of 0.91 on the benchmark CK34 dataset. The software implementation for protein structure comparison using the three methods and combined MCPSC, along with the developed underlying rckskel algorithmic skeletons library, is available via GitHub.

  12. Efficient Multicriteria Protein Structure Comparison on Modern Processor Architectures

    PubMed Central

    Sharma, Anuj; Manolakos, Elias S.

    2015-01-01

    Fast increasing computational demand for all-to-all protein structures comparison (PSC) is a result of three confounding factors: rapidly expanding structural proteomics databases, high computational complexity of pairwise protein comparison algorithms, and the trend in the domain towards using multiple criteria for protein structures comparison (MCPSC) and combining results. We have developed a software framework that exploits many-core and multicore CPUs to implement efficient parallel MCPSC in modern processors based on three popular PSC methods, namely, TMalign, CE, and USM. We evaluate and compare the performance and efficiency of the two parallel MCPSC implementations using Intel's experimental many-core Single-Chip Cloud Computer (SCC) as well as Intel's Core i7 multicore processor. We show that the 48-core SCC is more efficient than the latest generation Core i7, achieving a speedup factor of 42 (efficiency of 0.9), making many-core processors an exciting emerging technology for large-scale structural proteomics. We compare and contrast the performance of the two processors on several datasets and also show that MCPSC outperforms its component methods in grouping related domains, achieving a high F-measure of 0.91 on the benchmark CK34 dataset. The software implementation for protein structure comparison using the three methods and combined MCPSC, along with the developed underlying rckskel algorithmic skeletons library, is available via GitHub. PMID:26605332

  13. Scalable Parallel Algebraic Multigrid Solvers

    SciTech Connect

    Bank, R; Lu, S; Tong, C; Vassilevski, P

    2005-03-23

    The authors propose a parallel algebraic multilevel algorithm (AMG), which has the novel feature that the subproblem residing in each processor is defined over the entire partition domain, although the vast majority of unknowns for each subproblem are associated with the partition owned by the corresponding processor. This feature ensures that a global coarse description of the problem is contained within each of the subproblems. The advantages of this approach are that interprocessor communication is minimized in the solution process while an optimal order of convergence rate is preserved; and the speed of local subproblem solvers can be maximized using the best existing sequential algebraic solvers.

  14. Scheduling Tasks In Parallel Processing

    NASA Technical Reports Server (NTRS)

    Price, Camille C.; Salama, Moktar A.

    1989-01-01

    Algorithms sought to minimize time and cost of computation. Report describes research on scheduling of computations tasks in system of multiple identical data processors operating in parallel. Computational intractability requires use of suboptimal heuristic algorithms. First algorithm called "list heuristic", variation of classical list scheduling. Second algorithm called "cluster heuristic" applied to tightly coupled tasks and consists of four phases. Third algorithm called "exchange heuristic", iterative-improvement algorithm beginning with initial feasible assignment of tasks to processors and periods of time. Fourth algorithm is iterative one for optimal assignment of tasks and based on concept called "simulated annealing" because of mathematical resemblance to aspects of physical annealing processes.

  15. Fast combinatorial RNS processors for DSP applications

    SciTech Connect

    Di Claudio, E.D.; Piazza, F.; Orlandi, G.

    1995-05-01

    It is known that RNS VLSI processors can parallelize fixed-point addition and multiplication operations by the use of the Chinese Remainder Theorem (CRT). The required modular operations, however, must use specialized hardware whose design and implementation can create several problems. In this paper a modified residue arithmetic, called pseudo-RNS is introduced in order to alleviate some of the RNS problems when Digital Signal Processing (DSP) structures are implemented. Pseudo-RNS requires only the use of modified binary processors and exhibits a speed performance comparable with other RNS traditional approaches. Some applications of the pseudo-RNS to common DSP architectures, such as multipliers and filters, are also presented in this paper. They are compared in terms of the Area-Time Square product versus other RNS and weighted binary structures. It is proven that existing combinatorial or look-up table approaches for RNS are tailored to small designs or special applications, while the pseudo-RNS approach remains competitive also for complex systems. 32 refs.

  16. Multigrid on massively parallel architectures

    SciTech Connect

    Falgout, R D; Jones, J E

    1999-09-17

    The scalable implementation of multigrid methods for machines with several thousands of processors is investigated. Parallel performance models are presented for three different structured-grid multigrid algorithms, and a description is given of how these models can be used to guide implementation. Potential pitfalls are illustrated when moving from moderate-sized parallelism to large-scale parallelism, and results are given from existing multigrid codes to support the discussion. Finally, the use of mixed programming models is investigated for multigrid codes on clusters of SMPs.

  17. Single-Point Access to Data Distributed on Many Processors

    NASA Technical Reports Server (NTRS)

    James, Mark

    2007-01-01

    A description of the functions and data structures is defined that would be necessary to implement the Chapel concept of distributions, domains, allocation, access, and interfaces to the compiler for transformations from Chapel source to their run-time implementation for these concepts. A complete set of object-oriented operators is defined that enables one to access elements of a distributed array through regular arithmetic index sets, giving the programmer the illusion that all the elements are collocated on a single processor. This means that arbitrary regions of the arrays can be fragmented and distributed across multiple processors with a single point of access. This is important because it can significantly improve programmer productivity by allowing the programmers to concentrate on the high-level details of the algorithm without worrying about the efficiency and communication details of the underlying representation.

  18. Processing pharus data with the generic SAR processor

    SciTech Connect

    Otten, M.P.G.

    1996-11-01

    The Generic SAR Processor (GSP) is a SAR processing environment created to process airborne and spaceborne SAR data with a maximum amount of flexibility, while at the same time providing a user friendly and powerful environment for handling and analyzing SAR, including polarimetric calibration. PHARUS is an airborne polarimetric C-band SAR, utilizing an active (solid state) phased array. The absence of mechanical antenna stabilization, the use of electronic beam steering, in combination with high PRF, polarimetric operation, under motion condition which can be severe, requires a very large flexibility of the SAR processor. The GSP is designed to handle this type of SAR data with a very flexible motion compensation-, azimuth compression-, and radiometric correction approach. First experiences with the processing of PHARUS data show that this is a valid approach to obtain high quality polarimetric imagery with a phased array SAR. 4 refs., 5 figs.

  19. A test vector generator for a radar signal processor

    NASA Astrophysics Data System (ADS)

    Robins, C. B.

    1991-02-01

    This report documents the test vector generator (TVG) system developed for the purpose of testing a radar signal processor. This system simulates an eight channel radar receiver by providing input data for testing the signal processor test bed. The TVG system outputs 128-bit wide data samples at variable rates up to and including 10 million samples per second. The VTG memory array is one million samples deep. Variably sized output vectors can be addressed within the memory array and the vectors can be concatenated, repeated, and reshuffled in real time under the control of a single board computer. The TVG is seen having applications on a variety of programs. Discussions of adapting and scaling the system to these other applications are presented.

  20. Parallelization of the Pipelined Thomas Algorithm

    NASA Technical Reports Server (NTRS)

    Povitsky, A.

    1998-01-01

    In this study the following questions are addressed. Is it possible to improve the parallelization efficiency of the Thomas algorithm? How should the Thomas algorithm be formulated in order to get solved lines that are used as data for other computational tasks while processors are idle? To answer these questions, two-step pipelined algorithms (PAs) are introduced formally. It is shown that the idle processor time is invariant with respect to the order of backward and forward steps in PAs starting from one outermost processor. The advantage of PAs starting from two outermost processors is small. Versions of the pipelined Thomas algorithms considered here fall into the category of PAs. These results show that the parallelization efficiency of the Thomas algorithm cannot be improved directly. However, the processor idle time can be used if some data has been computed by the time processors become idle. To achieve this goal the Immediate Backward pipelined Thomas Algorithm (IB-PTA) is developed in this article. The backward step is computed immediately after the forward step has been completed for the first portion of lines. This enables the completion of the Thomas algorithm for some of these lines before processors become idle. An algorithm for generating a static processor schedule recursively is developed. This schedule is used to switch between forward and backward computations and to control communications between processors. The advantage of the IB-PTA over the basic PTA is the presence of solved lines, which are available for other computations, by the time processors become idle.

  1. Asynchronous parallel status comparator

    DOEpatents

    Arnold, J.W.; Hart, M.M.

    1992-12-15

    Disclosed is an apparatus for matching asynchronously received signals and determining whether two or more out of a total number of possible signals match. The apparatus comprises, in one embodiment, an array of sensors positioned in discrete locations and in communication with one or more processors. The processors will receive signals if the sensors detect a change in the variable sensed from a nominal to a special condition and will transmit location information in the form of a digital data set to two or more receivers. The receivers collect, read, latch and acknowledge the data sets and forward them to decoders that produce an output signal for each data set received. The receivers also periodically reset the system following each scan of the sensor array. A comparator then determines if any two or more, as specified by the user, of the output signals corresponds to the same location. A sufficient number of matches produces a system output signal that activates a system to restore the array to its nominal condition. 4 figs.

  2. Asynchronous parallel status comparator

    DOEpatents

    Arnold, Jeffrey W.; Hart, Mark M.

    1992-01-01

    Apparatus for matching asynchronously received signals and determining whether two or more out of a total number of possible signals match. The apparatus comprises, in one embodiment, an array of sensors positioned in discrete locations and in communication with one or more processors. The processors will receive signals if the sensors detect a change in the variable sensed from a nominal to a special condition and will transmit location information in the form of a digital data set to two or more receivers. The receivers collect, read, latch and acknowledge the data sets and forward them to decoders that produce an output signal for each data set received. The receivers also periodically reset the system following each scan of the sensor array. A comparator then determines if any two or more, as specified by the user, of the output signals corresponds to the same location. A sufficient number of matches produces a system output signal that activates a system to restore the array to its nominal condition.

  3. Implementation of a Configurable Fault Tolerant Processor (CFTP)

    DTIC Science & Technology

    2003-03-01

    Programmable Gate Array ( FPGA ), Single Event Upset (SEU), 16-Bit RISC 16. PRICE CODE 17. SECURITY CLASSIFICATION OF REPORT Unclassified 18...A: CFTP SCHEMATICS AND CODE Appendix A contains all the schematics and VHDL code files that were specific- ally built for this thesis. It does not...associated KDLX files, as well as the state machine design from which the Interrup VHDL code was derived. The VHDL files for the KDLX processor were not

  4. Scalable load balancing for massively parallel distributed Monte Carlo particle transport

    SciTech Connect

    O'Brien, M. J.; Brantley, P. S.; Joy, K. I.

    2013-07-01

    In order to run computer simulations efficiently on massively parallel computers with hundreds of thousands or millions of processors, care must be taken that the calculation is load balanced across the processors. Examining the workload of every processor leads to an unscalable algorithm, with run time at least as large as O(N), where N is the number of processors. We present a scalable load balancing algorithm, with run time 0(log(N)), that involves iterated processor-pair-wise balancing steps, ultimately leading to a globally balanced workload. We demonstrate scalability of the algorithm up to 2 million processors on the Sequoia supercomputer at Lawrence Livermore National Laboratory. (authors)

  5. Bipartite memory network architectures for parallel processing

    SciTech Connect

    Smith, W.; Kale, L.V. . Dept. of Computer Science)

    1990-01-01

    Parallel architectures are boradly classified as either shared memory or distributed memory architectures. In this paper, the authors propose a third family of architectures, called bipartite memory network architectures. In this architecture, processors and memory modules constitute a bipartite graph, where each processor is allowed to access a small subset of the memory modules, and each memory module allows access from a small set of processors. The architecture is particularly suitable for computations requiring dynamic load balancing. The authors explore the properties of this architecture by examining the Perfect Difference set based topology for the graph. Extensions of this topology are also suggested.

  6. CRBLASTER: A Fast Parallel-Processing Program for Cosmic Ray Rejection in Space-Based Observations

    NASA Astrophysics Data System (ADS)

    Mighell, K.

    Many astronomical image analysis tasks are based on algorithms that can be described as being embarrassingly parallel - where the analysis of one subimage generally does not affect the analysis of another subimage. Yet few parallel-processing astrophysical image-analysis programs exist that can easily take full advantage of today's fast multi-core servers costing a few thousands of dollars. One reason for the shortage of state-of-the-art parallel processing astrophysical image-analysis codes is that the writing of parallel codes has been perceived to be difficult. I describe a new fast parallel-processing image-analysis program called CRBLASTER which does cosmic ray rejection using van Dokkum's L.A.Cosmic algorithm. CRBLASTER is written in C using the industry standard Message Passing Interface library. Processing a single 800 x 800 Hubble Space Telescope Wide-Field Planetary Camera 2 (WFPC2) image takes 1.9 seconds using 4 processors on an Apple Xserve with two dual-core 3.0-GHz Intel Xeons; the efficiency of the program running with the 4 cores is 82%. The code has been designed to be used as a software framework for the easy development of parallel-processing image-analysis programs using embarrassing parallel algorithms; all that needs to be done is to replace the core image processing task (in this case the C function that performs the L.A.Cosmic algorithm) with an alternative image analysis task based on a single processor algorithm. I describe the design and implementation of the program and then discuss how it could possibly be used to quickly do time-critical analysis applications such as those involved with space surveillance or do complex calibration tasks as part of the pipeline processing of images from large focal plane arrays.

  7. Sequence information signal processor for local and global string comparisons

    DOEpatents

    Peterson, John C.; Chow, Edward T.; Waterman, Michael S.; Hunkapillar, Timothy J.

    1997-01-01

    A sequence information signal processing integrated circuit chip designed to perform high speed calculation of a dynamic programming algorithm based upon the algorithm defined by Waterman and Smith. The signal processing chip of the present invention is designed to be a building block of a linear systolic array, the performance of which can be increased by connecting additional sequence information signal processing chips to the array. The chip provides a high speed, low cost linear array processor that can locate highly similar global sequences or segments thereof such as contiguous subsequences from two different DNA or protein sequences. The chip is implemented in a preferred embodiment using CMOS VLSI technology to provide the equivalent of about 400,000 transistors or 100,000 gates. Each chip provides 16 processing elements, and is designed to provide 16 bit, two's compliment operation for maximum score precision of between -32,768 and +32,767. It is designed to provide a comparison between sequences as long as 4,194,304 elements without external software and between sequences of unlimited numbers of elements with the aid of external software. Each sequence can be assigned different deletion and insertion weight functions. Each processor is provided with a similarity measure device which is independently variable. Thus, each processor can contribute to maximum value score calculation using a different similarity measure.

  8. An Efficient Solution Method for Multibody Systems with Loops Using Multiple Processors

    NASA Technical Reports Server (NTRS)

    Ghosh, Tushar K.; Nguyen, Luong A.; Quiocho, Leslie J.

    2015-01-01

    This paper describes a multibody dynamics algorithm formulated for parallel implementation on multiprocessor computing platforms using the divide-and-conquer approach. The system of interest is a general topology of rigid and elastic articulated bodies with or without loops. The algorithm divides the multibody system into a number of smaller sets of bodies in chain or tree structures, called "branches" at convenient joints called "connection points", and uses an Order-N (O (N)) approach to formulate the dynamics of each branch in terms of the unknown spatial connection forces. The equations of motion for the branches, leaving the connection forces as unknowns, are implemented in separate processors in parallel for computational efficiency, and the equations for all the unknown connection forces are synthesized and solved in one or several processors. The performances of two implementations of this divide-and-conquer algorithm in multiple processors are compared with an existing method implemented on a single processor.

  9. Parallel processing data network of master and slave transputers controlled by a serial control network

    DOEpatents

    Crosetto, Dario B.

    1996-01-01

    The present device provides for a dynamically configurable communication network having a multi-processor parallel processing system having a serial communication network and a high speed parallel communication network. The serial communication network is used to disseminate commands from a master processor (100) to a plurality of slave processors (200) to effect communication protocol, to control transmission of high density data among nodes and to monitor each slave processor's status. The high speed parallel processing network is used to effect the transmission of high density data among nodes in the parallel processing system. Each node comprises a transputer (104), a digital signal processor (114), a parallel transfer controller (106), and two three-port memory devices. A communication switch (108) within each node (100) connects it to a fast parallel hardware channel (70) through which all high density data arrives or leaves the node.

  10. Parallelized direct execution simulation of message-passing parallel programs

    NASA Technical Reports Server (NTRS)

    Dickens, Phillip M.; Heidelberger, Philip; Nicol, David M.

    1994-01-01

    As massively parallel computers proliferate, there is growing interest in findings ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing computers, parallel performance monitoring, and parallel algorithm development. In this paper we describe one solution where one directly executes the application code, but uses a discrete-event simulator to model details of the presumed parallel machine such as operating system and communication network behavior. Because this approach is computationally expensive, we are interested in its own parallelization specifically the parallelization of the discrete-event simulator. We describe methods suitable for parallelized direct execution simulation of message-passing parallel programs, and report on the performance of such a system, Large Application Parallel Simulation Environment (LAPSE), we have built on the Intel Paragon. On all codes measured to date, LAPSE predicts performance well typically within 10 percent relative error. Depending on the nature of the application code, we have observed low slowdowns (relative to natively executing code) and high relative speedups using up to 64 processors.

  11. Communication-Avoiding Parallel Recursive Algorithms for Matrix Multiplication

    DTIC Science & Technology

    2013-05-17

    given subproblem share the same m-digit suffix . After the above communication is performed, the layout of Si and Ti has parameters (n/2, P/7, s − 1...MULTIPLICATION 11 Algorithm 2 CAPS, in detail Input: A, B, are n× n matrices P = number of processors rank = processor number base-7 as an array M = local

  12. Parallel network simulations with NEURON.

    PubMed

    Migliore, M; Cannia, C; Lytton, W W; Markram, Henry; Hines, M L

    2006-10-01

    The NEURON simulation environment has been extended to support parallel network simulations. Each processor integrates the equations for its subnet over an interval equal to the minimum (interprocessor) presynaptic spike generation to postsynaptic spike delivery connection delay. The performance of three published network models with very different spike patterns exhibits superlinear speedup on Beowulf clusters and demonstrates that spike communication overhead is often less than the benefit of an increased fraction of the entire problem fitting into high speed cache. On the EPFL IBM Blue Gene, almost linear speedup was obtained up to 100 processors. Increasing one model from 500 to 40,000 realistic cells exhibited almost linear speedup on 2,000 processors, with an integration time of 9.8 seconds and communication time of 1.3 seconds. The potential for speed-ups of several orders of magnitude makes practical the running of large network simulations that could otherwise not be explored.

  13. The 2nd Symposium on the Frontiers of Massively Parallel Computations

    NASA Technical Reports Server (NTRS)

    Mills, Ronnie (Editor)

    1988-01-01

    Programming languages, computer graphics, neural networks, massively parallel computers, SIMD architecture, algorithms, digital terrain models, sort computation, simulation of charged particle transport on the massively parallel processor and image processing are among the topics discussed.

  14. Parallelization of Edge Detection Algorithm using MPI on Beowulf Cluster

    NASA Astrophysics Data System (ADS)

    Haron, Nazleeni; Amir, Ruzaini; Aziz, Izzatdin A.; Jung, Low Tan; Shukri, Siti Rohkmah

    In this paper, we present the design of parallel Sobel edge detection algorithm using Foster's methodology. The parallel algorithm is implemented using MPI message passing library and master/slave algorithm. Every processor performs the same sequential algorithm but on different part of the image. Experimental results conducted on Beowulf cluster are presented to demonstrate the performance of the parallel algorithm.

  15. Parallel Recording of Neurotransmitters Release from Chromaffin Cells Using a 10 × 10 CMOS IC Potentiostat Array with On-Chip Working Electrodes

    PubMed Central

    Kim, Brian Namghi; Herbst, Adam D.; Kim, Sung June; Minch, Bradley A.; Lindau, Manfred

    2012-01-01

    Neurotransmitter release is modulated by many drugs and molecular manipulations. We present an active CMOS-based electrochemical biosensor array with high throughput capability (100 electrodes) for on-chip amperometric measurement of neurotransmitter release. The high-throughput of the biosensor array will accelerate the data collection needed to determine statistical significance of changes produced under varying conditions, from several weeks to a few hours. The biosensor is designed and fabricated using a combination of CMOS integrated circuit (IC) technology and a photolithography process to incorporate platinum working electrodes on-chip. We demonstrate the operation of an electrode array with integrated high-gain potentiostats and output time-division multiplexing with minimum dead time for readout. The on-chip working electrodes are patterned by conformal deposition of Pt and lift-off photolithography. The conformal deposition method protects the underlying electronic circuits from contact with the electrolyte that covers the electrode array during measurement. The biosensor was validated by simultaneous measurement of amperometric currents from 100 electrodes in response to dopamine injection, which revealed the time course of dopamine diffusion along the surface of the biosensor array. The biosensor simultaneously recorded neurotransmitter release successfully from multiple individual living chromaffin cells. The biosensor was capable of resolving small and fast amperometric spikes reporting release from individual vesicle secretions. We anticipate that this device will accelerate the characterization of the modulation of neurotransmitter secretion from neuronal and endocrine cells by pharmacological and molecular manipulations of the cells. PMID:23084756

  16. Cluster Algorithm Special Purpose Processor

    NASA Astrophysics Data System (ADS)

    Talapov, A. L.; Shchur, L. N.; Andreichenko, V. B.; Dotsenko, Vl. S.

    We describe a Special Purpose Processor, realizing the Wolff algorithm in hardware, which is fast enough to study the critical behaviour of 2D Ising-like systems containing more than one million spins. The processor has been checked to produce correct results for a pure Ising model and for Ising model with random bonds. Its data also agree with the Nishimori exact results for spin glass. Only minor changes of the SPP design are necessary to increase the dimensionality and to take into account more complex systems such as Potts models.

  17. Parallel methods for the flight simulation model

    SciTech Connect

    Xiong, Wei Zhong; Swietlik, C.

    1994-06-01

    The Advanced Computer Applications Center (ACAC) has been involved in evaluating advanced parallel architecture computers and the applicability of these machines to computer simulation models. The advanced systems investigated include parallel machines with shared. memory and distributed architectures consisting of an eight processor Alliant FX/8, a twenty four processor sor Sequent Symmetry, Cray XMP, IBM RISC 6000 model 550, and the Intel Touchstone eight processor Gamma and 512 processor Delta machines. Since parallelizing a truly efficient application program for the parallel machine is a difficult task, the implementation for these machines in a realistic setting has been largely overlooked. The ACAC has developed considerable expertise in optimizing and parallelizing application models on a collection of advanced multiprocessor systems. One of aspect of such an application model is the Flight Simulation Model, which used a set of differential equations to describe the flight characteristics of a launched missile by means of a trajectory. The Flight Simulation Model was written in the FORTRAN language with approximately 29,000 lines of source code. Depending on the number of trajectories, the computation can require several hours to full day of CPU time on DEC/VAX 8650 system. There is an impetus to reduce the execution time and utilize the advanced parallel architecture computing environment available. ACAC researchers developed a parallel method that allows the Flight Simulation Model to be able to run in parallel on the multiprocessor system. For the benchmark data tested, the parallel Flight Simulation Model implemented on the Alliant FX/8 has achieved nearly linear speedup. In this paper, we describe a parallel method for the Flight Simulation Model. We believe the method presented in this paper provides a general concept for the design of parallel applications. This concept, in most cases, can be adapted to many other sequential application programs.

  18. On the relationship between parallel computation and graph embedding

    SciTech Connect

    Gupta, A.K.

    1989-01-01

    The problem of efficiently simulating an algorithm designed for an n-processor parallel machine G on an m-processor parallel machine H with n > m arises when parallel algorithms designed for an ideal size machine are simulated on existing machines which are of a fixed size. The author studies this problem when every processor of H takes over the function of a number of processors in G, and he phrases the simulation problem as a graph embedding problem. New embeddings presented address relevant issues arising from the parallel computation environment. The main focus centers around embedding complete binary trees into smaller-sized binary trees, butterflies, and hypercubes. He also considers simultaneous embeddings of r source machines into a single hypercube. Constant factors play a crucial role in his embeddings since they are not only important in practice but also lead to interesting theoretical problems. All of his embeddings minimize dilation and load, which are the conventional cost measures in graph embeddings and determine the maximum amount of time required to simulate one step of G on H. His embeddings also optimize a new cost measure called ({alpha},{beta})-utilization which characterizes how evenly the processors of H are used by the processors of G. Ideally, the utilization should be balanced (i.e., every processor of H simulates at most (n/m) processors of G) and the ({alpha},{beta})-utilization measures how far off from a balanced utilization the embedding is. He presents embeddings for the situation when some processors of G have different capabilities (e.g. memory or I/O) than others and the processors with different capabilities are to be distributed uniformly among the processors of H. Placing such conditions on an embedding results in an increase in some of the cost measures.

  19. Parallel contingency statistics with Titan.

    SciTech Connect

    Thompson, David C.; Pebay, Philippe Pierre

    2009-09-01

    This report summarizes existing statistical engines in VTK/Titan and presents the recently parallelized contingency statistics engine. It is a sequel to [PT08] and [BPRT09] which studied the parallel descriptive, correlative, multi-correlative, and principal component analysis engines. The ease of use of this new parallel engines is illustrated by the means of C++ code snippets. Furthermore, this report justifies the design of these engines with parallel scalability in mind; however, the very nature of contingency tables prevent this new engine from exhibiting optimal parallel speed-up as the aforementioned engines do. This report therefore discusses the design trade-offs we made and study performance with up to 200 processors.

  20. Rapid prototyping and evaluation of programmable SIMD SDR processors in LISA

    NASA Astrophysics Data System (ADS)

    Chen, Ting; Liu, Hengzhu; Zhang, Botao; Liu, Dongpei

    2013-03-01

    With the development of international wireless communication standards, there is an increase in computational requirement for baseband signal processors. Time-to-market pressure makes it impossible to completely redesign new processors for the evolving standards. Due to its high flexibility and low power, software defined radio (SDR) digital signal processors have been proposed as promising technology to replace traditional ASIC and FPGA fashions. In addition, there are large numbers of parallel data processed in computation-intensive functions, which fosters the development of single instruction multiple data (SIMD) architecture in SDR platform. So a new way must be found to prototype the SDR processors efficiently. In this paper we present a bit-and-cycle accurate model of programmable SIMD SDR processors in a machine description language LISA. LISA is a language for instruction set architecture which can gain rapid model at architectural level. In order to evaluate the availability of our proposed processor, three common baseband functions, FFT, FIR digital filter and matrix multiplication have been mapped on the SDR platform. Analytical results showed that the SDR processor achieved the maximum of 47.1% performance boost relative to the opponent processor.

  1. Design of a massively parallel computer using bit serial processing elements

    NASA Technical Reports Server (NTRS)

    Aburdene, Maurice F.; Khouri, Kamal S.; Piatt, Jason E.; Zheng, Jianqing

    1995-01-01

    A 1-bit serial processor designed for a parallel computer architecture is described. This processor is used to develop a massively parallel computational engine, with a single instruction-multiple data (SIMD) architecture. The computer is simulated and tested to verify its operation and to measure its performance for further development.

  2. Parallel Simulation of Subsonic Fluid Dynamics on a Cluster of Workstations.

    DTIC Science & Technology

    1994-11-01

    inside wind musical instruments. Typical simulations achieve $80\\%$ parallel efficiency (speedup/processors) using 20 HP-Apollo workstations. Detailed...TERMS AI, MIT, Artificial Intelligence, Distributed Computing, Workstation Cluster, Network, Fluid Dynamics, Musical Instruments 17. SECURITY...for example, the flow of air inside wind musical instruments. Typical simulations achieve 80% parallel efficiency (speedup/processors) using 20 HP

  3. Parallel hypergraph partitioning for scientific computing.

    SciTech Connect

    Heaphy, Robert; Devine, Karen Dragon; Catalyurek, Umit; Bisseling, Robert; Hendrickson, Bruce Alan; Boman, Erik Gunnar

    2005-07-01

    Graph partitioning is often used for load balancing in parallel computing, but it is known that hypergraph partitioning has several advantages. First, hypergraphs more accurately model communication volume, and second, they are more expressive and can better represent nonsymmetric problems. Hypergraph partitioning is particularly suited to parallel sparse matrix-vector multiplication, a common kernel in scientific computing. We present a parallel software package for hypergraph (and sparse matrix) partitioning developed at Sandia National Labs. The algorithm is a variation on multilevel partitioning. Our parallel implementation is novel in that it uses a two-dimensional data distribution among processors. We present empirical results that show our parallel implementation achieves good speedup on several large problems (up to 33 million nonzeros) with up to 64 processors on a Linux cluster.

  4. A Course on Reconfigurable Processors

    ERIC Educational Resources Information Center

    Shoufan, Abdulhadi; Huss, Sorin A.

    2010-01-01

    Reconfigurable computing is an established field in computer science. Teaching this field to computer science students demands special attention due to limited student experience in electronics and digital system design. This article presents a compact course on reconfigurable processors, which was offered at the Technische Universitat Darmstadt,…

  5. Processor Emulator with Benchmark Applications

    SciTech Connect

    Lloyd, G. Scott; Pearce, Roger; Gokhale, Maya

    2015-11-13

    A processor emulator and a suite of benchmark applications have been developed to assist in characterizing the performance of data-centric workloads on current and future computer architectures. Some of the applications have been collected from other open source projects. For more details on the emulator and an example of its usage, see reference [1].

  6. Parallel Fock matrix construction with distributed shared memory model for the FMO-MO method.

    PubMed

    Umeda, Hiroaki; Inadomi, Yuichi; Watanabe, Toshio; Yagi, Toru; Ishimoto, Takayoshi; Ikegami, Tsutomu; Tadano, Hiroto; Sakurai, Tetsuya; Nagashima, Umpei

    2010-10-01

    A parallel Fock matrix construction program for FMO-MO method has been developed with the distributed shared memory model. To construct a large-sized Fock matrix during FMO-MO calculations, a distributed parallel algorithm was designed to make full use of local memory to reduce communication, and was implemented on the Global Array toolkit. A benchmark calculation for a small system indicates that the parallelization efficiency of the matrix construction portion is as high as 93% at 1,024 processors. A large FMO-MO application on the epidermal growth factor receptor (EGFR) protein (17,246 atoms and 96,234 basis functions) was also carried out at the HF/6-31G level of theory, with the frontier orbitals being extracted by a Sakurai-Sugiura eigensolver. It takes 11.3 h for the FMO calculation, 49.1 h for the Fock matrix construction, and 10 min to extract 94 eigen-components on a PC cluster system using 256 processors.

  7. Parallel methods for dynamic simulation of multiple manipulator systems

    NASA Technical Reports Server (NTRS)

    Mcmillan, Scott; Sadayappan, P.; Orin, David E.

    1993-01-01

    In this paper, efficient dynamic simulation algorithms for a system of m manipulators, cooperating to manipulate a large load, are developed; their performance, using two possible forms of parallelism on a general-purpose parallel computer, is investigated. One form, temporal parallelism, is obtained with the use of parallel numerical integration methods. A speedup of 3.78 on four processors of CRAY Y-MP8 was achieved with a parallel four-point block predictor-corrector method for the simulation of a four manipulator system. These multi-point methods suffer from reduced accuracy, and when comparing these runs with a serial integration method, the speedup can be as low as 1.83 for simulations with the same accuracy. To regain the performance lost due to accuracy problems, a second form of parallelism is employed. Spatial parallelism allows most of the dynamics of each manipulator chain to be computed simultaneously. Used exclusively in the four processor case, this form of parallelism in conjunction with a serial integration method results in a speedup of 3.1 on four processors over the best serial method. In cases where there are either more processors available or fewer chains in the system, the multi-point parallel integration methods are still advantageous despite the reduced accuracy because both forms of parallelism can then combine to generate more parallel tasks and achieve greater effective speedups. This paper also includes results for these cases.

  8. 7 CFR 1208.18 - Processor.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... AND ORDERS; MISCELLANEOUS COMMODITIES), DEPARTMENT OF AGRICULTURE PROCESSED RASPBERRY PROMOTION, RESEARCH, AND INFORMATION ORDER Processed Raspberry Promotion, Research, and Information Order Definitions § 1208.18 Processor. Processor means a person engaged in the preparation of raspberries for...

  9. 7 CFR 1208.18 - Processor.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... AND ORDERS; MISCELLANEOUS COMMODITIES), DEPARTMENT OF AGRICULTURE PROCESSED RASPBERRY PROMOTION, RESEARCH, AND INFORMATION ORDER Processed Raspberry Promotion, Research, and Information Order Definitions § 1208.18 Processor. Processor means a person engaged in the preparation of raspberries for...

  10. Entanglement in a Quantum Annealing Processor

    DTIC Science & Technology

    2016-09-07

    Entanglement in a Quantum Annealing Processor T. Lanting,1,* A. J. Przybysz,1 A. Yu. Smirnov,1 F. M. Spedalieri,2,3 M. H. Amin,1,4 A. J. Berkley,1 R...promising path to a practical quantum processor . We have built a series of architecturally scalable QA processors consisting of networks of manufactured...such processor , demonstrating quantum coherence in these systems. We present experimental evidence that, during a critical portion of QA, the qubits

  11. Dual-Sampler Processor Digitizes CCD Output

    NASA Technical Reports Server (NTRS)

    Salomon, P. M.

    1986-01-01

    Circuit for processing output of charge-coupled device (CCD) imager provides increased time for analog-to-digital conversion, thereby reducing bandwidth required for video processing. Instead of one sampleand-hold circuit of conventional processor, improved processor includes two sample-and-hold circuits alternated with each other. Dual-sampler processor operates with lower bandwidth and with timing requirements less stringent than those of single-sample processor.

  12. Preconditioning of real-time optical Wiener filters for array processing

    NASA Astrophysics Data System (ADS)

    Ghosh, Anjan; Paparao, Palacharla

    1992-07-01

    In adaptive array processors, a performance measure, such as mean square error or signal to noise ration, coverages to the optimum Wiener solution starting from an initial setting. The choice of adaptive algorithms to solve the Wiener filtering problem is mainly guided by the desired processing time. In an optical realization for direct calculation of the optimum weights the covariance matrix and vector for a Wiener filter are computed at a high speed on acousto-optic processors. The resulting linear system of equations can be solved on an iterative optical processor. The matrix and vector data should be recomputed in every iteration for better tracking and adaptation. This introduces variations in their values due to the time-varying jamming and interference noise and the optical errors and noise. Time variant steepest descent algorithm is a simple method that converges to the common solution. In this paper, we describe a real-time preconditioning technique for such nonstationary iterative methods. Preconditioning will progressively lower the condition number of each matrix in the sequence, thereby improving the convergence speed and accuracy of the solution. This preconditioning process involves matrix-matrix multiplications that can be performed at high speed on parallel optical processors. Results of simulations illustrate the superlinear convergence obtained from preconditioning.

  13. 7 CFR 926.13 - Processor.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 7 Agriculture 8 2010-01-01 2010-01-01 false Processor. 926.13 Section 926.13 Agriculture... and Orders; Fruits, Vegetables, Nuts), DEPARTMENT OF AGRICULTURE DATA COLLECTION, REPORTING AND... Processor. Processor means any person who receives or acquires fresh or frozen cranberries or cranberries...

  14. Processor architecture for airborne SAR systems

    NASA Technical Reports Server (NTRS)

    Glass, C. M.

    1983-01-01

    Digital processors for spaceborne imaging radars and application of the technology developed for airborne SAR systems are considered. Transferring algorithms and implementation techniques from airborne to spaceborne SAR processors offers obvious advantages. The following topics are discussed: (1) a quantification of the differences in processing algorithms for airborne and spaceborne SARs; and (2) an overview of three processors for airborne SAR systems.

  15. Parallel matrix transpose algorithms on distributed memory concurrent computers

    SciTech Connect

    Choi, J.; Walker, D.W.; Dongarra, J.J. |

    1993-10-01

    This paper describes parallel matrix transpose algorithms on distributed memory concurrent processors. It is assumed that the matrix is distributed over a P x Q processor template with a block scattered data distribution. P, Q, and the block size can be arbitrary, so the algorithms have wide applicability. The communication schemes of the algorithms are determined by the greatest common divisor (GCD) of P and Q. If P and Q are relatively prime, the matrix transpose algorithm involves complete exchange communication. If P and Q are not relatively prime, processors are divided into GCD groups and the communication operations are overlapped for different groups of processors. Processors transpose GCD wrapped diagonal blocks simultaneously, and the matrix can be transposed with LCM/GCD steps, where LCM is the least common multiple of P and Q. The algorithms make use of non-blocking, point-to-point communication between processors. The use of nonblocking communication allows a processor to overlap the messages that it sends to different processors, thereby avoiding unnecessary synchronization. Combined with the matrix multiplication routine, C = A{center_dot}B, the algorithms are used to compute parallel multiplications of transposed matrices, C = A{sup T}{center_dot}B{sup T}, in the PUMMA package. Details of the parallel implementation of the algorithms are given, and results are presented for runs on the Intel Touchstone Delta computer.

  16. Parallel Pascal - An extended Pascal for parallel computers

    NASA Technical Reports Server (NTRS)

    Reeves, A. P.

    1984-01-01

    Parallel Pascal is an extended version of the conventional serial Pascal programming language which includes a convenient syntax for specifying array operations. It is upward compatible with standard Pascal and involves only a small number of carefully chosen new features. Parallel Pascal was developed to reduce the semantic gap between standard Pascal and a large range of highly parallel computers. Two important design goals of Parallel Pascal were efficiency and portability. Portability is particularly difficult to achieve since different parallel computers frequently have very different capabilities.

  17. FPGA-based reconfigurable processor for ultrafast interlaced ultrasound and photoacoustic imaging.

    PubMed

    Alqasemi, Umar; Li, Hai; Aguirre, Andrés; Zhu, Quing

    2012-07-01

    In this paper, we report, to the best of our knowledge, a unique field-programmable gate array (FPGA)-based reconfigurable processor for real-time interlaced co-registered ultrasound and photoacoustic imaging and its application in imaging tumor dynamic response. The FPGA is used to control, acquire, store, delay-and-sum, and transfer the data for real-time co-registered imaging. The FPGA controls the ultrasound transmission and ultrasound and photoacoustic data acquisition process of a customized 16-channel module that contains all of the necessary analog and digital circuits. The 16-channel module is one of multiple modules plugged into a motherboard; their beamformed outputs are made available for a digital signal processor (DSP) to access using an external memory interface (EMIF). The FPGA performs a key role through ultrafast reconfiguration and adaptation of its structure to allow real-time switching between the two imaging modes, including transmission control, laser synchronization, internal memory structure, beamforming, and EMIF structure and memory size. It performs another role by parallel accessing of internal memories and multi-thread processing to reduce the transfer of data and the processing load on the DSP. Furthermore, because the laser will be pulsing even during ultrasound pulse-echo acquisition, the FPGA ensures that the laser pulses are far enough from the pulse-echo acquisitions by appropriate time-division multiplexing (TDM). A co-registered ultrasound and photoacoustic imaging system consisting of four FPGA modules (64-channels) is constructed, and its performance is demonstrated using phantom targets and in vivo mouse tumor models.

  18. A programmable power processor for a 25 kW power module. [on Shuttle Orbiter

    NASA Technical Reports Server (NTRS)

    Kapustka, R. E.; Lanier, J. R., Jr.

    1978-01-01

    The paper presents the concept and major design problems of a programmable power processor for the 25 kW electrical power system for the Shuttle Orbiter. The load will be handled by three parallel power stages operated in phase sequence with each power transistor having its own commutating diode and filter inductor. The power stages will be run at a fixed frequency of 10 kHz with the 'on'-time variable up to 100%. The input filter bank in the breadboard programmable power processor is planned to be a series-parallel combination of tantalum cased tantalum wet-slug capacitors.

  19. Method for simultaneous overlapped communications between neighboring processors in a multiple

    DOEpatents

    Benner, Robert E.; Gustafson, John L.; Montry, Gary R.

    1991-01-01

    A parallel computing system and method having improved performance where a program is concurrently run on a plurality of nodes for reducing total processing time, each node having a processor, a memory, and a predetermined number of communication channels connected to the node and independently connected directly to other nodes. The present invention improves performance of performance of the parallel computing system by providing a system which can provide efficient communication between the processors and between the system and input and output devices. A method is also disclosed which can locate defective nodes with the computing system.

  20. Singular value decomposition with systolic arrays

    NASA Technical Reports Server (NTRS)

    Ipsen, I. C. F.

    1984-01-01

    Systolic arrays for determining the singular value decomposition of a mxn, m n, matrix A of bandwidth w are presented. After A has been reduced to bidiagonal form B by means of Givens plane rotations, the singular values of B are computed by the Golub-Reinsch iteration. The products of plane rotations form the matrices of left and right singular vectors. Assuming each processor can compute or supply a plane rotation, O(wn) processors accomplish the reduction to bidiagonal form in O(np) steps, where p is the number of superdiagonals. A constant number of processors then determines each singular value in about 6n steps. The singular vectors are computed by rerouting the rotations through the arrays used for the reduction to bidiagonal form, or else along the way by employing another rectangular array of O(wm) processors.