Science.gov

Sample records for parallel processor array

  1. Parallel processor configuration for adaptive antenna arrays

    SciTech Connect

    Wiener, A.I.

    1986-03-11

    This patent describes a signal processing system receiving detected signals from the subarray sensors of an adaptive phased array and delivering to the array data processing system an output of weighted sensor signal, the weighted sensor signals having the signal components of multiple jamming sources in the detected signals nulled and maximizing the gain of desirable components of the detected signals. The signal processing system consists of: a parallel analog combiner receiving the detected signals from the subarray sensors of an adaptive phased array, the parallel analog combiner applying a first set of output weights to each of the detected signals from each of the subarray sensors, and combining the resultant weighted analog signals into the combiner output signal, the parallel analog combiner producing the first set of output weights by combining a first set of output weight change values with its first set of old output weight values; an error generator receiving the combiner output signal and one of the detected signals which is used as a reference signal and generating an error signal; a parallel processor receiving the detected signals from the subarray sensor, the parallel processor producing a second set of output weights to the detected signals and producing the output of weighted sensor signals to the array data processing system; the processing producing the second set of output weights by combining a second set of output weight change values with its second set of old output weight values; and an algorithm circuit receiving the error signal from the error generator and the weighted output signals from the parallel processor and providing a first set of output weight change values to the parallel analog combiner, and a second set of output weight change values to the parallel processor.

  2. Titanic: a VLSI based content addressable parallel array processor

    SciTech Connect

    Weems, C.; Levitan, S.; Foster, C.

    1982-01-01

    A design is presented for a content addressable parallel array processor (CAPAP) which is both practical and feasible. Its practicality stems from an extensive program of research into real applications of content addressability and parallelism. The feasibility of the design stems from development under a set of conservative engineering constraints tied to limitations of VLSI technology. 1 ref.

  3. Integration of IR focal plane arrays with massively parallel processor

    NASA Astrophysics Data System (ADS)

    Esfandiari, P.; Koskey, P.; Vaccaro, K.; Buchwald, W.; Clark, F.; Krejca, B.; Rekeczky, C.; Zarandy, A.

    2008-04-01

    The intent of this investigation is to replace the low fill factor visible sensor of a Cellular Neural Network (CNN) processor with an InGaAs Focal Plane Array (FPA) using both bump bonding and epitaxial layer transfer techniques for use in the Ballistic Missile Defense System (BMDS) interceptor seekers. The goal is to fabricate a massively parallel digital processor with a local as well as a global interconnect architecture. Currently, this unique CNN processor is capable of processing a target scene in excess of 10,000 frames per second with its visible sensor. What makes the CNN processor so unique is that each processing element includes memory, local data storage, local and global communication devices and a visible sensor supported by a programmable analog or digital computer program.

  4. Digital Parallel Processor Array for Optimum Path Planning

    NASA Technical Reports Server (NTRS)

    Kremeny, Sabrina E. (Inventor); Fossum, Eric R. (Inventor); Nixon, Robert H. (Inventor)

    1996-01-01

    The invention computes the optimum path across a terrain or topology represented by an array of parallel processor cells interconnected between neighboring cells by links extending along different directions to the neighboring cells. Such an array is preferably implemented as a high-speed integrated circuit. The computation of the optimum path is accomplished by, in each cell, receiving stimulus signals from neighboring cells along corresponding directions, determining and storing the identity of a direction along which the first stimulus signal is received, broadcasting a subsequent stimulus signal to the neighboring cells after a predetermined delay time, whereby stimulus signals propagate throughout the array from a starting one of the cells. After propagation of the stimulus signal throughout the array, a master processor traces back from a selected destination cell to the starting cell along an optimum path of the cells in accordance with the identity of the directions stored in each of the cells.

  5. Parallel processing in a host plus multiple array processor system for radar

    NASA Technical Reports Server (NTRS)

    Barkan, B. Z.

    1983-01-01

    Host plus multiple array processor architecture is demonstrated to yield a modular, fast, and cost-effective system for radar processing. Software methodology for programming such a system is developed. Parallel processing with pipelined data flow among the host, array processors, and discs is implemented. Theoretical analysis of performance is made and experimentally verified. The broad class of problems to which the architecture and methodology can be applied is indicated.

  6. Fast parallel implementation of multidimensional data-domain FORTRAN codes on distributed-memory processor arrays

    NASA Astrophysics Data System (ADS)

    Reale, F.; Barbera, M.; Sciortino, S.

    1992-11-01

    We illustrate a general and straightforward approach to develop FORTRAN parallel two-dimensional data-domain applications on distributed-memory systems, such as those based on transputers. We have aimed at achieving flexibility for different processor topologies and processor numbers, non-homogeneous processor configurations and coarse load-balancing. We have assumed a master-slave architecture as basic programming model in the framework of a domain decomposition approach. After developing a library of high-level general network and communication routines, based on low-level system-dependent libraries, we have used it to parallelize some specific applications: an elementary 2-D code, useful as a pattern and guide for other more complex applications, and a 2-D hydrodynamic code for astrophysical studies. Code parallelization is achieved by splitting the original code into two independent codes, one for the master and the other for the slaves, and then by adding coordinated calls to network setting and message-passing routines into the programs. The parallel applications have been implemented on a Meiko Computing Surface hosted by a SUN 4 workstation and running CSTools software package. After the basic network and communication routines were developed, the task of parallelizing the 2-D hydrodynamic code took approximately 12 man hours. The parallel efficiency of the code ranges between 98% and 58% on arrays between 2 and 20 T800 transputers, on a relatively small computational mesh (?3000 cells). Arrays consisting of a limited number of faster Intel i860 processors achieve a high parallel efficiency on large computational grids (> 10000 grid points) with performances in the class of minisupercomputers.

  7. Array processor architecture

    NASA Technical Reports Server (NTRS)

    Barnes, George H. (Inventor); Lundstrom, Stephen F. (Inventor); Shafer, Philip E. (Inventor)

    1983-01-01

    A high speed parallel array data processing architecture fashioned under a computational envelope approach includes a data base memory for secondary storage of programs and data, and a plurality of memory modules interconnected to a plurality of processing modules by a connection network of the Omega gender. Programs and data are fed from the data base memory to the plurality of memory modules and from hence the programs are fed through the connection network to the array of processors (one copy of each program for each processor). Execution of the programs occur with the processors operating normally quite independently of each other in a multiprocessing fashion. For data dependent operations and other suitable operations, all processors are instructed to finish one given task or program branch before all are instructed to proceed in parallel processing fashion on the next instruction. Even when functioning in the parallel processing mode however, the processors are not locked-step but execute their own copy of the program individually unless or until another overall processor array synchronization instruction is issued.

  8. Programming parallel processors

    SciTech Connect

    Babb, R.G. II

    1987-01-01

    This book surveys the major commercially available, scientific parallel computers with emphasis on how they are programmed. For each machine, the way in which parallel performance can be assessed is shown for the same, small example program. A wide range of parallel machines is covered, from superminis to parallel vector supercomputers, including both shared memory and message-passing machines. Topics covered include: exploiting multiprocessors: issues and options; Alliant FX/8; BBN Butterfly Parallel Processor; CRAY X-MP; FPS T Series Parallel Processor; IBM 3090; Intel iPSC Concurrent Computer; Loral Dataflo LDF 100; and Sequent Balance Series.

  9. Spaceborne Processor Array

    NASA Technical Reports Server (NTRS)

    Chow, Edward T.; Schatzel, Donald V.; Whitaker, William D.; Sterling, Thomas

    2008-01-01

    A Spaceborne Processor Array in Multifunctional Structure (SPAMS) can lower the total mass of the electronic and structural overhead of spacecraft, resulting in reduced launch costs, while increasing the science return through dynamic onboard computing. SPAMS integrates the multifunctional structure (MFS) and the Gilgamesh Memory, Intelligence, and Network Device (MIND) multi-core in-memory computer architecture into a single-system super-architecture. This transforms every inch of a spacecraft into a sharable, interconnected, smart computing element to increase computing performance while simultaneously reducing mass. The MIND in-memory architecture provides a foundation for high-performance, low-power, and fault-tolerant computing. The MIND chip has an internal structure that includes memory, processing, and communication functionality. The Gilgamesh is a scalable system comprising multiple MIND chips interconnected to operate as a single, tightly coupled, parallel computer. The array of MIND components shares a global, virtual name space for program variables and tasks that are allocated at run time to the distributed physical memory and processing resources. Individual processor- memory nodes can be activated or powered down at run time to provide active power management and to configure around faults. A SPAMS system is comprised of a distributed Gilgamesh array built into MFS, interfaces into instrument and communication subsystems, a mass storage interface, and a radiation-hardened flight computer.

  10. Array processors in chemistry

    SciTech Connect

    Ostlund, N.S.

    1980-01-01

    The field of attached scientific processors (''array processors'') is surveyed, and an attempt is made to indicate their present and possible future use in computational chemistry. The current commercial products from Floating Point Systems, Inc., Datawest Corporation, and CSP, Inc. are discussed.

  11. Climate modelling using parallel processors

    NASA Astrophysics Data System (ADS)

    Dash, S. K.; Selvakumar, S.; Jha, B.

    A spectral General Circulation Model at horizontal resolutions T21 and T42 has been integrated upto 30 d on 16 and 32 processors of Meiko T800. The model at resolution T21 is also implemented on 16 processors (T800) of a parallel computer (CHIPPS) built in India. The wallclock timings of model integration for 1, 10 and 30 d are noted and the speedup and efficiency of 16 and 32 processors have been computed. Results show that a T42 parallel model with nine levels in the vertical takes less than 36 elapsed minutes on 32 processors for 1 d integration. In case of T21 model integration, the maximum speedup and efficiency achieved on 16 processors are about 10 and 63%, respectively. When the horizontal resolution of the model is doubled to T42, the maximum speedup and efficiency obtained on 32 processors are about 9 and 29%, respectively. It is also found that when the physical parametrisation schemes are included in the model and thereby the number of arithmetic operations are increased, the speedup and efficiency of 16 as well as 32 processors increase compared to the case with no physics in the model.

  12. VLSI array processor

    NASA Astrophysics Data System (ADS)

    Greenwood, E.

    1982-07-01

    The Arithmetic Processor Unit (APU) data base design check was completed. Minor design rule violations and design improvements were accomplished. The APU mask set has been fabricated and checked. Initial checking of all mask layers revealed a design rule problem in one layer. That layer was corrected, refabricated and checked out. The mask set has been delivered to the chip fabrication area. The fabrication process has been initiated. All work on the Array Processor Demonstration System (APDS) has been suspended at CHI until the additionally requested funding was received. That funding has been authorized and CHI will begin work on the APDS in July. The following activities are planned in the following quarter: 1) Complete fabrication of the first lot of VLSI APU devices. 2) Complete integration and check-out of the APDS simulator. 3) Complete integration and check-out of the APU breadboard. 4) Verify the VLSI APU wafer tests with the APU breadboard. 5) Complete check-out of the APDS using the APU breadboard.

  13. Parallel processor engine model program

    NASA Technical Reports Server (NTRS)

    Mclaughlin, P.

    1984-01-01

    The Parallel Processor Engine Model Program is a generalized engineering tool intended to aid in the design of parallel processing real-time simulations of turbofan engines. It is written in the FORTRAN programming language and executes as a subset of the SOAPP simulation system. Input/output and execution control are provided by SOAPP; however, the analysis, emulation and simulation functions are completely self-contained. A framework in which a wide variety of parallel processing architectures could be evaluated and tools with which the parallel implementation of a real-time simulation technique could be assessed are provided.

  14. Optical systolic array processor using residue arithmetic

    NASA Technical Reports Server (NTRS)

    Jackson, J.; Casasent, D.

    1983-01-01

    The use of residue arithmetic to increase the accuracy and reduce the dynamic range requirements of optical matrix-vector processors is evaluated. It is determined that matrix-vector operations and iterative algorithms can be performed totally in residue notation. A new parallel residue quantizer circuit is developed which significantly improves the performance of the systolic array feedback processor. Results are presented of a computer simulation of this system used to solve a set of three simultaneous equations.

  15. Parallel Analog-to-Digital Image Processor

    NASA Technical Reports Server (NTRS)

    Lokerson, D. C.

    1987-01-01

    Proposed integrated-circuit network of many identical units convert analog outputs of imaging arrays of x-ray or infrared detectors to digital outputs. Converter located near imaging detectors, within cryogenic detector package. Because converter output digital, lends itself well to multiplexing and to postprocessing for correction of gain and offset errors peculiar to each picture element and its sampling and conversion circuits. Analog-to-digital image processor is massively parallel system for processing data from array of photodetectors. System built as compact integrated circuit located near local plane. Buffer amplifier for each picture element has different offset.

  16. The AIS-5000 parallel processor

    SciTech Connect

    Schmitt, L.A.; Wilson, S.S.

    1988-05-01

    The AIS-5000 is a commercially available massively parallel processor which has been designed to operate in an industrial environment. It has fine-grained parallelism with up to 1024 processing elements arranged in a single-instruction multiple-data (SIMD) architecture. The processing elements are arranged in a one-dimensional chain that, for computer vision applications, can be as wide as the image itself. This architecture has superior cost/performance characteristics than two-dimensional mesh-connected systems. The design of the processing elements and their interconnections as well as the software used to program the system allow a wide variety of algorithms and applications to be implemented. In this paper, the overall architecture of the system is described. Various components of the system are discussed, including details of the processing elements, data I/O pathways and parallel memory organization. A virtual two-dimensional model for programming image-based algorithms for the system is presented. This model is supported by the AIS-5000 hardware and software and allows the system to be treated as a full-image-size, two-dimensional, mesh-connected parallel processor. Performance bench marks are given for certain simple and complex functions.

  17. Rectangular Array Of Digital Processors For Planning Paths

    NASA Technical Reports Server (NTRS)

    Kemeny, Sabrina E.; Fossum, Eric R.; Nixon, Robert H.

    1993-01-01

    Prototype 24 x 25 rectangular array of asynchronous parallel digital processors rapidly finds best path across two-dimensional field, which could be patch of terrain traversed by robotic or military vehicle. Implemented as single-chip very-large-scale integrated circuit. Excepting processors on edges, each processor communicates with four nearest neighbors along paths representing travel to north, south, east, and west. Each processor contains delay generator in form of 8-bit ripple counter, preset to 1 of 256 possible values. Operation begins with choice of processor representing starting point. Transmits signals to nearest neighbor processors, which retransmits to other neighboring processors, and process repeats until signals propagated across entire field.

  18. Array Processor Has Power and Flexibility

    NASA Technical Reports Server (NTRS)

    Barnes, G. H.; Lundstrom, S. F.; Shafer, P. E.

    1982-01-01

    Proposed processor architecture would have flexibility of a multi-processor and computational power of a lockstep array. Using an efficient interconnection network, it accomodates a large number of individual processors and memory modules. Array architecture would be suitable for very large scientific simulation problems and other applications.

  19. APRON: A Cellular Processor Array Simulation and Hardware Design Tool

    NASA Astrophysics Data System (ADS)

    Barr, David R. W.; Dudek, Piotr

    2009-12-01

    We present a software environment for the efficient simulation of cellular processor arrays (CPAs). This software (APRON) is used to explore algorithms that are designed for massively parallel fine-grained processor arrays, topographic multilayer neural networks, vision chips with SIMD processor arrays, and related architectures. The software uses a highly optimised core combined with a flexible compiler to provide the user with tools for the design of new processor array hardware architectures and the emulation of existing devices. We present performance benchmarks for the software processor array implemented on standard commodity microprocessors. APRON can be configured to use additional processing hardware if necessary and can be used as a complete graphical user interface and development environment for new or existing CPA systems, allowing more users to develop algorithms for CPA systems.

  20. Ultrafast Fourier-transform parallel processor

    SciTech Connect

    Greenberg, W.L.

    1980-04-01

    A new, flexible, parallel-processing architecture is developed for a high-speed, high-precision Fourier transform processor. The processor is intended for use in 2-D signal processing including spatial filtering, matched filtering and image reconstruction from projections.

  1. The Use of a Microcomputer Based Array Processor for Real Time Laser Velocimeter Data Processing

    NASA Technical Reports Server (NTRS)

    Meyers, James F.

    1990-01-01

    The application of an array processor to laser velocimeter data processing is presented. The hardware is described along with the method of parallel programming required by the array processor. A portion of the data processing program is described in detail. The increase in computational speed of a microcomputer equipped with an array processor is illustrated by comparative testing with a minicomputer.

  2. Fault-tolerant parallel processor

    SciTech Connect

    Harper, R.E.; Lala, J.H. )

    1991-06-01

    This paper addresses issues central to the design and operation of an ultrareliable, Byzantine resilient parallel computer. Interprocessor connectivity requirements are met by treating connectivity as a resource that is shared among many processing elements, allowing flexibility in their configuration and reducing complexity. Redundant groups are synchronized solely by message transmissions and receptions, which aslo provide input data consistency and output voting. Reliability analysis results are presented that demonstrate the reduced failure probability of such a system. Performance analysis results are presented that quantify the temporal overhead involved in executing such fault-tolerance-specific operations. Empirical performance measurements of prototypes of the architecture are presented. 30 refs.

  3. Parallel processor programs in the Federal Government

    NASA Technical Reports Server (NTRS)

    Schneck, P. B.; Austin, D.; Squires, S. L.; Lehmann, J.; Mizell, D.; Wallgren, K.

    1985-01-01

    In 1982, a report dealing with the nation's research needs in high-speed computing called for increased access to supercomputing resources for the research community, research in computational mathematics, and increased research in the technology base needed for the next generation of supercomputers. Since that time a number of programs addressing future generations of computers, particularly parallel processors, have been started by U.S. government agencies. The present paper provides a description of the largest government programs in parallel processing. Established in fiscal year 1985 by the Institute for Defense Analyses for the National Security Agency, the Supercomputing Research Center will pursue research to advance the state of the art in supercomputing. Attention is also given to the DOE applied mathematical sciences research program, the NYU Ultracomputer project, the DARPA multiprocessor system architectures program, NSF research on multiprocessor systems, ONR activities in parallel computing, and NASA parallel processor projects.

  4. Performance limitations in parallel processor simulations

    NASA Technical Reports Server (NTRS)

    O'Grady, E. Pearse; Wang, Chung-Hsien

    1987-01-01

    A jet-engine model is partitioned and simulated on a parallel processor system consisting of five 8086/8087 floating-point computers. The simulation uses Heun's integration method. A near-optimal parallel simulation (in the sense of minimum execution time) achieves speedup of only 2.13 and efficiency of 42.6 percent, in effect wasting 57.4 percent of the available processing power. A detailed analysis identifies and graphically demonstrates why the system fails to achieve ideal performance (viz., speedup of 5 and efficiency of 100 percent). Inherent characteristics of the problem equations and solution algorithm account for the loss of nearly half of the available processing power. Overheads associated with interprocessor communication and processor synchronization account for only a small fraction of the lost processing power. The effects of these and other factors which limit parallel processor performance are illustrated through real-time timing-analyzer tracers describing the run/idle status of the parallel processors during the simulation.

  5. Assignment Of Finite Elements To Parallel Processors

    NASA Technical Reports Server (NTRS)

    Salama, Moktar A.; Flower, Jon W.; Otto, Steve W.

    1990-01-01

    Elements assigned approximately optimally to subdomains. Mapping algorithm based on simulated-annealing concept used to minimize approximate time required to perform finite-element computation on hypercube computer or other network of parallel data processors. Mapping algorithm needed when shape of domain complicated or otherwise not obvious what allocation of elements to subdomains minimizes cost of computation.

  6. The monarch parallel processor hardware design

    SciTech Connect

    Rettberg, R.D.; Crowther, W.R.; Carvey, P.P.; Tomlinson, R.S. )

    1990-04-01

    The authors report on their development of the Monarch parallel processor. Today, the Monarch's design is largely done and well into implementation. The high-speed interconnection network has been tested with two-micron switch chips, logging more than 30,000 device hours of operation at 125 mega bits per second passing over 10{sup 16} bits. The processor's logic design is almost complete and simulated. The memory controller and concentrator remain to be designed. The authors have analyzed the software in detail with the use of hand-coded examples, a simulator, and a rudimentary compiler. The authors are currently seeking support to finish the implementation.

  7. Grundy: Parallel Processor Architecture Makes Programming Easy

    NASA Astrophysics Data System (ADS)

    Meier, Robert J.

    1985-12-01

    Grundy, an architecture for parallel processing, facilitates the use of high-level languages. In Grundy, several thousand simple processors are dispersed throughout the address space and the concept of machine state is replaced by an invokation frame, a data structure of local variables, program counter, and pointers to superprocesses (parents), subprocesses (children), and concurrent processes (siblings). Each instruction execution consists of five phases. An instruction is fetched, the instruction is decoded, the sources are fetched, the operation is performed, and the destination is written. This breakdown of operations is easily pipelinable. The instruction format of Grundy is completely orthogonal, so Grundy machine code consists of a set of register transfer control bits. The process state pointers are used to collect unused resources such as processors and memory. Joseph Mahon[1] found that as the degree of physical parallelism increases, throughput, including overhead, increases even if extra overhead is needed to split logical processes. As stack pointer, accumulators, and index registers facilitate using high-level languages on conventional computers, pointers to parents, children, and siblings simplify the use of a run-time operating system. The ability to ignore the physical structure of a large number of simple processors supports the use of structured programming. A very simple processor cell allows the replication of approximately 16 32-bit processors on a single Very Large Scale Integration chip. (2M lambda[2]) A bootstrapper and Input/Output channels can be hardwired (using ROM cells and pseudo-processor cells) into a 100 chip computer that is expected to have over 500 procesors, 500K memory, and a network supporting up to 64 concurrent messages between 1000 nodes. These sizes are merely typical and not limits.

  8. A novel VLSI processor architecture for supercomputing arrays

    NASA Technical Reports Server (NTRS)

    Venkateswaran, N.; Pattabiraman, S.; Devanathan, R.; Ahmed, Ashaf; Venkataraman, S.; Ganesh, N.

    1993-01-01

    Design of the processor element for general purpose massively parallel supercomputing arrays is highly complex and cost ineffective. To overcome this, the architecture and organization of the functional units of the processor element should be such as to suit the diverse computational structures and simplify mapping of complex communication structures of different classes of algorithms. This demands that the computation and communication structures of different class of algorithms be unified. While unifying the different communication structures is a difficult process, analysis of a wide class of algorithms reveals that their computation structures can be expressed in terms of basic IP,IP,OP,CM,R,SM, and MAA operations. The execution of these operations is unified on the PAcube macro-cell array. Based on this PAcube macro-cell array, we present a novel processor element called the GIPOP processor, which has dedicated functional units to perform the above operations. The architecture and organization of these functional units are such to satisfy the two important criteria mentioned above. The structure of the macro-cell and the unification process has led to a very regular and simpler design of the GIPOP processor. The production cost of the GIPOP processor is drastically reduced as it is designed on high performance mask programmable PAcube arrays.

  9. Associative massively parallel processor for video processing

    NASA Astrophysics Data System (ADS)

    Krikelis, Argy; Tawiah, T.

    1996-03-01

    Massively parallel processing architectures have matured primarily through image processing and computer vision application. The similarity of processing requirements between these areas and video processing suggest that they should be very appropriate for video processing applications. This research describes the use of an associative massively parallel processing based system for video compression which includes architectural and system description, discussion of the implementation of compression tasks such as DCT/IDCT, Motion Estimation and Quantization and system evaluation. The core of the processing system is the ASP (Associative String Processor) architecture a modular massively parallel, programmable and inherently fault-tolerant fine-grain SIMD processing architecture incorporating a string of identical APEs (Associative Processing Elements), a reconfigurable inter-processor communication network and a Vector Data Buffer for fully-overlapped data input-output. For video compression applications a prototype system is developed, which is using ASP modules to implement the required compression tasks. This scheme leads to a linear speed up of the computation by simply adding more APEs to the modules.

  10. Scalable Unix tools on parallel processors

    SciTech Connect

    Gropp, W.; Lusk, E.

    1994-12-31

    The introduction of parallel processors that run a separate copy of Unix on each process has introduced new problems in managing the user`s environment. This paper discusses some generalizations of common Unix commands for managing files (e.g. 1s) and processes (e.g. ps) that are convenient and scalable. These basic tools, just like their Unix counterparts, are text-based. We also discuss a way to use these with a graphical user interface (GUI). Some notes on the implementation are provided. Prototypes of these commands are publicly available.

  11. Multiple-fold clustered processor mesh array

    NASA Technical Reports Server (NTRS)

    Pechanek, Gerald G.; Vassiliadis, Stamatis; Delgado, Jose G.

    1993-01-01

    The multiple-fold clustered processor mesh array is a triangular organization of clustered processing elements. This multiple-fold array maintains functional equivalence to the nearest neighbor mesh computer with uni-directional interprocessor communications, but with half the number of connection wires. In addition, the connectivity of the multiple-folded organization is superior to the standard square mesh due to the improved connectivity between the clustered processors. One of the primary application areas targeted is High Performance Architectures for image processing.

  12. Broadband monitoring simulation with massively parallel processors

    NASA Astrophysics Data System (ADS)

    Trubetskov, Mikhail; Amotchkina, Tatiana; Tikhonravov, Alexander

    2011-09-01

    Modern efficient optimization techniques, namely needle optimization and gradual evolution, enable one to design optical coatings of any type. Even more, these techniques allow obtaining multiple solutions with close spectral characteristics. It is important, therefore, to develop software tools that can allow one to choose a practically optimal solution from a wide variety of possible theoretical designs. A practically optimal solution provides the highest production yield when optical coating is manufactured. Computational manufacturing is a low-cost tool for choosing a practically optimal solution. The theory of probability predicts that reliable production yield estimations require many hundreds or even thousands of computational manufacturing experiments. As a result reliable estimation of the production yield may require too much computational time. The most time-consuming operation is calculation of the discrepancy function used by a broadband monitoring algorithm. This function is formed by a sum of terms over wavelength grid. These terms can be computed simultaneously in different threads of computations which opens great opportunities for parallelization of computations. Multi-core and multi-processor systems can provide accelerations up to several times. Additional potential for further acceleration of computations is connected with using Graphics Processing Units (GPU). A modern GPU consists of hundreds of massively parallel processors and is capable to perform floating-point operations efficiently.

  13. Intermediate-level computer-vision-processing algorithm development for the content-addressable-array parallel processor. Quarterly status report No. 3 for period ending 29 November 1986

    SciTech Connect

    Not Available

    1986-12-15

    During this quarter a set of seven benchmark problems were developed and analyzed for the IUA. These included Hough Transform, Convex Hull, Voronoi Diagram, Minimal Spanning Tree, Visibility of Vertices in a projected 3-dimensional model, subgraph isomorphism, and the minimum-cost path between points in a weighted graph. These problems are commonly considered intermediate-level processing in many visions research groups parallel implementations of UMass intermediate level processing algorithms, such as Boldt's line merging and Anandan's motion analysis continued to develop. A commercial processor, the TMS320C25, was chosen as the Intermediate Communications and Associative Processor (ICAP) processing element. The TMS320C25 has the advantages that it is a five-million instruction per second signal-processing unit with a fast multiplier and software support for fast floating-point operations. It also has a built in 5 Mb/S serial port that will interface well with the intermediate-level communications network. Also being explored is a set of group-theoretic network topologies with respect to the communication needs of intermediate-level processing. This has required the analysis of the classes of communication needed in each of the algorithms implemented.

  14. APEmille: a parallel processor in the teraflop range

    NASA Astrophysics Data System (ADS)

    Panizzi, E.

    1997-02-01

    APEmille is a SIMD parallel processor under development at the Italian National Institute for Nuclear Physics (INFN). It is the third machine of the APE family, following Ape and Ape100 and delivering peak performance in the Tflops range. APEmille is very well suited for Lattice QCD applications, both for its hardware characteristics and for its software and language features. APEmille is an array of custom arithmetic processors arranged on a tridimensional torus. The replicated processor is a pipelined VLIW device performing integer and single/double precision IEEE floating point operations. The processor is optimized for complex computations and has a peak performance of 528Mflop at 66MHz. Each replica has 8 Mbytes of locally addressable RAM. In principle an array of 2048 nodes is able to break the Tflops barrier. Two other custom processors are used for program flow control, global addressing and inter node communications. Fast nearest neighbour communications as well as longer distance communications and data broadcast are available. APEmille is interfaced to the external world by a PCI interface and a HIPPI channel. A network of PCs act as the host computer. The APE operating system and the cross compiler run on it. A powerful programming language named TAO is provided and is highly optimized for QCD. A C++ compiler is foreseen. The TAO language is as simple as Fortran but as powerful as object oriented languages. Specific data structures, operators and even statements can be defined by the user for each different application. Effort has been made to define the language constructs for QCD.

  15. Contextual classification on the massively parallel processor

    NASA Technical Reports Server (NTRS)

    Tilton, James C.

    1987-01-01

    Classifiers are often used to produce land cover maps from multispectral Earth observation imagery. Conventionally, these classifiers have been designed to exploit the spectral information contained in the imagery. Very few classifiers exploit the spatial information content of the imagery, and the few that do rarely exploit spatial information content in conjunction with spectral and/or temporal information. A contextual classifier that exploits spatial and spectral information in combination through a general statistical approach was studied. Early test results obtained from an implementation of the classifier on a VAX-11/780 minicomputer were encouraging, but they are of limited meaning because they were produced from small data sets. An implementation of the contextual classifier is presented on the Massively Parallel Processor (MPP) at Goddard that for the first time makes feasible the testing of the classifier on large data sets.

  16. Phased array antenna beamforming using optical processor

    NASA Technical Reports Server (NTRS)

    Anderson, L. P.; Boldissar, F.; Chang, D. C. D.

    1991-01-01

    The feasibility of optical processor based beamforming for microwave array antennas is investigated. The primary focus is on systems utilizing the 20/30 GHz communications band and a transmit configuration exclusively to serve this band. A mathematical model is developed for computation of candidate design configurations. The model is capable of determination of the necessary design parameters required for spatial aspects of the microwave 'footprint' (beam) formation. Computed example beams transmitted from geosynchronous orbit are presented to demonstrate network capabilities. The effect of the processor on the output microwave signal to noise quality at the antenna interface is also considered.

  17. Global Arrays Parallel Programming Toolkit

    SciTech Connect

    Nieplocha, Jaroslaw; Krishnan, Manoj Kumar; Palmer, Bruce J.; Tipparaju, Vinod; Harrison, Robert J.; Chavarra-Miranda, Daniel

    2011-01-01

    The two predominant classes of programming models for parallel computing are distributed memory and shared memory. Both shared memory and distributed memory models have advantages and shortcomings. Shared memory model is much easier to use but it ignores data locality/placement. Given the hierarchical nature of the memory subsystems in modern computers this characteristic can have a negative impact on performance and scalability. Careful code restructuring to increase data reuse and replacing fine grain load/stores with block access to shared data can address the problem and yield performance for shared memory that is competitive with message-passing. However, this performance comes at the cost of compromising the ease of use that the shared memory model advertises. Distributed memory models, such as message-passing or one-sided communication, offer performance and scalability but they are difficult to program. The Global Arrays toolkit attempts to offer the best features of both models. It implements a shared-memory programming model in which data locality is managed by the programmer. This management is achieved by calls to functions that transfer data between a global address space (a distributed array) and local storage. In this respect, the GA model has similarities to the distributed shared-memory models that provide an explicit acquire/release protocol. However, the GA model acknowledges that remote data is slower to access than local data and allows data locality to be specified by the programmer and hence managed. GA is related to the global address space languages such as UPC, Titanium, and, to a lesser extent, Co-Array Fortran. In addition, by providing a set of data-parallel operations, GA is also related to data-parallel languages such as HPF, ZPL, and Data Parallel C. However, the Global Array programming model is implemented as a library that works with most languages used for technical computing and does not rely on compiler technology for achieving parallel efficiency. It also supports a combination of task- and data-parallelism and is available as an extension of the message passing (MPI) model. The GA model exposes to the programmer the hierarchical memory of modern high-performance computer systems, and by recognizing the communication overhead for remote data transfer, it promotes data reuse and locality of reference. Virtually all the scalable architectures possess non-uniform memory access characteristics that reflect their multi-level memory hierarchies. These hierarchies typically comprise processor registers, multiple levels of cache, local memory, and remote memory. Over time, both the number of levels and the cost (in processor cycles) of accessing deeper levels has been increasing. It is important for any scalable programming model to address memory hierarchy since it is critical to the efficient execution of scalable applications.

  18. Scan line graphics generation on the massively parallel processor

    NASA Technical Reports Server (NTRS)

    Dorband, John E.

    1988-01-01

    Described here is how researchers implemented a scan line graphics generation algorithm on the Massively Parallel Processor (MPP). Pixels are computed in parallel and their results are applied to the Z buffer in large groups. To perform pixel value calculations, facilitate load balancing across the processors and apply the results to the Z buffer efficiently in parallel requires special virtual routing (sort computation) techniques developed by the author especially for use on single-instruction multiple-data (SIMD) architectures.

  19. Fast Hough Transform On A Mesh Connected Processor Array

    NASA Astrophysics Data System (ADS)

    Kannar, C. S.; Chuang, Henry Y. H.

    1988-02-01

    Hough transform is an effective method for the detection of the shape of object boundaries in image pattern analysis. Since the Hough transform is very computation intensive, it is essen-tial to parallelize the computation. However, an effective parallel algorithm is harder to obtain because it requires global informa-tion. In this paper we present an efficient parallel Hough transform algorithm for the detection of straight lines using mesh connected processor arrays. While other parallel algo-rithms take either 0(n2) or 0(n2) time, where n is the number of distinct values of a parameter and N is the number of edge pixels, our algorithm takes 0(n) time.

  20. Acceleration of computer-generated hologram by Greatly Reduced Array of Processor Element with Data Reduction

    NASA Astrophysics Data System (ADS)

    Sugiyama, Atsushi; Masuda, Nobuyuki; Oikawa, Minoru; Okada, Naohisa; Kakue, Takashi; Shimobaba, Tomoyoshi; Ito, Tomoyoshi

    2014-11-01

    We have implemented a computer-generated hologram (CGH) calculation on Greatly Reduced Array of Processor Element with Data Reduction (GRAPE-DR) processors. The cost of CGH calculation is enormous, but CGH calculation is well suited to parallel computation. The GRAPE-DR is a multicore processor that has 512 processor elements. The GRAPE-DR supports a double-precision floating-point operation and can perform CGH calculation with high accuracy. The calculation speed of the GRAPE-DR system is seven times faster than that of a personal computer with an Intel Core i7-950 processor.

  1. Breadboard Signal Processor for Arraying DSN Antennas

    NASA Technical Reports Server (NTRS)

    Jongeling, Andre; Sigman, Elliott; Chandra, Kumar; Trinh, Joseph; Soriano, Melissa; Navarro, Robert; Rogstad, Stephen; Goodhart, Charles; Proctor, Robert; Jourdan, Michael; Rayhrer, Benno

    2008-01-01

    A recently developed breadboard version of an advanced signal processor for arraying many antennas in NASA s Deep Space Network (DSN) can accept inputs in a 500-MHz-wide frequency band from six antennas. The next breadboard version is expected to accept inputs from 16 antennas, and a following developed version is expected to be designed according to an architecture that will be scalable to accept inputs from as many as 400 antennas. These and similar signal processors could also be used for combining multiple wide-band signals in non-DSN applications, including very-long-baseline interferometry and telecommunications. This signal processor performs functions of a wide-band FX correlator and a beam-forming signal combiner. [The term "FX" signifies that the digital samples of two given signals are fast Fourier transformed (F), then the fast Fourier transforms of the two signals are multiplied (X) prior to accumulation.] In this processor, the signals from the various antennas are broken up into channels in the frequency domain (see figure). In each frequency channel, the data from each antenna are correlated against the data from each other antenna; this is done for all antenna baselines (that is, for all antenna pairs). The results of the correlations are used to obtain calibration data to align the antenna signals in both phase and delay. Data from the various antenna frequency channels are also combined and calibration corrections are applied. The frequency-domain data thus combined are then synthesized back to the time domain for passing on to a telemetry receiver

  2. Chemical network problems solved on NASA/Goddard's massively parallel processor computer

    NASA Technical Reports Server (NTRS)

    Cho, Seog Y.; Carmichael, Gregory R.

    1987-01-01

    The single instruction stream, multiple data stream Massively Parallel Processor (MPP) unit consists of 16,384 bit serial arithmetic processors configured as a 128 x 128 array whose speed can exceed that of current supercomputers (Cyber 205). The applicability of the MPP for solving reaction network problems is presented and discussed, including the mapping of the calculation to the architecture, and CPU timing comparisons.

  3. Calculating real Delbrck amplitudes on parallel processors

    NASA Astrophysics Data System (ADS)

    Kahane, Sylvian

    1991-12-01

    Calculation of the real Delbrck scattering amplitudes is parallelized by concurent evaluation of 20 four-dimensional integrals. Two approaches were used: (a) a farm of master and workers tasks, and (b) the Cubix concept of parallelization. We discuss load balancing, timing and the efficiency of the implementation.

  4. Massively parallel MRI detector arrays.

    PubMed

    Keil, Boris; Wald, Lawrence L

    2013-04-01

    Originally proposed as a method to increase sensitivity by extending the locally high-sensitivity of small surface coil elements to larger areas via reception, the term parallel imaging now includes the use of array coils to perform image encoding. This methodology has impacted clinical imaging to the point where many examinations are performed with an array comprising multiple smaller surface coil elements as the detector of the MR signal. This article reviews the theoretical and experimental basis for the trend towards higher channel counts relying on insights gained from modeling and experimental studies as well as the theoretical analysis of the so-called "ultimate" SNR and g-factor. We also review the methods for optimally combining array data and changes in RF methodology needed to construct massively parallel MRI detector arrays and show some examples of state-of-the-art for highly accelerated imaging with the resulting highly parallel arrays. PMID:23453758

  5. Massively Parallel MRI Detector Arrays

    PubMed Central

    Keil, Boris; Wald, Lawrence L

    2013-01-01

    Originally proposed as a method to increase sensitivity by extending the locally high-sensitivity of small surface coil elements to larger areas, the term parallel imaging now includes the use of array coils to perform image encoding. This methodology has impacted clinical imaging to the point where many examinations are performed with an array comprising multiple smaller surface coil elements as the detector of the MR signal. This article reviews the theoretical and experimental basis for the trend towards higher channel counts relying on insights gained from modeling and experimental studies as well as the theoretical analysis of the so-called ultimate SNR and g-factor. We also review the methods for optimally combining array data and changes in RF methodology needed to construct massively parallel MRI detector arrays and show some examples of state-of-the-art for highly accelerated imaging with the resulting highly parallel arrays. PMID:23453758

  6. Parallel processor-based raster graphics system architecture

    DOEpatents

    Littlefield, Richard J. (Seattle, WA)

    1990-01-01

    An apparatus for generating raster graphics images from the graphics command stream includes a plurality of graphics processors connected in parallel, each adapted to receive any part of the graphics command stream for processing the command stream part into pixel data. The apparatus also includes a frame buffer for mapping the pixel data to pixel locations and an interconnection network for interconnecting the graphics processors to the frame buffer. Through the interconnection network, each graphics processor may access any part of the frame buffer concurrently with another graphics processor accessing any other part of the frame buffer. The plurality of graphics processors can thereby transmit concurrently pixel data to pixel locations in the frame buffer.

  7. Multithreaded processor architecture for parallel symbolic computation. Technical report

    SciTech Connect

    Fujita, T.

    1987-09-01

    This paper describes the Multilisp Architecture for Symbolic Applications (MASA), which is a multithreaded processor architecture for parallel symbolic computation with various features intended for effective Multilisp program execution. The principal mechanisms exploited for this processor are multiple contexts, interleaved pipeline execution from separate instruction streams, and synchronization based on a bit in each memory cell. The tagged architecture approach is taken for Lisp program execution, and trap conditions are provided for future object manipulation and garbage collection.

  8. Singular value decomposition utilizing parallel algorithms on graphical processors

    SciTech Connect

    Kotas, Charlotte W; Barhen, Jacob

    2011-01-01

    One of the current challenges in underwater acoustic array signal processing is the detection of quiet targets in the presence of noise. In order to enable robust detection, one of the key processing steps requires data and replica whitening. This, in turn, involves the eigen-decomposition of the sample spectral matrix, Cx = 1/K xKX(k)XH(k) where X(k) denotes a single frequency snapshot with an element for each element of the array. By employing the singular value decomposition (SVD) method, the eigenvectors and eigenvalues can be determined directly from the data without computing the sample covariance matrix, reducing the computational requirements for a given level of accuracy (van Trees, Optimum Array Processing). (Recall that the SVD of a complex matrix A involves determining V, , and U such that A = U VH where U and V are orthonormal and is a positive, real, diagonal matrix containing the singular values of A. U and V are the eigenvectors of AAH and AHA, respectively, while the singular values are the square roots of the eigenvalues of AAH.) Because it is desirable to be able to compute these quantities in real time, an efficient technique for computing the SVD is vital. In addition, emerging multicore processors like graphical processing units (GPUs) are bringing parallel processing capabilities to an ever increasing number of users. Since the computational tasks involved in array signal processing are well suited for parallelization, it is expected that these computations will be implemented using GPUs as soon as users have the necessary computational tools available to them. Thus, it is important to have an SVD algorithm that is suitable for these processors. This work explores the effectiveness of two different parallel SVD implementations on an NVIDIA Tesla C2050 GPU (14 multiprocessors, 32 cores per multiprocessor, 1.15 GHz clock - peed). The first algorithm is based on a two-step algorithm which bidiagonalizes the matrix using Householder transformations, and then diagonalizes the intermediate bidiagonal matrix through implicit QR shifts. This is similar to that implemented for real matrices by Lahabar and Narayanan ("Singular Value Decomposition on GPU using CUDA", IEEE International Parallel Distributed Processing Symposium 2009). The implementation is done in a hybrid manner, with the bidiagonalization stage done using the GPU while the diagonalization stage is done using the CPU, with the GPU used to update the U and V matrices. The second algorithm is based on a one-sided Jacobi scheme utilizing a sequence of pair-wise column orthogonalizations such that A is replaced by AV until the resulting matrix is sufficiently orthogonal (that is, equal to U ). V is obtained from the sequence of orthogonalizations, while can be found from the square root of the diagonal elements of AH A and, once is known, U can be found from column scaling the resulting matrix. These implementations utilize CUDA Fortran and NVIDIA's CUB LAS library. The primary goal of this study is to quantify the comparative performance of these two techniques against themselves and other standard implementations (for example, MATLAB). Considering that there is significant overhead associated with transferring data to the GPU and with synchronization between the GPU and the host CPU, it is also important to understand when it is worthwhile to use the GPU in terms of the matrix size and number of concurrent SVDs to be calculated.

  9. Global synchronization of parallel processors using clock pulse width modulation

    DOEpatents

    Chen, Dong; Ellavsky, Matthew R.; Franke, Ross L.; Gara, Alan; Gooding, Thomas M.; Haring, Rudolf A.; Jeanson, Mark J.; Kopcsay, Gerard V.; Liebsch, Thomas A.; Littrell, Daniel; Ohmacht, Martin; Reed, Don D.; Schenck, Brandon E.; Swetz, Richard A.

    2013-04-02

    A circuit generates a global clock signal with a pulse width modification to synchronize processors in a parallel computing system. The circuit may include a hardware module and a clock splitter. The hardware module may generate a clock signal and performs a pulse width modification on the clock signal. The pulse width modification changes a pulse width within a clock period in the clock signal. The clock splitter may distribute the pulse width modified clock signal to a plurality of processors in the parallel computing system.

  10. Bispectrum signal processing on HNC`s SIMD numerical array processor (SNAP)

    SciTech Connect

    Means, R.W.; Wallach, B.; Busby, D.; Lengel, R.C. Jr.

    1993-12-31

    Supercomputers and parallel processors are increasingly being applied to problems traditionally described as signal and image processing problems. The primary activities occurring in either processing area are detection, enhancement, and classification of signals embedded in additive noise. The bispectrum is a processing technique that can be used for improving the detection of signals in noise. It is an order N{sup 2} operation performed over a two dimensional frequency plane and, because of computational demands, has not been used much in practice. HNC has developed a commercially available SIMD Numerical Array Processor (SNAP) and implemented Tracor`s computationally demanding bispectrum signal processing code as a submission for the Gordon Bell prize. The SNAP is a SIMD array of parallel processors connected in a linear ring. A SNAP system with 32 processors (SNAP-32) demonstrated a performance of over 7.5 GIGA FLOP per million dollars.

  11. Adaptive domain decomposition for Monte Carlo simulations on parallel processors

    NASA Technical Reports Server (NTRS)

    Wilmoth, Richard G.

    1990-01-01

    A method is described for performing direct simulation Monte Carlo (DSMC) calculations on parallel processors using adaptive domain decomposition to distribute the computational work load. The method has been implemented on a commercially available hypercube and benchmark results are presented which show the performance of the method relative to current supercomputers. The problems studied were simulations of equilibrium conditions in a closed, stationary box, a two-dimensional vortex flow, and the hypersonic, rarefield flow in a two-dimensional channel. For these problems, the parallel DSMC method ran 5 to 13 times faster than on a single processor of a Cray-2. The adaptive decomposition method worked well in uniformly distributing the computational work over an arbitrary number of processors and reduced the average computational time by over a factor of two in certain cases.

  12. Adaptive domain decomposition for Monte Carlo simulations on parallel processors

    NASA Technical Reports Server (NTRS)

    Wilmoth, Richard G.

    1991-01-01

    A method is described for performing direct simulation Monte Carlo (DSMC) calculations on parallel processors using adaptive domain decomposition to distribute the computational work load. The method has been implemented on a commercially available hypercube and benchmark results are presented which show the performance of the method relative to current supercomputers. The problems studied were simulations of equilibrium conditions in a closed, stationary box, a two-dimensional vortex flow, and the hypersonic, rarefied flow in a two-dimensional channel. For these problems, the parallel DSMC method ran 5 to 13 times faster than on a single processor of a Cray-2. The adaptive decomposition method worked well in uniformly distributing the computational work over an arbitrary number of processors and reduced the average computational time by over a factor of two in certain cases.

  13. DFT algorithms for bit-serial GaAs array processor architectures

    NASA Technical Reports Server (NTRS)

    Mcmillan, Gary B.

    1988-01-01

    Systems and Processes Engineering Corporation (SPEC) has developed an innovative array processor architecture for computing Fourier transforms and other commonly used signal processing algorithms. This architecture is designed to extract the highest possible array performance from state-of-the-art GaAs technology. SPEC's architectural design includes a high performance RISC processor implemented in GaAs, along with a Floating Point Coprocessor and a unique Array Communications Coprocessor, also implemented in GaAs technology. Together, these data processors represent the latest in technology, both from an architectural and implementation viewpoint. SPEC has examined numerous algorithms and parallel processing architectures to determine the optimum array processor architecture. SPEC has developed an array processor architecture with integral communications ability to provide maximum node connectivity. The Array Communications Coprocessor embeds communications operations directly in the core of the processor architecture. A Floating Point Coprocessor architecture has been defined that utilizes Bit-Serial arithmetic units, operating at very high frequency, to perform floating point operations. These Bit-Serial devices reduce the device integration level and complexity to a level compatible with state-of-the-art GaAs device technology.

  14. Dynamic overset grid communication on distributed memory parallel processors

    NASA Technical Reports Server (NTRS)

    Barszcz, Eric; Weeratunga, Sisira K.; Meakin, Robert L.

    1993-01-01

    A parallel distributed memory implementation of intergrid communication for dynamic overset grids is presented. Included are discussions of various options considered during development. Results are presented comparing an Intel iPSC/860 to a single processor Cray Y-MP. Results for grids in relative motion show the iPSC/860 implementation to be faster than the Cray implementation.

  15. Real-time trajectory optimization on parallel processors

    NASA Technical Reports Server (NTRS)

    Psiaki, Mark L.

    1993-01-01

    A parallel algorithm has been developed for rapidly solving trajectory optimization problems. The goal of the work has been to develop an algorithm that is suitable to do real-time, on-line optimal guidance through repeated solution of a trajectory optimization problem. The algorithm has been developed on an INTEL iPSC/860 message passing parallel processor. It uses a zero-order-hold discretization of a continuous-time problem and solves the resulting nonlinear programming problem using a custom-designed augmented Lagrangian nonlinear programming algorithm. The algorithm achieves parallelism of function, derivative, and search direction calculations through the principle of domain decomposition applied along the time axis. It has been encoded and tested on 3 example problems, the Goddard problem, the acceleration-limited, planar minimum-time to the origin problem, and a National Aerospace Plane minimum-fuel ascent guidance problem. Execution times as fast as 118 sec of wall clock time have been achieved for a 128-stage Goddard problem solved on 32 processors. A 32-stage minimum-time problem has been solved in 151 sec on 32 processors. A 32-stage National Aerospace Plane problem required 2 hours when solved on 32 processors. A speed-up factor of 7.2 has been achieved by using 32-nodes instead of 1-node to solve a 64-stage Goddard problem.

  16. SCC-100 parallel processor for real-time imaging

    NASA Astrophysics Data System (ADS)

    Jacobi, William J.; Kendall, William B.; Wadsworth, Leo A.

    1990-09-01

    The SCC-100 parallel processor utilizes a fully-programmable 32-bit MIMD architecture optimized for image and signal processing. Applications include image registration clutter suppression velocity filtering multipectral processmg medical imaging and computer vision research as well as radar and sonar signal processing. The first SCC-100 processor with 19 nodes and a peak throughput in excess of 1 GFLOPS was recentry delivered. A micro-miniature version using hybrid wafer-scale integration is currently un4er development for space apphcations.

  17. Experience with a multiprocessor based on eight FPS 120B array processors

    SciTech Connect

    Bucher, I.Y.; Frederickson, P.O.; Moore, J.W.

    1981-01-01

    The rate of increase in the speed of monoprocessors is no longer keeping pace with the needs of the laboratory; accordingly, the use of parallel processors in large scientific computations is being investigated. As an initial experiment, a particle-in-cell plasma simulation was adapted to run on a star graph architecture consisting of a UNIVAC 1110 as hub, and up to eight Floating Point Systems AP120B array processors at the other vertices. Subdivision of tasks among processors and measured results are discussed.

  18. Simulation of binary tree and pyramid architectures on optical reconfigurable array processors

    NASA Astrophysics Data System (ADS)

    Hossain, M.; Ghanta, S.

    1994-08-01

    Computations on binary tree and pyramid topologies are powerful and are widely used parallel algorithm design techniques. Reconfigurable characteristics of optics provide an efficient way of solving problems in parallel. In this paper, we first design an optical vector processor that exploits the reconfigurable characteristics. A number of these processors connected in one-dimensional or two-dimensional arrays give ORBS (optical reconfigurable bus system) or ORMS (optical reconfigurable mesh system). We simulate the computations based on both binary tree and pyramid topologies on ORBS and ORMS respectively.

  19. Orbital Systolic Algorithms and Array Processors for Solution of the Algebraic Path Problem

    NASA Astrophysics Data System (ADS)

    Sedukhin, Stanislav G.; Miyazaki, Toshiaki; Kuroda, Kenichi

    The algebraic path problem (APP) is a general framework which unifies several solution procedures for a number of well-known matrix and graph problems. In this paper, we present a new 3-dimensional (3-D) orbital algebraic path algorithm and corresponding 2-D toroidal array processors which solve the n n APP in the theoretically minimal number of 3n time-steps. The coordinated time-space scheduling of the computing and data movement in this 3-D algorithm is based on the modular function which preserves the main technological advantages of systolic processing: simplicity, regularity, locality of communications, pipelining, etc. Our design of the 2-D systolic array processors is based on a classical 3-D?2-D space transformation. We have also shown how a data manipulation (copying and alignment) can be effectively implemented in these array processors in a massively-parallel fashion by using a matrix-matrix multiply-add operation.

  20. Potential of minicomputer/array-processor system for nonlinear finite-element analysis

    NASA Technical Reports Server (NTRS)

    Strohkorb, G. A.; Noor, A. K.

    1983-01-01

    The potential of using a minicomputer/array-processor system for the efficient solution of large-scale, nonlinear, finite-element problems is studied. A Prime 750 is used as the host computer, and a software simulator residing on the Prime is employed to assess the performance of the Floating Point Systems AP-120B array processor. Major hardware characteristics of the system such as virtual memory and parallel and pipeline processing are reviewed, and the interplay between various hardware components is examined. Effective use of the minicomputer/array-processor system for nonlinear analysis requires the following: (1) proper selection of the computational procedure and the capability to vectorize the numerical algorithms; (2) reduction of input-output operations; and (3) overlapping host and array-processor operations. A detailed discussion is given of techniques to accomplish each of these tasks. Two benchmark problems with 1715 and 3230 degrees of freedom, respectively, are selected to measure the anticipated gain in speed obtained by using the proposed algorithms on the array processor.

  1. Computations on the massively parallel processor at the Goddard Space Flight Center

    NASA Technical Reports Server (NTRS)

    Strong, James P.

    1991-01-01

    Described are four significant algorithms implemented on the massively parallel processor (MPP) at the Goddard Space Flight Center. Two are in the area of image analysis. Of the other two, one is a mathematical simulation experiment and the other deals with the efficient transfer of data between distantly separated processors in the MPP array. The first algorithm presented is the automatic determination of elevations from stereo pairs. The second algorithm solves mathematical logistic equations capable of producing both ordered and chaotic (or random) solutions. This work can potentially lead to the simulation of artificial life processes. The third algorithm is the automatic segmentation of images into reasonable regions based on some similarity criterion, while the fourth is an implementation of a bitonic sort of data which significantly overcomes the nearest neighbor interconnection constraints on the MPP for transferring data between distant processors.

  2. Method and structure for skewed block-cyclic distribution of lower-dimensional data arrays in higher-dimensional processor grids

    DOEpatents

    Chatterjee, Siddhartha (Yorktown Heights, NY); Gunnels, John A. (Brewster, NY)

    2011-11-08

    A method and structure of distributing elements of an array of data in a computer memory to a specific processor of a multi-dimensional mesh of parallel processors includes designating a distribution of elements of at least a portion of the array to be executed by specific processors in the multi-dimensional mesh of parallel processors. The pattern of the designating includes a cyclical repetitive pattern of the parallel processor mesh, as modified to have a skew in at least one dimension so that both a row of data in the array and a column of data in the array map to respective contiguous groupings of the processors such that a dimension of the contiguous groupings is greater than one.

  3. Analog parallel processor hardware for high speed pattern recognition

    NASA Technical Reports Server (NTRS)

    Daud, T.; Tawel, R.; Langenbacher, H.; Eberhardt, S. P.; Thakoor, A. P.

    1990-01-01

    A VLSI-based analog processor for fully parallel, associative, high-speed pattern matching is reported. The processor consists of two main components: an analog memory matrix for storage of a library of patterns, and a winner-take-all (WTA) circuit for selection of the stored pattern that best matches an input pattern. An inner product is generated between the input vector and each of the stored memories. The resulting values are applied to a WTA network for determination of the closest match. Patterns with up to 22 percent overlap are successfully classified with a WTA settling time of less than 10 microsec. Applications such as star pattern recognition and mineral classification with bounded overlap patterns have been successfully demonstrated. This architecture has a potential for an overall pattern matching speed in excess of 10 exp 9 bits per second for a large memory.

  4. Optimal mapping of irregular finite element domains to parallel processors

    NASA Technical Reports Server (NTRS)

    Flower, J.; Otto, S.; Salama, M.

    1987-01-01

    Mapping the solution domain of n-finite elements into N-subdomains that may be processed in parallel by N-processors is an optimal one if the subdomain decomposition results in a well-balanced workload distribution among the processors. The problem is discussed in the context of irregular finite element domains as an important aspect of the efficient utilization of the capabilities of emerging multiprocessor computers. Finding the optimal mapping is an intractable combinatorial optimization problem, for which a satisfactory approximate solution is obtained here by analogy to a method used in statistical mechanics for simulating the annealing process in solids. The simulated annealing analogy and algorithm are described, and numerical results are given for mapping an irregular two-dimensional finite element domain containing a singularity onto the Hypercube computer.

  5. Ring-array processor distribution topology for optical interconnects.

    PubMed

    Li, Y; Ha, B; Wang, T; Wang, S; Katz, A; Lu, X J; Kanterakis, E

    1992-09-10

    The existing linear and rectangular processor distribution topologies for optical interconnects, although promising in many respects, cannot solve problems such as clock skews, the lack of supporting elements for efficient optical implementation, etc. The use of a ring-array processor distribution topology, however, can overcome these problems. Here, a study of the ring-array topology is conducted with an aim of implementing various fast clock rate, high-performance, compact optical networks for digital electronic multiprocessor computers. Practical design issues are addressed. Some proof-of-principle experimental results are included. PMID:20733739

  6. Ring-array processor distribution topology for optical interconnects

    NASA Technical Reports Server (NTRS)

    Li, Yao; Ha, Berlin; Wang, Ting; Wang, Sunyu; Katz, A.; Lu, X. J.; Kanterakis, E.

    1992-01-01

    The existing linear and rectangular processor distribution topologies for optical interconnects, although promising in many respects, cannot solve problems such as clock skews, the lack of supporting elements for efficient optical implementation, etc. The use of a ring-array processor distribution topology, however, can overcome these problems. Here, a study of the ring-array topology is conducted with an aim of implementing various fast clock rate, high-performance, compact optical networks for digital electronic multiprocessor computers. Practical design issues are addressed. Some proof-of-principle experimental results are included.

  7. The language parallel Pascal and other aspects of the massively parallel processor

    NASA Technical Reports Server (NTRS)

    Reeves, A. P.; Bruner, J. D.

    1982-01-01

    A high level language for the Massively Parallel Processor (MPP) was designed. This language, called Parallel Pascal, is described in detail. A description of the language design, a description of the intermediate language, Parallel P-Code, and details for the MPP implementation are included. Formal descriptions of Parallel Pascal and Parallel P-Code are given. A compiler was developed which converts programs in Parallel Pascal into the intermediate Parallel P-Code language. The code generator to complete the compiler for the MPP is being developed independently. A Parallel Pascal to Pascal translator was also developed. The architecture design for a VLSI version of the MPP was completed with a description of fault tolerant interconnection networks. The memory arrangement aspects of the MPP are discussed and a survey of other high level languages is given.

  8. Parallel processors and nonlinear structural dynamics algorithms and software

    NASA Technical Reports Server (NTRS)

    Belytschko, Ted

    1990-01-01

    Techniques are discussed for the implementation and improvement of vectorization and concurrency in nonlinear explicit structural finite element codes. In explicit integration methods, the computation of the element internal force vector consumes the bulk of the computer time. The program can be efficiently vectorized by subdividing the elements into blocks and executing all computations in vector mode. The structuring of elements into blocks also provides a convenient way to implement concurrency by creating tasks which can be assigned to available processors for evaluation. The techniques were implemented in a 3-D nonlinear program with one-point quadrature shell elements. Concurrency and vectorization were first implemented in a single time step version of the program. Techniques were developed to minimize processor idle time and to select the optimal vector length. A comparison of run times between the program executed in scalar, serial mode and the fully vectorized code executed concurrently using eight processors shows speed-ups of over 25. Conjugate gradient methods for solving nonlinear algebraic equations are also readily adapted to a parallel environment. A new technique for improving convergence properties of conjugate gradients in nonlinear problems is developed in conjunction with other techniques such as diagonal scaling. A significant reduction in the number of iterations required for convergence is shown for a statically loaded rigid bar suspended by three equally spaced springs.

  9. Particle simulation of plasmas on the massively parallel processor

    NASA Technical Reports Server (NTRS)

    Gledhill, I. M. A.; Storey, L. R. O.

    1987-01-01

    Particle simulations, in which collective phenomena in plasmas are studied by following the self consistent motions of many discrete particles, involve several highly repetitive sets of calculations that are readily adaptable to SIMD parallel processing. A fully electromagnetic, relativistic plasma simulation for the massively parallel processor is described. The particle motions are followed in 2 1/2 dimensions on a 128 x 128 grid, with periodic boundary conditions. The two dimensional simulation space is mapped directly onto the processor network; a Fast Fourier Transform is used to solve the field equations. Particle data are stored according to an Eulerian scheme, i.e., the information associated with each particle is moved from one local memory to another as the particle moves across the spatial grid. The method is applied to the study of the nonlinear development of the whistler instability in a magnetospheric plasma model, with an anisotropic electron temperature. The wave distribution function is included as a new diagnostic to allow simulation results to be compared with satellite observations.

  10. Frequency-multiplexed and pipelined iterative optical systolic array processors

    NASA Technical Reports Server (NTRS)

    Casasent, D.; Jackson, J.; Neuman, C.

    1983-01-01

    Optical matrix processors using acoustooptic transducers are described, with emphasis on new systolic array architectures using frequency multiplexing in addition to space and time multiplexing. A Kalman filtering application is considered in a case study from which the operations required on such a system can be defined. This also serves as a new and powerful application for iterative optical processors. The importance of pipelining the data flow and the ordering of the operations performed in a specific application of such a system are also noted. Several examples of how to effectively achieve this are included. A new technique for handling bipolar data on such architectures is also described.

  11. A taxonomy of reconfiguration techniques for fault-tolerant processor arrays--

    SciTech Connect

    Chean, M. ); Fortes, J.A.B. )

    1990-01-01

    The authors overview, characterize, and classify some typical reconfiguration schemes in light of a proposed taxonomy. This taxonomy can be used as a guide for future research in design and analysis of reconfiguration schemes. Studying how to evaluate fault-tolerant arrays and how to exploit application characteristics to achieve dependable computing are important complementary directions of research towards reliable processor-array design. A related research problem is that of functional reconfiguration, that is, learning how to configure the topology of a parallel system to implement a different function or run a different application. Important directions of research include how to apply or extend processor-array reconfiguration algorithms to other topologies and how to marry functional and fault-tolerance reconfiguration requirements and solutions. The Diogenes approach discussed in this article is a case where this goal is naturally achieved.

  12. Implementation of SAR interferometric map generation using parallel processors

    SciTech Connect

    Doren, N.; Wahl, D.E.

    1998-07-01

    Interferometric fringe maps are generated by accurately registering a pair of complex SAR images of the same scene imaged from two very similar geometries, and calculating the phase difference between the two images by averaging over a neighborhood of pixels at each spatial location. The phase difference (fringe) map resulting from this IFSAR operation is then unwrapped and used to calculate the height estimate of the imaged terrain. Although the method used to calculate interferometric fringe maps is well known, it is generally executed in a post-processing mode well after the image pairs have been collected. In that mode of operation, there is little concern about algorithm speed and the method is normally implemented on a single processor machine. This paper describes how the interferometric map generation is implemented on a distributed-memory parallel processing machine. This particular implementation is designed to operate on a 16 node Power-PC platform and to generate interferometric maps in near real-time. The implementation is able to accommodate large translational offsets, along with a slight amount of rotation which may exist between the interferometric pair of images. If the number of pixels in the IFSAR image is large enough, the implementation accomplishes nearly linear speed-up times with the addition of processors.

  13. An informal introduction to program transformation and parallel processors

    SciTech Connect

    Hopkins, K.W.

    1994-08-01

    In the summer of 1992, I had the opportunity to participate in a Faculty Research Program at Argonne National Laboratory. I worked under Dr. Jim Boyle on a project transforming code written in pure functional Lisp to Fortran code to run on distributed-memory parallel processors. To perform this project, I had to learn three things: the transformation system, the basics of distributed-memory parallel machines, and the Lisp programming language. Each of these topics in computer science was unfamiliar to me as a mathematician, but I found that they (especially parallel processing) are greatly impacting many fields of mathematics and science. Since most mathematicians have some exposure to computers, but.certainly are not computer scientists, I felt it was appropriate to write a paper summarizing my introduction to these areas and how they can fit together. This paper is not meant to be a full explanation of the topics, but an informal introduction for the ``mathematical layman.`` I place myself in that category as well as my previous use of computers was as a classroom demonstration tool.

  14. On program restructuring, scheduling, and communication for parallel processor systems

    SciTech Connect

    Polychronopoulos, Constantine D.

    1986-08-01

    This dissertation discusses several software and hardware aspects of program execution on large-scale, high-performance parallel processor systems. The issues covered are program restructuring, partitioning, scheduling and interprocessor communication, synchronization, and hardware design issues of specialized units. All this work was performed focusing on a single goal: to maximize program speedup, or equivalently, to minimize parallel execution time. Parafrase, a Fortran restructuring compiler was used to transform programs in a parallel form and conduct experiments. Two new program restructuring techniques are presented, loop coalescing and subscript blocking. Compile-time and run-time scheduling schemes are covered extensively. Depending on the program construct, these algorithms generate optimal or near-optimal schedules. For the case of arbitrarily nested hybrid loops, two optimal scheduling algorithms for dynamic and static scheduling are presented. Simulation results are given for a new dynamic scheduling algorithm. The performance of this algorithm is compared to that of self-scheduling. Techniques for program partitioning and minimization of interprocessor communication for idealized program models and for real Fortran programs are also discussed. The close relationship between scheduling, interprocessor communication, and synchronization becomes apparent at several points in this work. Finally, the impact of various types of overhead on program speedup and experimental results are presented. 69 refs., 74 figs., 14 tabs.

  15. VLSI array processor R&D status report

    NASA Astrophysics Data System (ADS)

    Greenwood, E.

    1982-01-01

    Detail design of the Arithmetic Processor Unit (APU) chip has been completed. All cell types (100) have been run through the design rule check (DRC) programs, corrected and verified. DRC runs on the entire chip have been run and all corrections have been made. Fifteen out of eighteen of the chip DRC corrections have been verified. The metal, polysilicon and information data layers of the APU layout is shown. The attached drawings, titled 'VLSI Array Processor Arithmetic Processor Unit Chip Plan' is a detail drawing of the APU Chip Plan. The functional level simulator of the APU has been built and verified using a set of APU diagnostic code. A gate level logic simulation of the APU has been built. The APU breadboard modules have been fabricated and check out has been initiated. The Array Processor Demonstration System (APDS) modules are in the wire-wrap process. The APDS and APU microcode assembler have been built and checked out. The linker and loader for the APDS have also been built.

  16. Optimal evaluation of array expressions on massively parallel machines

    NASA Technical Reports Server (NTRS)

    Chatterjee, Siddhartha; Gilbert, John R.; Schreiber, Robert; Teng, Shang-Hua

    1992-01-01

    We investigate the problem of evaluating FORTRAN 90 style array expressions on massively parallel distributed-memory machines. On such machines, an elementwise operation can be performed in constant time for arrays whose corresponding elements are in the same processor. If the arrays are not aligned in this manner, the cost of aligning them is part of the cost of evaluating the expression. The choice of where to perform the operation then affects this cost. We present algorithms based on dynamic programming to solve this problem efficiently for a wide variety of interconnection schemes, including multidimensional grids and rings, hypercubes, and fat-trees. We also consider expressions containing operations that change the shape of the arrays, and show that our approach extends naturally to handle this case.

  17. Holographic optical backplane hardware implementation for parallel and distributed processors

    NASA Astrophysics Data System (ADS)

    Kim, Richard C.; Lin, Freddie S.

    1991-09-01

    A working model of an optical backplane has been built to demonstrate the feasibility of incorporating free space, multifaceted and angularly-multiplexed, holographic interconnect technology to enhance the electronic processing architecture. This new design will allow special configurations for parallel and distributed processing and can be made compatible with standard electrical bus connections. The current demonstrator unit contains four transceiver boards in a standard 19 in. rack-mount chassis. It can support bidirectional 125 MHz transmission per channel with a loss budget (allowable optical attenuation) of 30 dB for large fan-out (> 20 boards). Interconnection holograms have been designed to compensate for the large wavelength drift of laser diodes expected to be the result of temperature fluctuations in the processor box. The design also allows a large mechanical tolerance for board misalignment and vibration. Multiple interconnection patterns, each set representing a particular architecture, can be recorded on a single substrate to provide reconfiguration. The proposed holo-backplane can interconnect multiple transmitters and/or receivers (each could support different logic and/or signal levels) per board to realize truly flexible processing schemes.

  18. Massively parallel processor networks with optical express channels

    DOEpatents

    Deri, R.J.; Brooks, E.D. III; Haigh, R.E.; DeGroot, A.J.

    1999-08-24

    An optical method for separating and routing local and express channel data comprises interconnecting the nodes in a network with fiber optic cables. A single fiber optic cable carries both express channel traffic and local channel traffic, e.g., in a massively parallel processor (MPP) network. Express channel traffic is placed on, or filtered from, the fiber optic cable at a light frequency or a color different from that of the local channel traffic. The express channel traffic is thus placed on a light carrier that skips over the local intermediate nodes one-by-one by reflecting off of selective mirrors placed at each local node. The local-channel-traffic light carriers pass through the selective mirrors and are not reflected. A single fiber optic cable can thus be threaded throughout a three-dimensional matrix of nodes with the x,y,z directions of propagation encoded by the color of the respective light carriers for both local and express channel traffic. Thus frequency division multiple access is used to hierarchically separate the local and express channels to eliminate the bucket brigade latencies that would otherwise result if the express traffic had to hop between every local node to reach its ultimate destination. 3 figs.

  19. Massively parallel processor networks with optical express channels

    DOEpatents

    Deri, Robert J.; Brooks, III, Eugene D.; Haigh, Ronald E.; DeGroot, Anthony J.

    1999-01-01

    An optical method for separating and routing local and express channel data comprises interconnecting the nodes in a network with fiber optic cables. A single fiber optic cable carries both express channel traffic and local channel traffic, e.g., in a massively parallel processor (MPP) network. Express channel traffic is placed on, or filtered from, the fiber optic cable at a light frequency or a color different from that of the local channel traffic. The express channel traffic is thus placed on a light carrier that skips over the local intermediate nodes one-by-one by reflecting off of selective mirrors placed at each local node. The local-channel-traffic light carriers pass through the selective mirrors and are not reflected. A single fiber optic cable can thus be threaded throughout a three-dimensional matrix of nodes with the x,y,z directions of propagation encoded by the color of the respective light carriers for both local and express channel traffic. Thus frequency division multiple access is used to hierarchically separate the local and express channels to eliminate the bucket brigade latencies that would otherwise result if the express traffic had to hop between every local node to reach its ultimate destination.

  20. On nonlinear finite element analysis in single-, multi- and parallel-processors

    NASA Technical Reports Server (NTRS)

    Utku, S.; Melosh, R.; Islam, M.; Salama, M.

    1982-01-01

    Numerical solution of nonlinear equilibrium problems of structures by means of Newton-Raphson type iterations is reviewed. Each step of the iteration is shown to correspond to the solution of a linear problem, therefore the feasibility of the finite element method for nonlinear analysis is established. Organization and flow of data for various types of digital computers, such as single-processor/single-level memory, single-processor/two-level-memory, vector-processor/two-level-memory, and parallel-processors, with and without sub-structuring (i.e. partitioning) are given. The effect of the relative costs of computation, memory and data transfer on substructuring is shown. The idea of assigning comparable size substructures to parallel processors is exploited. Under Cholesky type factorization schemes, the efficiency of parallel processing is shown to decrease due to the occasional shared data, just as that due to the shared facilities.

  1. Serial multiplier arrays for parallel computation

    NASA Technical Reports Server (NTRS)

    Winters, Kel

    1990-01-01

    Arrays of systolic serial-parallel multiplier elements are proposed as an alternative to conventional SIMD mesh serial adder arrays for applications that are multiplication intensive and require few stored operands. The design and operation of a number of multiplier and array configurations featuring locality of connection, modularity, and regularity of structure are discussed. A design methodology combining top-down and bottom-up techniques is described to facilitate development of custom high-performance CMOS multiplier element arrays as well as rapid synthesis of simulation models and semicustom prototype CMOS components. Finally, a differential version of NORA dynamic circuits requiring a single-phase uncomplemented clock signal introduced for this application.

  2. Method for searching a database system including parallel processors

    SciTech Connect

    Kahle, B.; Stanfill, C.W.

    1989-09-26

    This patent describes a process for searching for relevant documents in a database. It comprises forming a database by storing for each of a plurality of documents at least one table of hash codes representing words in the document, the table(s) that represent the words in each different document being stored in a different digital data processor. Each hash code comprising information at a plurality of bit locations; forming a query having at least one word and a point value of relevance assigned to each word; testing if the word in the query is in the database; adding at each digital data processor the point value associated with the queried word to a total point value for the document if the hash code is found at all the bit locations corresponding to the queried word that are tested in that processor; and providing identification of those documents in the database with high total point values.

  3. Some parallel algorithms on the four processor Cray X-MP4 supercomputer

    SciTech Connect

    Kincaid, D.R.; Oppe, T.C.

    1988-05-01

    Three numerical studies of parallel algorithms on a four processor Cray X-MP4 supercomputer are presented. These numerical experiments involve the following: a parallel version of ITPACKV 2C, a package for solving large sparse linear systems, a parallel version of the conjugate gradient method with line Jacobi preconditioning, and several parallel algorithms for computing the LU-factorization of dense matrices. 27 refs., 4 tabs.

  4. Smart-Pixel Array Processors Based on Optimal Cellular Neural Networks for Space Sensor Applications

    NASA Technical Reports Server (NTRS)

    Fang, Wai-Chi; Sheu, Bing J.; Venus, Holger; Sandau, Rainer

    1997-01-01

    A smart-pixel cellular neural network (CNN) with hardware annealing capability, digitally programmable synaptic weights, and multisensor parallel interface has been under development for advanced space sensor applications. The smart-pixel CNN architecture is a programmable multi-dimensional array of optoelectronic neurons which are locally connected with their local neurons and associated active-pixel sensors. Integration of the neuroprocessor in each processor node of a scalable multiprocessor system offers orders-of-magnitude computing performance enhancements for on-board real-time intelligent multisensor processing and control tasks of advanced small satellites. The smart-pixel CNN operation theory, architecture, design and implementation, and system applications are investigated in detail. The VLSI (Very Large Scale Integration) implementation feasibility was illustrated by a prototype smart-pixel 5x5 neuroprocessor array chip of active dimensions 1380 micron x 746 micron in a 2-micron CMOS technology.

  5. Periodic Application of Concurrent Error Detection in Processor Array Architectures. PhD. Thesis -

    NASA Technical Reports Server (NTRS)

    Chen, Paul Peichuan

    1993-01-01

    Processor arrays can provide an attractive architecture for some applications. Featuring modularity, regular interconnection and high parallelism, such arrays are well-suited for VLSI/WSI implementations, and applications with high computational requirements, such as real-time signal processing. Preserving the integrity of results can be of paramount importance for certain applications. In these cases, fault tolerance should be used to ensure reliable delivery of a system's service. One aspect of fault tolerance is the detection of errors caused by faults. Concurrent error detection (CED) techniques offer the advantage that transient and intermittent faults may be detected with greater probability than with off-line diagnostic tests. Applying time-redundant CED techniques can reduce hardware redundancy costs. However, most time-redundant CED techniques degrade a system's performance.

  6. HVDC control system based on parallel digital signal processors

    SciTech Connect

    Maharsi, Y.; Do, V.Q.; Sood, V.K.; Casoria, S.; Belanger, J.

    1995-05-01

    A numerical HVDC control system operating in real time has been developed for a simulator to be used for operator training. The control system, implemented with digital signal processors (DSPs), consists of typical HVDC control functions such as the synchronizing unit, the regulation unit, the protection unit, the firing unit, the tap changer and the reactive power regulation unit. Results from the steady-state and the transient performance validation tests carried out on the IREQ power system simulator are provided.

  7. Using algebra for massively parallel processor design and utilization

    NASA Technical Reports Server (NTRS)

    Campbell, Lowell; Fellows, Michael R.

    1990-01-01

    This paper summarizes the author's advances in the design of dense processor networks. Within is reported a collection of recent constructions of dense symmetric networks that provide the largest know values for the number of nodes that can be placed in a network of a given degree and diameter. The constructions are in the range of current potential engineering significance and are based on groups of automorphisms of finite-dimensional vector spaces.

  8. A garbage collection algorithm for shared memory parallel processors

    SciTech Connect

    Crammond, J. )

    1988-12-01

    This paper describes a technique for adapting the Morris sliding garbage collection algorithm to execute on parallel machines with shared memory. The algorithm is described within the framework of an implementation of the parallel logic language Parlog. However, the algorithm is a general one and can easily be adapted to parallel Prolog systems and to other languages. The performance of the algorithm executing a few simple Parlog benchmarks is analyzed. Finally, it is shown how the technique for parallelizing the sequential algorithm can be adapted for a semi-space copying algorithm.

  9. Parallel processors and nonlinear structural dynamics algorithms and software

    NASA Technical Reports Server (NTRS)

    Belytschko, T.

    1986-01-01

    A nonlinear structural dynamics program with an element library that exploits parallel processing is under development. The aim is to exploit scheduling-allocation so that parallel processing and vectorization can effectively be treated in a general purpose program. As a byproduct an automatic scheme for assigning time steps was devised. A rudimentary form of the program is complete and has been tested; it shows substantial advantage can be taken of parallelism. In addition, a stability proof for the subcycling algorithm has been developed.

  10. Parallel processors and nonlinear structural dynamics algorithms and software

    NASA Technical Reports Server (NTRS)

    Belytschko, Ted

    1989-01-01

    A nonlinear structural dynamics finite element program was developed to run on a shared memory multiprocessor with pipeline processors. The program, WHAMS, was used as a framework for this work. The program employs explicit time integration and has the capability to handle both the nonlinear material behavior and large displacement response of 3-D structures. The elasto-plastic material model uses an isotropic strain hardening law which is input as a piecewise linear function. Geometric nonlinearities are handled by a corotational formulation in which a coordinate system is embedded at the integration point of each element. Currently, the program has an element library consisting of a beam element based on Euler-Bernoulli theory and trianglar and quadrilateral plate element based on Mindlin theory.

  11. Evaluation of fault-tolerant parallel-processor architectures over long space missions

    NASA Technical Reports Server (NTRS)

    Johnson, Sally C.

    1989-01-01

    The impact of a five year space mission environment on fault-tolerant parallel processor architectures is examined. The target application is a Strategic Defense Initiative (SDI) satellite requiring 256 parallel processors to provide the computation throughput. The reliability requirements are that the system still be operational after five years with .99 probability and that the probability of system failure during one-half hour of full operation be less than 10(-7). The fault tolerance features an architecture must possess to meet these reliability requirements are presented, many potential architectures are briefly evaluated, and one candidate architecture, the Charles Stark Draper Laboratory's Fault-Tolerant Parallel Processor (FTPP) is evaluated in detail. A methodology for designing a preliminary system configuration to meet the reliability and performance requirements of the mission is then presented and demonstrated by designing an FTPP configuration.

  12. Highly parallel reconfigurable computer architecture for robotic computation having plural processor cells each having right and left ensembles of plural processors

    NASA Technical Reports Server (NTRS)

    Fijany, Amir (inventor); Bejczy, Antal K. (inventor)

    1994-01-01

    In a computer having a large number of single-instruction multiple data (SIMD) processors, each of the SIMD processors has two sets of three individual processor elements controlled by a master control unit and interconnected among a plurality of register file units where data is stored. The register files input and output data in synchronism with a minor cycle clock under control of two slave control units controlling the register file units connected to respective ones of the two sets of processor elements. Depending upon which ones of the register file units are enabled to store or transmit data during a particular minor clock cycle, the processor elements within an SIMD processor are connected in rings or in pipeline arrays, and may exchange data with the internal bus or with neighboring SIMD processors through interface units controlled by respective ones of the two slave control units.

  13. Design of an optical content-addressable parallel processor for expert systems

    NASA Astrophysics Data System (ADS)

    Louri, Ahmed; Na, Jongwhoa

    1995-08-01

    The slow execution speed of current rule-based systems (RBS's) has restricted their application areas. To improve the speed of RBS's, researchers have proposed various electronic multiprocessor systems as well as optical systems. However, the electronic systems still suffer in performance from the large amount of required time-consuming pattern-matching and comparison operations at the core of RBS's. And optical systems do not fully exploit the available parallelism in RBS's. We propose an optical content-addressable parallel processor for expert systems. The processor executes the three basic RBS operations, match, select, and act, in a highly parallel fashion. Additionally, it extracts and exploits all possible parallelism in a RBS. Distinctive features of the proposed system include the data (knowledge) and control information to exploit the parallelism of optics in the three RBS units; (2) capability of processing general-domain knowledge expressed in terms of variables, numbers, symbols, and comparison operators such as greater than and less than; (3) the parallel optical match unit, which performs the two-dimensional optical pattern matching and comparison operations; (4) a novel conflict-resolution algorithm to resolve conflicts in a single step within the optical select unit. The three units and the general-knowledge representation scheme are designed to make the optical content-addressable parallel processor for expert systems suitable for any high-speed general-purpose RBS.

  14. On fault-tolerant structure, distributed fault-diagnosis, reconfiguration, and recovery of the array processors

    SciTech Connect

    Hosseini, S.H.

    1989-07-01

    The increasing need for the design of high-performance computers has led to the design of special purpose computers such as array processors. This paper studies the design of fault-tolerant array processors. First, it is shown how hardware redundancy can be employed in the existing structures in order to make them capable of withstanding the failure of some of the array links and processors. Then distributed fault-tolerance schemes are introduced for the diagnosis of the faulty elements, reconfiguration, and recovery of the array. Fault tolerance is maintained by the cooperation of processors in a decentralized form of control without the participation of any type of hardcore or fault-free central controller such as a host computer.

  15. Fast Fourier Transform Algorithm For Two-Dimensional Array Of Processors

    NASA Astrophysics Data System (ADS)

    Przytula, K. Wojtek; Nash, J. Greg; Hansen, Siegfried

    1988-01-01

    The paper discusses mapping of a Fast Fourier Transform (FFT), Haar Transform and Hadamard Transform algorithms onto a small, two-dimensional, mesh-connected array of processors. The FFT algorithm is an in-place, decimation in frequency, Cooley-Tuckey algorithm in radix 2 and radix 4 versions applied to multidimensional, complex inputs. The data flow of the algorithms has been implemented on the array using an efficient, regular data transfer pattern, uniform for all the algorithms. The inputs and constants used in the algorithms are prestored in the local memories of the processors. The mapping makes it possible to reduce significantly the number of memory locations needed for the constants. A partitioning scheme has been developed for the algorithms which allows us to execute them with inputs of arbitrary size on a small processor array. Also an algorithm has been proposed for the processor array, which efficiently unscrambles the bit reversed output of the FFT algorithm. The processors of the array have East, West, North, South interconnections with their nearest neighbors. The local memory of the processors is small, on the order of hundreds of locations. The processors are controlled in Single Instruction Multiple Data Stream (SIMD) mode and can be selectively disabled using simple masks, consisting of combinations of rows or columns.

  16. Parallel processors and nonlinear structural dynamics algorithms and software

    NASA Technical Reports Server (NTRS)

    Belytschko, Ted; Gilbertsen, Noreen D.; Neal, Mark O.; Plaskacz, Edward J.

    1989-01-01

    The adaptation of a finite element program with explicit time integration to a massively parallel SIMD (single instruction multiple data) computer, the CONNECTION Machine is described. The adaptation required the development of a new algorithm, called the exchange algorithm, in which all nodal variables are allocated to the element with an exchange of nodal forces at each time step. The architectural and C* programming language features of the CONNECTION Machine are also summarized. Various alternate data structures and associated algorithms for nonlinear finite element analysis are discussed and compared. Results are presented which demonstrate that the CONNECTION Machine is capable of outperforming the CRAY XMP/14.

  17. Modular high-temperature gas-cooled reactor simulation using parallel processors

    SciTech Connect

    Ball, S.J.; Conklin, J.C.

    1989-01-01

    The MHPP (Modular HTGR Parallel Processor) code has been developed to simulate modular high-temperature gas-cooled reactor (MHTGR) transients and accidents. MHPP incorporates a very detailed model for predicting the dynamics of the reactor core, vessel, and cooling systems over a wide variety of scenarios ranging from expected transients to very-low-probability severe accidents. The simulation routines, which had originally been developed entirely as serial code, were readily adapted to parallel processing Fortran. The resulting parallelized simulation speed was enhanced significantly. Workstation interfaces are being developed to provide for user (''operator'') interaction. The benefits realized by adapting previous MHTGR codes to run on a parallel processor are discussed, along with results of typical accident analyses. 3 refs., 3 figs.

  18. Parallel calculation of multi-electrode array correlation networks.

    PubMed

    Ribeiro, Pedro; Simonotto, Jennifer; Kaiser, Marcus; Silva, Fernando

    2009-11-15

    When calculating correlation networks from multi-electrode array (MEA) data, one works with extensive computations. Unfortunately, as the MEAs grow bigger, the time needed for the computation grows even more: calculating pair-wise correlations for current 60 channel systems can take hours on normal commodity computers whereas for future 1000 channel systems it would take almost 280 times as long, given that the number of pairs increases with the square of the number of channels. Even taking into account the increase of speed in processors, soon it can be unfeasible to compute correlations in a single computer. Parallel computing is a way to sustain reasonable calculation times in the future. We provide a general tool for rapid computation of correlation networks which was tested for: (a) a single computer cluster with 16 cores, (b) the Newcastle Condor System utilizing idle processors of university computers and (c) the inter-cluster, with 192 cores. Our reusable tool provides a simple interface for neuroscientists, automating data partition and job submission, and also allowing coding in any programming language. It is also sufficiently flexible to be used in other high-performance computing environments. PMID:19666054

  19. Coupled cluster algorithms for networks of shared memory parallel processors

    NASA Astrophysics Data System (ADS)

    Bentz, Jonathan L.; Olson, Ryan M.; Gordon, Mark S.; Schmidt, Michael W.; Kendall, Ricky A.

    2007-05-01

    As the popularity of using SMP systems as the building blocks for high performance supercomputers increases, so too increases the need for applications that can utilize the multiple levels of parallelism available in clusters of SMPs. This paper presents a dual-layer distributed algorithm, using both shared-memory and distributed-memory techniques to parallelize a very important algorithm (often called the "gold standard") used in computational chemistry, the single and double excitation coupled cluster method with perturbative triples, i.e. CCSD(T). The algorithm is presented within the framework of the GAMESS [M.W. Schmidt, K.K. Baldridge, J.A. Boatz, S.T. Elbert, M.S. Gordon, J.J. Jensen, S. Koseki, N. Matsunaga, K.A. Nguyen, S. Su, T.L. Windus, M. Dupuis, J.A. Montgomery, General atomic and molecular electronic structure system, J. Comput. Chem. 14 (1993) 1347-1363]. (General Atomic and Molecular Electronic Structure System) program suite and the Distributed Data Interface [M.W. Schmidt, G.D. Fletcher, B.M. Bode, M.S. Gordon, The distributed data interface in GAMESS, Comput. Phys. Comm. 128 (2000) 190]. (DDI), however, the essential features of the algorithm (data distribution, load-balancing and communication overhead) can be applied to more general computational problems. Timing and performance data for our dual-level algorithm is presented on several large-scale clusters of SMPs.

  20. Dynamic barrier architecture for multi-mode fine-grain parallelism using conventional processors

    SciTech Connect

    Cohen, W.E.; Dietz, H.G.; Sponaugle, J.B.

    1994-12-31

    Parallel computers constructed using conventional processors offer the potential to achieve large improvements in execution speed at reasonable cost, however, these machines tend to efficiently implement only coarse-grain MIMD parallelism. To achieve the best possible speedup through parallel execution, a computer must be capable of effectively using all the different types of parallelism that exist in each program. A combination of SIMD, VLIW, and MIMD parallelism, at a variety of granularity levels, exists in most applications; thus, hardware that can support multiple types of parallelism can achieve better performance with a wider range of codes. In this paper, we introduce a new hardware barrier architecture that provides the full DBM functionality we discussed, but can be implemented with much simpler hardware. This mechanism can be used to efficiently support multi-mode moderate-width parallelism with instruction-level granularity (i.e., synchronization cost is approximately one LOAD instruction).

  1. Real-time tracking with a 3D-Flow processor array

    SciTech Connect

    Crosetto, D.

    1993-06-01

    The problem of real-time track-finding has been performed to date with CAM (Content Addressable Memories) or with fast coincidence logic, because the processing scheme was thought to have much slower performance. Advances in technology together with a new architectural approach make it feasible to also explore the computing technique for real-time track finding thus giving the advantages of implementing algorithms that can find more parameters such as calculate the sagitta, curvature, pt, etc., with respect to the CAM approach. The report describes real-time track finding using new computing approach technique based on the 3D-Flow array processor system. This system consists of a fixed interconnection architecture scheme, allowing flexible algorithm implementation on a scalable platform. The 3D-Flow parallel processing system for track finding is scalable in size and performance by either increasing the number of processors, or increasing the speed or else the number of pipelined stages. The present article describes the conceptual idea and the design stage of the project.

  2. Preliminary study on the potential usefulness of array processor techniques for structural synthesis

    NASA Technical Reports Server (NTRS)

    Feeser, L. J.

    1980-01-01

    The effects of the use of array processor techniques within the structural analyzer program, SPAR, are simulated in order to evaluate the potential analysis speedups which may result. In particular the connection of a Floating Point System AP120 processor to the PRIME computer is discussed. Measurements of execution, input/output, and data transfer times are given. Using these data estimates are made as to the relative speedups that can be executed in a more complete implementation on an array processor maxi-mini computer system.

  3. A parallel encryption algorithm for dual-core processor based on chaotic map

    NASA Astrophysics Data System (ADS)

    Liu, Jiahui; Song, Dahua; Xu, Yiqiu

    2011-12-01

    In this paper, we propose a parallel chaos-based encryption scheme in order to take advantage of the dual-core processor. The chaos-based cryptosystem is combinatorially generated by the logistic map and Fibonacci sequence. Fibonacci sequence is employed to convert the value of the logistic map to integer data. The parallel algorithm is designed with a master/slave communication model with the Message Passing Interface (MPI). The experimental results show that chaotic cryptosystem possesses good statistical properties, and the parallel algorithm provides more enhanced performance against the serial version of the algorithm. It is suitable for encryption/decryption large sensitive data or multimedia.

  4. Reduction of solar vector magnetograph data using a microMSP array processor

    NASA Technical Reports Server (NTRS)

    Kineke, Jack

    1990-01-01

    The processing of raw data obtained by the solar vector magnetograph at NASA-Marshall requires extensive arithmetic operations on large arrays of real numbers. The objectives of this summer faculty fellowship study are to: (1) learn the programming language of the MicroMSP Array Processor and adapt some existing data reduction routines to exploit its capabilities; and (2) identify other applications and/or existing programs which lend themselves to array processor utilization which can be developed by undergraduate student programmers under the provisions of project JOVE.

  5. Digital signal array processor for NSLS booster power supply upgrade

    SciTech Connect

    Olsen, R.; Dabrowski, J.; Murray, J.

    1993-07-01

    The booster at the NSLS is being upgraded from 0.75 to 2 pulses per second. To accomplish this, new power supplied for the dipole, quadrupole, and sextupole have been installed. This paper will outline the design and function of the digital signal processor used as the primary control element in the power supply control system.

  6. Aligning parallel arrays to reduce communication

    NASA Technical Reports Server (NTRS)

    Sheffler, Thomas J.; Schreiber, Robert; Gilbert, John R.; Chatterjee, Siddhartha

    1994-01-01

    Axis and stride alignment is an important optimization in compiling data-parallel programs for distributed-memory machines. We previously developed an optimal algorithm for aligning array expressions. Here, we examine alignment for more general program graphs. We show that optimal alignment is NP-complete in this setting, so we study heuristic methods. This paper makes two contributions. First, we show how local graph transformations can reduce the size of the problem significantly without changing the best solution. This allows more complex and effective heuristics to be used. Second, we give a heuristic that can explore the space of possible solutions in a number of ways. We show that some of these strategies can give better solutions than a simple greedy approach proposed earlier. Our algorithms have been implemented; we present experimental results showing their effect on the performance of some example programs running on the CM-5.

  7. Array distribution in data-parallel programs

    NASA Technical Reports Server (NTRS)

    Chatterjee, Siddhartha; Gilbert, John R.; Schreiber, Robert; Sheffler, Thomas J.

    1994-01-01

    We consider distribution at compile time of the array data in a distributed-memory implementation of a data-parallel program written in a language like Fortran 90. We allow dynamic redistribution of data and define a heuristic algorithmic framework that chooses distribution parameters to minimize an estimate of program completion time. We represent the program as an alignment-distribution graph. We propose a divide-and-conquer algorithm for distribution that initially assigns a common distribution to each node of the graph and successively refines this assignment, taking computation, realignment, and redistribution costs into account. We explain how to estimate the effect of distribution on computation cost and how to choose a candidate set of distributions. We present the results of an implementation of our algorithms on several test problems.

  8. High-speed Systolic Array Processor (HISSAP) system development synopsis: Lesson learned. Final report, Oct 83-Oct 90

    SciTech Connect

    Loughlin, J.P.

    1991-05-01

    This report documents the design rationale of the High Speed Systolic Array Processor (HiSSAP) testbed. In addition to reviewing general parallel processing topics, the impact of the HiSSAP testbed architecture on the top level design of the diagnostic and software mapping tools is described. Based on the experience gained in the mapping of matrix-based algorithms on the testbed hardware, specific recommendations are presented in the form of lessons learned, which are intended to offer guidance in the development of future Navy signal processing systems.

  9. Parallel microfluidic arrays for SPRi detection

    NASA Astrophysics Data System (ADS)

    Ouellet, Eric; Lausted, Christopher; Hood, Leroy; Lagally, Eric T.

    2008-08-01

    Surface Plasmon Resonance imaging (SPRi) is a label-free technique for the quantitation of binding affinities and concentrations for a wide variety of target molecules. Although SPRi is capable of determining binding constants for multiple ligands in parallel, current commercial instruments are limited to a single analyte stream and a limited number of ligand spots. Measurement of target concentration also requires the serial introduction of different target concentrations; such repeated experiments are conducted manually and are therefore time-intensive. Likewise, the equilibrium determination of concentration for known binding affinity requires long times due to diffusion-limited kinetics to a surface-immobilized ligand. We have developed an integrated microfluidic array using soft lithography techniques for SPRi-based detection and determination of binding affinities for DNA aptamers against human alphathrombin. The device consists of 264 element-addressable chambers isolated by microvalves. The resulting 700 pL volumes surrounding each ligand spot promise to decrease measurement time through reaction rate-limited kinetics. The device also contains a dilution network for simultaneous interrogation of up to six different target concentrations, further speeding detection times. Finally, the element-addressable design of the array allows interrogation of multiple ligands against multiple targets.

  10. Parallel microfluidic arrays for SPRi detection

    NASA Astrophysics Data System (ADS)

    Ouellet, Eric; Lausted, Christopher; Lin, Tao; Yang, Cheng-Wei; Hood, Leroy; Lagally, Eric T.

    2010-04-01

    Surface Plasmon Resonance imaging (SPRi) is a label-free technique for the quantitation of binding affinities and concentrations for a wide variety of target molecules. Although SPRi is capable of determining binding constants for multiple ligands in parallel, current commercial instruments are limited to a single analyte stream and a limited number of ligand spots. Measurement of target concentration also requires the serial introduction of different target concentrations; such repeated experiments are conducted manually and are therefore time-intensive. Likewise, the equilibrium determination of concentration for known binding affinity requires long times due to diffusion-limited kinetics to a surface-immobilized ligand. We have developed an integrated microfluidic array using soft lithography techniques for SPRi-based detection and determination of binding affinities for DNA aptamers against human alphathrombin. The device consists of 264 element-addressable chambers of 700 pL each isolated by microvalves. The device also contains a dilution network for simultaneous interrogation of up to six different target concentrations, further speeding detection times. The element-addressable design of the array allows interrogation of multiple ligands against multiple targets, and analytes from individual chambers may be collected for downstream analysis.

  11. On job assignment for a parallel system of processor sharing queues

    SciTech Connect

    Bonomi, F. )

    1990-07-01

    Interest in the job assignment problem for parallel queues has been recently stimulated by research in the area of load balancing in distributed systems, where one is concerned with assigning tasks or processes to processors in order to achieve optimal system performance. However, most of the studies found in the literature refer to a system of parallel queues with FCFS service discipline, while it is well known that the processors sharing (PS) service discipline is often a better model for CPU scheduling in time-shared computer systems. In this paper, the authors underline some interesting peculiarities of the assignment problem with PS queues as compared to the usual case of the FCFS systems. Also, they propose an approach to the design of assignment algorithms which, in this case, produces solutions performing better than the well-known join-the-shortest-queue (JSQ) assignment rule.

  12. Efficient exploitation of parallelism in general purpose processor-based systems for image and video processing applications

    NASA Astrophysics Data System (ADS)

    Debes, Eric; Moschetti, Fulvio

    2002-01-01

    This paper proposes a classification of the parallelisms in general-purpose processor based systems in three main categories. One category is the intra-processor parallelism that includes multimedia instructions and superscalar and VLIW architectures. The former takes advantage of data parallelism. The latter benefit from instruction level parallelism. Another category is the inter-processor parallelism. We consider the parallelism between processors inside shared memory symmetric multiprocessor systems and in distributed memory clusters of workstations. Finally, in the last category, main features of the system level parallelism are studied including the input/output operations, the memory hierarchy and the exploitation of external processing. The potential gain is studied for each type of parallelism available in general-purpose processor based systems from a theoretical point of view as well as for existing image and video applications. The results in this paper showed that the exploitation of the different levels of parallelism available in PC workstations can lead to considerable gains in speed when optimizing a multimedia application. Finally the results of this work can be used to influence the design of new multimedia systems and media processors.

  13. Construction of a parallel processor for simulating manipulators and other mechanical systems

    NASA Technical Reports Server (NTRS)

    Hannauer, George

    1991-01-01

    This report summarizes the results of NASA Contract NAS5-30905, awarded under phase 2 of the SBIR Program, for a demonstration of the feasibility of a new high-speed parallel simulation processor, called the Real-Time Accelerator (RTA). The principal goals were met, and EAI is now proceeding with phase 3: development of a commercial product. This product is scheduled for commercial introduction in the second quarter of 1992.

  14. Data flow analysis of a highly parallel processor for a level 1 pixel trigger

    SciTech Connect

    Cancelo, G.; Gottschalk, Erik Edward; Pavlicek, V.; Wang, M.; Wu, J.

    2003-01-01

    The present work describes the architecture and data flow analysis of a highly parallel processor for the Level 1 Pixel Trigger for the BTeV experiment at Fermilab. First the Level 1 Trigger system is described. Then the major components are analyzed by resorting to mathematical modeling. Also, behavioral simulations are used to confirm the models. Results from modeling and simulations are fed back into the system in order to improve the architecture, eliminate bottlenecks, allocate sufficient buffering between processes and obtain other important design parameters. An interesting feature of the current analysis is that the models can be extended to a large class of architectures and parallel systems.

  15. An Integrated Approach to Locality-Conscious Processor Allocation and Scheduling of Mixed-Parallel Applications

    SciTech Connect

    Vydyanathan, Naga; Krishnamoorthy, Sriram; Sabin, Gerald M.; Catalyurek, Umit V.; Kurc, Tahsin; Sadayappan, Ponnuswamy; Saltz, Joel H.

    2009-08-01

    Complex parallel applications can often be modeled as directed acyclic graphs of coarse-grained application-tasks with dependences. These applications exhibit both task- and data-parallelism, and combining these two (also called mixedparallelism), has been shown to be an effective model for their execution. In this paper, we present an algorithm to compute the appropriate mix of task- and data-parallelism required to minimize the parallel completion time (makespan) of these applications. In other words, our algorithm determines the set of tasks that should be run concurrently and the number of processors to be allocated to each task. The processor allocation and scheduling decisions are made in an integrated manner and are based on several factors such as the structure of the taskgraph, the runtime estimates and scalability characteristics of the tasks and the inter-task data communication volumes. A locality conscious scheduling strategy is used to improve inter-task data reuse. Evaluation through simulations and actual executions of task graphs derived from real applications as well as synthetic graphs shows that our algorithm consistently generates schedules with lower makespan as compared to CPR and CPA, two previously proposed scheduling algorithms. Our algorithm also produces schedules that have lower makespan than pure taskand data-parallel schedules. For task graphs with known optimal schedules or lower bounds on the makespan, our algorithm generates schedules that are closer to the optima than other scheduling approaches.

  16. Optical content-addressable parallel processor for high-speed database processing.

    PubMed

    Louri, A; Hatch, J A

    1994-12-10

    We extend the concept of optical content-addressable parallel processing [Appl. Opt. 31, 3241 (1992)] to a novel architecture designed specifically for the parallel and high-speed implementation of database operations called optical content-addressable parallel processor for relational database processing (OCAPPRP). An OCAPPRP combines a parallel model of computation, associative processing, with parallel and high-speed technology optics. The architecture is developed to provide optimal support for high-speed parallel equivalence (pattern matching) and relative-magnitude searches (greater than and lesser than). Distinctive features of the proposed architecture include (1) a two-dimensional match-compare unit for two-dimensional pattern matching, (2) constant-time retrieval of database entries, (3) an optical word and bit-parallel relative-magnitude single-step algorithm, and (4) the capability of constanttime sorting. Since relational database operations rely heavily on parallel equivalence or relativemagnitue searches, database processing is an excellent candidate for implementation on an OCAPPRP. The architecture delivers a speedup factor of n over conventional optical database architectures, where n is the number of rows in a database table. We present an overview of the architecture followed by its optical implementation. The representative relational database operations, intersection, and selection are outlined to illustrate the architecture's potential for efficiently supporting high-speed database processing. PMID:20963048

  17. Redundant disk arrays: Reliable, parallel secondary storage. Ph.D. Thesis

    NASA Technical Reports Server (NTRS)

    Gibson, Garth Alan

    1990-01-01

    During the past decade, advances in processor and memory technology have given rise to increases in computational performance that far outstrip increases in the performance of secondary storage technology. Coupled with emerging small-disk technology, disk arrays provide the cost, volume, and capacity of current disk subsystems, by leveraging parallelism, many times their performance. Unfortunately, arrays of small disks may have much higher failure rates than the single large disks they replace. Redundant arrays of inexpensive disks (RAID) use simple redundancy schemes to provide high data reliability. The data encoding, performance, and reliability of redundant disk arrays are investigated. Organizing redundant data into a disk array is treated as a coding problem. Among alternatives examined, codes as simple as parity are shown to effectively correct single, self-identifying disk failures.

  18. Fast structural design and analysis via hybrid domain decomposition on massively parallel processors

    NASA Technical Reports Server (NTRS)

    Farhat, Charbel

    1993-01-01

    A hybrid domain decomposition framework for static, transient and eigen finite element analyses of structural mechanics problems is presented. Its basic ingredients include physical substructuring and /or automatic mesh partitioning, mapping algorithms, 'gluing' approximations for fast design modifications and evaluations, and fast direct and preconditioned iterative solvers for local and interface subproblems. The overall methodology is illustrated with the structural design of a solar viewing payload that is scheduled to fly in March 1993. This payload has been entirely designed and validated by a group of undergraduate students at the University of Colorado using the proposed hybrid domain decomposition approach on a massively parallel processor. Performance results are reported on the CRAY Y-MP/8 and the iPSC-860/64 Touchstone systems, which represent both extreme parallel architectures. The hybrid domain decomposition methodology is shown to outperform leading solution algorithms and to exhibit an excellent parallel scalability.

  19. Interconnection arrangement of routers of processor boards in array of cabinets supporting secure physical partition

    DOEpatents

    Tomkins, James L. (Albuquerque, NM); Camp, William J. (Albuquerque, NM)

    2007-07-17

    A multiple processor computing apparatus includes a physical interconnect structure that is flexibly configurable to support selective segregation of classified and unclassified users. The physical interconnect structure includes routers in service or compute processor boards distributed in an array of cabinets connected in series on each board and to respective routers in neighboring row cabinet boards with the routers in series connection coupled to routers in series connection in respective neighboring column cabinet boards. The array can include disconnect cabinets or respective routers in all boards in each cabinet connected in a toroid. The computing apparatus can include an emulator which permits applications from the same job to be launched on processors that use different operating systems.

  20. Implementation of context independent code on a new array processor: The Super-65

    NASA Technical Reports Server (NTRS)

    Colbert, R. O.; Bowhill, S. A.

    1981-01-01

    The feasibility of rewriting standard uniprocessor programs into code which contains no context-dependent branches is explored. Context independent code (CIC) would contain no branches that might require different processing elements to branch different ways. In order to investigate the possibilities and restrictions of CIC, several programs were recoded into CIC and a four-element array processor was built. This processor (the Super-65) consisted of three 6502 microprocessors and the Apple II microcomputer. The results obtained were somewhat dependent upon the specific architecture of the Super-65 but within bounds, the throughput of the array processor was found to increase linearly with the number of processing elements (PEs). The slope of throughput versus PEs is highly dependent on the program and varied from 0.33 to 1.00 for the sample programs.

  1. Parallel implementation of RX anomaly detection on multi-core processors: impact of data partitioning strategies

    NASA Astrophysics Data System (ADS)

    Molero, Jose M.; Garzn, Ester M.; Garca, Inmaculada; Plaza, Antonio

    2011-11-01

    Anomaly detection is an important task for remotely sensed hyperspectral data exploitation. One of the most widely used and successful algorithms for anomaly detection in hyperspectral images is the Reed-Xiaoli (RX) algorithm. Despite its wide acceptance and high computational complexity when applied to real hyperspectral scenes, few documented parallel implementations of this algorithm exist, in particular for multi-core processors. The advantage of multi-core platforms over other specialized parallel architectures is that they are a low-power, inexpensive, widely available and well-known technology. A critical issue in the parallel implementation of RX is the sample covariance matrix calculation, which can be approached in global or local fashion. This aspect is crucial for the RX implementation since the consideration of a local or global strategy for the computation of the sample covariance matrix is expected to affect both the scalability of the parallel solution and the anomaly detection results. In this paper, we develop new parallel implementations of the RX in multi-core processors and specifically investigate the impact of different data partitioning strategies when parallelizing its computations. For this purpose, we consider both global and local data partitioning strategies in the spatial domain of the scene, and further analyze their scalability in different multi-core platforms. The numerical effectiveness of the considered solutions is evaluated using receiver operating characteristics (ROC) curves, analyzing their capacity to detect thermal hot spots (anomalies) in hyperspectral data collected by the NASA's Airborne Visible Infra- Red Imaging Spectrometer system over the World Trade Center in New York, five days after the terrorist attacks of September 11th, 2001.

  2. Series-parallel method of direct solar array regulation

    NASA Technical Reports Server (NTRS)

    Gooder, S. T.

    1976-01-01

    A 40 watt experimental solar array was directly regulated by shorting out appropriate combinations of series and parallel segments of a solar array. Regulation switches were employed to control the array at various set-point voltages between 25 and 40 volts. Regulation to within + or - 0.5 volt was obtained over a range of solar array temperatures and illumination levels as an active load was varied from open circuit to maximum available power. A fourfold reduction in regulation switch power dissipation was achieved with series-parallel regulation as compared to the usual series-only switching for direct solar array regulation.

  3. Software development on the High-Speed Systolic Array Processor (HISSAP): Lessons learned. Final report, Mar 88-Mar 91

    SciTech Connect

    Tirpak, F.M.

    1991-06-01

    This report documents the lessons learned in programming the Naval Ocean System Center's (NOSC's) High-Speed Systolic Array Processor (HISSAP) testbed. The procedures used for code generation, along with the programming utilities provided in the software development environment, are discussed with regard to their impact on the efficient implementation of algorithms on a parallel processing system such as HISSAP. This information is intended for considerations pertaining to software-development environments in future Navy parallel processing systems. Many of HISSAP's software-development utilities played key roles in the implementation of two computationally intensive algorithms: the Multiple-Signal Classification algorithm (MUSIC) and a four-channel, narrowband, finite-impulse response (FIR) filter. The introduction of utilities not included with the HISSAP tools would undoubtedly have increased the speed and efficiency of software development.

  4. An Analog Processor Array Implementing Interconnect-Efficient Reference Data Shift and SAD/SSD Extraction for Motion Estimation

    NASA Astrophysics Data System (ADS)

    Poikonen, Jonne; Laiho, Mika; Paasio, Ari; Koskinen, Lauri; Halonen, Kari

    2009-12-01

    A cellular analog processor array for use in variable block-size motion estimation with a new simple method for shifting reference image data is presented. The new shift method leads to a greatly reduced number of neighborhood connections for each cell of the array, and allows for all shifts within the [8,8] search area to be performed in a single step, with simple digital controls. The new shift circuitry, together with some other cell and system level optimizations, reduces silicon area and array layout complexity, enabling faster and more efficient parallel full search motion estimation hardware. A [InlineEquation not available: see fulltext.] cell parallel analog test array for reference-shift with a maximum block-size of [InlineEquation not available: see fulltext.], as well as absolute value/quadratic processing for variable block-size analog motion estimation (AME) has been designed in a 0.13 [InlineEquation not available: see fulltext.]m CMOS technology.

  5. Block iterative restoration of astronomical images with the massively parallel processor

    NASA Technical Reports Server (NTRS)

    Heap, Sara R.; Lindler, Don J.

    1987-01-01

    A method is described for algebraic image restoration capable of treating astronomical images. For a typical 500 x 500 image, direct algebraic restoration would require the solution of a 250,000 x 250,000 linear system. The block iterative approach is used to reduce the problem to solving 4900 121 x 121 linear systems. The algorithm was implemented on the Goddard Massively Parallel Processor, which can solve a 121 x 121 system in approximately 0.06 seconds. Examples are shown of the results for various astronomical images.

  6. Solving linear programs under uncertainty, using decomposition, importance sampling and parallel processors. Progress report

    SciTech Connect

    Dantzig, G.B.; Glynn, P.; Infanger, G.

    1994-03-01

    Planning under uncertainty is a fundamental problem of decision science where solution could advance man`s ability to plan, schedule, design, and control complex situations. Goal is to develop efficient methods for solving an important class of planning problems, namely linear programs whose parameters (coefficients, right hand sides) are not known with certainty. The research concentrated on theoretical tasks of decomposition and importance sampling techniques, implementation, and software development issues and on applications. Research is continuing on use of parallel processors for solving stochastic programs.

  7. An Investigation into Reliability, Availability, and Serviceability (RAS) Features for Massively Parallel Processor Systems

    SciTech Connect

    KELLY, SUZANNE M.; OGDEN, JEFFREY BRANDON

    2002-10-01

    A study has been completed into the RAS features necessary for Massively Parallel Processor (MPP) systems. As part of this research, a use case model was built of how RAS features would be employed in an operational MPP system. Use cases are an effective way to specify requirements so that all involved parties can easily understand them. This technique is in contrast to laundry lists of requirements that are subject to misunderstanding as they are without context. As documented in the use case model, the study included a look at incorporating system software and end-user applications, as well as hardware, into the RAS system.

  8. Estimating water flow through a hillslope using the massively parallel processor

    NASA Technical Reports Server (NTRS)

    Devaney, Judy E.; Camillo, P. J.; Gurney, R. J.

    1988-01-01

    A new two-dimensional model of water flow in a hillslope has been implemented on the Massively Parallel Processor at the Goddard Space Flight Center. Flow in the soil both in the saturated and unsaturated zones, evaporation and overland flow are all modelled, and the rainfall rates are allowed to vary spatially. Previous models of this type had always been very limited computationally. This model takes less than a minute to model all the components of the hillslope water flow for a day. The model can now be used in sensitivity studies to specify which measurements should be taken and how accurate they should be to describe such flows for environmental studies.

  9. Parallelism analysis of the memory system in single-chip VLIW video signal processors

    NASA Astrophysics Data System (ADS)

    Wu, Zhao; Wolf, Wayne H.

    1998-03-01

    This paper presents a design study of the memory system for a very long instruction word (VLIW) video signal processor (VSP). The gap between memory and modern processors is continuously becoming wider and wider, and thus memory systems have been a subject of active research for a long time.However, memory issues in VLIW machines have not yet been addressed. Real-time video signal processing requires a fast memory with high-bandwidth and high-connectivity. Efficient memory system design is particularly important for VSPs that combine significant amounts of memory on-chip with the processor, which we expect to become common in the next generation of VSPs. In this paper we use trace-driven methodology to analyze the parallelism, especially that of memory operations, in video applications. With a scheduling range of up to ne billion operations, we analyzed large traces of several real applications including H.263, MPEG2 and MPEG4. We found that even with a conservative configuration the average speedup is more than 8.

  10. Parallel scheduling of recursively defined arrays

    NASA Technical Reports Server (NTRS)

    Myers, T. J.; Gokhale, M. B.

    1986-01-01

    A new method of automatic generation of concurrent programs which constructs arrays defined by sets of recursive equations is described. It is assumed that the time of computation of an array element is a linear combination of its indices, and integer programming is used to seek a succession of hyperplanes along which array elements can be computed concurrently. The method can be used to schedule equations involving variable length dependency vectors and mutually recursive arrays. Portions of the work reported here have been implemented in the PS automatic program generation system.

  11. Animated computer graphics models of space and earth sciences data generated via the massively parallel processor

    NASA Technical Reports Server (NTRS)

    Treinish, Lloyd A.; Gough, Michael L.; Wildenhain, W. David

    1987-01-01

    The capability was developed of rapidly producing visual representations of large, complex, multi-dimensional space and earth sciences data sets via the implementation of computer graphics modeling techniques on the Massively Parallel Processor (MPP) by employing techniques recently developed for typically non-scientific applications. Such capabilities can provide a new and valuable tool for the understanding of complex scientific data, and a new application of parallel computing via the MPP. A prototype system with such capabilities was developed and integrated into the National Space Science Data Center's (NSSDC) Pilot Climate Data System (PCDS) data-independent environment for computer graphics data display to provide easy access to users. While developing these capabilities, several problems had to be solved independently of the actual use of the MPP, all of which are outlined.

  12. Parallel collective resonances in arrays of gold nanorods.

    PubMed

    Vitrey, Alan; Aigouy, Lionel; Prieto, Patricia; García-Martín, José Miguel; González, María U

    2014-01-01

    In this work we discuss the excitation of parallel collective resonances in arrays of gold nanoparticles. Parallel collective resonances result from the coupling of the nanoparticles localized surface plasmons with diffraction orders traveling in the direction parallel to the polarization vector. While they provide field enhancement and delocalization as the standard collective resonances, our results suggest that parallel resonances could exhibit greater tolerance to index asymmetry in the environment surrounding the arrays. The near- and far-field properties of these resonances are analyzed, both experimentally and numerically. PMID:24645987

  13. The Square Kilometre Array Science Data Processor. Preliminary compute platform design

    NASA Astrophysics Data System (ADS)

    Broekema, P. C.; van Nieuwpoort, R. V.; Bal, H. E.

    2015-07-01

    The Square Kilometre Array is a next-generation radio-telescope, to be built in South Africa and Western Australia. It is currently in its detailed design phase, with procurement and construction scheduled to start in 2017. The SKA Science Data Processor is the high-performance computing element of the instrument, responsible for producing science-ready data. This is a major IT project, with the Science Data Processor expected to challenge the computing state-of-the art even in 2020. In this paper we introduce the preliminary Science Data Processor design and the principles that guide the design process, as well as the constraints to the design. We introduce a highly scalable and flexible system architecture capable of handling the SDP workload.

  14. Applications of array processors in the analysis of remote sensing images

    NASA Technical Reports Server (NTRS)

    Ramapriyan, H. K.; Strong, J. P.

    1984-01-01

    The architectures, programming characteristics, and ranges of application of past, present, and planned array processors for the digital processing of remote-sensing images are compared. Such functions as radiometric and geometric corrections, principal-components analysis, cluster coding, histogram generation, grey-level mapping, convolution, classification, and mensuration and modeling operations are considered, and both pipeline-type and single-instruction/multiple-data-stream (SIMD) arrays are evaluated. Numerical results are presented in a table, and it is found that the pipeline-type arrays normally used with minicomputers increase their speed significantly at low cost, while even further gains are provided by the more expensive SIMD arrays. Most image-processing operations become I/O-limited when SIMD arrays are used with current I/O devices.

  15. Evaluation of soft-core processors on a Xilinx Virtex-5 field programmable gate array.

    SciTech Connect

    Learn, Mark Walter

    2011-04-01

    Node-based architecture (NBA) designs for future satellite projects hold the promise of decreasing system development time and costs, size, weight, and power and positioning the laboratory to address other emerging mission opportunities quickly. Reconfigurable field programmable gate array (FPGA)-based modules will comprise the core of several of the NBA nodes. Microprocessing capabilities will be necessary with varying degrees of mission-specific performance requirements on these nodes. To enable the flexibility of these reconfigurable nodes, it is advantageous to incorporate the microprocessor into the FPGA itself, either as a hard-core processor built into the FPGA or as a soft-core processor built out of FPGA elements. This document describes the evaluation of three reconfigurable FPGA-based soft-core processors for use in future NBA systems: the MicroBlaze (uB), the open-source Leon3, and the licensed Leon3. Two standard performance benchmark applications were developed for each processor. The first, Dhrystone, is a fixed-point operation metric. The second, Whetstone, is a floating-point operation metric. Several trials were run at varying code locations, loop counts, processor speeds, and cache configurations. FPGA resource utilization was recorded for each configuration.

  16. Performance Evaluation and Modeling Techniques for Parallel Processors. Ph.D. Thesis

    NASA Technical Reports Server (NTRS)

    Dimpsey, Robert Tod

    1992-01-01

    In practice, the performance evaluation of supercomputers is still substantially driven by singlepoint estimates of metrics (e.g., MFLOPS) obtained by running characteristic benchmarks or workloads. With the rapid increase in the use of time-shared multiprogramming in these systems, such measurements are clearly inadequate. This is because multiprogramming and system overhead, as well as other degradations in performance due to time varying characteristics of workloads, are not taken into account. In multiprogrammed environments, multiple jobs and users can dramatically increase the amount of system overhead and degrade the performance of the machine. Performance techniques, such as benchmarking, which characterize performance on a dedicated machine ignore this major component of true computer performance. Due to the complexity of analysis, there has been little work done in analyzing, modeling, and predicting the performance of applications in multiprogrammed environments. This is especially true for parallel processors, where the costs and benefits of multi-user workloads are exacerbated. While some may claim that the issue of multiprogramming is not a viable one in the supercomputer market, experience shows otherwise. Even in recent massively parallel machines, multiprogramming is a key component. It has even been claimed that a partial cause of the demise of the CM2 was the fact that it did not efficiently support time-sharing. In the same paper, Gordon Bell postulates that, multicomputers will evolve to multiprocessors in order to support efficient multiprogramming. Therefore, it is clear that parallel processors of the future will be required to offer the user a time-shared environment with reasonable response times for the applications. In this type of environment, the most important performance metric is the completion of response time of a given application. However, there are a few evaluation efforts addressing this issue.

  17. Application of an array processor to the analysis of magnetic data for the Doublet III tokamak

    SciTech Connect

    Wang, T.S.; Saito, M.T.

    1980-08-01

    Discussed herein is a fast computational technique employing the Floating Point Systems AP-190L array processor to analyze magnetic data for the Doublet III tokamak, a fusion research device. Interpretation of the experimental data requires the repeated solution of a free-boundary nonlinear partial differential equation, which describes the magnetohydrodynamic (MHD) equilibrium of the plasma. For this particular application, we have found that the array processor is only 1.4 and 3.5 times slower than the CDC-7600 and CRAY computers, respectively. The overhead on the host DEC-10 computer was kept to a minimum by chaining the complete Poisson solver and free-boundary algorithm into one single-load module using the vector function chainer (VFC). A simple time-sharing scheme for using the MHD code is also discussed.

  18. Parallel Spectral Acquisition with an Ion Cyclotron Resonance Cell Array.

    PubMed

    Park, Sung-Gun; Anderson, Gordon A; Navare, Arti T; Bruce, James E

    2016-01-19

    Mass measurement accuracy is a critical analytical figure-of-merit in most areas of mass spectrometry application. However, the time required for acquisition of high-resolution, high mass accuracy data limits many applications and is an aspect under continual pressure for development. Current efforts target implementation of higher electrostatic and magnetic fields because ion oscillatory frequencies increase linearly with field strength. As such, the time required for spectral acquisition of a given resolving power and mass accuracy decreases linearly with increasing fields. Mass spectrometer developments to include multiple high-resolution detectors that can be operated in parallel could further decrease the acquisition time by a factor of n, the number of detectors. Efforts described here resulted in development of an instrument with a set of Fourier transform ion cyclotron resonance (ICR) cells as detectors that constitute the first MS array capable of parallel high-resolution spectral acquisition. ICR cell array systems consisting of three or five cells were constructed with printed circuit boards and installed within a single superconducting magnet and vacuum system. Independent ion populations were injected and trapped within each cell in the array. Upon filling the array, all ions in all cells were simultaneously excited and ICR signals from each cell were independently amplified and recorded in parallel. Presented here are the initial results of successful parallel spectral acquisition, parallel mass spectrometry (MS) and MS/MS measurements, and parallel high-resolution acquisition with the MS array system. PMID:26669509

  19. Simulation study of a parallel processor with unbalanced loads. Master's thesis

    SciTech Connect

    Moore, T.S.

    1987-12-01

    The purpose of this thesis was twofold: to estimate the impact of unbalanced computational loads on a parallel-processing architecture via Monte Carlo simulation; and second to investigate the impact of representing the dynamics of the parallel-processing problem via animated simulation. It is constrained to the hypercube architecture in which each node is connected in a predetermined topology and allowed to communicate to other nodes through calls to the operating system. Routing of messages through the network is fixed and specified within the operating system. Message-transmission preempts nodal processing causing internodal communications to complicate the concurrent operation of the network. Two independent variables are defined: 1) the degree of imbalance characterizes the nature or severity of the load imbalance, and 2) the degree of locality characterizes the node loadings with respect to node locations across the cube. A SLAM II simulation model of a generic 16 node hypercube was constructed in which each node processes a predetermined number of computational tasks and, following each task, sends a message to a single randomly chosen receiver node. An experiment was designed in which the independent variables, degree of imbalance and degree of locality were varied across two computation-to-IO ratios to determine their separate and interactive effects on the dependent variable, job speedup. ANOVA and regression techniques were used to estimate the relationship between load imbalance, locality, computation-to-IO ratio, and their interactions to job speedup. Results show that load imbalance severely impacts a parallel-processor's performance.

  20. Evaluation of the Intel iWarp parallel processor for space flight applications

    NASA Technical Reports Server (NTRS)

    Hine, Butler P., III; Fong, Terrence W.

    1993-01-01

    The potential of a DARPA-sponsored advanced processor, the Intel iWarp, for use in future SSF Data Management Systems (DMS) upgrades is evaluated through integration into the Ames DMS testbed and applications testing. The iWarp is a distributed, parallel computing system well suited for high performance computing applications such as matrix operations and image processing. The system architecture is modular, supports systolic and message-based computation, and is capable of providing massive computational power in a low-cost, low-power package. As a consequence, the iWarp offers significant potential for advanced space-based computing. This research seeks to determine the iWarp's suitability as a processing device for space missions. In particular, the project focuses on evaluating the ease of integrating the iWarp into the SSF DMS baseline architecture and the iWarp's ability to support computationally stressing applications representative of SSF tasks.

  1. On-board landmark navigation and attitude reference parallel processor system

    NASA Technical Reports Server (NTRS)

    Gilbert, L. E.; Mahajan, D. T.

    1978-01-01

    An approach to autonomous navigation and attitude reference for earth observing spacecraft is described along with the landmark identification technique based on a sequential similarity detection algorithm (SSDA). Laboratory experiments undertaken to determine if better than one pixel accuracy in registration can be achieved consistent with onboard processor timing and capacity constraints are included. The SSDA is implemented using a multi-microprocessor system including synchronization logic and chip library. The data is processed in parallel stages, effectively reducing the time to match the small known image within a larger image as seen by the onboard image system. Shared memory is incorporated in the system to help communicate intermediate results among microprocessors. The functions include finding mean values and summation of absolute differences over the image search area. The hardware is a low power, compact unit suitable to onboard application with the flexibility to provide for different parameters depending upon the environment.

  2. High-performance parallel processors based on star-coupled wavelength division multiplexing optical interconnects

    DOEpatents

    Deri, Robert J.; DeGroot, Anthony J.; Haigh, Ronald E.

    2002-01-01

    As the performance of individual elements within parallel processing systems increases, increased communication capability between distributed processor and memory elements is required. There is great interest in using fiber optics to improve interconnect communication beyond that attainable using electronic technology. Several groups have considered WDM, star-coupled optical interconnects. The invention uses a fiber optic transceiver to provide low latency, high bandwidth channels for such interconnects using a robust multimode fiber technology. Instruction-level simulation is used to quantify the bandwidth, latency, and concurrency required for such interconnects to scale to 256 nodes, each operating at 1 GFLOPS performance. Performance scales have been shown to .apprxeq.100 GFLOPS for scientific application kernels using a small number of wavelengths (8 to 32), only one wavelength received per node, and achievable optoelectronic bandwidth and latency.

  3. Parallel pipeline networking and signal processing with field-programmable gate arrays (FPGAs) and VCSEL-MSM smart pixels

    NASA Astrophysics Data System (ADS)

    Kuznia, C. B.; Sawchuk, Alexander A.; Zhang, Liping; Hoanca, Bogdan; Hong, Sunkwang; Min, Chris; Pansatiankul, Dhawat E.; Alpaslan, Zahir Y.

    2000-05-01

    We present a networking and signal processing architecture called Transpar-TR (Translucent Smart Pixel Array-Token- Ring) that utilizes smart pixel technology to perform 2D parallel optical data transfer between digital processing nodes. Transpar-TR moves data through the network in the form of 3D packets (2D spatial and 1D time). By utilizing many spatial parallel channels, Transpar-TR can achieve high throughput, low latency communication between nodes, even with each channel operating at moderate data rates. The 2D array of optical channels is created by an array of smart pixels, each with an optical input and optical output. Each smart pixel consists of two sections, an optical network interface and ALU-based processor with local memory. The optical network interface is responsible for transmitting and receiving optical data packets using a slotted token ring network protocol. The smart pixel array operates as a single-instruction multiple-data processor when processing data. The Transpar-TR network, consisting of networked smart pixel arrays, can perform pipelined parallel processing very efficiently on 2D data structures such as images and video. This paper discusses the Transpar-TR implementation in which each node is the printed circuit board integration of a VCSEL-MSM chip, a transimpedance receiver array chip and an FPGA chip.

  4. Parallel Access of Out-Of-Core Dense Extendible Arrays

    SciTech Connect

    Otoo, Ekow J; Rotem, Doron

    2007-07-26

    Datasets used in scientific and engineering applications are often modeled as dense multi-dimensional arrays. For very large datasets, the corresponding array models are typically stored out-of-core as array files. The array elements are mapped onto linear consecutive locations that correspond to the linear ordering of the multi-dimensional indices. Two conventional mappings used are the row-major order and the column-major order of multi-dimensional arrays. Such conventional mappings of dense array files highly limit the performance of applications and the extendibility of the dataset. Firstly, an array file that is organized in say row-major order causes applications that subsequently access the data in column-major order, to have abysmal performance. Secondly, any subsequent expansion of the array file is limited to only one dimension. Expansions of such out-of-core conventional arrays along arbitrary dimensions, require storage reorganization that can be very expensive. Wepresent a solution for storing out-of-core dense extendible arrays that resolve the two limitations. The method uses a mapping function F*(), together with information maintained in axial vectors, to compute the linear address of an extendible array element when passed its k-dimensional index. We also give the inverse function, F-1*() for deriving the k-dimensional index when given the linear address. We show how the mapping function, in combination with MPI-IO and a parallel file system, allows for the growth of the extendible array without reorganization and no significant performance degradation of applications accessing elements in any desired order. We give methods for reading and writing sub-arrays into and out of parallel applications that run on a cluster of workstations. The axial-vectors are replicated and maintained in each node that accesses sub-array elements.

  5. Feasibility of using the Massively Parallel Processor for large eddy simulations and other Computational Fluid Dynamics applications

    NASA Technical Reports Server (NTRS)

    Bruno, John

    1984-01-01

    The results of an investigation into the feasibility of using the MPP for direct and large eddy simulations of the Navier-Stokes equations is presented. A major part of this study was devoted to the implementation of two of the standard numerical algorithms for CFD. These implementations were not run on the Massively Parallel Processor (MPP) since the machine delivered to NASA Goddard does not have sufficient capacity. Instead, a detailed implementation plan was designed and from these were derived estimates of the time and space requirements of the algorithms on a suitably configured MPP. In addition, other issues related to the practical implementation of these algorithms on an MPP-like architecture were considered; namely, adaptive grid generation, zonal boundary conditions, the table lookup problem, and the software interface. Performance estimates show that the architectural components of the MPP, the Staging Memory and the Array Unit, appear to be well suited to the numerical algorithms of CFD. This combined with the prospect of building a faster and larger MMP-like machine holds the promise of achieving sustained gigaflop rates that are required for the numerical simulations in CFD.

  6. The Fortran-P Translator: Towards Automatic Translation of Fortran 77 Programs for Massively Parallel Processors

    DOE PAGESBeta

    O'keefe, Matthew; Parr, Terence; Edgar, B. Kevin; Anderson, Steve; Woodward, Paul; Dietz, Hank

    1995-01-01

    Massively parallel processors (MPPs) hold the promise of extremely high performance that, if realized, could be used to study problems of unprecedented size and complexity. One of the primary stumbling blocks to this promise has been the lack of tools to translate application codes to MPP form. In this article we show how applications codes written in a subset of Fortran 77, called Fortran-P, can be translated to achieve good performance on several massively parallel machines. This subset can express codes that are self-similar, where the algorithm applied to the global data domain is also applied to each subdomain. Wemore » have found many codes that match the Fortran-P programming style and have converted them using our tools. We believe a self-similar coding style will accomplish what a vectorizable style has accomplished for vector machines by allowing the construction of robust, user-friendly, automatic translation systems that increase programmer productivity and generate fast, efficient code for MPPs.« less

  7. Bayesian image reconstruction for emission tomography incorporating Good's roughness prior on massively parallel processors.

    PubMed Central

    Miller, M I; Roysam, B

    1991-01-01

    Since the introduction by Shepp and Vardi [Shepp, L. A. & Vardi, Y. (1982) IEEE Trans. Med. Imaging 1, 113-121] of the expectation-maximization algorithm for the generation of maximum-likelihood images in emission tomography, a number of investigators have applied the maximum-likelihood method to imaging problems. Though this approach is promising, it is now well known that the unconstrained maximum-likelihood approach has two major drawbacks: (i) the algorithm is computationally demanding, resulting in reconstruction times that are not acceptable for routine clinical application, and (ii) the unconstrained maximum-likelihood estimator has a fundamental noise artifact that worsens as the iterative algorithm climbs the likelihood hill. In this paper the computation issue is addressed by proposing an implementation on the class of massively parallel single-instruction, multiple-data architectures. By restructuring the superposition integrals required for the expectation-maximization algorithm as the solutions of partial differential equations, the local data passage required for efficient computation on this class of machines is satisfied. For dealing with the "noise artifact" a Markov random field prior determined by Good's rotationally invariant roughness penalty is incorporated. These methods are demonstrated on the single-instruction multiple-data class of parallel processors, with the computation times compared with those on conventional and hypercube architectures. Images PMID:2014243

  8. Parallel arrays of Josephson junctions for submillimeter local oscillators

    NASA Technical Reports Server (NTRS)

    Pance, Aleksandar; Wengler, Michael J.

    1992-01-01

    In this paper we discuss the influence of the DC biasing circuit on operation of parallel biased quasioptical Josephson junction oscillator arrays. Because of nonuniform distribution of the DC biasing current along the length of the bias lines, there is a nonuniform distribution of magnetic flux in superconducting loops connecting every two junctions of the array. These DC self-field effects determine the state of the array. We present analysis and time-domain numerical simulations of these states for four biasing configurations. We find conditions for the in-phase states with maximum power output. We compare arrays with small and large inductances and determine the low inductance limit for nearly-in-phase array operation. We show how arrays can be steered in H-plane using the externally applied DC magnetic field.

  9. Feasibility study for the implementation of NASTRAN on the ILLIAC 4 parallel processor

    NASA Technical Reports Server (NTRS)

    Field, E. I.

    1975-01-01

    The ILLIAC IV, a fourth generation multiprocessor using parallel processing hardware concepts, is operational at Moffett Field, California. Its capability to excel at matrix manipulation, makes the ILLIAC well suited for performing structural analyses using the finite element displacement method. The feasibility of modifying the NASTRAN (NASA structural analysis) computer program to make effective use of the ILLIAC IV was investigated. The characteristics are summarized of the ILLIAC and the ARPANET, a telecommunications network which spans the continent making the ILLIAC accessible to nearly all major industrial centers in the United States. Two distinct approaches are studied: retaining NASTRAN as it now operates on many of the host computers of the ARPANET to process the input and output while using the ILLIAC only for the major computational tasks, and installing NASTRAN to operate entirely in the ILLIAC environment. Though both alternatives offer similar and significant increases in computational speed over modern third generation processors, the full installation of NASTRAN on the ILLIAC is recommended. Specifications are presented for performing that task with manpower estimates and schedules to correspond.

  10. NOSC (Naval Ocean Systems Center) advanced systolic array processor (ASAP). Professional paper for period ending August 1987

    SciTech Connect

    Loughlin, J.P.

    1987-12-01

    Design of a high-speed (250 million 32-bit floating-point operations per second) two-dimensional systolic array composed of 16-bit/slice microsequencer structured processors is presented. System-design features such as broadcast data flow, tag bit movement, and integrated diagnostic test registers are described. The software development tools needed to map complex matrix-based signal-processing algorithms onto the systolic-processor system are described.

  11. Mobile and replicated alignment of arrays in data-parallel programs

    NASA Technical Reports Server (NTRS)

    Chatterjee, Siddhartha; Gilbert, John R.; Schreiber, Robert

    1993-01-01

    When a data-parallel language like FORTRAN 90 is compiled for a distributed-memory machine, aggregate data objects (such as arrays) are distributed across the processor memories. The mapping determines the amount of residual communication needed to bring operands of parallel operations into alignment with each other. A common approach is to break the mapping into two stages: first, an alignment that maps all the objects to an abstract template, and then a distribution that maps the template to the processors. We solve two facets of the problem of finding alignments that reduce residual communication: we determine alignments that vary in loops, and objects that should have replicated alignments. We show that loop-dependent mobile alignment is sometimes necessary for optimum performance, and we provide algorithms with which a compiler can determine good mobile alignments for objects within do loops. We also identify situations in which replicated alignment is either required by the program itself (via spread operations) or can be used to improve performance. We propose an algorithm based on network flow that determines which objects to replicate so as to minimize the total amount of broadcast communication in replication. This work on mobile and replicated alignment extends our earlier work on determining static alignment.

  12. Real-Time Adaptive Lossless Hyperspectral Image Compression using CCSDS on Parallel GPGPU and Multicore Processor Systems

    NASA Technical Reports Server (NTRS)

    Hopson, Ben; Benkrid, Khaled; Keymeulen, Didier; Aranki, Nazeeh; Klimesh, Matt; Kiely, Aaron

    2012-01-01

    The proposed CCSDS (Consultative Committee for Space Data Systems) Lossless Hyperspectral Image Compression Algorithm was designed to facilitate a fast hardware implementation. This paper analyses that algorithm with regard to available parallelism and describes fast parallel implementations in software for GPGPU and Multicore CPU architectures. We show that careful software implementation, using hardware acceleration in the form of GPGPUs or even just multicore processors, can exceed the performance of existing hardware and software implementations by up to 11x and break the real-time barrier for the first time for a typical test application.

  13. Investigations on the usefulness of the Massively Parallel Processor for study of electronic properties of atomic and condensed matter systems

    NASA Technical Reports Server (NTRS)

    Das, T. P.

    1988-01-01

    The usefulness of the Massively Parallel Processor (MPP) for investigation of electronic structures and hyperfine properties of atomic and condensed matter systems was explored. The major effort was directed towards the preparation of algorithms for parallelization of the computational procedure being used on serial computers for electronic structure calculations in condensed matter systems. Detailed descriptions of investigations and results are reported, including MPP adaptation of self-consistent charge extended Hueckel (SCCEH) procedure, MPP adaptation of the first-principles Hartree-Fock cluster procedure for electronic structures of large molecules and solid state systems, and MPP adaptation of the many-body procedure for atomic systems.

  14. Parallel computation of optimized arrays for 2-D electrical imaging surveys

    NASA Astrophysics Data System (ADS)

    Loke, M. H.; Wilkinson, P. B.; Chambers, J. E.

    2010-12-01

    Modern automatic multi-electrode survey instruments have made it possible to use non-traditional arrays to maximize the subsurface resolution from electrical imaging surveys. Previous studies have shown that one of the best methods for generating optimized arrays is to select the set of array configurations that maximizes the model resolution for a homogeneous earth model. The Sherman-Morrison Rank-1 update is used to calculate the change in the model resolution when a new array is added to a selected set of array configurations. This method had the disadvantage that it required several hours of computer time even for short 2-D survey lines. The algorithm was modified to calculate the change in the model resolution rather than the entire resolution matrix. This reduces the computer time and memory required as well as the computational round-off errors. The matrix-vector multiplications for a single add-on array were replaced with matrix-matrix multiplications for 28 add-on arrays to further reduce the computer time. The temporary variables were stored in the double-precision Single Instruction Multiple Data (SIMD) registers within the CPU to minimize computer memory access. A further reduction in the computer time is achieved by using the computer graphics card Graphics Processor Unit (GPU) as a highly parallel mathematical coprocessor. This makes it possible to carry out the calculations for 512 add-on arrays in parallel using the GPU. The changes reduce the computer time by more than two orders of magnitude. The algorithm used to generate an optimized data set adds a specified number of new array configurations after each iteration to the existing set. The resolution of the optimized data set can be increased by adding a smaller number of new array configurations after each iteration. Although this increases the computer time required to generate an optimized data set with the same number of data points, the new fast numerical routines has made this practical on commonly available microcomputers.

  15. Highly scalable parallel processing of extracellular recordings of Multielectrode Arrays.

    PubMed

    Gehring, Tiago V; Vasilaki, Eleni; Giugliano, Michele

    2015-08-01

    Technological advances of Multielectrode Arrays (MEAs) used for multisite, parallel electrophysiological recordings, lead to an ever increasing amount of raw data being generated. Arrays with hundreds up to a few thousands of electrodes are slowly seeing widespread use and the expectation is that more sophisticated arrays will become available in the near future. In order to process the large data volumes resulting from MEA recordings there is a pressing need for new software tools able to process many data channels in parallel. Here we present a new tool for processing MEA data recordings that makes use of new programming paradigms and recent technology developments to unleash the power of modern highly parallel hardware, such as multi-core CPUs with vector instruction sets or GPGPUs. Our tool builds on and complements existing MEA data analysis packages. It shows high scalability and can be used to speed up some performance critical pre-processing steps such as data filtering and spike detection, helping to make the analysis of larger data sets tractable. PMID:26737215

  16. Microchannel cross load array with dense parallel input

    DOEpatents

    Swierkowski, Stefan P.

    2004-04-06

    An architecture or layout for microchannel arrays using T or Cross (+) loading for electrophoresis or other injection and separation chemistry that are performed in microfluidic configurations. This architecture enables a very dense layout of arrays of functionally identical shaped channels and it also solves the problem of simultaneously enabling efficient parallel shapes and biasing of the input wells, waste wells, and bias wells at the input end of the separation columns. One T load architecture uses circular holes with common rows, but not columns, which allows the flow paths for each channel to be identical in shape, using multiple mirror image pieces. Another T load architecture enables the access hole array to be formed on a biaxial, collinear grid suitable for EDM micromachining (square holes), with common rows and columns.

  17. Parallel vacuum arc discharge with microhollow array dielectric and anode

    SciTech Connect

    Feng, Jinghua; Zhou, Lin; Fu, Yuecheng; Zhang, Jianhua; Xu, Rongkun; Chen, Faxin; Li, Linbo; Meng, Shijian

    2014-07-15

    An electrode configuration with microhollow array dielectric and anode was developed to obtain parallel vacuum arc discharge. Compared with the conventional electrodes, more than 10 parallel microhollow discharges were ignited for the new configuration, which increased the discharge area significantly and made the cathode eroded more uniformly. The vacuum discharge channel number could be increased effectively by decreasing the distances between holes or increasing the arc current. Experimental results revealed that plasmas ejected from the adjacent hollow and the relatively high arc voltage were two key factors leading to the parallel discharge. The characteristics of plasmas in the microhollow were investigated as well. The spectral line intensity and electron density of plasmas in microhollow increased obviously with the decease of the microhollow diameter.

  18. Analog processor design for potentiometric sensor array and its applications in smart living space

    NASA Astrophysics Data System (ADS)

    Chung, Danny Wen-Yaw; Tsai, You-Lin; Liu, Tai-Tsun; Leu, Chun-Liang; Yang, Chung-Huang; Pijanowska, Dorota G.; Torbicz, Wladyslaw; Grabiec, Piotr B.; Jaroszewicz, Bohdan

    2007-04-01

    This paper presents an analog processor design for ion sensitive field effect transistor (ISFET)-based flow through system and its application in smart living space. The dynamic flow-cell measurement explores more information compared to stationary measurement and is useful in environmental monitoring and electronic tongue systems. The multi-channel floating source readout circuitry has been developed for flow-through analysis of ion sensitive field effect transistor based array. The flow injection analysis system with two different ISFET structures has been investigated by using performance parameters such as sensitivity, uniformity, response time of pH sensing. In addition, a self-tuning multi-sensor water quality monitoring system based on adaptive-network-based fuzzy interference system (ANFIS) learning method is developed. The results can be directly used in drinking water and swimming pool monitoring for improving living space and quality.

  19. Implementation of VQ algorithms on a reconfigurable array processor. Professional paper

    SciTech Connect

    Henderson, T.B.; Thyagarajan, K.S.

    1991-05-01

    Vector quantization is being widely used in image data compression applications due to the facts that it is capable of achieving fractional bit rates with reasonable complexity and that the decoding is a very simple table look-up scheme. In image encoding, a vector quantizer accepts a block of pixels and outputs an address of the best matching tile stored in a codebook. The matching algorithm requires a large number of basic arithmetic operations in typical applications. Since real time coding is required in many video applications, the need for dedicated processing architectures arises naturally. This paper investigates the mapping of VQ algorithms onto an array processor to achieve near real-time compression of video images.

  20. Mechanically verified hardware implementing an 8-bit parallel IO Byzantine agreement processor

    NASA Technical Reports Server (NTRS)

    Moore, J. Strother

    1992-01-01

    Consider a network of four processors that use the Oral Messages (Byzantine Generals) Algorithm of Pease, Shostak, and Lamport to achieve agreement in the presence of faults. Bevier and Young have published a functional description of a single processor that, when interconnected appropriately with three identical others, implements this network under the assumption that the four processors step in synchrony. By formalizing the original Pease, et al work, Bevier and Young mechanically proved that such a network achieves fault tolerance. We develop, formalize, and discuss a hardware design that has been mechanically proven to implement their processor. In particular, we formally define mapping functions from the abstract state space of the Bevier-Young processor to a concrete state space of a hardware module and state a theorem that expresses the claim that the hardware correctly implements the processor. We briefly discuss the Brock-Hunt Formal Hardware Description Language which permits designs both to be proved correct with the Boyer-Moore theorem prover and to be expressed in a commercially supported hardware description language for additional electrical analysis and layout. We briefly describe our implementation.

  1. Medical ultrasound digital beamforming on a massively parallel processing array platform

    NASA Astrophysics Data System (ADS)

    Chen, Paul; Butts, Mike; Budlong, Brad

    2008-03-01

    Digital beamforming has been widely used in modern medical ultrasound instruments. Flexibility is the key advantage of a digital beamformer over the traditional analog approach. Unlike analog delay lines, digital delay can be programmed to implement new ways of beam shaping and beam steering without hardware modification. Digital beamformers can also be focused dynamically by tracking the depth and focusing the receive beam as the depth increases. By constantly updating an element weight table, a digital beamformer can dynamically increase aperture size with depth to maintain constant lateral resolution and reduce sidelobe noise. Because ultrasound digital beamformers have high I/O bandwidth and processing requirements, traditionally they have been implemented using ASICs or FPGAs that are costly both in time and in money. This paper introduces a sample implementation of a digital beamformer that is programmed in software on a Massively Parallel Processor Array (MPPA). The system consists of a host PC and a PCI Express-based beamformer accelerator with an Ambric Am2045 MPPA chip and 512 Mbytes of external memory. The Am2045 has 336 asynchronous RISCDSP processors that communicate through a configurable structure of channels, using a self-synchronizing communication protocol.

  2. Scalable Unix commands for parallel processors : a high-performance implementation.

    SciTech Connect

    Ong, E.; Lusk, E.; Gropp, W.

    2001-06-22

    We describe a family of MPI applications we call the Parallel Unix Commands. These commands are natural parallel versions of common Unix user commands such as ls, ps, and find, together with a few similar commands particular to the parallel environment. We describe the design and implementation of these programs and present some performance results on a 256-node Linux cluster. The Parallel Unix Commands are open source and freely available.

  3. A parallel FPGA implementation for real-time 2D pixel clustering for the ATLAS Fast Tracker Processor

    NASA Astrophysics Data System (ADS)

    Sotiropoulou, C. L.; Gkaitatzis, S.; Annovi, A.; Beretta, M.; Kordas, K.; Nikolaidis, S.; Petridou, C.; Volpi, G.

    2014-10-01

    The parallel 2D pixel clustering FPGA implementation used for the input system of the ATLAS Fast TracKer (FTK) processor is presented. The input system for the FTK processor will receive data from the Pixel and micro-strip detectors from inner ATLAS read out drivers (RODs) at full rate, for total of 760Gbs, as sent by the RODs after level-1 triggers. Clustering serves two purposes, the first is to reduce the high rate of the received data before further processing, the second is to determine the cluster centroid to obtain the best spatial measurement. For the pixel detectors the clustering is implemented by using a 2D-clustering algorithm that takes advantage of a moving window technique to minimize the logic required for cluster identification. The cluster detection window size can be adjusted for optimizing the cluster identification process. Additionally, the implementation can be parallelized by instantiating multiple cores to identify different clusters independently thus exploiting more FPGA resources. This flexibility makes the implementation suitable for a variety of demanding image processing applications. The implementation is robust against bit errors in the input data stream and drops all data that cannot be identified. In the unlikely event of missing control words, the implementation will ensure stable data processing by inserting the missing control words in the data stream. The 2D pixel clustering implementation is developed and tested in both single flow and parallel versions. The first parallel version with 16 parallel cluster identification engines is presented. The input data from the RODs are received through S-Links and the processing units that follow the clustering implementation also require a single data stream, therefore data parallelizing (demultiplexing) and serializing (multiplexing) modules are introduced in order to accommodate the parallelized version and restore the data stream afterwards. The results of the first hardware tests of the single flow implementation on the custom FTK input mezzanine (IM) board are presented. We report on the integration of 16 parallel engines in the same FPGA and the resulting performances. The parallel 2D-clustering implementation has sufficient processing power to meet the specification for the Pixel layers of ATLAS, for up to 80 overlapping pp collisions that correspond to the maximum LHC luminosity planned until 2022.

  4. Parallel programming

    SciTech Connect

    Perrott, R.H.

    1987-01-01

    This book examines the major hardware developments and programming concepts that have influenced the introduction of parallelism. It provides an overview of some of the features of specific machine architectures and their interaction with developments in software technology. The independent areas of multiprocessor and distributed programming, programming array and vector processors, and data flow programming are also examined in detail. Topics covered include: hardware technology developments; software technology developments; mutual exclusion; process synchronization; message passing primitives; Modula-2; Pascal Plus; Ada; Occam: a distributed computing language; Cray-1 FORTRAN translator: CFT; CDC Cyber FORTRAN; Illiac IV CFD FORTRAN; distributed array processor FORTRAN; Actus: a Pascal-based language; data flow programming.

  5. A digital magnetic resonance imaging spectrometer using digital signal processor and field programmable gate array

    NASA Astrophysics Data System (ADS)

    Liang, Xiao; Binghe, Sun; Yueping, Ma; Ruyan, Zhao

    2013-05-01

    A digital spectrometer for low-field magnetic resonance imaging is described. A digital signal processor (DSP) is utilized as the pulse programmer on which a pulse sequence is executed as a subroutine. Field programmable gate array (FPGA) devices that are logically mapped into the external addressing space of the DSP work as auxiliary controllers of gradient control, radio frequency (rf) generation, and rf receiving separately. The pulse programmer triggers an event by setting the 32-bit control register of the corresponding FPGA, and then the FPGA automatically carries out the event function according to preset configurations in cooperation with other devices; accordingly, event control of the spectrometer is flexible and efficient. Digital techniques are in widespread use: gradient control is implemented in real-time by a FPGA; rf source is constructed using direct digital synthesis technique, and rf receiver is constructed using digital quadrature detection technique. Well-designed performance is achieved, including 1 ?s time resolution of the gradient waveform, 1 ?s time resolution of the soft pulse, and 2 MHz signal receiving bandwidth. Both rf synthesis and rf digitalization operate at the same 60 MHz clock, therefore, the frequency range of transmitting and receiving is from DC to 27 MHz. A majority of pulse sequences have been developed, and the imaging performance of the spectrometer has been validated through a large number of experiments. Furthermore, the spectrometer is also suitable for relaxation measurement in nuclear magnetic resonance field.

  6. A digital magnetic resonance imaging spectrometer using digital signal processor and field programmable gate array.

    PubMed

    Liang, Xiao; Binghe, Sun; Yueping, Ma; Ruyan, Zhao

    2013-05-01

    A digital spectrometer for low-field magnetic resonance imaging is described. A digital signal processor (DSP) is utilized as the pulse programmer on which a pulse sequence is executed as a subroutine. Field programmable gate array (FPGA) devices that are logically mapped into the external addressing space of the DSP work as auxiliary controllers of gradient control, radio frequency (rf) generation, and rf receiving separately. The pulse programmer triggers an event by setting the 32-bit control register of the corresponding FPGA, and then the FPGA automatically carries out the event function according to preset configurations in cooperation with other devices; accordingly, event control of the spectrometer is flexible and efficient. Digital techniques are in widespread use: gradient control is implemented in real-time by a FPGA; rf source is constructed using direct digital synthesis technique, and rf receiver is constructed using digital quadrature detection technique. Well-designed performance is achieved, including 1 ?s time resolution of the gradient waveform, 1 ?s time resolution of the soft pulse, and 2 MHz signal receiving bandwidth. Both rf synthesis and rf digitalization operate at the same 60 MHz clock, therefore, the frequency range of transmitting and receiving is from DC to ~27 MHz. A majority of pulse sequences have been developed, and the imaging performance of the spectrometer has been validated through a large number of experiments. Furthermore, the spectrometer is also suitable for relaxation measurement in nuclear magnetic resonance field. PMID:23742570

  7. QLISP for parallel processors. Final report, 15 July 1986-31 July 1988

    SciTech Connect

    McCarthy, J.

    1989-01-01

    The goal of the QLISP project at Stanford is to gain experience with the shared-memory, queue-based approach to parallel Lisp, by implementing the QLISP language on an actual multiprocessor, and by developing a symbolic algebra system as a testbed application. The experiments performed on the simulator included: 1. Algorithms for sorting and basic data-structure manipulation for polynomials. 2. Partitioning and scheduling methods for parallel programming. 3. Parallelizing the production rule system OPS5.

  8. Numerical methods for matrix computations using arrays of processors. Final report, 15 August 1983-15 October 1986

    SciTech Connect

    Golub, G.H.

    1987-04-30

    The basic objective of this project was to consider a large class of matrix computations with particular emphasis on algorithms that can be implemented on arrays of processors. In particular, methods useful for sparse matrix computations were investigated. These computations arise in a variety of applications such as the solution of partial differential equations by multigrid methods and in the fitting of geodetic data. Some of the methods developed have already found their use on some of the newly developed architectures.

  9. High-performance computational chemistry : hartree-fock electronic structure calculations on massively parallel processors.

    SciTech Connect

    Tilson, J. L.; Minkoff, M.; Wagner, A. F.; Shepard, R.; Sutton, P.; Harrison, R. J.; Kendall, R. A.; Wong, A. T.; PNNL

    1999-01-01

    The parallel performance of the NWChem version 1.2{alpha} parallel direct-SCF code has been characterized on five massively parallel supercomputers (IBM SP, Kendall Square KSR-2, CRAY T3D and T3E, and Intel Touchstone DELTA) using single-point energy calculations on seven molecules of varying size (up to 389 atoms) and composition (first-row atoms, halogens, and transition metals). The authors compare the performance using both replicated-data and distributed-data algorithms and the original McMurchie-Davidson and recently incorporated TEXAS integrals packages.

  10. A longitudinal multi-bunch feedback system using parallel digital signal processors

    SciTech Connect

    Sapozhnikov, L.; Fox, J.D.; Olsen, J.J.; Oxoby, G.; Linscott, I.; Drago, A.; Serio, M.

    1993-12-01

    A programmable longitudinal feedback system based on four AT&T 1610 digital signal processors has been developed as a component of the PEP-II R&D program. This longitudinal quick prototype is a proof of concept for the PEP-II system and implements full-speed bunch-by-bunch signal processing for storage rings with bunch spacing of 4 ns. The design incorporates a phase-detector-based front end that digitizes the oscillation phases of bunchies at the 250 MHz crossing rate, four programmable signal processors that compute correction signals, and a 250-MHz hold buffer/kicker driver stage that applies correction signals back on the beam. The design implements a general-purpose, table-driven downsampler that allows the system to be operated at several accelerator facilities. The hardware architecture of the signal processing is described, and the software algorithms used in the feedback signal computation are discussed. The system configuration used for tests at the LBL Advanced Light Source is presented.

  11. Experimental results for a photonic time reversal processor for the adaptive control of an ultra wideband phased array antenna

    NASA Astrophysics Data System (ADS)

    Zmuda, Henry; Fanto, Michael; McEwen, Thomas

    2008-04-01

    This paper describes a new concept for a photonic implementation of a time reversed RF antenna array beamforming system. The process does not require analog to digital conversion to implement and is therefore particularly suited for high bandwidth applications. Significantly, propagation distortion due to atmospheric effects, clutter, etc. is automatically accounted for with the time reversal process. The approach utilizes the reflection of an initial interrogation signal from off an extended target to precisely time match the radiating elements of the array so as to re-radiate signals precisely back to the target's location. The backscattered signal(s) from the desired location is captured by each antenna and used to modulate a pulsed laser. An electrooptic switch acts as a time gate to eliminate any unwanted signals such as those reflected from other targets whose range is different from that of the desired location resulting in a spatial null at that location. A chromatic dispersion processor is used to extract the exact array parameters of the received signal location. Hence, other than an approximate knowledge of the steering direction needed only to approximately establish the time gating, no knowledge of the target position is required, and hence no knowledge of the array element time delay is required. Target motion and/or array element jitter is automatically accounted for. Presented here are experimental results that demonstrate the ability of a photonic processor to perform the time-reversal operation on ultra-short electronic pulses.

  12. Failure analysis in a highly parallel processor for L1 triggering

    SciTech Connect

    Cancelo, G.; Gottschalk, Erik Edward; Pavlicek, V.; Wang, M.; Wu, J.

    2003-12-01

    This paper studies how processor failures affect the dataflow of the Level 1 Trigger in the BTeV experiment proposed to run at Fermilab's Tevatron. The failure analysis is crucial for a system with over 2500 processing nodes and a number of storage units and communication links of the same order of magnitude. This paper is based on models of the L1 Trigger architecture and shows the dynamics of the architecture's dataflow. The dataflow analysis provides insight into how system variables are affected by single component failures and provides key information to the implementation of error recovery strategies. The analysis includes both short-term failures from which the system can recover quickly and long-term failures which imply a more drastic error-recovery strategy. The modeling results are supported by behavioral simulations of the L1 Trigger processing BTeV's GEANT Monte Carlo data.

  13. Fast String Search on Multicore Processors: Mapping fundamental algorithms onto parallel hardware

    SciTech Connect

    Scarpazza, Daniele P.; Villa, Oreste; Petrini, Fabrizio

    2008-04-01

    String searching is one of these basic algorithms. It has a host of applications, including search engines, network intrusion detection, virus scanners, spam filters, and DNA analysis, among others. The Cell processor, with its multiple cores, promises to speed-up string searching a lot. In this article, we show how we mapped string searching efficiently on the Cell. We present two implementations: • The fast implementation supports a small dictionary size (approximately 100 patterns) and provides a throughput of 40 Gbps, which is 100 times faster than reference implementations on x86 architectures. • The heavy-duty implementation is slower (3.3-4.3 Gbps), but supports dictionaries with tens of thousands of strings.

  14. Low-power, real-time digital video stabilization using the HyperX parallel processor

    NASA Astrophysics Data System (ADS)

    Hunt, Martin A.; Tong, Lin; Bindloss, Keith; Zhong, Shang; Lim, Steve; Schmid, Benjamin J.; Tidwell, J. D.; Willson, Paul D.

    2011-06-01

    Coherent Logix has implemented a digital video stabilization algorithm for use in soldier systems and small unmanned air / ground vehicles that focuses on significantly reducing the size, weight, and power as compared to current implementations. The stabilization application was implemented on the HyperX architecture using a dataflow programming methodology and the ANSI C programming language. The initial implementation is capable of stabilizing an 800 x 600, 30 fps, full color video stream with a 53ms frame latency using a single 100 DSP core HyperX hx3100TM processor running at less than 3 W power draw. By comparison an Intel Core2 Duo processor running the same base algorithm on a 320x240, 15 fps stream consumes on the order of 18W. The HyperX implementation is an overall 100x improvement in performance (processing bandwidth increase times power improvement) over the GPP based platform. In addition the implementation only requires a minimal number of components to interface directly to the imaging sensor and helmet mounted display or the same computing architecture can be used to generate software defined radio waveforms for communications links. In this application, the global motion due to the camera is measured using a feature based algorithm (11 x 11 Difference of Gaussian filter and Features from Accelerated Segment Test) and model fitting (Random Sample Consensus). Features are matched in consecutive frames and a control system determines the affine transform to apply to the captured frame that will remove or dampen the camera / platform motion on a frame-by-frame basis.

  15. Fast 2D DOA Estimation Algorithm by an Array Manifold Matching Method with Parallel Linear Arrays.

    PubMed

    Yang, Lisheng; Liu, Sheng; Li, Dong; Jiang, Qingping; Cao, Hailin

    2016-01-01

    In this paper, the problem of two-dimensional (2D) direction-of-arrival (DOA) estimation with parallel linear arrays is addressed. Two array manifold matching (AMM) approaches, in this work, are developed for the incoherent and coherent signals, respectively. The proposed AMM methods estimate the azimuth angle only with the assumption that the elevation angles are known or estimated. The proposed methods are time efficient since they do not require eigenvalue decomposition (EVD) or peak searching. In addition, the complexity analysis shows the proposed AMM approaches have lower computational complexity than many current state-of-the-art algorithms. The estimated azimuth angles produced by the AMM approaches are automatically paired with the elevation angles. More importantly, for estimating the azimuth angles of coherent signals, the aperture loss issue is avoided since a decorrelation procedure is not required for the proposed AMM method. Numerical studies demonstrate the effectiveness of the proposed approaches. PMID:26907301

  16. Representing S-expressions for the efficient evaluation of Lisp on parallel processors

    SciTech Connect

    Harrison, W.L. III; Padua, D.A.

    1986-03-01

    Present methods for exploiting parallelism in Lisp programs perform poorly upon lists (long, flat s-expressions), as such structures must be both created and traversed sequentially. While such a serial operation may be masked by overlapping it with other computation (by virtue of process spawning, or by the use of a mechanism such as futures), it represents a lost (and potentially large) source of parallelism. In this paper we describe the representation of s-expressions employed in PARCEL (Project for the Automatic Restructuring and Concurrent Evaluation of Lisp), which facilitates the creation and access of lists, without compromising the performance of functions which manipulate s-expressions of a more general shape. Using this representation, the PARCEL compiler translates Lisp programs written in a subset of the Scheme dialect (which allows for global variables and atom properties) into code for a large, tightly coupled shared memory multiprocessor. 12 refs.

  17. Retinal Parallel Processors: More than 100 Independent Microcircuits Operate within a Single Interneuron

    PubMed Central

    Grimes, William N.; Zhang, Jun; Graydon, Cole W.; Kachar, Bechara; Diamond, Jeffrey S.

    2010-01-01

    SUMMARY Most neurons are highly polarized cells with branched dendrites that receive and integrate synaptic inputs and extensive axons that deliver action potential output to distant targets. By contrast, amacrine cells, a diverse class of inhibitory interneurons in the inner retina, collect input and distribute output within the same neuritic network. The extent to which most amacrine cells integrate synaptic information and distribute their output is poorly understood. Here, we show that single A17 amacrine cells provide reciprocal feedback inhibition to presynaptic bipolar cells via hundreds of independent microcircuits operating in parallel. The A17 uses specialized morphological features, biophysical properties, and synaptic mechanisms to isolate feedback microcircuits and maximize its capacity to handle many independent processes. This example of a neuron employing distributed parallel processing rather than spatial integration provides insights into how unconventional neuronal morphology and physiology can maximize network function while minimizing wiring cost. PMID:20346762

  18. Parallel modified signed-digit arithmetic using an optoelectronic shared content-addressable-memory processor.

    PubMed

    Ha, B; Li, Y

    1994-06-10

    Addition is the most primitive arithmetic operation in digital computation. Other arithmetic operations such as subtraction, multiplication, and division can all be performed by addition together with some logic operations. With the binary number system, addition speed is inevitably limited by the carry-propagation schemes. On the other hand, carry-free addition is possible when the modified signed-digit (MSD) number representation is used. We propose a novel optoelectronic scheme to handle the parallel MSD addition and subtraction operations. An optoelectronic shared content-addressable memroy is introduced. The shared content-addressable memory uses free-space optical processing to handle the large amount of parallel memory access operations and uses electronics to postprocess and derive logic decisions. We analyze the accuracy that the required optical hardware can deliver by using a statistical cross-talk-rate model that we propose. We also evaluate other important device and system performanceparameters, such as the memory capacity or the maximum number of parallel bits the adder can handle in terms of a given cross-talk rate at a certain repetition rate, the corresponding diffraction-limited memory density, and the system's power efficiency. To confirm the underlining operational principles of the proposed optoelectronic shared content-addressable-memory MSD adder, we design and perform initial experiments for handling 8-bit MSD number addition and subtraction and present the results. PMID:20885756

  19. Evaluation of the Leon3 soft-core processor within a Xilinx radiation-hardened field-programmable gate array.

    SciTech Connect

    Learn, Mark Walter

    2012-01-01

    The purpose of this document is to summarize the work done to evaluate the performance of the Leon3 soft-core processor in a radiation environment while instantiated in a radiation-hardened static random-access memory based field-programmable gate array. This evaluation will look at the differences between two soft-core processors: the open-source Leon3 core and the fault-tolerant Leon3 core. Radiation testing of these two cores was conducted at the Texas A&M University Cyclotron facility and Lawrence Berkeley National Laboratory. The results of these tests are included within the report along with designs intended to improve the mitigation of the open-source Leon3. The test setup used for evaluating both versions of the Leon3 is also included within this document.

  20. Trajectory optimization for real-time guidance. I - Time-varying LQR on a parallel processor

    NASA Technical Reports Server (NTRS)

    Psiaki, Mark L.; Park, Kihong

    1990-01-01

    A key algorithmic element of a real-time trajectory optimization hardware/software implementation, the quadratic program (QP) solver element, is presented. The purpose of the effort is to make nonlinear trajectory optimization fast enough to provide real-time commands during guidance of a vehicle such as an aeromaneuvering orbiter. Many methods of nonlinear programming require the solution of a QP at each iteration. In the trajectory optimization case the QP has a special dynamic programming structure, a LQR-like structure. QP algorithm speed is increased by taking advantage of this special structure and by parallel implementation.

  1. Multimode power processor

    DOEpatents

    O'Sullivan, George A.; O'Sullivan, Joseph A.

    1999-01-01

    In one embodiment, a power processor which operates in three modes: an inverter mode wherein power is delivered from a battery to an AC power grid or load; a battery charger mode wherein the battery is charged by a generator; and a parallel mode wherein the generator supplies power to the AC power grid or load in parallel with the battery. In the parallel mode, the system adapts to arbitrary non-linear loads. The power processor may operate on a per-phase basis wherein the load may be synthetically transferred from one phase to another by way of a bumpless transfer which causes no interruption of power to the load when transferring energy sources. Voltage transients and frequency transients delivered to the load when switching between the generator and battery sources are minimized, thereby providing an uninterruptible power supply. The power processor may be used as part of a hybrid electrical power source system which may contain, in one embodiment, a photovoltaic array, diesel engine, and battery power sources.

  2. Multimode power processor

    DOEpatents

    O'Sullivan, G.A.; O'Sullivan, J.A.

    1999-07-27

    In one embodiment, a power processor which operates in three modes: an inverter mode wherein power is delivered from a battery to an AC power grid or load; a battery charger mode wherein the battery is charged by a generator; and a parallel mode wherein the generator supplies power to the AC power grid or load in parallel with the battery. In the parallel mode, the system adapts to arbitrary non-linear loads. The power processor may operate on a per-phase basis wherein the load may be synthetically transferred from one phase to another by way of a bumpless transfer which causes no interruption of power to the load when transferring energy sources. Voltage transients and frequency transients delivered to the load when switching between the generator and battery sources are minimized, thereby providing an uninterruptible power supply. The power processor may be used as part of a hybrid electrical power source system which may contain, in one embodiment, a photovoltaic array, diesel engine, and battery power sources. 31 figs.

  3. Optimizing ion channel models using a parallel genetic algorithm on graphical processors.

    PubMed

    Ben-Shalom, Roy; Aviv, Amit; Razon, Benjamin; Korngreen, Alon

    2012-01-01

    We have recently shown that we can semi-automatically constrain models of voltage-gated ion channels by combining a stochastic search algorithm with ionic currents measured using multiple voltage-clamp protocols. Although numerically successful, this approach is highly demanding computationally, with optimization on a high performance Linux cluster typically lasting several days. To solve this computational bottleneck we converted our optimization algorithm for work on a graphical processing unit (GPU) using NVIDIA's CUDA. Parallelizing the process on a Fermi graphic computing engine from NVIDIA increased the speed ?180 times over an application running on an 80 node Linux cluster, considerably reducing simulation times. This application allows users to optimize models for ion channel kinetics on a single, inexpensive, desktop "super computer," greatly reducing the time and cost of building models relevant to neuronal physiology. We also demonstrate that the point of algorithm parallelization is crucial to its performance. We substantially reduced computing time by solving the ODEs (Ordinary Differential Equations) so as to massively reduce memory transfers to and from the GPU. This approach may be applied to speed up other data intensive applications requiring iterative solutions of ODEs. PMID:22407006

  4. Wideband aperture array using RF channelizers and massively parallel digital 2D IIR filterbank

    NASA Astrophysics Data System (ADS)

    Sengupta, Arindam; Madanayake, Arjuna; Gmez-Garca, Roberto; Engeberg, Erik D.

    2014-05-01

    Wideband receive-mode beamforming applications in wireless location, electronically-scanned antennas for radar, RF sensing, microwave imaging and wireless communications require digital aperture arrays that offer a relatively constant far-field beam over several octaves of bandwidth. Several beamforming schemes including the well-known true time-delay and the phased array beamformers have been realized using either finite impulse response (FIR) or fast Fourier transform (FFT) digital filter-sum based techniques. These beamforming algorithms offer the desired selectivity at the cost of a high computational complexity and frequency-dependant far-field array patterns. A novel approach to receiver beamforming is the use of massively parallel 2-D infinite impulse response (IIR) fan filterbanks for the synthesis of relatively frequency independent RF beams at an order of magnitude lower multiplier complexity compared to FFT or FIR filter based conventional algorithms. The 2-D IIR filterbanks demand fast digital processing that can support several octaves of RF bandwidth, fast analog-to-digital converters (ADCs) for RF-to-bits type direct conversion of wideband antenna element signals. Fast digital implementation platforms that can realize high-precision recursive filter structures necessary for real-time beamforming, at RF radio bandwidths, are also desired. We propose a novel technique that combines a passive RF channelizer, multichannel ADC technology, and single-phase massively parallel 2-D IIR digital fan filterbanks, realized at low complexity using FPGA and/or ASIC technology. There exists native support for a larger bandwidth than the maximum clock frequency of the digital implementation technology. We also strive to achieve More-than-Moore throughput by processing a wideband RF signal having content with N-fold (B = N Fclk/2) bandwidth compared to the maximum clock frequency Fclk Hz of the digital VLSI platform under consideration. Such increase in bandwidth is achieved without use of polyphase signal processing or time-interleaved ADC methods. That is, all digital processors operate at the same Fclk clock frequency without phasing, while wideband operation is achieved by sub-sampling of narrower sub-bands at the the RF channelizer outputs.

  5. Implementation of an EPICS IOC on an Embedded Soft Core Processor Using Field Programmable Gate Arrays

    SciTech Connect

    Douglas Curry; Alicia Hofler; Hai Dong; Trent Allison; J. Hovater; Kelly Mahoney

    2005-09-20

    At Jefferson Lab, we have been evaluating soft core processors running an EPICS IOC over {mu}Clinux on our custom hardware. A soft core processor is a flexible CPU architecture that is configured in the FPGA as opposed to a hard core processor which is fixed in silicon. Combined with an on-board Ethernet port, the technology incorporates the IOC and digital control hardware within a single FPGA. By eliminating the general purpose computer IOC, the designer is no longer tied to a specific platform, e.g. PC, VME, or VXI, to serve as the intermediary between the high level controls and the field hardware. This paper will discuss the design and development process as well as specific applications for JLab's next generation low-level RF controls and Machine Protection Systems.

  6. Multimedia OC12 parallel interface using VCSEL array to achieve high-performance cost-effective optical interconnections

    NASA Astrophysics Data System (ADS)

    Chang, Edward S.

    1996-09-01

    The multimedia communication needs high-performance, cost- effective communication techniques to transport data for the fast-growing multimedia traffic resulting from the recent deployment of World Wide Web (WWW), media-on-demand , and other multimedia applications. To transport a large volume, of multimedia data, high-performance servers are required to perform media processing and transfer. Typically, the high- performance multimedia server is a massively parallel processor with a high number of I/O ports, high storage capacity, fast signal processing, and excellent cost- performance. The parallel I/O ports of the server are connected to multiple clients through a network switch which uses parallel links in both switch-to-server and switch-to- client connections. In addition to media processing and storage, media communication is also a major function of the multimedia system. Without a high-performance communication network, a high-performance server can not deliver its full capacity of service to clients. Fortunately, there are many advanced communication technologies developed for networking, which can be adopted by the multimedia communication to economically deliver the full capacity of a high-performance multimedia service to clients. The VCSEL array technology has been developed for gigabit-rate parallel optical interconnections because of its high bandwidth, small-size, and easy-fabrication advantages. Several firms are developing multifiber, low-skew, low-cost ribbon cables to transfer signals form a VCSEL array. The OC12 SONET data-rate is widely used by high-performance multimedia communications for its high-data-rate and cost- effectiveness. Therefore, the OC12 VCSEL parallel optical interconnection is the ideal technology to meet the high- performance low-cost requirements for delivering affordable multimedia services to mass users. This paper describes a multimedia OC12 parallel optical interconnection using a VCSEL array transceiver, a multifiber ribbon cable, and MT connectors to achieve a high-performance, low-cost parallel link. A logical model of a multimedia server with parallel connections to an ATM switch, and to clients is presented. The design of the parallel optical link is analyzed. Furthermore, the link configured for testing, the test method, and test results are presented to confirm the analysis and to assure reliable link performance.

  7. Design and numerical evaluation of a volume coil array for parallel MR imaging at ultrahigh fields

    PubMed Central

    Pang, Yong; Wong, Ernest W.H.; Yu, Baiying

    2014-01-01

    In this work, we propose and investigate a volume coil array design method using different types of birdcage coils for MR imaging. Unlike the conventional radiofrequency (RF) coil arrays of which the array elements are surface coils, the proposed volume coil array consists of a set of independent volume coils including a conventional birdcage coil, a transverse birdcage coil, and a helix birdcage coil. The magnetic fluxes of these three birdcage coils are intrinsically cancelled, yielding a highly decoupled volume coil array. In contrast to conventional non-array type volume coils, the volume coil array would be beneficial in improving MR signal-to-noise ratio (SNR) and also gain the capability of implementing parallel imaging. The volume coil array is evaluated at the ultrahigh field of 7T using FDTD numerical simulations, and the g-factor map at different acceleration rates was also calculated to investigate its parallel imaging performance. PMID:24649435

  8. Massively parallel computation of lattice associative memory classifiers on multicore processors

    NASA Astrophysics Data System (ADS)

    Ritter, Gerhard X.; Schmalz, Mark S.; Hayden, Eric T.

    2011-09-01

    Over the past quarter century, concepts and theory derived from neural networks (NNs) have featured prominently in the literature of pattern recognition. Implementationally, classical NNs based on the linear inner product can present performance challenges due to the use of multiplication operations. In contrast, NNs having nonlinear kernels based on Lattice Associative Memories (LAM) theory tend to concentrate primarily on addition and maximum/minimum operations. More generally, the emergence of LAM-based NNs, with their superior information storage capacity, fast convergence and training due to relatively lower computational cost, as well as noise-tolerant classification has extended the capabilities of neural networks far beyond the limited applications potential of classical NNs. This paper explores theory and algorithmic approaches for the efficient computation of LAM-based neural networks, in particular lattice neural nets and dendritic lattice associative memories. Of particular interest are massively parallel architectures such as multicore CPUs and graphics processing units (GPUs). Originally developed for video gaming applications, GPUs hold the promise of high computational throughput without compromising numerical accuracy. Unfortunately, currently-available GPU architectures tend to have idiosyncratic memory hierarchies that can produce unacceptably high data movement latencies for relatively simple operations, unless careful design of theory and algorithms is employed. Advantageously, some GPUs (e.g., the Nvidia Fermi GPU) are optimized for efficient streaming computation (e.g., concurrent multiply and add operations). As a result, the linear or nonlinear inner product structures of NNs are inherently suited to multicore GPU computational capabilities. In this paper, the authors' recent research in lattice associative memories and their implementation on multicores is overviewed, with results that show utility for a wide variety of pattern classification applications using classical NNs or lattice-based NNs. Dataflow diagrams are presented in terms of a parameterized model of data burden and LAM partitioning.

  9. Graphics-processor-unit-based parallelization of optimized baseline wander filtering algorithms for long-term electrocardiography.

    PubMed

    Niederhauser, Thomas; Wyss-Balmer, Thomas; Haeberlin, Andreas; Marisa, Thanks; Wildhaber, Reto A; Goette, Josef; Jacomet, Marcel; Vogel, Rolf

    2015-06-01

    Long-term electrocardiogram (ECG) often suffers from relevant noise. Baseline wander in particular is pronounced in ECG recordings using dry or esophageal electrodes, which are dedicated for prolonged registration. While analog high-pass filters introduce phase distortions, reliable offline filtering of the baseline wander implies a computational burden that has to be put in relation to the increase in signal-to-baseline ratio (SBR). Here, we present a graphics processor unit (GPU)-based parallelization method to speed up offline baseline wander filter algorithms, namely the wavelet, finite, and infinite impulse response, moving mean, and moving median filter. Individual filter parameters were optimized with respect to the SBR increase based on ECGs from the Physionet database superimposed to autoregressive modeled, real baseline wander. A Monte-Carlo simulation showed that for low input SBR the moving median filter outperforms any other method but negatively affects ECG wave detection. In contrast, the infinite impulse response filter is preferred in case of high input SBR. However, the parallelized wavelet filter is processed 500 and four times faster than these two algorithms on the GPU, respectively, and offers superior baseline wander suppression in low SBR situations. Using a signal segment of 64 mega samples that is filtered as entire unit, wavelet filtering of a seven-day high-resolution ECG is computed within less than 3 s. Taking the high filtering speed into account, the GPU wavelet filter is the most efficient method to remove baseline wander present in long-term ECGs, with which computational burden can be strongly reduced. PMID:25675449

  10. Appendix E: Parallel Pascal development system

    NASA Technical Reports Server (NTRS)

    1985-01-01

    The Parallel Pascal Development System enables Parallel Pascal programs to be developed and tested on a conventional computer. It consists of several system programs, including a Parallel Pascal to standard Pascal translator, and a library of Parallel Pascal subprograms. The library includes subprograms for using Parallel Pascal on a parallel system with a fixed degree of parallelism, such as the Massively Parallel Processor, to conveniently manipulate arrays which have dimensions than the hardware. Programs can be conveninetly tested with small sized arrays on the conventional computer before attempting to run on a parallel system.

  11. Analysis of a parallel-arrayed power regulating system

    NASA Technical Reports Server (NTRS)

    Colburn, B. K.; Horton, H. M.; Honnell, M. A.

    1979-01-01

    A power regulation system incorporating n-parallel power supplies employing PWM switching regulators is studied. Analysis of individual unit operation and coupled-system parameter sensitivity is considered from an operations viewpoint. A detailed example is included to illustrate parallel system operation for 18 such units powered by solar-cell banks.

  12. Acoustooptic linear algebra processors - Architectures, algorithms, and applications

    NASA Technical Reports Server (NTRS)

    Casasent, D.

    1984-01-01

    Architectures, algorithms, and applications for systolic processors are described with attention to the realization of parallel algorithms on various optical systolic array processors. Systolic processors for matrices with special structure and matrices of general structure, and the realization of matrix-vector, matrix-matrix, and triple-matrix products and such architectures are described. Parallel algorithms for direct and indirect solutions to systems of linear algebraic equations and their implementation on optical systolic processors are detailed with attention to the pipelining and flow of data and operations. Parallel algorithms and their optical realization for LU and QR matrix decomposition are specifically detailed. These represent the fundamental operations necessary in the implementation of least squares, eigenvalue, and SVD solutions. Specific applications (e.g., the solution of partial differential equations, adaptive noise cancellation, and optimal control) are described to typify the use of matrix processors in modern advanced signal processing.

  13. Computer Processor Allocator

    Energy Science and Technology Software Center (ESTSC)

    2004-03-01

    The Compute Processor Allocator (CPA) provides an efficient and reliable mechanism for managing and allotting processors in a massively parallel (MP) computer. It maintains information in a database on the health. configuration and allocation of each processor. This persistent information is factored in to each allocation decision. The CPA runs in a distributed fashion to avoid a single point of failure.

  14. Electrostatic quadrupole array for focusing parallel beams of charged particles

    DOEpatents

    Brodowski, John

    1982-11-23

    An array of electrostatic quadrupoles, capable of providing strong electrostatic focusing simultaneously on multiple beams, is easily fabricated from a single array element comprising a support rod and multiple electrodes spaced at intervals along the rod. The rods are secured to four terminals which are isolated by only four insulators. This structure requires bias voltage to be supplied to only two terminals and eliminates the need for individual electrode bias and insulators, as well as increases life by eliminating beam plating of insulators.

  15. Research of control system stability in solar array simulator with continuous power amplifier of parallel type

    NASA Astrophysics Data System (ADS)

    Mizrah, E. A.; Tkachev, S. B.; Shtabel, N. V.

    2015-10-01

    Solar array simulators are nonlinear control systems designed to reproduce static and dynamic characteristics of solar array. Solar array characteristics depend on illumination, temperature, space environment and other causes. During on-earth testing of spacecraft power systems there is a problem reaching stable work of simulator with different impedance loads in wide range load regulation. In the article authors propose a research method for absolute process stability in solar array simulators and present results of absolute stability research for solar array simulator with continuous parallel type power amplifier.

  16. Parallel array of independent thermostats for column separations

    DOEpatents

    Foret, Frantisek; Karger, Barry L.

    2005-08-16

    A thermostat array including an array of two or more capillary columns (10) or two or more channels in a microfabricated device is disclosed. A heat conductive material (12) surrounded each individual column or channel in array, each individual column or channel being thermally insulated from every other individual column or channel. One or more independently controlled heating or cooling elements (14) is positioned adjacent to individual columns or channels within the heat conductive material, each heating or cooling element being connected to a source of heating or cooling, and one or more independently controlled temperature sensing elements (16) is positioned adjacent to the individual columns or channels within the heat conductive material. Each temperature sensing element is connected to a temperature controller.

  17. VLSI processor with a configurable processing element array for balanced feature extraction in high-resolution images

    NASA Astrophysics Data System (ADS)

    Zhu, Hongbo; Shibata, Tadashi

    2014-01-01

    A VLSI processor employing a configurable processing element array (PEA) is developed for a newly proposed balanced feature extraction algorithm. In the algorithm, the input image is divided into square regions and the number of features is determined by noise effect analysis in each region. Regions of different sizes are used according to the resolutions and contents of input images. Therefore, inside the PEA, processing elements are hierarchically grouped for feature extraction in regions of different sizes. A proof-of-concept chip is fabricated using a 0.18 µm CMOS technology with a 32 × 32 PEA. From measurement results, a speed of 7.5 kfps is achieved for feature extraction in 128 × 128 pixel regions when operating the chip at 45 MHz, and a speed of 55 fps is also achieved for feature extraction in 1920 × 1080 pixel images.

  18. A fast adaptive convex hull algorithm on two-dimensional processor arrays with a reconfigurable BUS system

    NASA Technical Reports Server (NTRS)

    Olariu, S.; Schwing, J.; Zhang, J.

    1991-01-01

    A bus system that can change dynamically to suit computational needs is referred to as reconfigurable. We present a fast adaptive convex hull algorithm on a two-dimensional processor array with a reconfigurable bus system (2-D PARBS, for short). Specifically, we show that computing the convex hull of a planar set of n points taken O(log n/log m) time on a 2-D PARBS of size mn x n with 3 less than or equal to m less than or equal to n. Our result implies that the convex hull of n points in the plane can be computed in O(1) time in a 2-D PARBS of size n(exp 1.5) x n.

  19. High-speed, automatic controller design considerations for integrating array processor, multi-microprocessor, and host computer system architectures

    NASA Technical Reports Server (NTRS)

    Jacklin, S. A.; Leyland, J. A.; Warmbrodt, W.

    1985-01-01

    Modern control systems must typically perform real-time identification and control, as well as coordinate a host of other activities related to user interaction, online graphics, and file management. This paper discusses five global design considerations which are useful to integrate array processor, multimicroprocessor, and host computer system architectures into versatile, high-speed controllers. Such controllers are capable of very high control throughput, and can maintain constant interaction with the nonreal-time or user environment. As an application example, the architecture of a high-speed, closed-loop controller used to actively control helicopter vibration is briefly discussed. Although this system has been designed for use as the controller for real-time rotorcraft dynamics and control studies in a wind tunnel environment, the controller architecture can generally be applied to a wide range of automatic control applications.

  20. Language and environment support for parallel array object I/O on distributed environments

    SciTech Connect

    Lee, J.K.; Tsaur, I.K.; Hwang, S.Y.

    1995-12-01

    This paper describes a parallel file object environment to support distributed array store on shared-nothing distributed computing environments. Our environment enables programmers to extend the concept of array distribution from memory levels to file levels. It allows parallel I/O according to the distribution of objects in an application. When objects are read and/or written by multiple applications using different distributions, we present a novel scheme to help programmers to select the best data distribution pattern according to minimum amount of remote data movements for the store of array objects on distributed file systems.

  1. Stream Processors

    NASA Astrophysics Data System (ADS)

    Erez, Mattan; Dally, William J.

    Stream processors, like other multi core architectures partition their functional units and storage into multiple processing elements. In contrast to typical architectures, which contain symmetric general-purpose cores and a cache hierarchy, stream processors have a significantly leaner design. Stream processors are specifically designed for the stream execution model, in which applications have large amounts of explicit parallel computation, structured and predictable control, and memory accesses that can be performed at a coarse granularity. Applications in the streaming model are expressed in a gather-compute-scatter form, yielding programs with explicit control over transferring data to and from on-chip memory. Relying on these characteristics, which are common to many media processing and scientific computing applications, stream architectures redefine the boundary between software and hardware responsibilities with software bearing much of the complexity required to manage concurrency, locality, and latency tolerance. Thus, stream processors have minimal control consisting of fetching medium- and coarse-grained instructions and executing them directly on the many ALUs. Moreover, the on-chip storage hierarchy of stream processors is under explicit software control, as is all communication, eliminating the need for complex reactive hardware mechanisms.

  2. Experience in highly parallel processing using DAP

    NASA Technical Reports Server (NTRS)

    Parkinson, D.

    1987-01-01

    Distributed Array Processors (DAP) have been in day to day use for ten years and a large amount of user experience has been gained. The profile of user applications is similar to that of the Massively Parallel Processor (MPP) working group. Experience has shown that contrary to expectations, highly parallel systems provide excellent performance on so-called dirty problems such as the physics part of meteorological codes. The reasons for this observation are discussed. The arguments against replacing bit processors with floating point processors are also discussed.

  3. NEUSORT2.0: a multiple-channel neural signal processor with systolic array buffer and channel-interleaving processing schedule.

    PubMed

    Chen, Tung-Chien; Yang, Zhi; Liu, Wentai; Chen, Liang-Gee

    2008-01-01

    An emerging class of neuroprosthetic devices aims to provide aggressive performance by integrating more complicated signal processing hardware into the neural recording system with a large amount of electrodes. However, the traditional parallel structure duplicating one neural signal processor (NSP) multiple times for multiple channels takes a heavy burden on chip area. The serial structure sequentially switching the processing task between channels requires a bulky memory to store neural data and may has a long processing delay. In this paper, a memory hierarchy of systolic array buffer is proposed to support signal processing interleavingly channel by channel in cycle basis to match up with the data flow of the optimized multiple-channel frontend interface circuitry. The NSP can thus be tightly coupled to the analog frontend interface circuitry and perform signal processing for multiple channels in real time without any bulky memory. Based on our previous one-channel NSP of NEUSORT1.0 [1], the proposed memory hierarchy is realized on NEUSORT2.0 for a 16-channel neural recording system. Compared to 16 of NEUSORT1.0, NEUSORT2.0 demonstrates a 81.50% saving in terms of areaxpower factor. PMID:19163846

  4. Parallel RNA extraction using magnetic beads and a droplet array

    PubMed Central

    Shi, Xu; Chen, Chun-Hong; Gao, Weimin; Meldrum, Deirdre R.

    2015-01-01

    Nucleic acid extraction is a necessary step for most genomic/transcriptomic analyses, but it often requires complicated mechanisms to be integrated into a lab-on-a-chip device. Here, we present a simple, effective configuration for rapidly obtaining purified RNA from low concentration cell medium. This Total RNA Extraction Droplet Array (TREDA) utilizes an array of surface-adhering droplets to facilitate the transportation of magnetic purification beads seamlessly through individual buffer solutions without solid structures. The fabrication of TREDA chips is rapid and does not require a microfabrication facility or expertise. The process takes less than 5 minutes. When purifying mRNA from bulk marine diatom samples, its repeatability and extraction efficiency are comparable to conventional tube-based operations. We demonstrate that TREDA can extract the total mRNA of about 10 marine diatom cells, indicating that the sensitivity of TREDA approaches single-digit cell numbers. PMID:25519439

  5. "Multipoint Force Feedback" Leveling of Massively Parallel Tip Arrays in Scanning Probe Lithography.

    PubMed

    Noh, Hanaul; Jung, Goo-Eun; Kim, Sukhyun; Yun, Seong-Hun; Jo, Ahjin; Kahng, Se-Jong; Cho, Nam-Joon; Cho, Sang-Joon

    2015-09-16

    Nanoscale patterning with massively parallel 2D array tips is of significant interest in scanning probe lithography. A challenging task for tip-based large area nanolithography is maintaining parallel tip arrays at the same contact point with a sample substrate in order to pattern a uniform array. Here, polymer pen lithography is demonstrated with a novel leveling method to account for the magnitude and direction of the total applied force of tip arrays by a multipoint force sensing structure integrated into the tip holder. This high-precision approach results in a 0.001 slope of feature edge length variation over 1 cm wide tip arrays. The position sensitive leveling operates in a fully automated manner and is applicable to recently developed scanning probe lithography techniques of various kinds which can enable "desktop nanofabrication." PMID:26081390

  6. High-performance ultra-low power VLSI analog processor for data compression

    NASA Technical Reports Server (NTRS)

    Tawel, Raoul (Inventor)

    1996-01-01

    An apparatus for data compression employing a parallel analog processor. The apparatus includes an array of processor cells with N columns and M rows wherein the processor cells have an input device, memory device, and processor device. The input device is used for inputting a series of input vectors. Each input vector is simultaneously input into each column of the array of processor cells in a pre-determined sequential order. An input vector is made up of M components, ones of which are input into ones of M processor cells making up a column of the array. The memory device is used for providing ones of M components of a codebook vector to ones of the processor cells making up a column of the array. A different codebook vector is provided to each of the N columns of the array. The processor device is used for simultaneously comparing the components of each input vector to corresponding components of each codebook vector, and for outputting a signal representative of the closeness between the compared vector components. A combination device is used to combine the signal output from each processor cell in each column of the array and to output a combined signal. A closeness determination device is then used for determining which codebook vector is closest to an input vector from the combined signals, and for outputting a codebook vector index indicating which of the N codebook vectors was the closest to each input vector input into the array.

  7. A frequency and sensitivity tunable microresonator array for high-speed quantum processor readout

    NASA Astrophysics Data System (ADS)

    Whittaker, J. D.; Swenson, L. J.; Volkmann, M. H.; Spear, P.; Altomare, F.; Berkley, A. J.; Bumble, B.; Bunyk, P.; Day, P. K.; Eom, B. H.; Harris, R.; Hilton, J. P.; Hoskinson, E.; Johnson, M. W.; Kleinsasser, A.; Ladizinsky, E.; Lanting, T.; Oh, T.; Perminov, I.; Tolkacheva, E.; Yao, J.

    2016-01-01

    Superconducting microresonators have been successfully utilized as detection elements for a wide variety of applications. With multiplexing factors exceeding 1000 detectors per transmission line, they are the most scalable low-temperature detector technology demonstrated to date. For high-throughput applications, fewer detectors can be coupled to a single wire but utilize a larger per-detector bandwidth. For all existing designs, fluctuations in fabrication tolerances result in a non-uniform shift in resonance frequency and sensitivity, which ultimately limits the efficiency of bandwidth utilization. Here, we present the design, implementation, and initial characterization of a superconducting microresonator readout integrating two tunable inductances per detector. We demonstrate that these tuning elements provide independent control of both the detector frequency and sensitivity, allowing us to maximize the transmission line bandwidth utilization. Finally, we discuss the integration of these detectors in a multilayer fabrication stack for high-speed readout of the D-Wave quantum processor, highlighting the use of control and routing circuitry composed of single-flux-quantum loops to minimize the number of control wires at the lowest temperature stage.

  8. Achieving supercomputer performance for neural net simulation with an array of digital signal processors

    SciTech Connect

    Muller, U.A.; Baumle, B.; Kohler, P.; Gunzinger, A.; Guggenbuhl, W.

    1992-10-01

    Music, a DSP-based system with a parallel distributed-memory architecture, provides enormous computing power yet retains the flexibility of a general-purpose computer. Reaching a peak performance of 2.7 Gflops at a significantly lower cost, power consumption, and space requirement than conventional supercomputers, Music is well suited to computationally intensive applications such as neural network simulation. 12 refs., 9 figs., 2 tabs.

  9. Optoelectronic smart-pixel arrays for parallel processing

    NASA Astrophysics Data System (ADS)

    Wherrett, B. S.; Desmulliez, M. P. Y.

    1996-09-01

    The transition of thinking from the use of all-optical logic arrays, to optoelectronic smart-pixel arrays, within digital information processing is described. Detectors with low conversion energies and fast modulators, developed from devices originally intended as bistable elements, now provide one of the technologies being pursued for the optics - electronics interface. The key advantage provided by optics is the huge bandwidth off-chip (THz); electronics provides high density locally interconnected (on-chip) logic devices. Applications that exploit this combination are being sought. One possible area is in the sorting of data sets, where non-local interconnections between stages and modest logic functionality per stage, are required in order to implement fast algorithms. The expected performance of a smart-pixel sorting module, such as that under construction by the Scottish Collaborative Initiative in Optoelectronic Sciences (SCIOS) is summarized. The move from all-optical to hybrid technologies does not eradicate the need for further advances in materials and in the processing control of materials with nonlinear optical (electro-absorption and electro-optic) responses.

  10. Fully parallel write/read in resistive synaptic array for accelerating on-chip learning

    NASA Astrophysics Data System (ADS)

    Gao, Ligang; Wang, I.-Ting; Chen, Pai-Yu; Vrudhula, Sarma; Seo, Jae-sun; Cao, Yu; Hou, Tuo-Hung; Yu, Shimeng

    2015-11-01

    A neuro-inspired computing paradigm beyond the von Neumann architecture is emerging and it generally takes advantage of massive parallelism and is aimed at complex tasks that involve intelligence and learning. The cross-point array architecture with synaptic devices has been proposed for on-chip implementation of the weighted sum and weight update in the learning algorithms. In this work, forming-free, silicon-process-compatible Ta/TaO x /TiO2/Ti synaptic devices are fabricated, in which >200 levels of conductance states could be continuously tuned by identical programming pulses. In order to demonstrate the advantages of parallelism of the cross-point array architecture, a novel fully parallel write scheme is designed and experimentally demonstrated in a small-scale crossbar array to accelerate the weight update in the training process, at a speed that is independent of the array size. Compared to the conventional row-by-row write scheme, it achieves >30× speed-up and >30× improvement in energy efficiency as projected in a large-scale array. If realistic synaptic device characteristics such as device variations are taken into an array-level simulation, the proposed array architecture is able to achieve ∼95% recognition accuracy of MNIST handwritten digits, which is close to the accuracy achieved by software using the ideal sparse coding algorithm.

  11. CombinePlt and CombineThs user manual: Merging multiple, processor-local plot and time-history data bases produced during a parallel calculation

    SciTech Connect

    Procassini, R.J.; DeGroot, A.J.

    1995-06-01

    The CombinePlt and CombineThs post-processing utilities are designed to merge the data in multiple, processor-local plot and time-history data bases produced by the parallel versions of the analysis codes DYNA3D, NIKE3D or PING into a serial data base which is compatible with the existing versions of the GRIZ and THUG visualization tools. These utilities make use of the partition assignment file produced by the PartMesh suite of pre-processing utilities to map the data from the processor-local order to global order. These utilities are also capable of translating 64-bit IEEE data bases into 32-bit IEEE data bases which are required for post-processing with GRIZ or THUG on an SGI workstation.

  12. CombinePlt and CombineThs user manual: Merging multiple, processor-local plot and time-history data bases produced during a parallel calculation. Revision 1

    SciTech Connect

    Procassini, R.J.; DeGroot, A.J.

    1995-09-21

    The CombinePlt and CombineThs post-processing utilities are designed to merge the data in multiple, processor-local plot and time-history data bases produced by the parallel versions of the analysis codes DYNA3D, NIKE3D or PING into a serial database which is compatible with the existing versions of the GRIZ and THUG visualization tools. These utilities make use of the partition assignment file produced by the PartMesh suite for pre-processing utilities to map the data from the processor-local order to global order. These utilities are also capable of translating 64-bit IEEE data bases into 32-bit IEEE data bases which are required for post-processing with GRIZ or THUG on an SGI workstation.

  13. Breast ultrasound tomography with two parallel transducer arrays: preliminary clinical results

    NASA Astrophysics Data System (ADS)

    Huang, Lianjie; Shin, Junseob; Chen, Ting; Lin, Youzuo; Intrator, Miranda; Hanson, Kenneth; Epstein, Katherine; Sandoval, Daniel; Williamson, Michael

    2015-03-01

    Ultrasound tomography has great potential to provide quantitative estimations of physical properties of breast tumors for accurate characterization of breast cancer. We design and manufacture a new synthetic-aperture breast ultrasound tomography system with two parallel transducer arrays. The distance of these two transducer arrays is adjustable for scanning breasts with different sizes. The ultrasound transducer arrays are translated vertically to scan the entire breast slice by slice and acquires ultrasound transmission and reflection data for whole-breast ultrasound imaging and tomographic reconstructions. We use the system to acquire patient data at the University of New Mexico Hospital for clinical studies. We present some preliminary imaging results of in vivo patient ultrasound data. Our preliminary clinical imaging results show promising of our breast ultrasound tomography system with two parallel transducer arrays for breast cancer imaging and characterization.

  14. Development of Microreactor Array Chip-Based Measurement System for Massively Parallel Analysis of Enzymatic Activity

    NASA Astrophysics Data System (ADS)

    Hosoi, Yosuke; Akagi, Takanori; Ichiki, Takanori

    Microarray chip technology such as DNA chips, peptide chips and protein chips is one of the promising approaches for achieving high-throughput screening (HTS) of biomolecule function since it has great advantages in feasibility of automated information processing due to one-to-one indexing between array position and molecular function as well as massively parallel sample analysis as a benefit of down-sizing and large-scale integration. Mostly, however, the function that can be evaluated by such microarray chips is limited to affinity of target molecules. In this paper, we propose a new HTS system of enzymatic activity based on microreactor array chip technology. A prototype of the automated and massively parallel measurement system for fluorometric assay of enzymatic reactions was developed by the combination of microreactor array chips and a highly-sensitive fluorescence microscope. Design strategy of microreactor array chips and an optical measurement platform for the high-throughput enzyme assay are discussed.

  15. Parallel processing on the Livermore VAX 11/780-4 parallel processor system with compatibility to Cray Research, Inc. (CRI) multitasking. Version 1

    SciTech Connect

    Werner, N.E.; Van Matre, S.W.

    1985-05-01

    This manual describes the CRI Subroutine Library and Utility Package. The CRI library provides Cray multitasking functionality on the four-processor shared memory VAX 11/780-4. Additional functionality has been added for more flexibility. A discussion of the library, utilities, error messages, and example programs is provided.

  16. Using a Cray Y-MP as an array processor for a RISC Workstation

    NASA Technical Reports Server (NTRS)

    Lamaster, Hugh; Rogallo, Sarah J.

    1992-01-01

    As microprocessors increase in power, the economics of centralized computing has changed dramatically. At the beginning of the 1980's, mainframes and super computers were often considered to be cost-effective machines for scalar computing. Today, microprocessor-based RISC (reduced-instruction-set computer) systems have displaced many uses of mainframes and supercomputers. Supercomputers are still cost competitive when processing jobs that require both large memory size and high memory bandwidth. One such application is array processing. Certain numerical operations are appropriate to use in a Remote Procedure Call (RPC)-based environment. Matrix multiplication is an example of an operation that can have a sufficient number of arithmetic operations to amortize the cost of an RPC call. An experiment which demonstrates that matrix multiplication can be executed remotely on a large system to speed the execution over that experienced on a workstation is described.

  17. Development of a ground signal processor for digital synthetic array radar data

    NASA Technical Reports Server (NTRS)

    Griffin, C. R.; Estes, J. M.

    1981-01-01

    A modified APQ-102 sidelooking array radar (SLAR) in a B-57 aircraft test bed is used, with other optical and infrared sensors, in remote sensing of Earth surface features for various users at NASA Johnson Space Center. The video from the radar is normally recorded on photographic film and subsequently processed photographically into high resolution radar images. Using a high speed sampling (digitizing) system, the two receiver channels of cross-and co-polarized video are recorded on wideband magnetic tape along with radar and platform parameters. These data are subsequently reformatted and processed into digital synthetic aperture radar images with the image data available on magnetic tape for subsequent analysis by investigators. The system design and results obtained are described.

  18. Coding for parallel execution of hardware-in-the-loop millimeter-wave scene generation models on multicore SIMD processor architectures

    NASA Astrophysics Data System (ADS)

    Olson, Richard F.

    2013-05-01

    Rendering of point scatterer based radar scenes for millimeter wave (mmW) seeker tests in real-time hardware-in-the-loop (HWIL) scene generation requires efficient algorithms and vector-friendly computer architectures for complex signal synthesis. New processor technology from Intel implements an extended 256-bit vector SIMD instruction set (AVX, AVX2) in a multi-core CPU design providing peak execution rates of hundreds of GigaFLOPS (GFLOPS) on one chip. Real world mmW scene generation code can approach peak SIMD execution rates only after careful algorithm and source code design. An effective software design will maintain high computing intensity emphasizing register-to-register SIMD arithmetic operations over data movement between CPU caches or off-chip memories. Engineers at the U.S. Army Aviation and Missile Research, Development and Engineering Center (AMRDEC) applied two basic parallel coding methods to assess new 256-bit SIMD multi-core architectures for mmW scene generation in HWIL. These include use of POSIX threads built on vector library functions and more portable, highlevel parallel code based on compiler technology (e.g. OpenMP pragmas and SIMD autovectorization). Since CPU technology is rapidly advancing toward high processor core counts and TeraFLOPS peak SIMD execution rates, it is imperative that coding methods be identified which produce efficient and maintainable parallel code. This paper describes the algorithms used in point scatterer target model rendering, the parallelization of those algorithms, and the execution performance achieved on an AVX multi-core machine using the two basic parallel coding methods. The paper concludes with estimates for scale-up performance on upcoming multi-core technology.

  19. Sequence information signal processor

    DOEpatents

    Peterson, John C. (Alta Loma, CA); Chow, Edward T. (San Dimas, CA); Waterman, Michael S. (Culver City, CA); Hunkapillar, Timothy J. (Pasadena, CA)

    1999-01-01

    An electronic circuit is used to compare two sequences, such as genetic sequences, to determine which alignment of the sequences produces the greatest similarity. The circuit includes a linear array of series-connected processors, each of which stores a single element from one of the sequences and compares that element with each successive element in the other sequence. For each comparison, the processor generates a scoring parameter that indicates which segment ending at those two elements produces the greatest degree of similarity between the sequences. The processor uses the scoring parameter to generate a similar scoring parameter for a comparison between the stored element and the next successive element from the other sequence. The processor also delivers the scoring parameter to the next processor in the array for use in generating a similar scoring parameter for another pair of elements. The electronic circuit determines which processor and alignment of the sequences produce the scoring parameter with the highest value.

  20. Mitigation of cache memory using an embedded hard-core PPC440 processor in a Virtex-5 Field Programmable Gate Array.

    SciTech Connect

    Learn, Mark Walter

    2010-02-01

    Sandia National Laboratories is currently developing new processing and data communication architectures for use in future satellite payloads. These architectures will leverage the flexibility and performance of state-of-the-art static-random-access-memory-based Field Programmable Gate Arrays (FPGAs). One such FPGA is the radiation-hardened version of the Virtex-5 being developed by Xilinx. However, not all features of this FPGA are being radiation-hardened by design and could still be susceptible to on-orbit upsets. One such feature is the embedded hard-core PPC440 processor. Since this processor is implemented in the FPGA as a hard-core, traditional mitigation approaches such as Triple Modular Redundancy (TMR) are not available to improve the processor's on-orbit reliability. The goal of this work is to investigate techniques that can help mitigate the embedded hard-core PPC440 processor within the Virtex-5 FPGA other than TMR. Implementing various mitigation schemes reliably within the PPC440 offers a powerful reconfigurable computing resource to these node-based processing architectures. This document summarizes the work done on the cache mitigation scheme for the embedded hard-core PPC440 processor within the Virtex-5 FPGAs, and describes in detail the design of the cache mitigation scheme and the testing conducted at the radiation effects facility on the Texas A&M campus.

  1. A class of parallel algorithms for computation of the manipulator inertia matrix

    NASA Technical Reports Server (NTRS)

    Fijany, Amir; Bejczy, Antal K.

    1989-01-01

    Parallel and parallel/pipeline algorithms for computation of the manipulator inertia matrix are presented. An algorithm based on composite rigid-body spatial inertia method, which provides better features for parallelization, is used for the computation of the inertia matrix. Two parallel algorithms are developed which achieve the time lower bound in computation. Also described is the mapping of these algorithms with topological variation on a two-dimensional processor array, with nearest-neighbor connection, and with cardinality variation on a linear processor array. An efficient parallel/pipeline algorithm for the linear array was also developed, but at significantly higher efficiency.

  2. A Systolic Array-Based FPGA Parallel Architecture for the BLAST Algorithm

    PubMed Central

    Guo, Xinyu; Wang, Hong; Devabhaktuni, Vijay

    2012-01-01

    A design of systolic array-based Field Programmable Gate Array (FPGA) parallel architecture for Basic Local Alignment Search Tool (BLAST) Algorithm is proposed. BLAST is a heuristic biological sequence alignment algorithm which has been used by bioinformatics experts. In contrast to other designs that detect at most one hit in one-clock-cycle, our design applies a Multiple Hits Detection Module which is a pipelining systolic array to search multiple hits in a single-clock-cycle. Further, we designed a Hits Combination Block which combines overlapping hits from systolic array into one hit. These implementations completed the first and second step of BLAST architecture and achieved significant speedup comparing with previously published architectures. PMID:25969747

  3. Anisotropic charge and heat conduction through arrays of parallel elliptic cylinders in a continuous medium

    NASA Astrophysics Data System (ADS)

    Martin, James E.; Ribaudo, Troy

    2013-04-01

    Arrays of circular pores in silicon can exhibit a phononic bandgap when the lattice constant is smaller than the phonon scattering length, and so have become of interest for use as thermoelectric materials, due to the large reduction in thermal conductivity that this bandgap can cause. The reduction in electrical conductivity is expected to be less, because the lattice constant of these arrays is engineered to be much larger than the electron scattering length. As a result, electron transport through the effective medium is well described by the diffusion equation, and the Seebeck coefficient is expected to increase. In this paper, we develop an expression for the purely diffusive thermal (or electrical) conductivity of a composite comprised of square or hexagonal arrays of parallel circular or elliptic cylinders of one material in a continuum of a second material. The transport parallel to the cylinders is straightforward, so we consider the transport in the two principal directions normal to the cylinders, using a self-consistent local field calculation based on the point dipole approximation. There are two limiting cases: large negative contrast (e.g., pores in a conductor) and large positive contrast (conducting pillars in air). In the large negative contrast case, the transport is only slightly affected parallel to the major axis of the elliptic cylinders but can be significantly affected parallel to the minor axis, even in the limit of zero volume fraction of pores. The positive contrast case is just the opposite: the transport is only slightly affected parallel to the minor axis of the pillars but can be significantly affected parallel to the major axis, even in the limit of zero volume fraction of pillars. The analytical results are compared to extensive FEA calculations obtained using Comsol™ and the agreement is generally very good, provided the cylinders are sufficiently small compared to the lattice constant.

  4. Automatic Parallelization of Numerical Python Applications using the Global Arrays Toolkit

    SciTech Connect

    Daily, Jeffrey A.; Lewis, Robert R.

    2011-11-30

    Global Arrays is a software system from Pacific Northwest National Laboratory that enables an efficient, portable, and parallel shared-memory programming interface to manipulate distributed dense arrays. The NumPy module is the de facto standard for numerical calculation in the Python programming language, a language whose use is growing rapidly in the scientific and engineering communities. NumPy provides a powerful N-dimensional array class as well as other scientific computing capabilities. However, like the majority of the core Python modules, NumPy is inherently serial. Using a combination of Global Arrays and NumPy, we have reimplemented NumPy as a distributed drop-in replacement called Global Arrays in NumPy (GAiN). Serial NumPy applications can become parallel, scalable GAiN applications with only minor source code changes. Scalability studies of several different GAiN applications will be presented showing the utility of developing serial NumPy codes which can later run on more capable clusters or supercomputers.

  5. Hydrodynamic Crystals: Collective Dynamics of Regular Arrays of Spherical Particles in a Parallel-Wall Channel

    NASA Astrophysics Data System (ADS)

    Baron, M.; B?awzdziewicz, J.; Wajnryb, E.

    2008-05-01

    Simulations of over 103 hydrodynamically coupled solid spheres are performed to investigate collective motion of linear trains and regular square arrays of particles suspended in a fluid bounded by two parallel walls. Our novel accelerated Stokesian-dynamics algorithm relies on simplifications associated with the Hele-Shaw asymptotic far-field form of the flow scattered by the particles. The simulations reveal propagation of particle-displacement waves, deformation, and rearrangements of a particle lattice, propagation of dislocation defects in ordered arrays, and long-lasting coexistence of ordered and disordered regions.

  6. Cross-link induced linear and curved polymer channel waveguide arrays for massively parallel optical interconnects

    NASA Astrophysics Data System (ADS)

    Chen, Ray T.

    1993-01-01

    A single-mode polymer-based channel waveguide array with 1250 channels/cm packaging density on a cross-link induced photopolymeric thin film is reported. This array works at 1.31 micrometers and 0.63 micrometers . Curved waveguides with radii of curvature (ROC) from 1 mm to 40 mm were demonstrated. Waveguide propagation loss in the neighborhood of 0.1 db/cm was demonstrated for both linear and curved waveguides. Interconnectivity for various interconnection architectures including cross bar, hypercube, daisy chain and star are further considered. Multiple layers of optical interconnects may be required for an optical backplane involving massively parallel highly distributed computing systems.

  7. The effect of steroids on peripheral blood lymphocytes containing parallel tubular arrays.

    PubMed Central

    Payne, C. M.; Glasser, L.

    1978-01-01

    The response of lymphocytes containing cytoplasmic inclusions called parallel tubular arrays (PTA) was determined after the administration of the glucocorticoid dexamethasone to 10 healthy volunteers. The percentage of these lymphocytes was found to increase during the lymphopenia induced by steroid administration. The size and number of parallel tubular arrays per cell showed no differences before and after steroid administration, indicating that the increase was a result of a change in the proportion of whole cells. This indicates, for the first time, that a morphologically defined population of lymphocytes from the normal peripheral circulation has been linked to a specific response, ie, steroid resistance. The possible mechanism of steroid resistance is discussed. Images Figure 2 Figure 1 PMID:686151

  8. Excitation of a Parallel Plate Waveguide by an Array of Rectangular Waveguides

    NASA Technical Reports Server (NTRS)

    Rengarajan, Sembiam

    2011-01-01

    This work addresses the problem of excitation of a parallel plate waveguide by an array of rectangular waveguides that arises in applications such as the continuous transverse stub (CTS) antenna and dual-polarized parabolic cylindrical reflector antennas excited by a scanning line source. In order to design the junction region between the parallel plate waveguide and the linear array of rectangular waveguides, waveguide sizes have to be chosen so that the input match is adequate for the range of scan angles for both polarizations. Electromagnetic wave scattered by the junction of a parallel plate waveguide by an array of rectangular waveguides is analyzed by formulating coupled integral equations for the aperture electric field at the junction. The integral equations are solved by the method of moments. In order to make the computational process efficient and accurate, the method of weighted averaging was used to evaluate rapidly oscillating integrals encountered in the moment matrix. In addition, the real axis spectral integral is evaluated in a deformed contour for speed and accuracy. The MoM results for a large finite array have been validated by comparing its reflection coefficients with corresponding results for an infinite array generated by the commercial finite element code, HFSS. Once the aperture electric field is determined by MoM, the input reflection coefficients at each waveguide port, and coupling for each polarization over the range of useful scan angles, are easily obtained. Results for the input impedance and coupling characteristics for both the vertical and horizontal polarizations are presented over a range of scan angles. It is shown that the scan range is limited to about 35 for both polarizations and therefore the optimum waveguide is a square of size equal to about 0.62 free space wavelength.

  9. Photon detection with parallel asynchronous processing

    NASA Technical Reports Server (NTRS)

    Coon, D. D.; Perera, A. G. U.

    1990-01-01

    An approach to photon detection with a parallel asynchronous signal processor is described. The visible or IR photon-detection capability of the silicon p(+)-n-n(+) detectors and the parallel asynchronous processing are addressed separately. This approach would permit an independent analog processing channel to be dedicated to every pixel. A laminar architecture consisting of a stack of planar arrays of the devices would form a 2D array processor with a 2D array of inputs located directly behind a focal-plane detector array. A 2D image data stream would propagate in neuronlike asynchronous pulse-coded form through the laminar processor. Such systems can integrate image acquisition and image processing. Acquisition and processing would be performed concurrently as in natural vision systems. The possibility of multispectral image processing is addressed.

  10. Nanopore arrays in a silicon membrane for parallel single-molecule detection: DNA translocation.

    PubMed

    Zhang, Miao; Schmidt, Torsten; Jemt, Anders; Sahln, Pelin; Sychugov, Ilya; Lundeberg, Joakim; Linnros, Jan

    2015-08-01

    Optical nanopore sensing offers great potential in single-molecule detection, genotyping, or DNA sequencing for high-throughput applications. However, one of the bottle-necks for fluorophore-based biomolecule sensing is the lack of an optically optimized membrane with a large array of nanopores, which has large pore-to-pore distance, small variation in pore size and low background photoluminescence (PL). Here, we demonstrate parallel detection of single-fluorophore-labeled DNA strands (450 bps) translocating through an array of silicon nanopores that fulfills the above-mentioned requirements for optical sensing. The nanopore array was fabricated using electron beam lithography and anisotropic etching followed by electrochemical etching resulting in pore diameters down to ?7 nm. The DNA translocation measurements were performed in a conventional wide-field microscope tailored for effective background PL control. The individual nanopore diameter was found to have a substantial effect on the translocation velocity, where smaller openings slow the translocation enough for the event to be clearly detectable in the fluorescence. Our results demonstrate that a uniform silicon nanopore array combined with wide-field optical detection is a promising alternative with which to realize massively-parallel single-molecule detection. PMID:26180050

  11. A flexible programmable signal processor for next generation fighter aircraft

    NASA Astrophysics Data System (ADS)

    Rowlett, R.; Stewart, C.; Mayor, M.

    The performance requirements of next generation Programmable Signal Processors (PSP) military radar applications are examined. Consideration is given to processor performance criteria (throughput rate, parameter changes, mode changes) in connection with several air-to-air radar modes including long range search, single target tracking, track-while-scan, and ECCM. Air-to-ground radar modes are also examined, with emphasis given to Moving Target Indication (MTI), Doppler mode, SAR, and terrain following/avoidance. It is shown that next-generation PSP will require processing speeds on the order of 1 billion complex operations per second. It is pointed out that conventional array processor architectures similar to those in current PSPs will need significantly larger memory bandwidths to achieve the required throughput rates. However, the use of parallel architectures such as systolic arrays and wavefront arrays can achieve such speeds with much lower memory bandwidth requirements.

  12. An Energy-Efficient and Scalable Deep Learning/Inference Processor With Tetra-Parallel MIMD Architecture for Big Data Applications.

    PubMed

    Park, Seong-Wook; Park, Junyoung; Bong, Kyeongryeol; Shin, Dongjoo; Lee, Jinmook; Choi, Sungpill; Yoo, Hoi-Jun

    2015-12-01

    Deep Learning algorithm is widely used for various pattern recognition applications such as text recognition, object recognition and action recognition because of its best-in-class recognition accuracy compared to hand-crafted algorithm and shallow learning based algorithms. Long learning time caused by its complex structure, however, limits its usage only in high-cost servers or many-core GPU platforms so far. On the other hand, the demand on customized pattern recognition within personal devices will grow gradually as more deep learning applications will be developed. This paper presents a SoC implementation to enable deep learning applications to run with low cost platforms such as mobile or portable devices. Different from conventional works which have adopted massively-parallel architecture, this work adopts task-flexible architecture and exploits multiple parallelism to cover complex functions of convolutional deep belief network which is one of popular deep learning/inference algorithms. In this paper, we implement the most energy-efficient deep learning and inference processor for wearable system. The implemented 2.5 mm ×4.0 mm deep learning/inference processor is fabricated using 65 nm 8-metal CMOS technology for a battery-powered platform with real-time deep inference and deep learning operation. It consumes 185 mW average power, and 213.1 mW peak power at 200 MHz operating frequency and 1.2 V supply voltage. It achieves 411.3 GOPS peak performance and 1.93 TOPS/W energy efficiency, which is 2.07× higher than the state-of-the-art. PMID:26780817

  13. SIMD massively parallel processing system for real-time image processing

    NASA Astrophysics Data System (ADS)

    Chen, Xiaochu; Zhang, Ming; Yao, Qingdong; Liu, Jilin; Ye, Hong; Wu, Song; Li, Dongxiao; Zhang, Yong; Ding, Lei; Yao, Zhongyang; Yang, Weijian; Pan, Qiaohai

    1998-09-01

    This paper will describe the embedded SIMD massively parallel processor that we have developed for real-time image processing applications, such as real-time small target detection and tracking and video processing. The processor array is based on SIMD chip BAP-128 designed by our own, and uses high performance DSP TMS320C31, which can effectively perform serial and floating point calculations, as the host of the SIMD processor array. As a result, the system is able to perform a variety of image processing tasks in real-time. Furthermore, the processor will be connected with a MIMD parallel processor to construct a heterogeneously parallel processor for more complex real- time ATR (Automatic Target Recognition) and computer vision applications.

  14. Comparing a new laser strainmeter array with an adjacent, parallel running quartz tube strainmeter array

    NASA Astrophysics Data System (ADS)

    Kobe, Martin; Jahr, Thomas; Pöschel, Wolfgang; Kukowski, Nina

    2016-03-01

    In summer 2011, two new laser strainmeters about 26.6 m long were installed in N-S and E-W directions parallel to an existing quartz tube strainmeter system at the Geodynamic Observatory Moxa, Thuringia/Germany. This kind of installation is unique in the world and allows the direct comparison of measurements of horizontal length changes with different types of strainmeters for the first time. For the comparison of both data sets, we used the tidal analysis over three years, the strain signals resulting from drilling a shallow 100 m deep borehole on the ground of the observatory and long-period signals. The tidal strain amplitude factors of the laser strainmeters are found to be much closer to theoretical values (85%-105% N-S and 56%-92% E-W) than those of the quartz tube strainmeters. A first data analysis shows that the new laser strainmeters are more sensitive in the short-periodic range with an improved signal-to-noise ratio and distinctly more stable during long-term drifts of environmental parameters such as air pressure or groundwater level. We compared the signal amplitudes of both strainmeter systems at variable signal periods and found frequency-dependent amplitude differences. Confirmed by the tidal parameters, we have now a stable and high resolution laser strainmeter system that serves as calibration reference for quartz tube strainmeters.

  15. High-performance SPAD array detectors for parallel photon timing applications

    NASA Astrophysics Data System (ADS)

    Rech, I.; Cuccato, A.; Antonioli, S.; Cammi, C.; Gulinatti, A.; Ghioni, M.

    2012-02-01

    Over the past few years there has been a growing interest in monolithic arrays of single photon avalanche diodes (SPAD) for spatially resolved detection of faint ultrafast optical signals. SPADs implemented in planar technologies offer the typical advantages of microelectronic devices (small size, ruggedness, low voltage, low power, etc.). Furthermore, they have inherently higher photon detection efficiency than PMTs and are able to provide, beside sensitivities down to single-photons, very high acquisition speeds. In order to make SPAD array more and more competitive in time-resolved application it is necessary to face problems like electrical crosstalk between adjacent pixel, moreover all the singlephoton timing electronics with picosecond resolution has to be developed. In this paper we present a new instrument suitable for single-photon imaging applications and made up of 32 timeresolved parallel channels. The 32x1 pixel array that includes SPAD detectors represents the system core, and an embedded data elaboration unit performs on-board data processing for single-photon counting applications. Photontiming information is exported through a custom parallel cable that can be connected to an external multichannel TCSPC system.

  16. A Full Parallel Event Driven Readout Technique for Area Array SPAD FLIM Image Sensors

    PubMed Central

    Nie, Kaiming; Wang, Xinlei; Qiao, Jun; Xu, Jiangtao

    2016-01-01

    This paper presents a full parallel event driven readout method which is implemented in an area array single-photon avalanche diode (SPAD) image sensor for high-speed fluorescence lifetime imaging microscopy (FLIM). The sensor only records and reads out effective time and position information by adopting full parallel event driven readout method, aiming at reducing the amount of data. The image sensor includes four 8 × 8 pixel arrays. In each array, four time-to-digital converters (TDCs) are used to quantize the time of photons’ arrival, and two address record modules are used to record the column and row information. In this work, Monte Carlo simulations were performed in Matlab in terms of the pile-up effect induced by the readout method. The sensor’s resolution is 16 × 16. The time resolution of TDCs is 97.6 ps and the quantization range is 100 ns. The readout frame rate is 10 Mfps, and the maximum imaging frame rate is 100 fps. The chip’s output bandwidth is 720 MHz with an average power of 15 mW. The lifetime resolvability range is 5–20 ns, and the average error of estimated fluorescence lifetimes is below 1% by employing CMM to estimate lifetimes. PMID:26828490

  17. Parallel and series FED microstrip array with high efficiency and low cross polarization

    NASA Technical Reports Server (NTRS)

    Huang, John (inventor)

    1995-01-01

    A microstrip array antenna for vertically polarized fan beam (approximately 2 deg x 50 deg) for C-band SAR applications with a physical area of 1.7 m by 0.17 m comprises two rows of patch elements and employs a parallel feed to left- and right-half sections of the rows. Each section is divided into two segments that are fed in parallel with the elements in each segment fed in series through matched transmission lines for high efficiency. The inboard section has half the number of patch elements of the outboard section, and the outboard sections, which have tapered distribution with identical transmission line sections, terminated with half wavelength long open-circuit stubs so that the remaining energy is reflected and radiated in phase. The elements of the two inboard segments of the two left- and right-half sections are provided with tapered transmission lines from element to element for uniform power distribution over the central third of the entire array antenna. The two rows of array elements are excited at opposite patch feed locations with opposite (180 deg difference) phases for reduced cross-polarization.

  18. Numerical Study of a Crossed Loop Coil Array for Parallel Magnetic Resonance Imaging

    NASA Astrophysics Data System (ADS)

    Hernndez, J.; Solis, S. E.; Rodriguez, A. O.

    2008-08-01

    A coil design has been recently proposed by Temnikov (Instrum Exp Tech. 2005;48;636-637), with higher experimental signal-to-noise ratio than that of the birdcage coil. It is also claimed that it is possible to individually tune it with a single chip capacitor. This coil design shows a great resemble to the gradiometer coil. These results motivated us to numerically simulate a three-coil array for parallel magnetic resonance imaging and in vivo magnetic resonance spectroscopy with multi nuclear capability. The magnetic field was numerical simulated by solving Maxwell's equations with the finite element method. Uniformity profiles were calculated at the midsection for one single coil and showed a good agreement with the experimental data. Then, two more coils were added to form two different coil arrays: coil elements were equally distributed by an angle of a 30 angle. Then, uniformity profiles were calculated again for all cases at the midsection. Despite the strong interaction among all coil elements, very good field uniformity can be achieved. These numerical results indicate that this coil array may be a good choice for magnetic resonance imaging parallel imaging.

  19. A Full Parallel Event Driven Readout Technique for Area Array SPAD FLIM Image Sensors.

    PubMed

    Nie, Kaiming; Wang, Xinlei; Qiao, Jun; Xu, Jiangtao

    2016-01-01

    This paper presents a full parallel event driven readout method which is implemented in an area array single-photon avalanche diode (SPAD) image sensor for high-speed fluorescence lifetime imaging microscopy (FLIM). The sensor only records and reads out effective time and position information by adopting full parallel event driven readout method, aiming at reducing the amount of data. The image sensor includes four 8 × 8 pixel arrays. In each array, four time-to-digital converters (TDCs) are used to quantize the time of photons' arrival, and two address record modules are used to record the column and row information. In this work, Monte Carlo simulations were performed in Matlab in terms of the pile-up effect induced by the readout method. The sensor's resolution is 16 × 16. The time resolution of TDCs is 97.6 ps and the quantization range is 100 ns. The readout frame rate is 10 Mfps, and the maximum imaging frame rate is 100 fps. The chip's output bandwidth is 720 MHz with an average power of 15 mW. The lifetime resolvability range is 5-20 ns, and the average error of estimated fluorescence lifetimes is below 1% by employing CMM to estimate lifetimes. PMID:26828490

  20. Dynamic scheduling and planning parallel observations on large Radio Telescope Arrays with the Square Kilometre Array in mind

    NASA Astrophysics Data System (ADS)

    Buchner, Johannes

    2011-12-01

    Scheduling, the task of producing a time table for resources and tasks, is well-known to be a difficult problem the more resources are involved (a NP-hard problem). This is about to become an issue in Radio astronomy as observatories consisting of hundreds to thousands of telescopes are planned and operated. The Square Kilometre Array (SKA), which Australia and New Zealand bid to host, is aiming for scales where current approaches -- in construction, operation but also scheduling -- are insufficent. Although manual scheduling is common today, the problem is becoming complicated by the demand for (1) independent sub-arrays doing simultaneous observations, which requires the scheduler to plan parallel observations and (2) dynamic re-scheduling on changed conditions. Both of these requirements apply to the SKA, especially in the construction phase. We review the scheduling approaches taken in the astronomy literature, as well as investigate techniques from human schedulers and today's observatories. The scheduling problem is specified in general for scientific observations and in particular on radio telescope arrays. Also taken into account is the fact that the observatory may be oversubscribed, requiring the scheduling problem to be integrated with a planning process. We solve this long-term scheduling problem using a time-based encoding that works in the very general case of observation scheduling. This research then compares algorithms from various approaches, including fast heuristics from CPU scheduling, Linear Integer Programming and Genetic algorithms, Branch-and-Bound enumeration schemes. Measures include not only goodness of the solution, but also scalability and re-scheduling capabilities. In conclusion, we have identified a fast and good scheduling approach that allows (re-)scheduling difficult and changing problems by combining heuristics with a Genetic algorithm using block-wise mutation operations. We are able to explain and eradicate two problems in the literature: The inability of a GA to properly improve schedules and the generation of schedules with frequent interruptions. Finally, we demonstrate the scheduling framework for several operating telescopes: (1) Dynamic re-scheduling with the AUT Warkworth 12m telescope, (2) Scheduling for the Australian Mopra 22m telescope and scheduling for the Allen Telescope Array. Furthermore, we discuss the applicability of the presented scheduling framework to the Atacama Large Millimeter/submillimeter Array (ALMA, in construction) and the SKA. In particular, during the development phase of the SKA, this dynamic, scalable scheduling framework can accommodate changing conditions.

  1. Multi-focus parallel detection of fluorescent molecules at picomolar concentration with photonic nanojets arrays

    SciTech Connect

    Ghenuche, Petru; Torres, Juan de; Ferrand, Patrick; Wenger, Jrme

    2014-09-29

    Fluorescence sensing and fluorescence correlation spectroscopy (FCS) are powerful methods to detect and characterize single molecules; yet, their use has been restricted by expensive and complex optical apparatus. Here, we present a simple integrated design using a self-assembled bi-dimensional array of microspheres to realize multi-focus parallel detection scheme for FCS. We simultaneously illuminate and collect the fluorescence from several tens of microspheres, which all generate their own photonic nanojet to efficiently excite the molecules and collect the fluorescence emission. Each photonic nanojet contributes to the global detection volume, reaching FCS detection volumes of several tens of femtoliters while preserving the fluorescence excitation and collection efficiencies. The microspheres photonic nanojets array enables FCS experiments at low picomolar concentrations with a drastic reduction in apparatus cost and alignment constraints, ideal for microfluidic chip integration.

  2. Nanopore arrays in a silicon membrane for parallel single-molecule detection: fabrication

    NASA Astrophysics Data System (ADS)

    Schmidt, Torsten; Zhang, Miao; Sychugov, Ilya; Roxhed, Niclas; Linnros, Jan

    2015-08-01

    Solid state nanopores enable translocation and detection of single bio-molecules such as DNA in buffer solutions. Here, sub-10 nm nanopore arrays in silicon membranes were fabricated by using electron-beam lithography to define etch pits and by using a subsequent electrochemical etching step. This approach effectively decouples positioning of the pores and the control of their size, where the pore size essentially results from the anodizing current and time in the etching cell. Nanopores with diameters as small as 7 nm, fully penetrating 300 nm thick membranes, were obtained. The presented fabrication scheme to form large arrays of nanopores is attractive for parallel bio-molecule sensing and DNA sequencing using optical techniques. In particular the signal-to-noise ratio is improved compared to other alternatives such as nitride membranes suffering from a high-luminescence background.

  3. Parallelization and improvements of the generalized born model with a simple sWitching function for modern graphics processors.

    PubMed

    Arthur, Evan J; Brooks, Charles L

    2016-04-15

    Two fundamental challenges of simulating biologically relevant systems are the rapid calculation of the energy of solvation and the trajectory length of a given simulation. The Generalized Born model with a Simple sWitching function (GBSW) addresses these issues by using an efficient approximation of Poisson-Boltzmann (PB) theory to calculate each solute atom's free energy of solvation, the gradient of this potential, and the subsequent forces of solvation without the need for explicit solvent molecules. This study presents a parallel refactoring of the original GBSW algorithm and its implementation on newly available, low cost graphics chips with thousands of processing cores. Depending on the system size and nonbonded force cutoffs, the new GBSW algorithm offers speed increases of between one and two orders of magnitude over previous implementations while maintaining similar levels of accuracy. We find that much of the algorithm scales linearly with an increase of system size, which makes this water model cost effective for solvating large systems. Additionally, we utilize our GPU-accelerated GBSW model to fold the model system chignolin, and in doing so we demonstrate that these speed enhancements now make accessible folding studies of peptides and potentially small proteins. © 2016 Wiley Periodicals, Inc. PMID:26786647

  4. Layer-to-layer parallel fluidic transportation system by addressable fluidic gate arrays.

    PubMed

    Morimoto, Takashi; Konishi, Satoshi

    2008-09-01

    This paper presents addressable fluidic gate arrays for a layer-to-layer parallel fluidic transportation system. The proposed addressable fluidic gate consists of double valves driven by pneumatic pressure. One of the double valves is controlled by the row channel and the other is controlled by the column channel for row/column addressing. Our study applies addressable fluidic gate arrays to layer-to-layer transportation beyond a typical in-plane fluidic network system. The layer-to-layer transportation makes it possible to collect targeted samples from a testing well plate. 3 x 3 fluidic gate arrays based on the proposed concept are developed and tested. A single PDMS valve (phi400 microm) can be closed by 75.0 kPa. The demonstrated fluidic system is based on all PDMS structures by taking account of its disposable use. This paper also reports a dome-shaped chamber for robust sealing and a switching valve with a bistable diaphragm for memory function. PMID:18818812

  5. High-throughput fabrication of micrometer-sized compound parabolic mirror arrays by using parallel laser direct-write processing

    NASA Astrophysics Data System (ADS)

    Yan, Wensheng; Cumming, Benjamin P.; Gu, Min

    2015-07-01

    Micrometer-sized parabolic mirror arrays have significant applications in both light emitting diodes and solar cells. However, low fabrication throughput has been identified as major obstacle for the mirror arrays towards large-scale applications due to the serial nature of the conventional method. Here, the mirror arrays are fabricated by using a parallel laser direct-write processing, which addresses this barrier. In addition, it is demonstrated that the parallel writing is able to fabricate complex arrays besides simple arrays and thus offers wider applications. Optical measurements show that each single mirror confines the full-width at half-maximum value to as small as 17.8 ?m at the height of 150 ?m whilst providing a transmittance of up to 68.3% at a wavelength of 633 nm in good agreement with the calculation values.

  6. A micromachined silicon parallel acoustic delay line (PADL) array for real-time photoacoustic tomography (PAT)

    NASA Astrophysics Data System (ADS)

    Cho, Young Y.; Chang, Cheng-Chung; Wang, Lihong V.; Zou, Jun

    2015-03-01

    To achieve real-time photoacoustic tomography (PAT), massive transducer arrays and data acquisition (DAQ) electronics are needed to receive the PA signals simultaneously, which results in complex and high-cost ultrasound receiver systems. To address this issue, we have developed a new PA data acquisition approach using acoustic time delay. Optical fibers were used as parallel acoustic delay lines (PADLs) to create different time delays in multiple channels of PA signals. This makes the PA signals reach a single-element transducer at different times. As a result, they can be properly received by single-channel DAQ electronics. However, due to their small diameter and fragility, using optical fiber as acoustic delay lines poses a number of challenges in the design, construction and packaging of the PADLs, thereby limiting their performances and use in real imaging applications. In this paper, we report the development of new silicon PADLs, which are directly made from silicon wafers using advanced micromachining technologies. The silicon PADLs have very low acoustic attenuation and distortion. A linear array of 16 silicon PADLs were assembled into a handheld package with one common input port and one common output port. To demonstrate its real-time PAT capability, the silicon PADL array (with its output port interfaced with a single-element transducer) was used to receive 16 channels of PA signals simultaneously from a tissue-mimicking optical phantom sample. The reconstructed PA image matches well with the imaging target. Therefore, the silicon PADL array can provide a 16× reduction in the ultrasound DAQ channels for real-time PAT.

  7. Fast Confocal Raman Imaging Using a 2-D Multifocal Array for Parallel Hyperspectral Detection.

    PubMed

    Kong, Lingbo; Navas-Moreno, Maria; Chan, James W

    2016-01-19

    We present the development of a novel confocal hyperspectral Raman microscope capable of imaging at speeds up to 100 times faster than conventional point-scan Raman microscopy under high noise conditions. The microscope utilizes scanning galvomirrors to generate a two-dimensional (2-D) multifocal array at the sample plane, generating Raman signals simultaneously at each focus of the array pattern. The signals are combined into a single beam and delivered through a confocal pinhole before being focused through the slit of a spectrometer. To separate the signals from each row of the array, a synchronized scan mirror placed in front of the spectrometer slit positions the Raman signals onto different pixel rows of the detector. We devised an approach to deconvolve the superimposed signals and retrieve the individual spectra at each focal position within a given row. The galvomirrors were programmed to scan different focal arrays following Hadamard encoding patterns. A key feature of the Hadamard detection is the reconstruction of individual spectra with improved signal-to-noise ratio. Using polystyrene beads as test samples, we demonstrated not only that our system images faster than a conventional point-scan method but that it is especially advantageous under noisy conditions, such as when the CCD detector operates at fast read-out rates and high temperatures. This is the first demonstration of multifocal confocal Raman imaging in which parallel spectral detection is implemented along both axes of the CCD detector chip. We envision this novel 2-D multifocal spectral detection technique can be used to develop faster imaging spontaneous Raman microscopes with lower cost detectors. PMID:26654100

  8. Recent advances in image reconstruction, coil sensitivity calibration, and coil array design for SMASH and generalized parallel MRI.

    PubMed

    Sodickson, Daniel K; McKenzie, Charles A; Ohliger, Michael A; Yeh, Ernest N; Price, Mark D

    2002-01-01

    Parallel magnetic resonance imaging (MRI) techniques use spatial information from arrays of radiofrequency (RF) detector coils to accelerate imaging. A number of parallel MRI techniques have been described in recent years, and numerous clinical applications are currently being explored. The advent of practical parallel imaging presents various challenges for image reconstruction and RF system design. Recent advances in tailored SiMultaneous Acquisition of Spatial Harmonics (SMASH) image reconstructions are summarized. These advances enable robust SMASH imaging in arbitrary image planes with a wide range of coil array geometries. A generalized formalism is described which may be used to understand the relations between SMASH and SENSE, to derive typical implementations of each as special cases, and to form hybrid techniques combining some of the advantages of both. Accurate knowledge of coil sensitivities is crucial for parallel MRI, and errors in calibration represent one of the most common and the most pernicious sources of error in parallel image reconstructions. As one example, motion of the patient and/or the coil array between the sensitivity reference scan and the accelerated acquisition can lead to calibration errors and reconstruction artifacts. Self-calibrating parallel MRI approaches that address this problem by eliminating the need for external sensitivity references are reviewed. The ultimate achievable signal-to-noise ratio (SNR) for parallel MRI studies is closely tied to the geometry and sensitivity patterns of the coil arrays used for spatial encoding. Several parallel imaging array designs that depart from the traditional model of overlapped adjacent loop elements are described. PMID:11755091

  9. Large-scale parallel arrays of silicon nanowires via block copolymer directed self-assembly.

    PubMed

    Farrell, Richard A; Kinahan, Niall T; Hansel, Stefan; Stuen, Karl O; Petkov, Nikolay; Shaw, Matthew T; West, Laetitia E; Djara, Vladimir; Dunne, Robert J; Varona, Olga G; Gleeson, Peter G; Jung, Soon-Jung; Kim, Hye-Young; Koleśnik, Maria M; Lutz, Tarek; Murray, Christopher P; Holmes, Justin D; Nealey, Paul F; Duesberg, Georg S; Krstić, Vojislav; Morris, Michael A

    2012-05-21

    Extending the resolution and spatial proximity of lithographic patterning below critical dimensions of 20 nm remains a key challenge with very-large-scale integration, especially if the persistent scaling of silicon electronic devices is sustained. One approach, which relies upon the directed self-assembly of block copolymers by chemical-epitaxy, is capable of achieving high density 1 : 1 patterning with critical dimensions approaching 5 nm. Herein, we outline an integration-favourable strategy for fabricating high areal density arrays of aligned silicon nanowires by directed self-assembly of a PS-b-PMMA block copolymer nanopatterns with a L(0) (pitch) of 42 nm, on chemically pre-patterned surfaces. Parallel arrays (5 × 10(6) wires per cm) of uni-directional and isolated silicon nanowires on insulator substrates with critical dimension ranging from 15 to 19 nm were fabricated by using precision plasma etch processes; with each stage monitored by electron microscopy. This step-by-step approach provides detailed information on interfacial oxide formation at the device silicon layer, the polystyrene profile during plasma etching, final critical dimension uniformity and line edge roughness variation nanowire during processing. The resulting silicon-nanowire array devices exhibit Schottky-type behaviour and a clear field-effect. The measured values for resistivity and specific contact resistance were ((2.6 ± 1.2) × 10(5)Ωcm) and ((240 ± 80) Ωcm(2)) respectively. These values are typical for intrinsic (un-doped) silicon when contacted by high work function metal albeit counterintuitive as the resistivity of the starting wafer (∼10 Ωcm) is 4 orders of magnitude lower. In essence, the nanowires are so small and consist of so few atoms, that statistically, at the original doping level each nanowire contains less than a single dopant atom and consequently exhibits the electrical behaviour of the un-doped host material. Moreover this indicates that the processing successfully avoided unintentional doping. Therefore our approach permits tuning of the device steps to contact the nanowires functionality through careful selection of the initial bulk starting material and/or by means of post processing steps e.g. thermal annealing of metal contacts to produce high performance devices. We envision that such a controllable process, combined with the precision patterning of the aligned block copolymer nanopatterns, could prolong the scaling of nanoelectronics and potentially enable the fabrication of dense, parallel arrays of multi-gate field effect transistors. PMID:22481430

  10. Modeling of the phase lag causing fluidelastic instability in a parallel triangular tube array

    NASA Astrophysics Data System (ADS)

    Khalifa, Ahmed; Weaver, David; Ziada, Samir

    2013-11-01

    Fluidelastic instability is considered a critical flow induced vibration mechanism in tube and shell heat exchangers. It is believed that a finite time lag between tube vibration and fluid response is essential to predict the phenomenon. However, the physical nature of this time lag is not fully understood. This paper presents a fundamental study of this time delay using a parallel triangular tube array with a pitch ratio of 1.54. A computational fluid dynamics (CFD) model was developed and validated experimentally in an attempt to investigate the interaction between tube vibrations and flow perturbations at lower reduced velocities Ur=1-6 and Reynolds numbers Re=2000-12 000. The numerical predictions of the phase lag are in reasonable agreement with the experimental measurements for the range of reduced velocities Ug/fd=6-7. It was found that there are two propagation mechanisms; the first is associated with the acoustic wave propagation at low reduced velocities, Ur<2, and the second mechanism for higher reduced velocities is associated with the vorticity shedding and convection. An empirical model of the two mechanisms is developed and the phase lag predictions are in reasonable agreement with the experimental and numerical measurements. The developed phase lag model is then coupled with the semi-analytical model of Lever and Weaver to predict the fluidelastic stability threshold. Improved predictions of the stability boundaries for the parallel triangular array were achieved. In addition, the present study has explained why fluidelastic instability does not occur below some threshold reduced velocity.

  11. Collective dynamics and pattern formation in 2D regular arrays of spherical particles in Stokes flow between two parallel walls

    NASA Astrophysics Data System (ADS)

    Blawzdziewicz, Jerzy; Wajnryb, Eligiusz; Baron, Matthew; Khurana, Nidhi

    2008-03-01

    We present results of our numerical and theoretical investigations of collective dynamics of linear trains and regular square arrays of spherical particles suspended in a fluid bounded by two parallel walls. The simulations reveal propagation of particle-displacement waves, deformation and rearrangements of a particle lattice, propagation of dislocation-like defects in ordered arrays, and transitions between ordered and disordered regions that can coexist for a long time. We argue that ordered motion of the arrays is associated with the dipolar form of the quasi-2D asymptotic far-field flow produced by the particles. We also show that the overall deformation of the arrays can be described using a macroscopic theory where the array is treated as a 2D effective medium. The theory predicts a fingering instability near the array corners, and this instability is confirmed by our microscopic simulations.

  12. Femtosecond laser fabrication of micro/nano-channel array devices for parallelized fluorescence detection

    NASA Astrophysics Data System (ADS)

    Canfield, Brian; Hofmeister, William; Davis, Lloyd

    2013-03-01

    Cost-effective pharmaceutical drug discovery depends on increasing assay throughput while reducing reagent needs. Ultrasensitive, highly parallelized fluorescence-based platforms that incorporate a nano/micro-fluidic chip with an array of closely spaced channels would meet this need. We discuss the use of direct femtosecond laser machining to fabricate prototype fluidic chips with arrays of more than one hundred closely spaced channels. Traditional machining techniques involve overlapping focal spots from many laser pulses while scanning the substrate in order to create channels. However, this procedure is not only lengthy but may allow thermal effects to accumulate that degrade the quality of both the channel profile and surrounding substrate material. We are developing a different method for machining a line with just a single pulse, using a combination of cylindrical lenses and an aspheric lens to reshape a near-Gaussian beam into a tight line focus. Channels on the order of 1 micron wide, 5 microns deep, and nearly 2000 microns long may be made this way. We also address the critical issue of mitigating the high autofluorescence responses that arise from the creation of defects by fs-laser machining in fused silica.

  13. Parallel processing

    SciTech Connect

    Krishnamurthy, E.V. )

    1989-01-01

    This book provides a introduction to the fundamental principles and practice of parallel processing. After a general introduction to the many facets of parallelism, the first part of the book is devoted to the development of a coherent theoretical framework. Particular attention is paid to the modeling, semantics and complexity of interacting parallel processes. The second part of the book considers the more practical aspects such as parallel processor architecture, parallel and distributed programming, and concurrent transaction handling in databases.

  14. Control scheme for microcomputers being used in multiprocessor arrays

    SciTech Connect

    Meng, J.; Gin, F.

    1984-06-01

    In general, microcomputer central processor devices are completely controllable from memory and memory control lines. By interjecting a controlling processor between the central processor chip and its memory, and using the central processor memory ready signal for synchronization, data can be supplied to the microprocessor either from an attached memory or from the controlling processor. The controlling processor may also download codes into the microprocessor's memory to be used either as programs or as data. By manipulating restart, hold and interrupt signal lines in addition to the memory lines, total control is achieved. Such a scheme can be used to orchestrate the simultaneous application of arrays of microcomputers to single large problems or to many discrete smaller problems. We describe the details of such connections to three commercially available devices: a Motorola 68000, an Advanced Micro Devices 29116 and a National Semiconductor NS32032 and indicate how our scheme may be used to connect such devices into a cooperating parallel array.

  15. MVSP: multithreaded VLIW stream processor

    NASA Astrophysics Data System (ADS)

    Sardashti, Somayeh; Ghasemi, Hamid Reza; Fatemi, Omid

    2006-02-01

    Stream processing is a new trend in computer architecture design which fills the gap between inflexible special-purpose media architectures and programmable architectures with low computational ability for media processing. Stream processors are designed for computationally intensive media applications characterized by high data parallelism and producer-consumer locality with little global data reuse. In this paper, we propose a new stream processor, named MVSP1. This processor is a programmable stream processor based on Imagine [1]. MVSP exploits TLP2, DLP 3, SP 4 and ILP 5 parallelisms inherent in media applications. Full simulator of MVSP has been implemented and several media workloads composed of EEMBC [2] benchmarks have been applied. The simulation results show the performance and functional unit utilization improvements of more than two times in comparison with Imagine processor.

  16. New computing environments:Parallel, vector and systolic

    SciTech Connect

    Wouk, A.

    1986-01-01

    This book presents papers on supercomputers and array processors. Topics considered include nested dissection, the systolic level 2 BLAS, parallel processing a hydrodynamic shock wave problem, MACH-1, portable standard LISP on the Cray, distributed combinator evaluation, performance and library issues, scale problems, multiprocessor architecture, the MIDAS multiprocessor system, parallel algorithms for incompressible and compressible flows on a multiprocessor, and parallel algorithms for elliptic equations.

  17. Systolic-array optimizing compiler

    SciTech Connect

    Lam, M.S.L.

    1987-01-01

    The WARP machine is a linear array of ten programmable processors and is capable of executing 100 million floating-point operations per second (100 MFLOPS). The individual processors, or cells, derive their performance from a wide instruction set and a high degree of internal pipelining and parallelism. Can an array of high-performance cells be programmed to cooperate at a fine grain of parallelism The author's thesis is that systolic arrays of high-performance cells can be programmed effectively using a high-level language. The solution has two components: a machine abstraction and compiler optimizations for systolic arrays, and code-scheduling techniques for horizontally microcoded or VLIW processors. In the proposed machine abstraction, individual cells are programmed in a high-level programming language; inter-cell communication is explicitly specified by asynchronous primitives: receive and send operations. This machine abstraction offers both efficiency and generality. It is shown that software pipelining is a practical and efficient code-scheduling technique for highly parallel and pipelined processors. The ideas and techniques in this thesis were validated by the implementation of an optimizing compiler for Warp.

  18. An Evaluation of Global Address Space Languages: Co-Array Fortran and Unified Parallel C

    SciTech Connect

    Coarfa, Cristian; Dotsenko, Yuri; Mellor-Crummey, John M.; Cantonnet, Franois; El-Ghazawi, Tarek; Mohanti, Ashrujit; Yao, Yiyi; Chavarra-Miranda, Daniel

    2005-06-10

    Co-array Fortran (CAF) and Unified Parallel C (UPC) are two emerging languages for single-program, multiple-data global address space programming. These languages boost programmer productivity by providing shared variables for communication instead of message passing. However, the performance of these emerging languages still has room for improvement. In this paper, we study the performance of variants of the NAS MG, CG, SP, and BT benchmarks on several modern cluster architectures to identify challenges that must be met to deliver top performance. We compare CAF and UPC variants of these programs with the original Fortran+MPI code. Today, CAF and UPC programs deliver scalable performance on clusters only when written to use bulk communication. However, our experiments uncovered some significant performance bottlenecks limiting UPC performance on all platforms. We account for the root causes of these performance anomalies and show that they can be remedied with additional compiler improvements, in particular we show that many of these obstacles can be resolved with adequate optimizations by the backend C compilers.

  19. Hardware multiplier processor

    DOEpatents

    Pierce, Paul E. (Albuquerque, NM)

    1986-01-01

    A hardware processor is disclosed which in the described embodiment is a memory mapped multiplier processor that can operate in parallel with a 16 bit microcomputer. The multiplier processor decodes the address bus to receive specific instructions so that in one access it can write and automatically perform single or double precision multiplication involving a number written to it with or without addition or subtraction with a previously stored number. It can also, on a single read command automatically round and scale a previously stored number. The multiplier processor includes two concatenated 16 bit multiplier registers, two 16 bit concatenated 16 bit multipliers, and four 16 bit product registers connected to an internal 16 bit data bus. A high level address decoder determines when the multiplier processor is being addressed and first and second low level address decoders generate control signals. In addition, certain low order address lines are used to carry uncoded control signals. First and second control circuits coupled to the decoders generate further control signals and generate a plurality of clocking pulse trains in response to the decoded and address control signals.

  20. Hardware multiplier processor

    DOEpatents

    Pierce, P.E.

    A hardware processor is disclosed which in the described embodiment is a memory mapped multiplier processor that can operate in parallel with a 16 bit microcomputer. The multiplier processor decodes the address bus to receive specific instructions so that in one access it can write and automatically perform single or double precision multiplication involving a number written to it with or without addition or subtraction with a previously stored number. It can also, on a single read command automatically round and scale a previously stored number. The multiplier processor includes two concatenated 16 bit multiplier registers, two 16 bit concatenated 16 bit multipliers, and four 16 bit product registers connected to an internal 16 bit data bus. A high level address decoder determines when the multiplier processor is being addressed and first and second low level address decoders generate control signals. In addition, certain low order address lines are used to carry uncoded control signals. First and second control circuits coupled to the decoders generate further control signals and generate a plurality of clocking pulse trains in response to the decoded and address control signals.

  1. Imer-product array processor for retrieval of stored images represented by bipolar binary (+1,-1) pixels using partial input trinary pixels represented by (+1,-1)

    NASA Technical Reports Server (NTRS)

    Liu, Hua-Kuang (Inventor); Awwal, Abdul A. S. (Inventor); Karim, Mohammad A. (Inventor)

    1993-01-01

    An inner-product array processor is provided with thresholding of the inner product during each iteration to make more significant the inner product employed in estimating a vector to be used as the input vector for the next iteration. While stored vectors and estimated vectors are represented in bipolar binary (1,-1), only those elements of an initial partial input vector that are believed to be common with those of a stored vector are represented in bipolar binary; the remaining elements of a partial input vector are set to 0. This mode of representation, in which the known elements of a partial input vector are in bipolar binary form and the remaining elements are set equal to 0, is referred to as trinary representation. The initial inner products corresponding to the partial input vector will then be equal to the number of known elements. Inner-product thresholding is applied to accelerate convergence and to avoid convergence to a negative input product.

  2. Efficient and low-latency pixel data transmission module for adaptive optics wavefront processor based on field-programmable gate array

    NASA Astrophysics Data System (ADS)

    Yang, Haifeng; Xia, Yunxia; Zhang, Haotian; Li, Mei; Rao, Changhui

    2015-06-01

    An efficient and low-latency pixel data transmission module for the adaptive optics wavefront processor is presented. A fiber-based custom real-time pixel data transfer protocol is developed, which has the characteristics of long transmission distance, low-latency, and low-protocol overhead. The hardware part of the suite has been verified on different circuit boards with different field-programmable gate arrays. Obtained results demonstrate that the transmission and protocol processing delay are only 413.5 ns, the transmission bandwidth is 3.125 Gbps, and the error rate is less than 10-12. Under the same conditions of 156.25 MHz clock frequencies and length of transmission line, compared with the serial front panel data port protocol adopted by the thirty meter telescope and the european southern observatory, the transmission delay is significantly reduced by 2.82 times and has a remarkably low logic occupancy rate through our solution. Also, the method has been applied in several actual projects.

  3. Atmospheric plasma jet array in parallel electric and gas flow fields for three-dimensional surface treatment

    NASA Astrophysics Data System (ADS)

    Cao, Z.; Walsh, J. L.; Kong, M. G.

    2009-01-01

    This letter reports on electrical and optical characteristics of a ten-channel atmospheric pressure glow discharge jet array in parallel electric and gas flow fields. Challenged with complex three-dimensional substrates including surgical tissue forceps and sloped plastic plate of up to 15, the jet array is shown to achieve excellent jet-to-jet uniformity both in time and in space. Its spatial uniformity is four times better than a comparable single jet when both are used to treat a 15 sloped substrate. These benefits are likely from an effective self-adjustment mechanism among individual jets facilitated by individualized ballast and spatial redistribution of surface charges.

  4. Atmospheric plasma jet array in parallel electric and gas flow fields for three-dimensional surface treatment

    SciTech Connect

    Cao, Z.; Walsh, J. L.; Kong, M. G.

    2009-01-12

    This letter reports on electrical and optical characteristics of a ten-channel atmospheric pressure glow discharge jet array in parallel electric and gas flow fields. Challenged with complex three-dimensional substrates including surgical tissue forceps and sloped plastic plate of up to 15 deg., the jet array is shown to achieve excellent jet-to-jet uniformity both in time and in space. Its spatial uniformity is four times better than a comparable single jet when both are used to treat a 15 deg. sloped substrate. These benefits are likely from an effective self-adjustment mechanism among individual jets facilitated by individualized ballast and spatial redistribution of surface charges.

  5. Parallel self-mixing imaging system based on an array of vertical-cavity surface-emitting lasers

    SciTech Connect

    Tucker, John R.; Baque, Johnathon L.; Lim, Yah Leng; Zvyagin, Andrei V.; Rakic, Aleksandar D

    2007-09-01

    In this paper we investigate the feasibility of a massively parallel self-mixing imaging system based on an array of vertical-cavity surface-emitting lasers (VCSELs) to measure surface profiles of displacement,distance, velocity, and liquid flow rate. The concept of the system is demonstrated using a prototype to measure the velocity at different radial points on a rotating disk, and the velocity profile of diluted milk in a custom built diverging-converging planar flow channel. It is envisaged that a scaled up version of the parallel self-mixing imaging system will enable real-time surface profiling, vibrometry, and flowmetry.

  6. Architecture of the parallel recirculating pipeline

    NASA Astrophysics Data System (ADS)

    Wehner, William W., II; Brandt, James

    1990-11-01

    Current image analysis and image understanding applications in DoD systems require very high performance image pixel processing in real time. To attain the necessary performance within stringent system size weight and power constraints requires special-purpose parallel processing hardware architectures. At the same time it is desirable to retain as much programmability as possible in order to rapidly adapt the hardware to new applications or evolving system requirements. The Parallel Recirculating Pipeline processor uses techniques adopted from image algebra and mathematical morphology to provide a low-cost low-complexity high-performance architecture that is suitable for silicon implementation and programmable in high-order languages. The parallel recirculating pipeline hardware architecture is based on a cellular array structure in which each cell is a pipelined neighborhood processor. Each processor cell transforms an entire image segment by successively executing an operation on small fixed-size neighborhoods around each pixel. By cascading a series of these operations transforms on larger neighborhoods can be achieved. The parallel recirculating pipeline achieves cascading by allowing a series of cells to be connected in a pipelined fashion. Partial results can recirculate several times through the hardware pipeline via an external buffer memory. A virtual pipeline of any length is thus achieved. Several novel features of the architecture allow multiple pipelines to operate in parallel on strips of the same image. These features can support parallel expansion to a large number of processors with correspondingly

  7. Tiled Multicore Processors

    NASA Astrophysics Data System (ADS)

    Taylor, Michael B.; Lee, Walter; Miller, Jason E.; Wentzlaff, David; Bratt, Ian; Greenwald, Ben; Hoffmann, Henry; Johnson, Paul R.; Kim, Jason S.; Psota, James; Saraf, Arvind; Shnidman, Nathan; Strumpen, Volker; Frank, Matthew I.; Amarasinghe, Saman; Agarwal, Anant

    For the last few decades Moore’s Law has continually provided exponential growth in the number of transistors on a single chip. This chapter describes a class of architectures, called tiled multicore architectures, that are designed to exploit massive quantities of on-chip resources in an efficient, scalable manner. Tiled multicore architectures combine each processor core with a switch to create a modular element called a tile. Tiles are replicated on a chip as needed to create multicores with any number of tiles. The Raw processor, a pioneering example of a tiled multicore processor, is examined in detail to explain the philosophy, design, and strengths of such architectures. Raw addresses the challenge of building a general-purpose architecture that performs well on a larger class of stream and embedded computing applications than existing microprocessors, while still running existing ILP-based sequential programs with reasonable performance. Central to achieving this goal is Raw’s ability to exploit all forms of parallelism, including ILP, DLP, TLP, and Stream parallelism. Raw approaches this challenge by implementing plenty of on-chip resources - including logic, wires, and pins - in a tiled arrangement, and exposing them through a new ISA, so that the software can take advantage of these resources for parallel applications. Compared to a traditional superscalar processor, Raw performs within a factor of 2x for sequential applications with a very low degree of ILP, about 2x-9x better for higher levels of ILP, and 10x-100x better when highly parallel applications are coded in a stream language or optimized by hand.

  8. Biological Information Signal Processor

    NASA Technical Reports Server (NTRS)

    Chow, Edward T.; Peterson, John C.; Yoo, Michael M.

    1993-01-01

    Biological Information Signal Processor (BISP) is computing system analyzing data on deoxyribonucleic acid (DNA) sequences for molecular genetic analysis. Includes coprocessors, specialized microprocessors complementing present and future computers by performing rapidly most-time-consuming DNA-sequence-analyzing functions, establishing relationships (alignments) between both global sequences and defining patterns in multiple sequences. Also includes state-of-art software and data-base systems on both conventional and parallel computer systems to augment analytical abilities of developmental coprocessors.

  9. Facial Recognition System with Compact Optical Parallel Correlator Using Vertical-Cavity Surface-Emitting Laser Array Module

    NASA Astrophysics Data System (ADS)

    Watanabe, Eriko; Arima, Nobuko; Kodate, Kashiko

    2004-08-01

    The design and trial fabrication of a two-dimensional light-source module is presented for application to an optical parallel correlator for facial recognition. The light-source module is composed of a vertical-cavity surface-emitting laser array and a multilevel zone-plate array as a collimating lens. This module is about 1/10 the size of the conventional light source module and the optical parallel correlator has a size of 16.1 13 23 cm3, weight of 4.4 kg, and a throughput time of 19 faces/s. In an experimental evaluation of the system through one-to-one correlation using a database of 300 front facial images, the false match and false non-match rates were less than 1%. The optical system presented here is therefore robust to a variety of changes in facial expressions and is highly applicable in security systems.

  10. Even-odd mode excitation for stability investigation of Cartesian feedback amplifier used in parallel transmit array.

    PubMed

    Shooshtary, S; Solbach, K

    2015-08-01

    A 7 Tesla Magnetic Resonance Imaging (MRI) system with parallel transmission (pTx) for 32 near-magnet Cartesian feedback loop power amplifiers (PA) with output power of 1kW is under construction at Erwin L. Hahn Institute for Magnetic Resonance Imaging. Variation of load impedance due to mutual coupling of neighborhood coils in the array may lead to instability of the Cartesian feedback loop amplifier. MRI safety requires unconditional stability of the PAs at any load. In order to avoid instability in the pTx system, conditions and limits of stability have to be investigated for every possible excitation mode for the coil array. In this work, an efficient method of stability check for an array of two transmit channels (Tx) with Cartesian feedback loop amplifier and a selective excitation mode for the coil array is proposed which allows extension of stability investigations to a large pTx array with any arbitrary excitation mode for the coil array. PMID:26736573

  11. Quadrature transmit array design using single-feed circularly polarized patch antenna for parallel transmission in MR imaging.

    PubMed

    Pang, Yong; Yu, Baiying; Vigneron, Daniel B; Zhang, Xiaoliang

    2014-02-01

    Quadrature coils are often desired in MR applications because they can improve MR sensitivity and also reduce excitation power. In this work, we propose, for the first time, a quadrature array design strategy for parallel transmission at 298 MHz using single-feed circularly polarized (CP) patch antenna technique. Each array element is a nearly square ring microstrip antenna and is fed at a point on the diagonal of the antenna to generate quadrature magnetic fields. Compared with conventional quadrature coils, the single-feed structure is much simple and compact, making the quadrature coil array design practical. Numerical simulations demonstrate that the decoupling between elements is better than -35 dB for all the elements and the RF fields are homogeneous with deep penetration and quadrature behavior in the area of interest. Bloch equation simulation is also performed to simulate the excitation procedure by using an 8-element quadrature planar patch array to demonstrate its feasibility in parallel transmission at the ultrahigh field of 7 Tesla. PMID:24649430

  12. Massively parallel visualization: Parallel rendering

    SciTech Connect

    Hansen, C.D.; Krogh, M.; White, W.

    1995-12-01

    This paper presents rendering algorithms, developed for massively parallel processors (MPPs), for polygonal, spheres, and volumetric data. The polygon algorithm uses a data parallel approach whereas the sphere and volume renderer use a MIMD approach. Implementations for these algorithms are presented for the Thinking Machines Corporation CM-5 MPP.

  13. Parallel nanomanufacturing via electrohydrodynamic jetting from microfabricated externally-fed emitter arrays

    NASA Astrophysics Data System (ADS)

    Ponce de Leon, Philip J.; Hill, Frances A.; Heubel, Eric V.; Velsquez-Garca, Luis F.

    2015-06-01

    We report the design, fabrication, and characterization of planar arrays of externally-fed silicon electrospinning emitters for high-throughput generation of polymer nanofibers. Arrays with as many as 225 emitters and with emitter density as large as 100 emitters cm-2 were characterized using a solution of dissolved PEO in water and ethanol. Devices with emitter density as high as 25 emitters cm-2 deposit uniform imprints comprising fibers with diameters on the order of a few hundred nanometers. Mass flux rates as high as 417 g hr-1 m-2 were measured, i.e., four times the reported production rate of the leading commercial free-surface electrospinning sources. Throughput increases with increasing array size at constant emitter density, suggesting the design can be scaled up with no loss of productivity. Devices with emitter density equal to 100 emitters cm-2 fail to generate fibers but uniformly generate electrosprayed droplets. For the arrays tested, the largest measured mass flux resulted from arrays with larger emitter separation operating at larger bias voltages, indicating the strong influence of electrical field enhancement on the performance of the devices. Incorporation of a ground electrode surrounding the array tips helps equalize the emitter field enhancement across the array as well as control the spread of the imprints over larger distances.

  14. Discrete voltage states in one-dimensional parallel array of Josephson junctions

    NASA Astrophysics Data System (ADS)

    Braiman, Y.; Family, F.; Hentschel, H. G. E.

    1996-05-01

    We study low-voltage dynamics in highly discrete one-dimensional arrays of Josephson junctions. In particular, we focus on the resonant solutions emerging from the locking between the time period of the oscillations of the single junction with the spatial period of the wave propagating across the array. We find that the average voltage across the array scales as V∝(κ-κc)1/2, where κc is the critical value of the coupling. The connections to high voltage solutions are discussed.

  15. Low Voltage Dynamics in 1D Parallel Arrays of Josephson Junctions

    NASA Astrophysics Data System (ADS)

    Braiman, Y.; Family, F.; Hentschel, G.

    1996-03-01

    We study low-voltage dynamics in highly discrete slightly underdamped chains of Josephson junctions. In particular, we focus on the resonant solutions emerging from the locking between the temporal dynamics of the single junction with the spatial dynamics of the linear wave propagating across the array. We have developed a simple formalism to reduce the complexity of the equations and to calculate the average voltage in the array in highly nonlinear regime. The average voltage across the array scales as V ∝ (κ-κ_c)^1/2, where κ is the the coupling constant. The connections to high voltage solutions will be discussed.

  16. Integrated RF/shim coil array for parallel reception and localized B0 shimming in the human brain.

    PubMed

    Truong, Trong-Kha; Darnell, Dean; Song, Allen W

    2014-12-01

    The purpose of this work was to develop a novel integrated radiofrequency and shim (RF/shim) coil array that can perform parallel reception and localized B0 shimming in the human brain with the same coils, thereby maximizing both the signal-to-noise ratio and shimming efficiency. A 32-channel receive-only head coil array was modified to enable both RF currents (for signal reception) and direct currents (for B0 shimming) to flow in individual coil elements. Its in vivo performance was assessed in the frontal brain region, which is affected by large susceptibility-induced B0 inhomogeneities. The coil modifications did not reduce their quality factor or signal-to-noise ratio. Axial B0 maps and echo-planar images acquired in vivo with direct currents optimized to shim specific slices showed substantially reduced B0 inhomogeneities and image distortions in the frontal brain region. The B0 root-mean-square error in the anterior half of the brain was reduced by 60.3% as compared to that obtained with second-order spherical harmonic shimming. These results demonstrate that the integrated RF/shim coil array can perform parallel reception and localized B0 shimming in the human brain and provide a much more effective shimming than conventional spherical harmonic shimming alone, without taking up additional space in the magnet bore and without compromising the signal-to-noise ratio or shimming performance. PMID:25270602

  17. Integrated RF/shim coil array for parallel reception and localized B0 shimming in the human brain

    PubMed Central

    Truong, Trong-Kha; Darnell, Dean; Song, Allen W.

    2014-01-01

    The purpose of this work was to develop a novel integrated radiofrequency and shim (RF/shim) coil array that can perform parallel reception and localized B0 shimming in the human brain with the same coils, thereby maximizing both the signal-to-noise ratio and shimming efficiency. A 32-channel receive-only head coil array was modified to enable both RF currents (for signal reception) and direct currents (for B0 shimming) to flow in individual coil elements. Its in vivo performance was assessed in the frontal brain region, which is affected by large susceptibility-induced B0 in homogeneities. The coil modifications did not reduce their quality factor or signal-to-noise ratio. Axial B0 maps and echo-planar images acquired in vivo with direct currents optimized to shim specific slices showed substantially reduced B0 inhomogeneities and image distortions in the frontal brain region. The B0 root-mean-square error in the anterior half of the brain was reduced by 60.3% as compared to that obtained with second-order spherical harmonic shimming. These results demonstrate that the integrated RF/shim coil array can perform parallel reception and localized B0 shimming in the human brain and provide a much more effective shimming than conventional spherical harmonic shimming alone, without taking up additional space in the magnet bore and without compromising the signal-to-noise ratio or shimming performance. PMID:25270602

  18. Cellular array processing simulation

    SciTech Connect

    Lee, H.C.; Preston, E.W.

    1981-01-01

    The cellular array processing simulation (CAPS) system is a high-level image language that runs on a multiprocessor configuration. CAPS is interpretively decoded on a conventional minicomputer with all image operation instructions executed on an array processor. CAPS was designed to be both modular and table driven so that it can be easily maintained and modified. CAPS uses the image convolution operator as one of its primitives and performs this cellular operation by decomposing it into parallel image steps. Among its features is the ability to observe the imagery in real time as a user's algorithm is executed. CAPS also contains a language processor that permits users to develop re-entrant image processing subroutines or algorithms. 4 references.

  19. Graded index linear and curved polymer channel waveguide arrays for massively parallel optical interconnects

    NASA Astrophysics Data System (ADS)

    Chen, Ray T.

    1992-11-01

    A single-mode polymer-based graded index channel waveguide array with 1250 channels/cm packaging density on a cross-link induced photopolymeric thin film is reported. This array works at 1.31 and 0.63 microns. Curved waveguides with radii of curvature from 1 to 40 mm were demonstrated. Waveguide propagation loss in the neighborhood of 0.1 db/cm was experimentally confirmed at 1.31 microns.

  20. Two-dimensional parallel array technology as a new approach to automated combinatorial solid-phase organic synthesis

    PubMed

    Brennan; Biddison; Frauendorf; Schwarcz; Keen; Ecker; Davis; Tinder; Swayze

    1998-01-01

    An automated, 96-well parallel array synthesizer for solid-phase organic synthesis has been designed and constructed. The instrument employs a unique reagent array delivery format, in which each reagent utilized has a dedicated plumbing system. An inert atmosphere is maintained during all phases of a synthesis, and temperature can be controlled via a thermal transfer plate which holds the injection molded reaction block. The reaction plate assembly slides in the X-axis direction, while eight nozzle blocks holding the reagent lines slide in the Y-axis direction, allowing for the extremely rapid delivery of any of 64 reagents to 96 wells. In addition, there are six banks of fixed nozzle blocks, which deliver the same reagent or solvent to eight wells at once, for a total of 72 possible reagents. The instrument is controlled by software which allows the straightforward programming of the synthesis of a larger number of compounds. This is accomplished by supplying a general synthetic procedure in the form of a command file, which calls upon certain reagents to be added to specific wells via lookup in a sequence file. The bottle position, flow rate, and concentration of each reagent is stored in a separate reagent table file. To demonstrate the utility of the parallel array synthesizer, a small combinatorial library of hydroxamic acids was prepared in high throughput mode for biological screening. Approximately 1300 compounds were prepared on a 10 ?mole scale (3-5 mg) in a few weeks. The resulting crude compounds were generally >80% pure, and were utilized directly for high throughput screening in antibacterial assays. Several active wells were found, and the activity was verified by solution-phase synthesis of analytically pure material, indicating that the system described herein is an efficient means for the parallel synthesis of compounds for lead discovery. Copyright 1998 John Wiley & Sons, Inc. PMID:10099494

  1. Graphene under one-dimensional periodic potentials using DNA-assembled parallel nanotubes as a periodic gate array

    NASA Astrophysics Data System (ADS)

    Wu, Yong; Han, Si-Ping; Goddard, William; Bockrath, Marc

    2015-03-01

    Graphene under an applied one-dimensional (1D) periodic potential is predicted to show many interesting and unique phenomena such as electron supercollimation and additional Dirac points, and some progress has been made in observing graphene in this regime. Here, we use parallel nanotubes assembled using DNA linkers as a back gate to apply periodic or quasi-periodic 1D potentials to graphene layers. The pitch of the nanotube array can be controlled by the linker length which we can vary from 8nm-20nm. We can independently control the periodic potentials using the nanotube array and the carrier density using a top gate to study the transport properties of the system. Our latest results will be discussed.

  2. Apparatus for measuring local stress of metallic films, using an array of parallel laser beams during rapid thermal processing

    NASA Astrophysics Data System (ADS)

    Huang, R.; Taylor, C. A.; Himmelsbach, S.; Ceric, H.; Detzel, T.

    2010-05-01

    The novel apparatus described here was developed to investigate the thermo-mechanical behavior of metallic films on a substrate by acquiring the wafer curvature. It comprises an optical module producing and measuring an array of parallel laser beams, a high resolution scanning stage, a rapid thermal processing (RTP) chamber and several accessorial gas control modules. Unlike most traditional systems which only calculate the average wafer curvature, this system has the capability to measure the curvature locally in 30 ms. Consequently, the real-time development of biaxial stress involved in thin films can be fully captured during any thermal treatments such as temperature cycling or annealing processes. In addition, the multiple parallel laser beam technique cancels electrical, vibrational and other random noise sources that would otherwise make an in situ measurement very difficult. Furthermore, other advanced features such as the in situ acid treatment and active cooling extend the experimental conditions to provide new insights into thin film properties and material behavior.

  3. Silicon nanodisk array with a fin field-effect transistor for time-domain weighted sum calculation toward massively parallel spiking neural networks

    NASA Astrophysics Data System (ADS)

    Tohara, Takashi; Liang, Haichao; Tanaka, Hirofumi; Igarashi, Makoto; Samukawa, Seiji; Endo, Kazuhiko; Takahashi, Yasuo; Morie, Takashi

    2016-03-01

    A nanodisk array connected with a fin field-effect transistor is fabricated and analyzed for spiking neural network applications. This nanodevice performs weighted sums in the time domain using rising slopes of responses triggered by input spike pulses. The nanodisk arrays, which act as a resistance of several giga-ohms, are fabricated using a self-assembly bio-nano-template technique. Weighted sums are achieved with an energy dissipation on the order of 1 fJ, where the number of inputs can be more than one hundred. This amount of energy is several orders of magnitude lower than that of conventional digital processors.

  4. Analog Processor To Solve Optimization Problems

    NASA Technical Reports Server (NTRS)

    Duong, Tuan A.; Eberhardt, Silvio P.; Thakoor, Anil P.

    1993-01-01

    Proposed analog processor solves "traveling-salesman" problem, considered paradigm of global-optimization problems involving routing or allocation of resources. Includes electronic neural network and auxiliary circuitry based partly on concepts described in "Neural-Network Processor Would Allocate Resources" (NPO-17781) and "Neural Network Solves 'Traveling-Salesman' Problem" (NPO-17807). Processor based on highly parallel computing solves problem in significantly less time.

  5. Turbulent tube-flow heat transfer coefficients in the presence of flow imbalance in the tubes of a parallel array

    NASA Astrophysics Data System (ADS)

    Molki, M.

    Experiments were performed to study the effects of a controlled flow imbalance on the heat transfer coefficients in the entrance region of a tube situated in an array of parallel tubes. The array was modeled by two parallel tubes which were set into a large baffle plate to form a sharp-edged inlet, with air being drawn into the tube inlets from a large upstream plenum chamber. Quasi-local heat transfer coefficients were determined at various axial and circumferential locations along the length of the test section tube. A preselected, fixed Reynolds number was established in the test section tube while that of the other was varied systematically. Both Reynolds numbers ranged from 5000 to 88000. The separation distance between the tubes varied from 1.5 to 4.5 tube diameters. It was found that high degrees of flow imbalance played a decisive role in shaping the axial and circumferential distributions of the heat transfer coefficient in the thermal entrance region.

  6. Comparison of measurements and simulations of series-parallel incommensurate area superconducting quantum interference device arrays fabricated from YBa2Cu3O7-? ion damage Josephson junctions

    NASA Astrophysics Data System (ADS)

    Cybart, Shane A.; Dalichaouch, T. N.; Wu, S. M.; Anton, S. M.; Drisko, J. A.; Parker, J. M.; Harteneck, B. D.; Dynes, R. C.

    2012-09-01

    We have fabricated series-parallel (two-dimensional) arrays of incommensurate superconducting quantum interference devices (SQUIDs) using YBa2Cu3O7-? thin film ion damage Josephson junctions. The arrays initially consisted of a grid of Josephson junctions with 28 junctions in parallel and 565 junctions in series, for a total of 15 255 SQUIDs. The 28 junctions in the parallel direction were sequentially decreased by removing them with photolithography and ion milling to allow comparisons of voltage-magnetic field (V-B) characteristics for different parallel dimensions and area distributions. Comparisons of measurements for these different configurations reveal that the maximum voltage modulation with magnetic field is significantly reduced by both the self inductances of the SQUIDs and the mutual inductances between them. Based on these results, we develop a computer simulation model from first principles which simultaneously solves the differential equations of the junctions in the array while considering the effects of self inductance, mutual inductance, and non-uniformity of junction critical currents. We find that our model can accurately predict V-B for all of the array geometries studied. A second experiment is performed where we use photolithography and ion milling to split another 28 565 junction array into 6 decoupled arrays to further investigate mutual interactions between adjacent SQUIDs. This work conclusively shows that the magnetic fields generated by self currents in an incommensurate array severely reduce its performance by reducing the maximum obtainable modulation voltage.

  7. 3D optical interconnect mesh network for on-board parallel multiprocessor system based on EOPCB

    NASA Astrophysics Data System (ADS)

    Luo, Fengguang; Cao, Mingcui; Zhou, Xinjun; Xu, Jun; Luo, Zhixiang; Yuan, Jing; Zong, Liangjia; Feng, Yonghua; Chen, Chao; Zhang, Conghui

    2007-11-01

    A three-dimensional (3-D) 444 optical interconnect Mesh network scheme for parallel multiprocessor system based on polymer light waveguide electro-optical printed circuit board(EOPCB) is proposed in this paper. The Mesh topological structures of light waveguide interconnects for processor element chip-to-chip on a board, and board-toboard on backplane is constructed. The system consists of 64 processor element chips interconnected in a 3-D Mesh network configuration. Every processor board comprises 4x4 processor element chips with Mesh interconnection. Board-to-board Mesh interconnects are established on a backplane through light waveguide Mesh interconnect topological structure. An additional optical layer with light waveguide structure is used in conventional PCB to construct EOPCB. Vertical cavity surface emitting laser (VCSEL) array is used as optical transmitter array. PIN photodiode array is used as optical receiver array. A MT-compatible direct coupling method is presented to couple light beam between optical transmitter/receiver with light waveguide layer. The optical signals from a processor element chip on a board can transmit to another processor element chip on another board through light waveguide interconnection in the backplane. So 3-D optical interconnection Mesh network for parallel multiprocessor system can be reailzed by EOPCB.

  8. Parallel grid population

    DOEpatents

    Wald, Ingo; Ize, Santiago

    2015-07-28

    Parallel population of a grid with a plurality of objects using a plurality of processors. One example embodiment is a method for parallel population of a grid with a plurality of objects using a plurality of processors. The method includes a first act of dividing a grid into n distinct grid portions, where n is the number of processors available for populating the grid. The method also includes acts of dividing a plurality of objects into n distinct sets of objects, assigning a distinct set of objects to each processor such that each processor determines by which distinct grid portion(s) each object in its distinct set of objects is at least partially bounded, and assigning a distinct grid portion to each processor such that each processor populates its distinct grid portion with any objects that were previously determined to be at least partially bounded by its distinct grid portion.

  9. Development and characterization of hollow microprobe array as a potential tool for versatile and massively parallel manipulation of single cells.

    PubMed

    Nagai, Moeto; Oohara, Kiyotaka; Kato, Keita; Kawashima, Takahiro; Shibata, Takayuki

    2015-04-01

    Parallel manipulation of single cells is important for reconstructing in vivo cellular microenvironments and studying cell functions. To manipulate single cells and reconstruct their environments, development of a versatile manipulation tool is necessary. In this study, we developed an array of hollow probes using microelectromechanical systems fabrication technology and demonstrated the manipulation of single cells. We conducted a cell aspiration experiment with a glass pipette and modeled a cell using a standard linear solid model, which provided information for designing hollow stepped probes for minimally invasive single-cell manipulation. We etched a silicon wafer on both sides and formed through holes with stepped structures. The inner diameters of the holes were reduced by SiO2 deposition of plasma-enhanced chemical vapor deposition to trap cells on the tips. This fabrication process makes it possible to control the wall thickness, inner diameter, and outer diameter of the probes. With the fabricated probes, single cells were manipulated and placed in microwells at a single-cell level in a parallel manner. We studied the capture, release, and survival rates of cells at different suction and release pressures and found that the cell trapping rate was directly proportional to the suction pressure, whereas the release rate and viability decreased with increasing the suction pressure. The proposed manipulation system makes it possible to place cells in a well array and observe the adherence, spreading, culture, and death of the cells. This system has potential as a tool for massively parallel manipulation and for three-dimensional hetero cellular assays. PMID:25749639

  10. Fully Integrated Linear Single Photon Avalanche Diode (SPAD) Array with Parallel Readout Circuit in a Standard 180 nm CMOS Process

    NASA Astrophysics Data System (ADS)

    Isaak, S.; Bull, S.; Pitter, M. C.; Harrison, Ian.

    2011-05-01

    This paper reports on the development of a SPAD device and its subsequent use in an actively quenched single photon counting imaging system, and was fabricated in a UMC 0.18 ?m CMOS process. A low-doped p- guard ring (t-well layer) encircling the active area to prevent the premature reverse breakdown. The array is a 161 parallel output SPAD array, which comprises of an active quenched SPAD circuit in each pixel with the current value being set by an external resistor RRef = 300 k?. The SPAD I-V response, ID was found to slowly increase until VBD was reached at excess bias voltage, Ve = 11.03 V, and then rapidly increase due to avalanche multiplication. Digital circuitry to control the SPAD array and perform the necessary data processing was designed in VHDL and implemented on a FPGA chip. At room temperature, the dark count was found to be approximately 13 KHz for most of the 16 SPAD pixels and the dead time was estimated to be 40 ns.

  11. Micromachined silicon parallel acoustic delay lines as time-delayed ultrasound detector array for real-time photoacoustic tomography

    NASA Astrophysics Data System (ADS)

    Cho, Y.; Chang, C.-C.; Wang, L. V.; Zou, J.

    2016-02-01

    This paper reports the development of a new 16-channel parallel acoustic delay line (PADL) array for real-time photoacoustic tomography (PAT). The PADLs were directly fabricated from single-crystalline silicon substrates using deep reactive ion etching. Compared with other acoustic delay lines (e.g., optical fibers), the micromachined silicon PADLs offer higher acoustic transmission efficiency, smaller form factor, easier assembly, and mass production capability. To demonstrate its real-time photoacoustic imaging capability, the silicon PADL array was interfaced with one single-element ultrasonic transducer followed by one channel of data acquisition electronics to receive 16 channels of photoacoustic signals simultaneously. A PAT image of an optically-absorbing target embedded in an optically-scattering phantom was reconstructed, which matched well with the actual size of the imaged target. Because the silicon PADL array allows a signal-to-channel reduction ratio of 16:1, it could significantly simplify the design and construction of ultrasonic receivers for real-time PAT.

  12. Parallel high-throughput microanalysis of materials using microfabricated full bridge device arrays

    NASA Astrophysics Data System (ADS)

    Potyrailo, Radislav A.; Morris, William G.

    2004-01-01

    An array of microfabricated full bridge devices has been implemented for the rapid thermal microanalysis of polymers. In each microelectromechanical system device, four strain gauges were formed in silicon cantilevered microbeams and were configured as a Wheatstone bridge circuit. Glass transition temperatures Tg were measured by the quantitation of the strain produced in the sensor by the stress applied by a polymer layer to the cantilevered microbeams. The measured strain was analyzed as a function of chip temperature for the change in the slope, which was indicative to Tg. Resolution of Tg determinations of amorphous and crystalline polymers was <0.25 C and <2.0 C, respectively, being attractive for combinatorial screening of polymers. Our approach is a practical alternative to known methods for Tg determinations because of the immunity to the variations in the amount of deposited material and its viscosity, vapor pressure of employed solvent, and ease of multiplexing into dense sensor arrays.

  13. Implementation and Assessment of Advanced Analog Vector-Matrix Processor

    NASA Technical Reports Server (NTRS)

    Gary, Charles K.; Bualat, Maria G.; Lum, Henry, Jr. (Technical Monitor)

    1994-01-01

    This paper discusses the design and implementation of an analog optical vecto-rmatrix coprocessor with a throughput of 128 Mops for a personal computer. Vector matrix calculations are inherently parallel, providing a promising domain for the use of optical calculators. However, to date, digital optical systems have proven too cumbersome to replace electronics, and analog processors have not demonstrated sufficient accuracy in large scale systems. The goal of the work described in this paper is to demonstrate a viable optical coprocessor for linear operations. The analog optical processor presented has been integrated with a personal computer to provide full functionality and is the first demonstration of an optical linear algebra processor with a throughput greater than 100 Mops. The optical vector matrix processor consists of a laser diode source, an acoustooptical modulator array to input the vector information, a liquid crystal spatial light modulator to input the matrix information, an avalanche photodiode array to read out the result vector of the vector matrix multiplication, as well as transport optics and the electronics necessary to drive the optical modulators and interface to the computer. The intent of this research is to provide a low cost, highly energy efficient coprocessor for linear operations. Measurements of the analog accuracy of the processor performing 128 Mops are presented along with an assessment of the implications for future systems. A range of noise sources, including cross-talk, source amplitude fluctuations, shot noise at the detector, and non-linearities of the optoelectronic components are measured and compared to determine the most significant source of error. The possibilities for reducing these sources of error are discussed. Also, the total error is compared with that expected from a statistical analysis of the individual components and their relation to the vector-matrix operation. The sufficiency of the measured accuracy of the processor is compared with that required for a range of typical problems. Calculations resolving alloy concentrations from spectral plume data of rocket engines are implemented on the optical processor, demonstrating its sufficiency for this problem. We also show how this technology can be easily extended to a 100 x 100 10 MHz (200 Cops) processor.

  14. Comparative Analysis on the Performance of a Short String of Series-Connected and Parallel-Connected Photovoltaic Array Under Partial Shading

    NASA Astrophysics Data System (ADS)

    Vijayalekshmy, S.; Rama Iyer, S.; Beevi, Bisharathu

    2015-09-01

    The output power from the photovoltaic (PV) array decreases and the array exhibit multiple peaks when it is subjected to partial shading (PS). The power loss in the PV array varies with the array configuration, physical location and the shading pattern. This paper compares the relative performance of a PV array consisting of a short string of three PV modules for two different configurations. The mismatch loss, shading loss, fill factor and the power loss due to the failure in tracking of the global maximum power point, of a series string with bypass diodes and short parallel string are analysed using MATLAB/Simulink model. The performance of the system is investigated for three different conditions of solar insolation for the same shading pattern. Results indicate that there is considerable power loss due to shading in a series string during PS than in a parallel string with same number of modules.

  15. Database Reorganization in Parallel Disk Arrays with I/O Service Stealing

    NASA Technical Reports Server (NTRS)

    Zabback, Peter; Onyuksel, Ibrahim; Scheuermann, Peter; Weikum, Gerhard

    1996-01-01

    We present a model for data reorganization in parallel disk systems that is geared towards load balancing in an environment with periodic access patterns. Data reorganization is performed by disk cooling, i.e. migrating files or extents from the hottest disks to the coldest ones. We develop an approximate queueing model for determining the effective arrival rates of cooling requests and discuss its use in assessing the costs versus benefits of cooling.

  16. Field programmable gate array based parallel strapdown algorithm design for strapdown inertial navigation systems.

    PubMed

    Li, Zong-Tao; Wu, Tie-Jun; Lin, Can-Long; Ma, Long-Hua

    2011-01-01

    A new generalized optimum strapdown algorithm with coning and sculling compensation is presented, in which the position, velocity and attitude updating operations are carried out based on the single-speed structure in which all computations are executed at a single updating rate that is sufficiently high to accurately account for high frequency angular rate and acceleration rectification effects. Different from existing algorithms, the updating rates of the coning and sculling compensations are unrelated with the number of the gyro incremental angle samples and the number of the accelerometer incremental velocity samples. When the output sampling rate of inertial sensors remains constant, this algorithm allows increasing the updating rate of the coning and sculling compensation, yet with more numbers of gyro incremental angle and accelerometer incremental velocity in order to improve the accuracy of system. Then, in order to implement the new strapdown algorithm in a single FPGA chip, the parallelization of the algorithm is designed and its computational complexity is analyzed. The performance of the proposed parallel strapdown algorithm is tested on the Xilinx ISE 12.3 software platform and the FPGA device XC6VLX550T hardware platform on the basis of some fighter data. It is shown that this parallel strapdown algorithm on the FPGA platform can greatly decrease the execution time of algorithm to meet the real-time and high precision requirements of system on the high dynamic environment, relative to the existing implemented on the DSP platform. PMID:22164058

  17. Electro-optical processor for optimal control

    NASA Technical Reports Server (NTRS)

    Casasent, D.; Neuman, C.; Carlotto, M.

    1981-01-01

    An iterative optical processor has been developed for applications in the optimal control of advanced sensor systems. The processor is designed for the realization of the Richardson algorithm on bipolar data, using as input a linear array of LEDs. The usefulness of the processor has been demonstrated by the solution of the linear quadratic regulator problem for the optimal control signals of the F100 turbofan engine. In this case study, the algebraic Riccati equation matrix was solved by the use of a modified Kleinman algorithm along with the Richardson algorithm applied to a system of linear algebraic equations. Preliminary experimental results demonstrate the gradual convergence of the processor.

  18. Template-directed atomically precise self-organization of perfectly ordered parallel cerium silicide nanowire arrays on Si(110)-16??2 surfaces

    PubMed Central

    2013-01-01

    The perfectly ordered parallel arrays of periodic Ce silicide nanowires can self-organize with atomic precision on single-domain Si(110)-16??2 surfaces. The growth evolution of self-ordered parallel Ce silicide nanowire arrays is investigated over a broad range of Ce coverages on single-domain Si(110)-16??2 surfaces by scanning tunneling microscopy (STM). Three different types of well-ordered parallel arrays, consisting of uniformly spaced and atomically identical Ce silicide nanowires, are self-organized through the heteroepitaxial growth of Ce silicides on a long-range grating-like 16??2 reconstruction at the deposition of various Ce coverages. Each atomically precise Ce silicide nanowire consists of a bundle of chains and rows with different atomic structures. The atomic-resolution dual-polarity STM images reveal that the interchain coupling leads to the formation of the registry-aligned chain bundles within individual Ce silicide nanowire. The nanowire width and the interchain coupling can be adjusted systematically by varying the Ce coverage on a Si(110) surface. This natural template-directed self-organization of perfectly regular parallel nanowire arrays allows for the precise control of the feature size and positions within 0.2 nm over a large area. Thus, it is a promising route to produce parallel nanowire arrays in a straightforward, low-cost, high-throughput process. PMID:24188092

  19. Parallel multi-step nanolithography by nanoscale Cu-covered h-PDMS tip array

    NASA Astrophysics Data System (ADS)

    Chang, Yuan-Jen; Huang, Han-Kuan

    2014-09-01

    Tip-based nanolithography provides a flexible nanolithographic technology. Tip fabrication is one of the main challenges. In this paper, we propose to combine the dry etching of photoresist and electro-chemical machining to reduce the size of the tip opening. We successfully fabricate a tip opening with a diameter of 200?nm. After lithography and lift-off, gold dot patterns with a diameter of 280?nm are demonstrated. Moreover, a home-made multi-step exposure system is built and both the successful 14- and 44-step nanolithography by a tip array are also demonstrated in the paper.

  20. Parallel rendering techniques for massively parallel visualization

    SciTech Connect

    Hansen, C.; Krogh, M.; Painter, J.

    1995-07-01

    As the resolution of simulation models increases, scientific visualization algorithms which take advantage of the large memory. and parallelism of Massively Parallel Processors (MPPs) are becoming increasingly important. For large applications rendering on the MPP tends to be preferable to rendering on a graphics workstation due to the MPP`s abundant resources: memory, disk, and numerous processors. The challenge becomes developing algorithms that can exploit these resources while minimizing overhead, typically communication costs. This paper will describe recent efforts in parallel rendering for polygonal primitives as well as parallel volumetric techniques. This paper presents rendering algorithms, developed for massively parallel processors (MPPs), for polygonal, spheres, and volumetric data. The polygon algorithm uses a data parallel approach whereas the sphere and volume render use a MIMD approach. Implementations for these algorithms are presented for the Thinking Ma.chines Corporation CM-5 MPP.

  1. Advanced parallel processing with supercomputer architectures

    SciTech Connect

    Hwang, K.

    1987-10-01

    This paper investigates advanced parallel processing techniques and innovative hardware/software architectures that can be applied to boost the performance of supercomputers. Critical issues on architectural choices, parallel languages, compiling techniques, resource management, concurrency control, programming environment, parallel algorithms, and performance enhancement methods are examined and the best answers are presented. The authors cover advanced processing techniques suitable for supercomputers, high-end mainframes, minisupers, and array processors. The coverage emphasizes vectorization, multitasking, multiprocessing, and distributed computing. In order to achieve these operation modes, parallel languages, smart compilers, synchronization mechanisms, load balancing methods, mapping parallel algorithms, operating system functions, application library, and multidiscipline interactions are investigated to ensure high performance. At the end, they assess the potentials of optical and neural technologies for developing future supercomputers.

  2. Optimal expression evaluation for data parallel architectures

    NASA Technical Reports Server (NTRS)

    Gilbert, John R.; Schreiber, Robert

    1990-01-01

    A data parallel machine represents an array or other composite data structure by allocating one processor (at least conceptually) per data item. A pointwise operation can be performed between two such arrays in unit time, provided their corresponding elements are allocated in the same processors. If the arrays are not aligned in this fashion, the cost of moving one or both of them is part of the cost of the operation. The choice of where to perform the operation then affects this cost. If an expression with several operands is to be evaluated, there may be many choices of where to perform the intermediate operations. An efficient algorithm is given to find the minimum-cost way to evaluate an expression, for several different data parallel architectures. This algorithm applies to any architecture in which the metric describing the cost of moving an array is robust. This encompasses most of the common data parallel communication architectures, including meshes of arbitrary dimension and hypercubes. Remarks are made on several variations of the problem, some of which are solved and some of which remain open.

  3. Transport of nonconductive and conductive droplets in a parallel plate array

    NASA Astrophysics Data System (ADS)

    Chatterjee, Debalina; Hetayothin, Boonta

    2005-03-01

    Electrowetting on dielectric technique is used to actuate conductive liquid droplets on electrodes patterned beneath a dielectric. Nonconductive liquids can be transported electrohydrodynamically inside channels. We show for the first time that it is possible to transport droplets of nonconductive liquids on dielectric surfaces, using modest voltages and frequencies (<100 V, <10 kHz). Ionic liquids, aqueous surfactants, buffers, and organic solutions can also be transported. Although conductive liquids show a significant change in liquid contact angle on application of potential, nonconductive liquids do not, suggesting a different mechanism of transport. The empirical criteria for moving droplets in a two-dimensional array are a liquid dielectric constant >= 4.3 and a molecular dipole moment >= 1.2 D. The transport mechanisms are discussed along with new microfluidic applications that these results suggest are now feasible.

  4. High-resolution parallel-detection sensor array using piezo-phototronics effect

    DOEpatents

    Wang, Zhong L.; Pan, Caofeng

    2015-07-28

    A pressure sensor element includes a substrate, a first type of semiconductor material layer and an array of elongated light-emitting piezoelectric nanostructures extending upwardly from the first type of semiconductor material layer. A p-n junction is formed between each nanostructure and the first type semiconductor layer. An insulative resilient medium layer is infused around each of the elongated light-emitting piezoelectric nanostructures. A transparent planar electrode, disposed on the resilient medium layer, is electrically coupled to the top of each nanostructure. A voltage source is coupled to the first type of semiconductor material layer and the transparent planar electrode and applies a biasing voltage across each of the nanostructures. Each nanostructure emits light in an intensity that is proportional to an amount of compressive strain applied thereto.

  5. Design and optimization of multi-class series-parallel linear electromagnetic array artificial muscle.

    PubMed

    Li, Jing; Ji, Zhenyu; Shi, Xuetao; You, Fusheng; Fu, Feng; Liu, Ruigang; Xia, Junying; Wang, Nan; Bai, Jing; Wang, Zhanxi; Qin, Xiansheng; Dong, Xiuzhen

    2014-01-01

    Skeletal muscle exhibiting complex and excellent precision has evolved for millions of years. Skeletal muscle has better performance and simpler structure compared with existing driving modes. Artificial muscle may be designed by analyzing and imitating properties and structure of skeletal muscle based on bionics, which has been focused on by bionic researchers, and a structure mode of linear electromagnetic array artificial muscle has been designed in this paper. Half sarcomere is the minimum unit of artificial muscle and electromagnetic model has been built. The structural parameters of artificial half sarcomere actuator were optimized to achieve better movement performance. Experimental results show that artificial half sarcomere actuator possesses great motion performance such as high response speed, great acceleration, small weight and size, robustness, etc., which presents a promising application prospect of artificial half sarcomere actuator. PMID:24211938

  6. A periodic array of nano-scale parallel slats for high-efficiency electroosmotic pumping.

    PubMed

    Kung, Chun-Fei; Wang, Chang-Yi; Chang, Chien-Cheng

    2013-12-01

    It is known that the eletroosmotic (EO) flow rate through a nano-scale channel is extremely small. A channel made of a periodic array of slats is proposed to effectively promote the EO pumping, and thus greatly improve the EO flow rate. The geometrically simple array is complicated enough that four length scales are involved: the vertical period 2L, lateral period 2aL, width of the slat 2cL as well as the Debye length λD. The EO pumping rate is determined by the normalized lengths: a, c, or the perforation fraction of slats η=1-(c/a) and the dimensionless electrokinetic width K=L/λD. In a nano-scale channel, K is of order unity or less. EO pumping in both longitudinal and transverse directions (denoted as longitudinal EO pumping (LEOP) and transverse EO pumping (TEOP), respectively) is investigated by solving the Debye-Hückel approximation and viscous electro-kinetic equation. The main findings include that (i) the EO pumping rates of LEOP for small K are remarkably improved (by one order of magnitude) when we have longer slats (a≫1) and a large perforation fraction of slats (η > 0.7); (ii) the EO pumping rates of TEOP for small K can also be much improved but less significantly with longer slats and a large perforation fraction of slats. Nevertheless, it must be noted that in practice K cannot be made arbitrarily small as the criterion of φc≈0 for the reference potential at the channel center put lower bounds on K; in other words, there are geometrical limits for the use of the Poisson-Boltzmann equation. PMID:24105905

  7. Detection and characterization of an activity which aligns mesodermal cells into parallel arrays.

    PubMed

    Li, S F; Klajn, E; Marotta, R; Parish, R W

    1997-04-01

    A cell line of mesodermal origin, FS9, was found to release a Cell Orienting Factor into its culture medium. In contrast with the random migration of controls, the orienting activity causes migrating mesenchymal cells to form an orderly "halo' surrounding tissue explants; individual cells and their cytoskeletons are elongated and parallel to each other but at right angle to the explant. No effect on the rate of cell movement was apparent. The orienting activity could be quantified by counting the number of cells found within strings radiating at right angles to a single tissue explant in the presence of FS9 conditioned medium or by using NIH image analysis. A dose dependent relationship with half maximal activity occurring at a 25% dilution of conditioned medium was observed. Cells that migrated randomly in the absence of conditioned medium became oriented within 4 h of exposure to 50% conditioned medium. Conversely, when the conditioned medium was removed, parallel alignment was rapidly lost. The orienting activity was found in conditioned media from a variety of mesodermal derivatives. Transformation of Balb/c 3T3 cells using EJ-ras oncogene led to augmented production of the activity. Furthermore, insulin was required in serum-free medium to support its production, Laminin, fibronectin and collagen and a range of pure cytokines, neither promoted nor inhibited orientation. Cell alignment was also unaffected by treatments which interfered with cell-substrate interactions and motility including the addition of the RGD peptide or anti-integrin beta 1 and beta 3 antibodies. A protein is likely to be involved since the activity was heat and trypsin sensitive and non-dialysable. The possibility is discussed that the orienting activity is a novel protein(s) which alters intercellular interactions to promote the formation of an aligned pattern by migrating mesenchymal cells. PMID:9127262

  8. Supercomputing on massively parallel bit-serial architectures

    NASA Technical Reports Server (NTRS)

    Iobst, Ken

    1985-01-01

    Research on the Goodyear Massively Parallel Processor (MPP) suggests that high-level parallel languages are practical and can be designed with powerful new semantics that allow algorithms to be efficiently mapped to the real machines. For the MPP these semantics include parallel/associative array selection for both dense and sparse matrices, variable precision arithmetic to trade accuracy for speed, micro-pipelined train broadcast, and conditional branching at the processing element (PE) control unit level. The preliminary design of a FORTRAN-like parallel language for the MPP has been completed and is being used to write programs to perform sparse matrix array selection, min/max search, matrix multiplication, Gaussian elimination on single bit arrays and other generic algorithms. A description is given of the MPP design. Features of the system and its operation are illustrated in the form of charts and diagrams.

  9. Parallel image-acquisition in continuous-wave electron paramagnetic resonance imaging with a surface coil array: Proof-of-concept experiments

    NASA Astrophysics Data System (ADS)

    Enomoto, Ayano; Hirata, Hiroshi

    2014-02-01

    This article describes a feasibility study of parallel image-acquisition using a two-channel surface coil array in continuous-wave electron paramagnetic resonance (CW-EPR) imaging. Parallel EPR imaging was performed by multiplexing of EPR detection in the frequency domain. The parallel acquisition system consists of two surface coil resonators and radiofrequency (RF) bridges for EPR detection. To demonstrate the feasibility of this method of parallel image-acquisition with a surface coil array, three-dimensional EPR imaging was carried out using a tube phantom. Technical issues in the multiplexing method of EPR detection were also clarified. We found that degradation in the signal-to-noise ratio due to the interference of RF carriers is a key problem to be solved.

  10. Parallel image-acquisition in continuous-wave electron paramagnetic resonance imaging with a surface coil array: Proof-of-concept experiments.

    PubMed

    Enomoto, Ayano; Hirata, Hiroshi

    2014-02-01

    This article describes a feasibility study of parallel image-acquisition using a two-channel surface coil array in continuous-wave electron paramagnetic resonance (CW-EPR) imaging. Parallel EPR imaging was performed by multiplexing of EPR detection in the frequency domain. The parallel acquisition system consists of two surface coil resonators and radiofrequency (RF) bridges for EPR detection. To demonstrate the feasibility of this method of parallel image-acquisition with a surface coil array, three-dimensional EPR imaging was carried out using a tube phantom. Technical issues in the multiplexing method of EPR detection were also clarified. We found that degradation in the signal-to-noise ratio due to the interference of RF carriers is a key problem to be solved. PMID:24374749

  11. Upset Characterization of the PowerPC405 Hard-core Processor Embedded in Virtex-II Pro Field Programmable Gate Arrays

    NASA Technical Reports Server (NTRS)

    Swift, Gary M.; Allen, Gregory S.; Farmanesh, Farhad; George, Jeffrey; Petrick, David J.; Chayab, Fayez

    2006-01-01

    Shown in this presentation are recent results for the upset susceptibility of the various types of memory elements in the embedded PowerPC405 in the Xilinx V2P40 FPGA. For critical flight designs where configuration upsets are mitigated effectively through appropriate design triplication and configuration scrubbing, these upsets of processor elements can dominate the system error rate. Data from irradiations with both protons and heavy ions are given and compared using available models.

  12. Parallel recognition of cancer cells using an addressable array of solid-state micropores.

    PubMed

    Ilyas, Azhar; Asghar, Waseem; Kim, Young-tae; Iqbal, Samir M

    2014-12-15

    Early stage detection and precise quantification of circulating tumor cells (CTCs) in the peripheral blood of cancer patients are important for early diagnosis. Early diagnosis improves the effectiveness of the therapy and results in better prognosis. Several techniques have been used for CTC detection but are limited by their need for dye tagging, low throughput and lack of statistical reliability at single cell level. Solid-state micropores can characterize each cell in a sample providing interesting information about cellular populations. We report a multi-channel device which utilized solid-state micropores array assembly for simultaneous measurement of cell translocation. This increased the throughput of measurement and as the cells passed the micropores, tumor cells showed distinctive current blockade pulses, when compared to leukocytes. The ionic current across each micropore channel was continuously monitored and recorded. The measurement system not only increased throughput but also provided on-chip cross-relation. The whole blood was lysed to get rid of red blood cells, so the blood dilution was not needed. The approach facilitated faster processing of blood samples with tumor cell detection efficiency of about 70%. The design provided a simple and inexpensive method for rapid and reliable detection of tumor cells without any cell staining or surface functionalization. The device can also be used for high throughput electrophysiological analysis of other cell types. PMID:25038540

  13. Scripts for Scalable Monitoring of Parallel Filesystem Infrastructure

    Energy Science and Technology Software Center (ESTSC)

    2014-02-27

    Scripts for scalable monitoring of parallel filesystem infrastructure provide frameworks for monitoring the health of block storage arrays and large InfiniBand fabrics. The block storage framework uses Python multiprocessing to within scale the number monitored arrays to scale with the number of processors in the system. This enables live monitoring of HPC-scale filesystem with 10-50 storage arrays. For InfiniBand monitoring, there are scripts included that monitor InfiniBand health of each host along with visualization toolsmore » for mapping the topology of complex fabric topologies.« less

  14. Scripts for Scalable Monitoring of Parallel Filesystem Infrastructure

    SciTech Connect

    2014-02-27

    Scripts for scalable monitoring of parallel filesystem infrastructure provide frameworks for monitoring the health of block storage arrays and large InfiniBand fabrics. The block storage framework uses Python multiprocessing to within scale the number monitored arrays to scale with the number of processors in the system. This enables live monitoring of HPC-scale filesystem with 10-50 storage arrays. For InfiniBand monitoring, there are scripts included that monitor InfiniBand health of each host along with visualization tools for mapping the topology of complex fabric topologies.

  15. Massively parallel information processing systems for space applications

    NASA Technical Reports Server (NTRS)

    Schaefer, D. H.

    1979-01-01

    NASA is developing massively parallel systems for ultra high speed processing of digital image data collected by satellite borne instrumentation. Such systems contain thousands of processing elements. Work is underway on the design and fabrication of the 'Massively Parallel Processor', a ground computer containing 16,384 processing elements arranged in a 128 x 128 array. This computer uses existing technology. Advanced work includes the development of semiconductor chips containing thousands of feedthrough paths. Massively parallel image analog to digital conversion technology is also being developed. The goal is to provide compact computers suitable for real-time onboard processing of images.

  16. Opto-electronic morphological processor

    NASA Technical Reports Server (NTRS)

    Yu, Jeffrey W. (Inventor); Chao, Tien-Hsin (Inventor); Cheng, Li J. (Inventor); Psaltis, Demetri (Inventor)

    1993-01-01

    The opto-electronic morphological processor of the present invention is capable of receiving optical inputs and emitting optical outputs. The use of optics allows implementation of parallel input/output, thereby overcoming a major bottleneck in prior art image processing systems. The processor consists of three components, namely, detectors, morphological operators and modulators. The detectors and operators are fabricated on a silicon VLSI chip and implement the optical input and morphological operations. A layer of ferro-electric liquid crystals is integrated with a silicon chip to provide the optical modulation. The implementation of the image processing operators in electronics leads to a wide range of applications and the use of optical connections allows cascadability of these parallel opto-electronic image processing components and high speed operation. Such an opto-electronic morphological processor may be used as the pre-processing stage in an image recognition system. In one example disclosed herein, the optical input/optical output morphological processor of the invention is interfaced with a binary phase-only correlator to produce an image recognition system.

  17. Programmable pipelined image processor

    NASA Technical Reports Server (NTRS)

    Gennery, Donald B. (inventor); Wilcox, Brian (inventor)

    1988-01-01

    A pipelined image processor selectively interconnects modules in a column of a two-dimensional array to modules of the next column of the array of modules 1,1 through M,N, where M is the number of modules in one dimension and N is the number of modules in the other direction. Each module includes two input selectors for A and B inputs, two convolvers, a binary function operator, a neighborhood comparison operator which produces an A output and an output selector which may select as a B output the output of any one of the components in the module, including the A output of the neighborhood comparison operator. Each module may be connected to as many as eight modules in the next column, preferably with the majority always in a different row that is up (or down) in the array for a generally spiral data path around the torus thus formed. The binary function operator is implemented as a look-up table addressed by the most significant 8 bits of each 12-bit argument. The table output includes a function value and the slopes for interpolation of the two arguments by multiplying the 4 least significant bits in multipliers and adding the products to the function value through adders.

  18. Design and implementation of highly parallel pipelined VLSI systems

    NASA Astrophysics Data System (ADS)

    Delange, Alphonsus Anthonius Jozef

    A methodology and its realization as a prototype CAD (Computer Aided Design) system for the design and analysis of complex multiprocessor systems is presented. The design is an iterative process in which the behavioral specifications of the system components are refined into structural descriptions consisting of interconnections and lower level components etc. A model for the representation and analysis of multiprocessor systems at several levels of abstraction and an implementation of a CAD system based on this model are described. A high level design language, an object oriented development kit for tool design, a design data management system, and design and analysis tools such as a high level simulator and graphics design interface which are integrated into the prototype system and graphics interface are described. Procedures for the synthesis of semiregular processor arrays, and to compute the switching of input/output signals, memory management and control of processor array, and sequencing and segmentation of input/output data streams due to partitioning and clustering of the processor array during the subsequent synthesis steps, are described. The architecture and control of a parallel system is designed and each component mapped to a module or module generator in a symbolic layout library, compacted for design rules of VLSI (Very Large Scale Integration) technology. An example of the design of a processor that is a useful building block for highly parallel pipelined systems in the signal/image processing domains is given.

  19. Massively parallel electron beam direct writing (MPEBDW) system based on micro-electro-mechanical system (MEMS)/nanocrystalineSi emitter array

    NASA Astrophysics Data System (ADS)

    Kojima, A.; Ikegami, N.; Yoshida, T.; Miyaguchi, H.; Muroyama, M.; Nishino, H.; Yoshida, S.; Sugata, M.; Ohyi, H.; Koshida, N.; Esashi, M.

    2014-03-01

    The characteristics of a prototype massively parallel electron beam direct writing (MPEBDW) system are demonstrated. The electron optics consist of an emitter array, a micro-electro-mechanical system (MEMS) condenser lens array, auxiliary lenses, a stigmator, three-stage deflectors to align and scan the parallel beams, and an objective lens acting as a reduction lens. The emitter array produces 10000 programmable 10 μm square beams. The electron emitter is a nanocrystalline silicon (nc-Si) ballistic electron emitter array integrated with an active matrix driver LSI for high-speed emission current control. Because the LSI also has a field curvature correction function, the system can use a large electron emitter array. In this system, beams that are incident on the outside of the paraxial region of the reduction lens can also be used through use of the optical aberration correction functions. The exposure pattern is stored in the active matrix LSI's memory. Alignment between the emitter array and the condenser lens array is performed by moving the emitter stage that slides along the x- and y-axes, and rotates around the z-theta axis. The electrons of all beams are accelerated, and pass through the anode array. The stigmator and the two-stage deflectors perform fine adjustments to the beam positions. The other deflector simultaneously scans all parallel beams to synchronize the moving target stage. Exposure is carried out by moving the target stage that holds the wafer. The reduction lens focuses all beams on the target wafer surface, and the electron optics of the column reduces the electron image to 0.1% of its original size.

  20. Photorefractive processing for large adaptive phased arrays.

    PubMed

    Weverka, R T; Wagner, K; Sarto, A

    1996-03-10

    An adaptive null-steering phased-array optical processor that utilizes a photorefractive crystal to time integrate the adaptive weights and null out correlated jammers is described. This is a beam-steering processor in which the temporal waveform of the desired signal is known but the look direction is not. The processor computes the angle(s) of arrival of the desired signal and steers the array to look in that direction while rotating the nulls of the antenna pattern toward any narrow-band jammers that may be present. We have experimentally demonstrated a simplified version of this adaptive phased-array-radar processor that nulls out the narrow-band jammers by using feedback-correlation detection. In this processor it is assumed that we know a priori only that the signal is broadband and the jammers are narrow band. These are examples of a class of optical processors that use the angular selectivity of volume holograms to form the nulls and look directions in an adaptive phased-array-radar pattern and thereby to harness the computational abilities of three-dimensional parallelism in the volume of photorefractive crystals. The development of this processing in volume holographic system has led to a new algorithm for phased-array-radar processing that uses fewer tapped-delay lines than does the classic time-domain beam former. The optical implementation of the new algorithm has the further advantage of utilization of a single photorefractive crystal to implement as many as a million adaptive weights, allowing the radar system to scale to large size with no increase in processing hardware. PMID:21085246

  1. Digital optical processor based on symbolic substitution using holographic matched filtering

    NASA Astrophysics Data System (ADS)

    Jeon, Ho-In; Abushagur, Mustafa A. G.; Sawchuk, Alexander A.; Jenkins, B. Keith

    1990-05-01

    A digital optical arithmetic processor design based on symbolic substitution using holographic matched and space-invariant filters is proposed. The system performs Boolean logic, binary addition, and subtraction in a highly parallel manner; i.e., the processing time depends on word size but not array size. Algorithms for performing binary addition and subtraction in parallel are presented. A skew problem occurring when symbolic substitution is applied to binary addition and subtraction with space-invariant systems is addressed, and its solution is suggested. Crosstalk in symbolic substitution is described, and new symbols which can prevent the crosstalk are introduced. System analysis and fundamental limitations of the proposed system are also presented in terms of processing time, overall light efficiency, and the maximum array size of the input data plane. The performance of the proposed system with that of the current electronic supercomputers has been compared by combining information about the processing time and maximum array size.

  2. Calculating electronic tunnel currents in networks of disordered irregularly shaped nanoparticles by mapping networks to arrays of parallel nonlinear resistors

    SciTech Connect

    Aghili Yajadda, Mir Massoud

    2014-10-21

    We have shown both theoretically and experimentally that tunnel currents in networks of disordered irregularly shaped nanoparticles (NPs) can be calculated by considering the networks as arrays of parallel nonlinear resistors. Each resistor is described by a one-dimensional or a two-dimensional array of equal size nanoparticles that the tunnel junction gaps between nanoparticles in each resistor is assumed to be equal. The number of tunnel junctions between two contact electrodes and the tunnel junction gaps between nanoparticles are found to be functions of Coulomb blockade energies. In addition, the tunnel barriers between nanoparticles were considered to be tilted at high voltages. Furthermore, the role of thermal expansion coefficient of the tunnel junction gaps on the tunnel current is taken into account. The model calculations fit very well to the experimental data of a network of disordered gold nanoparticles, a forest of multi-wall carbon nanotubes, and a network of few-layer graphene nanoplates over a wide temperature range (5-300 K) at low and high DC bias voltages (0.001 mV50 V). Our investigations indicate, although electron cotunneling in networks of disordered irregularly shaped NPs may occur, non-Arrhenius behavior at low temperatures cannot be described by the cotunneling model due to size distribution in the networks and irregular shape of nanoparticles. Non-Arrhenius behavior of the samples at zero bias voltage limit was attributed to the disorder in the samples. Unlike the electron cotunneling model, we found that the crossover from Arrhenius to non-Arrhenius behavior occurs at two temperatures, one at a high temperature and the other at a low temperature.

  3. Calculating electronic tunnel currents in networks of disordered irregularly shaped nanoparticles by mapping networks to arrays of parallel nonlinear resistors

    NASA Astrophysics Data System (ADS)

    Aghili Yajadda, Mir Massoud

    2014-10-01

    We have shown both theoretically and experimentally that tunnel currents in networks of disordered irregularly shaped nanoparticles (NPs) can be calculated by considering the networks as arrays of parallel nonlinear resistors. Each resistor is described by a one-dimensional or a two-dimensional array of equal size nanoparticles that the tunnel junction gaps between nanoparticles in each resistor is assumed to be equal. The number of tunnel junctions between two contact electrodes and the tunnel junction gaps between nanoparticles are found to be functions of Coulomb blockade energies. In addition, the tunnel barriers between nanoparticles were considered to be tilted at high voltages. Furthermore, the role of thermal expansion coefficient of the tunnel junction gaps on the tunnel current is taken into account. The model calculations fit very well to the experimental data of a network of disordered gold nanoparticles, a forest of multi-wall carbon nanotubes, and a network of few-layer graphene nanoplates over a wide temperature range (5-300 K) at low and high DC bias voltages (0.001 mV-50 V). Our investigations indicate, although electron cotunneling in networks of disordered irregularly shaped NPs may occur, non-Arrhenius behavior at low temperatures cannot be described by the cotunneling model due to size distribution in the networks and irregular shape of nanoparticles. Non-Arrhenius behavior of the samples at zero bias voltage limit was attributed to the disorder in the samples. Unlike the electron cotunneling model, we found that the crossover from Arrhenius to non-Arrhenius behavior occurs at two temperatures, one at a high temperature and the other at a low temperature.

  4. Development of a bench-top device for parallel climate-controlled recordings of neuronal cultures activity with microelectrode arrays.

    PubMed

    Regalia, Giulia; Biffi, Emilia; Achilli, Silvia; Ferrigno, Giancarlo; Menegon, Andrea; Pedrocchi, Alessandra

    2016-02-01

    Two binding requirements for in vitro studies on long-term neuronal networks dynamics are (i) finely controlled environmental conditions to keep neuronal cultures viable and provide reliable data for more than a few hours and (ii) parallel operation on multiple neuronal cultures to shorten experimental time scales and enhance data reproducibility. In order to fulfill these needs with a Microelectrode Arrays (MEA)-based system, we designed a stand-alone device that permits to uninterruptedly monitor neuronal cultures activity over long periods, overcoming drawbacks of existing MEA platforms. We integrated in a single device: (i) a closed chamber housing four MEAs equipped with access for chemical manipulations, (ii) environmental control systems and embedded sensors to reproduce and remotely monitor the standard in vitro culture environment on the lab bench (i.e. in terms of temperature, air CO2 and relative humidity), and (iii) a modular MEA interface analog front-end for reliable and parallel recordings. The system has been proven to assure environmental conditions stable, physiological and homogeneos across different cultures. Prolonged recordings (up to 10 days) of spontaneous and pharmacologically stimulated neuronal culture activity have not shown signs of rundown thanks to the environmental stability and have not required to withdraw the cells from the chamber for culture medium manipulations. This system represents an effective MEA-based solution to elucidate neuronal network phenomena with slow dynamics, such as long-term plasticity, effects of chronic pharmacological stimulations or late-onset pathological mechanisms. Biotechnol. Bioeng. 2016;113: 403-413. 2015 Wiley Periodicals, Inc. PMID:26301335

  5. Highly Parallel Computing Architectures by using Arrays of Quantum-dot Cellular Automata (QCA): Opportunities, Challenges, and Recent Results

    NASA Technical Reports Server (NTRS)

    Fijany, Amir; Toomarian, Benny N.

    2000-01-01

    There has been significant improvement in the performance of VLSI devices, in terms of size, power consumption, and speed, in recent years and this trend may also continue for some near future. However, it is a well known fact that there are major obstacles, i.e., physical limitation of feature size reduction and ever increasing cost of foundry, that would prevent the long term continuation of this trend. This has motivated the exploration of some fundamentally new technologies that are not dependent on the conventional feature size approach. Such technologies are expected to enable scaling to continue to the ultimate level, i.e., molecular and atomistic size. Quantum computing, quantum dot-based computing, DNA based computing, biologically inspired computing, etc., are examples of such new technologies. In particular, quantum-dots based computing by using Quantum-dot Cellular Automata (QCA) has recently been intensely investigated as a promising new technology capable of offering significant improvement over conventional VLSI in terms of reduction of feature size (and hence increase in integration level), reduction of power consumption, and increase of switching speed. Quantum dot-based computing and memory in general and QCA specifically, are intriguing to NASA due to their high packing density (10(exp 11) - 10(exp 12) per square cm ) and low power consumption (no transfer of current) and potentially higher radiation tolerant. Under Revolutionary Computing Technology (RTC) Program at the NASA/JPL Center for Integrated Space Microelectronics (CISM), we have been investigating the potential applications of QCA for the space program. To this end, exploiting the intrinsic features of QCA, we have designed novel QCA-based circuits for co-planner (i.e., single layer) and compact implementation of a class of data permutation matrices, a class of interconnection networks, and a bit-serial processor. Building upon these circuits, we have developed novel algorithms and QCA-based architectures for highly parallel and systolic computation of signal/image processing applications, such as FFT and Wavelet and Wlash-Hadamard Transforms.

  6. Algorithmically specialized parallel computers

    SciTech Connect

    Snyder, L.; Jamieson, L.H.; Gannon, D.B.; Siegel, H.J.

    1985-01-01

    This book is based on a workshop which dealt with array processors. Topics considered include algorithmic specialization using VLSI, innovative architectures, signal processing, speech recognition, image processing, specialized architectures for numerical computations, and general-purpose computers.

  7. Global image processing operations on parallel architectures

    NASA Astrophysics Data System (ADS)

    Webb, Jon A.

    1990-09-01

    Image processing operations fall into two classes: local and global. Local operations affect only a small corresponding area in the output image, and include edge detection, smoothing, and point operations. In global operations any input pixel can affect any or a large number of output data. Global operations include histogram, image warping, Hough transform, and connected components. Parallel architectures offer a promising method for speeding up these image processing operations. Local operations are easy to parallelize, because the input data can be divided among processors, processed in parallel separately, then the outputs can be combined by concatenation. Global operations are harder to parallelize. In fact, some global operations cannot be executed in parallel; it is possible for a global operation to require serial execution for correct computation of the result. However, an important class of global operations, namely those that are reversible-that can be computed in forward or reverse order on a data structure-can be computed in parallel using a restricted form of divide and conquer called split and merge. These reversible operations include the global operations mentioned above, and many more besides-even such non-image processing operations as parsing, string search, and sorting. The split and merge method will be illustrated by application of it to these algorithms. Performance analysis of the method on different architectures-one-dimensional, two-dimensional, and binary tree processor arrays will be demonstrated.

  8. Online track processor for the CDF upgrade

    SciTech Connect

    E. J. Thomson et al.

    2002-07-17

    A trigger track processor, called the eXtremely Fast Tracker (XFT), has been designed for the CDF upgrade. This processor identifies high transverse momentum (> 1.5 GeV/c) charged particles in the new central outer tracking chamber for CDF II. The XFT design is highly parallel to handle the input rate of 183 Gbits/s and output rate of 44 Gbits/s. The processor is pipelined and reports the result for a new event every 132 ns. The processor uses three stages: hit classification, segment finding, and segment linking. The pattern recognition algorithms for the three stages are implemented in programmable logic devices (PLDs) which allow in-situ modification of the algorithm at any time. The PLDs reside on three different types of modules. The complete system has been installed and commissioned at CDF II. An overview of the track processor and performance in CDF Run II are presented.

  9. Reconfigurable VLSI architecture for a database processor

    SciTech Connect

    Oflazer, K.

    1983-01-01

    This work brings together the processing potential offered by regularly structured VLSI processing units and the architecture of a database processor-the relational associative processor (RAP). The main motivations are to integrate a RAP cell processor on a few VLSI chips and improve performance by employing procedures exploiting these VLSI chips and the system level reconfigurability of processing resources. The resulting VLSI database processor consists of parallel processing cells that can be reconfigured into a large processor to execute the hard operations of projection and semijoin efficiently. It is shown that such a configuration can provide 2 to 3 orders of magnitude of performance improvement over previous implementations of the RAP system in the execution of such operations. 27 refs.

  10. Implementing Access to Data Distributed on Many Processors

    NASA Technical Reports Server (NTRS)

    James, Mark

    2006-01-01

    A reference architecture is defined for an object-oriented implementation of domains, arrays, and distributions written in the programming language Chapel. This technology primarily addresses domains that contain arrays that have regular index sets with the low-level implementation details being beyond the scope of this discussion. What is defined is a complete set of object-oriented operators that allows one to perform data distributions for domain arrays involving regular arithmetic index sets. What is unique is that these operators allow for the arbitrary regions of the arrays to be fragmented and distributed across multiple processors with a single point of access giving the programmer the illusion that all the elements are collocated on a single processor. Today's massively parallel High Productivity Computing Systems (HPCS) are characterized by a modular structure, with a large number of processing and memory units connected by a high-speed network. Locality of access as well as load balancing are primary concerns in these systems that are typically used for high-performance scientific computation. Data distributions address these issues by providing a range of methods for spreading large data sets across the components of a system. Over the past two decades, many languages, systems, tools, and libraries have been developed for the support of distributions. Since the performance of data parallel applications is directly influenced by the distribution strategy, users often resort to low-level programming models that allow fine-tuning of the distribution aspects affecting performance, but, at the same time, are tedious and error-prone. This technology presents a reusable design of a data-distribution framework for data parallel high-performance applications. Distributions are a means to express locality in systems composed of large numbers of processor and memory components connected by a network. Since distributions have a great effect on the performance of applications, it is important that the distribution strategy is flexible, so its behavior can change depending on the needs of the application. At the same time, high productivity concerns require that the user be shielded from error-prone, tedious details such as communication and synchronization.

  11. Parallel asynchronous systems and image processing algorithms

    NASA Technical Reports Server (NTRS)

    Coon, D. D.; Perera, A. G. U.

    1989-01-01

    A new hardware approach to implementation of image processing algorithms is described. The approach is based on silicon devices which would permit an independent analog processing channel to be dedicated to evey pixel. A laminar architecture consisting of a stack of planar arrays of the device would form a two-dimensional array processor with a 2-D array of inputs located directly behind a focal plane detector array. A 2-D image data stream would propagate in neuronlike asynchronous pulse coded form through the laminar processor. Such systems would integrate image acquisition and image processing. Acquisition and processing would be performed concurrently as in natural vision systems. The research is aimed at implementation of algorithms, such as the intensity dependent summation algorithm and pyramid processing structures, which are motivated by the operation of natural vision systems. Implementation of natural vision algorithms would benefit from the use of neuronlike information coding and the laminar, 2-D parallel, vision system type architecture. Besides providing a neural network framework for implementation of natural vision algorithms, a 2-D parallel approach could eliminate the serial bottleneck of conventional processing systems. Conversion to serial format would occur only after raw intensity data has been substantially processed. An interesting challenge arises from the fact that the mathematical formulation of natural vision algorithms does not specify the means of implementation, so that hardware implementation poses intriguing questions involving vision science.

  12. Detection and Classification of Low Probability of Intercept Radar Signals Using Parallel Filter Arrays and Higher Order Statistics

    NASA Astrophysics Data System (ADS)

    Taboada, Fernando L.

    2002-09-01

    Low probability of intercept (LPI) is that property of an emitter that because of its low power, wide bandwidth, frequency variability, or other design attributes, makes it difficult to be detected or identified by means of passive intercept devices such as radar warning, electronic support and electronic intelligence receivers. In order to detect LPI radar waveforms new signal processing techniques are required. This thesis first develops a MATLAB toolbox to generate important types of LPI waveforms based on frequency and phase modulation. The power spectral density and the periodic ambiguity function are examined for each waveforms. These signals are then used to test a novel signal processing technique that detects the waveforms parameters and classifies the intercepted signal in various degrees of noise. The technique is based on the use of parallel filter (sub-band) arrays and higher order statistics (third-order cumulant estimator). Each sub-band signal is treated individually and is followed by the third-order estimator in order to suppress any symmetrical noise that might be present. The significance of this technique is that it separates the LPI waveforms in small frequency bands, providing a detailed time-frequency description of the unknown signal. Finally, the resulting output matrix is processed by a feature extraction routine to detect the waveforms parameters. Identification of the signal is based on the modulation parameters detected.

  13. Infrared laser transillumination CT imaging system using parallel fiber arrays and optical switches for finger joint imaging

    NASA Astrophysics Data System (ADS)

    Sasaki, Yoshiaki; Emori, Ryota; Inage, Hiroki; Goto, Masaki; Takahashi, Ryo; Yuasa, Tetsuya; Taniguchi, Hiroshi; Devaraj, Balasigamani; Akatsuka, Takao

    2004-05-01

    The heterodyne detection technique, on which the coherent detection imaging (CDI) method founds, can discriminate and select very weak, highly directional forward scattered, and coherence retaining photons that emerge from scattering media in spite of their complex and highly scattering nature. That property enables us to reconstruct tomographic images using the same reconstruction technique as that of X-Ray CT, i.e., the filtered backprojection method. Our group had so far developed a transillumination laser CT imaging method based on the CDI method in the visible and near-infrared regions and reconstruction from projections, and reported a variety of tomographic images both in vitro and in vivo of biological objects to demonstrate the effectiveness to biomedical use. Since the previous system was not optimized, it took several hours to obtain a single image. For a practical use, we developed a prototype CDI-based imaging system using parallel fiber array and optical switches to reduce the measurement time significantly. Here, we describe a prototype transillumination laser CT imaging system using fiber-optic based on optical heterodyne detection for early diagnosis of rheumatoid arthritis (RA), by demonstrating the tomographic imaging of acrylic phantom as well as the fundamental imaging properties. We expect that further refinements of the fiber-optic-based laser CT imaging system could lead to a novel and practical diagnostic tool for rheumatoid arthritis and other joint- and bone-related diseases in human finger.

  14. Quantitative analysis of RNA-protein interactions on a massively parallel array for mapping biophysical and evolutionary landscapes

    PubMed Central

    Buenrostro, Jason D.; Chircus, Lauren M.; Araya, Carlos L.; Layton, Curtis J.; Chang, Howard Y.; Snyder, Michael P.; Greenleaf, William J.

    2015-01-01

    RNA-protein interactions drive fundamental biological processes and are targets for molecular engineering, yet quantitative and comprehensive understanding of the sequence determinants of affinity remains limited. Here we repurpose a high-throughput sequencing instrument to quantitatively measure binding and dissociation of MS2 coat protein to >107 RNA targets generated on a flow-cell surface by in situ transcription and inter-molecular tethering of RNA to DNA. We decompose the binding energy contributions from primary and secondary RNA structure, finding that differences in affinity are often driven by sequence-specific changes in association rates. By analyzing the biophysical constraints and modeling mutational paths describing the molecular evolution of MS2 from low- to high-affinity hairpins, we quantify widespread molecular epistasis, and a long-hypothesized structure-dependent preference for G:U base pairs over C:A intermediates in evolutionary trajectories. Our results suggest that quantitative analysis of RNA on a massively parallel array (RNAMaP) relationships across molecular variants. PMID:24727714

  15. Fault-tolerant computer architecture based on INMOS transputer processor

    NASA Technical Reports Server (NTRS)

    Ortiz, Jorge L.

    1987-01-01

    Redundant processing was used for several years in mission flight systems. In these systems, more than one processor performs the same task at the same time but only one processor is actually in real use. A fault-tolerance computer architecture based on the features provided by INMOS Transputers is presented. The Transputer architecture provides several communication links that allow data and command communication with other Transputers without the use of a bus. Additionally the Transputer allows the use of parallel processing to increase the system speed considerably. The processor architecture consists of three processors working in parallel keeping all the processors at the same operational level but only one processor is in real control of the process. The design allows each Transputer to perform a test to the other two Transputers and report the operating condition of the neighboring processors. A graphic display was developed to facilitate the identification of any problem by the user.

  16. VLIW processor architecture adapted to FPAs

    NASA Astrophysics Data System (ADS)

    Petit, Laurent; Legat, Jean-Didier

    1998-09-01

    A new processor architecture intended to be integrated with a CMOS image sensor is presented. This association allows to design an intelligent camera that can perform on-chip image processing tasks. The processor is based on a VLIW architecture with a reduced instruction bus, able to execute multiple instructions in a parallel without any loss of performance. In addition, no more instruction cache is required, decreasing in this way the hardware complexity.

  17. A cost-effective methodology for the design of massively-parallel VLSI functional units

    NASA Technical Reports Server (NTRS)

    Venkateswaran, N.; Sriram, G.; Desouza, J.

    1993-01-01

    In this paper we propose a generalized methodology for the design of cost-effective massively-parallel VLSI Functional Units. This methodology is based on a technique of generating and reducing a massive bit-array on the mask-programmable PAcube VLSI array. This methodology unifies (maintains identical data flow and control) the execution of complex arithmetic functions on PAcube arrays. It is highly regular, expandable and uniform with respect to problem-size and wordlength, thereby reducing the communication complexity. The memory-functional unit interface is regular and expandable. Using this technique functional units of dedicated processors can be mask-programmed on the naked PAcube arrays, reducing the turn-around time. The production cost of such dedicated processors can be drastically reduced since the naked PAcube arrays can be mass-produced. Analysis of the the performance of functional units designed by our method yields promising results.

  18. NeuroFlow: A General Purpose Spiking Neural Network Simulation Platform using Customizable Processors

    PubMed Central

    Cheung, Kit; Schultz, Simon R.; Luk, Wayne

    2016-01-01

    NeuroFlow is a scalable spiking neural network simulation platform for off-the-shelf high performance computing systems using customizable hardware processors such as Field-Programmable Gate Arrays (FPGAs). Unlike multi-core processors and application-specific integrated circuits, the processor architecture of NeuroFlow can be redesigned and reconfigured to suit a particular simulation to deliver optimized performance, such as the degree of parallelism to employ. The compilation process supports using PyNN, a simulator-independent neural network description language, to configure the processor. NeuroFlow supports a number of commonly used current or conductance based neuronal models such as integrate-and-fire and Izhikevich models, and the spike-timing-dependent plasticity (STDP) rule for learning. A 6-FPGA system can simulate a network of up to ~600,000 neurons and can achieve a real-time performance of 400,000 neurons. Using one FPGA, NeuroFlow delivers a speedup of up to 33.6 times the speed of an 8-core processor, or 2.83 times the speed of GPU-based platforms. With high flexibility and throughput, NeuroFlow provides a viable environment for large-scale neural network simulation. PMID:26834542

  19. NeuroFlow: A General Purpose Spiking Neural Network Simulation Platform using Customizable Processors.

    PubMed

    Cheung, Kit; Schultz, Simon R; Luk, Wayne

    2015-01-01

    NeuroFlow is a scalable spiking neural network simulation platform for off-the-shelf high performance computing systems using customizable hardware processors such as Field-Programmable Gate Arrays (FPGAs). Unlike multi-core processors and application-specific integrated circuits, the processor architecture of NeuroFlow can be redesigned and reconfigured to suit a particular simulation to deliver optimized performance, such as the degree of parallelism to employ. The compilation process supports using PyNN, a simulator-independent neural network description language, to configure the processor. NeuroFlow supports a number of commonly used current or conductance based neuronal models such as integrate-and-fire and Izhikevich models, and the spike-timing-dependent plasticity (STDP) rule for learning. A 6-FPGA system can simulate a network of up to ~600,000 neurons and can achieve a real-time performance of 400,000 neurons. Using one FPGA, NeuroFlow delivers a speedup of up to 33.6 times the speed of an 8-core processor, or 2.83 times the speed of GPU-based platforms. With high flexibility and throughput, NeuroFlow provides a viable environment for large-scale neural network simulation. PMID:26834542

  20. Throughput-Oriented Multicore Processors

    NASA Astrophysics Data System (ADS)

    Laudon, James; Golla, Robert; Grohoski, Greg

    Many important commercial server applications are throughput-oriented. Chip multiprocessors (CMPs) are ideally suited to handle these workloads, as the multiple processors on the chip can independently service incoming requests. To date, most CMPs have been built using a small number of high-performance superscalar processor cores. However, the majority of commercial applications exhibit high cache miss rates, larger memory footprints, and low instruction-level parallelism, which leads to poor utilization on these CMPs. An alternative approach is to build a throughput-oriented, multithreaded CMP from a much larger number of simpler processor cores. This chapter explores the tradeoffs involved in building such a simple-core CMP. Two case studies, the Niagara and Niagara 2 CMPs from Sun Microsystems, are used to illustrate how simple-core CMPs are built in practice and how they compare to CMPs built from more traditional high-performance superscalar processor cores. The case studies show that simple-core CMPs can have a significant performance/watt advantage over complex-core CMPs.

  1. Parallel processing ITS

    SciTech Connect

    Fan, W.C.; Halbleib, J.A. Sr.

    1996-09-01

    This report provides a users` guide for parallel processing ITS on a UNIX workstation network, a shared-memory multiprocessor or a massively-parallel processor. The parallelized version of ITS is based on a master/slave model with message passing. Parallel issues such as random number generation, load balancing, and communication software are briefly discussed. Timing results for example problems are presented for demonstration purposes.

  2. MULTI-CORE AND OPTICAL PROCESSOR RELATED APPLICATIONS RESEARCH AT OAK RIDGE NATIONAL LABORATORY

    SciTech Connect

    Barhen, Jacob; Kerekes, Ryan A; ST Charles, Jesse Lee; Buckner, Mark A

    2008-01-01

    High-speed parallelization of common tasks holds great promise as a low-risk approach to achieving the significant increases in signal processing and computational performance required for next generation innovations in reconfigurable radio systems. Researchers at the Oak Ridge National Laboratory have been working on exploiting the parallelization offered by this emerging technology and applying it to a variety of problems. This paper will highlight recent experience with four different parallel processors applied to signal processing tasks that are directly relevant to signal processing required for SDR/CR waveforms. The first is the EnLight Optical Core Processor applied to matched filter (MF) correlation processing via fast Fourier transform (FFT) of broadband Dopplersensitive waveforms (DSW) using active sonar arrays for target tracking. The second is the IBM CELL Broadband Engine applied to 2-D discrete Fourier transform (DFT) kernel for image processing and frequency domain processing. And the third is the NVIDIA graphical processor applied to document feature clustering. EnLight Optical Core Processor. Optical processing is inherently capable of high-parallelism that can be translated to very high performance, low power dissipation computing. The EnLight 256 is a small form factor signal processing chip (5x5 cm2) with a digital optical core that is being developed by an Israeli startup company. As part of its evaluation of foreign technology, ORNL's Center for Engineering Science Advanced Research (CESAR) had access to a precursor EnLight 64 Alpha hardware for a preliminary assessment of capabilities in terms of large Fourier transforms for matched filter banks and on applications related to Doppler-sensitive waveforms. This processor is optimized for array operations, which it performs in fixed-point arithmetic at the rate of 16 TeraOPS at 8-bit precision. This is approximately 1000 times faster than the fastest DSP available today. The optical core performs the matrix-vector multiplications, where the nominal matrix size is 256x256. The system clock is 125MHz. At each clock cycle, 128K multiply-and-add operations per second (OPS) are carried out, which yields a peak performance of 16 TeraOPS. IBM Cell Broadband Engine. The Cell processor is the extraordinary resulting product of 5 years of sustained, intensive R&D collaboration (involving over $400M investment) between IBM, Sony, and Toshiba. Its architecture comprises one multithreaded 64-bit PowerPC processor element (PPE) with VMX capabilities and two levels of globally coherent cache, and 8 synergistic processor elements (SPEs). Each SPE consists of a processor (SPU) designed for streaming workloads, local memory, and a globally coherent direct memory access (DMA) engine. Computations are performed in 128-bit wide single instruction multiple data streams (SIMD). An integrated high-bandwidth element interconnect bus (EIB) connects the nine processors and their ports to external memory and to system I/O. The Applied Software Engineering Research (ASER) Group at the ORNL is applying the Cell to a variety of text and image analysis applications. Research on Cell-equipped PlayStation3 (PS3) consoles has led to the development of a correlation-based image recognition engine that enables a single PS3 to process images at more than 10X the speed of state-of-the-art single-core processors. NVIDIA Graphics Processing Units. The ASER group is also employing the latest NVIDIA graphical processing units (GPUs) to accelerate clustering of thousands of text documents using recently developed clustering algorithms such as document flocking and affinity propagation.

  3. Final Report, Center for Programming Models for Scalable Parallel Computing: Co-Array Fortran, Grant Number DE-FC02-01ER25505

    SciTech Connect

    Robert W. Numrich

    2008-04-22

    The major accomplishment of this project is the production of CafLib, an 'object-oriented' parallel numerical library written in Co-Array Fortran. CafLib contains distributed objects such as block vectors and block matrices along with procedures, attached to each object, that perform basic linear algebra operations such as matrix multiplication, matrix transpose and LU decomposition. It also contains constructors and destructors for each object that hide the details of data decomposition from the programmer, and it contains collective operations that allow the programmer to calculate global reductions, such as global sums, global minima and global maxima, as well as vector and matrix norms of several kinds. CafLib is designed to be extensible in such a way that programmers can define distributed grid and field objects, based on vector and matrix objects from the library, for finite difference algorithms to solve partial differential equations. A very important extra benefit that resulted from the project is the inclusion of the co-array programming model in the next Fortran standard called Fortran 2008. It is the first parallel programming model ever included as a standard part of the language. Co-arrays will be a supported feature in all Fortran compilers, and the portability provided by standardization will encourage a large number of programmers to adopt it for new parallel application development. The combination of object-oriented programming in Fortran 2003 with co-arrays in Fortran 2008 provides a very powerful programming model for high-performance scientific computing. Additional benefits from the project, beyond the original goal, include a programto provide access to the co-array model through access to the Cray compiler as a resource for teaching and research. Several academics, for the first time, included the co-array model as a topic in their courses on parallel computing. A separate collaborative project with LANL and PNNL showed how to extend the co-array model to other languages in a small experimental version of Co-array Python. Another collaborative project defined a Fortran 95 interface to ARMCI to encourage Fortran programmers to use the one-sided communication model in anticipation of their conversion to the co-array model later. A collaborative project with the Earth Sciences community at NASA Goddard and GFDL experimented with the co-array model within computational kernels related to their climate models, first using CafLib and then extending the co-array model to use design patterns. Future work will build on the design-pattern idea with a redesign of CafLib as a true object-oriented library using Fortran 2003 and as a parallel numerical library using Fortran 2008.

  4. FFT Computation with Systolic Arrays, A New Architecture

    NASA Technical Reports Server (NTRS)

    Boriakoff, Valentin

    1994-01-01

    The use of the Cooley-Tukey algorithm for computing the l-d FFT lends itself to a particular matrix factorization which suggests direct implementation by linearly-connected systolic arrays. Here we present a new systolic architecture that embodies this algorithm. This implementation requires a smaller number of processors and a smaller number of memory cells than other recent implementations, as well as having all the advantages of systolic arrays. For the implementation of the decimation-in-frequency case, word-serial data input allows continuous real-time operation without the need of a serial-to-parallel conversion device. No control or data stream switching is necessary. Computer simulation of this architecture was done in the context of a 1024 point DFT with a fixed point processor, and CMOS processor implementation has started.

  5. Architectural approaches for multimedia processors

    NASA Astrophysics Data System (ADS)

    Pirsch, Peter; Freimann, Achim; Berekovic, Mladen

    1997-01-01

    This paper presents an overview on architectures for multimedia purposes. Emphasis is given on flexible, programmable processors to enable processing of different standardized or proprietary multimedia applications. Several parallelization strategies to enhance performance especially for video coding are described. This includes architectures like SIMD, MIMD and associative controlling. Exploitation of instruction-level parallelism by use of techniques like VLIW and packed-arithmetic extends this discussion. Reference to design examples from the literature is given. To help develop cost-effective architectures for a set of applications, two methods for modeling hardware and algorithms are explained. Instrumentation of algorithms implemented in software is discussed as a method to determine characteristics and features of given algorithms. Additionally, a more general approach is presented that analyzes different parallelization potentials for a class of algorithms. These are mapped on a simple hardware model using only a few parameters for performance evaluation. Limitations of software instrumentation and the presented modeling approach are discussed.

  6. Sandia secure processor : a native Java processor.

    SciTech Connect

    Wickstrom, Gregory Lloyd; Gale, Jason Carl; Ma, Kwok Kee

    2003-08-01

    The Sandia Secure Processor (SSP) is a new native Java processor that has been specifically designed for embedded applications. The SSP's design is a system composed of a core Java processor that directly executes Java bytecodes, on-chip intelligent IO modules, and a suite of software tools for simulation and compiling executable binary files. The SSP is unique in that it provides a way to control real-time IO modules for embedded applications. The system software for the SSP is a 'class loader' that takes Java .class files (created with your favorite Java compiler), links them together, and compiles a binary. The complete SSP system provides very powerful functionality with very light hardware requirements with the potential to be used in a wide variety of small-system embedded applications. This paper gives a detail description of the Sandia Secure Processor and its unique features.

  7. Parallel architectures for iterative methods on adaptive, block structured grids

    NASA Technical Reports Server (NTRS)

    Gannon, D.; Vanrosendale, J.

    1983-01-01

    A parallel computer architecture well suited to the solution of partial differential equations in complicated geometries is proposed. Algorithms for partial differential equations contain a great deal of parallelism. But this parallelism can be difficult to exploit, particularly on complex problems. One approach to extraction of this parallelism is the use of special purpose architectures tuned to a given problem class. The architecture proposed here is tuned to boundary value problems on complex domains. An adaptive elliptic algorithm which maps effectively onto the proposed architecture is considered in detail. Two levels of parallelism are exploited by the proposed architecture. First, by making use of the freedom one has in grid generation, one can construct grids which are locally regular, permitting a one to one mapping of grids to systolic style processor arrays, at least over small regions. All local parallelism can be extracted by this approach. Second, though there may be a regular global structure to the grids constructed, there will be parallelism at this level. One approach to finding and exploiting this parallelism is to use an architecture having a number of processor clusters connected by a switching network. The use of such a network creates a highly flexible architecture which automatically configures to the problem being solved.

  8. Periodic parallel array of nanopillars and nanoholes resulting from colloidal stripes patterned by geometrically confined evaporative self-assembly for unique anisotropic wetting.

    PubMed

    Li, Xiangmeng; Wang, Chunhui; Shao, Jinyou; Ding, Yucheng; Tian, Hongmiao; Li, Xiangming; Wang, Li

    2014-11-26

    In this paper we present an economical process to create anisotropic microtextures based on periodic parallel stripes of monolayer silica nanoparticles (NPs) patterned by geometrically confined evaporative self-assembly (GCESA). In the GCESA process, a straight meniscus of a colloidal dispersion is initially formed in an opened enclosure, which is composed of two parallel plates bounded by a U-shaped spacer sidewall on three sides with an evaporating outlet on the fourth side. Lateral evaporation of the colloidal dispersion leads to periodic "stick-slip" receding of the meniscus (evaporative front), as triggered by the "coffee-ring" effect, promoting the assembly of silica NPs into periodic parallel stripes. The morphology of stripes can be well controlled by tailoring process variables such as substrate wettability, NP concentration, temperature, and gap height, etc. Furthermore, arrayed patterns of nanopillars or nanoholes are generated on a silicon wafer using the as-prepared colloidal stripes as an etching mask or template. Such arrayed patterns can reveal unique anisotropic wetting properties, which have a large contact angle hysteresis viewing from both the parallel and perpendicular directions in addition to a large wetting anisotropy. PMID:25353399

  9. Magnetic arrays

    DOEpatents

    Trumper, David L. (Plaistow, NH); Kim, Won-jong (Cambridge, MA); Williams, Mark E. (Pelham, NH)

    1997-05-20

    Electromagnet arrays which can provide selected field patterns in either two or three dimensions, and in particular, which can provide single-sided field patterns in two or three dimensions. These features are achieved by providing arrays which have current densities that vary in the windings both parallel to the array and in the direction of array thickness.

  10. Magnetic arrays

    DOEpatents

    Trumper, D.L.; Kim, W.; Williams, M.E.

    1997-05-20

    Electromagnet arrays are disclosed which can provide selected field patterns in either two or three dimensions, and in particular, which can provide single-sided field patterns in two or three dimensions. These features are achieved by providing arrays which have current densities that vary in the windings both parallel to the array and in the direction of array thickness. 12 figs.

  11. DFVLAR's intelligent SAR-processor - ISAR

    NASA Astrophysics Data System (ADS)

    Noack, W.; Runge, H.

    The fact that future SAR sensors like ERS-1 and X-SAR will be operational systems requires a processor system design which is significantly different from existing SAR correlators. Future systems require highest throughput and reliability. In addition, more attention must be paid to the user community needs in terms of various product levels and adequate production and organization schemes. This paper presents the design of the ISAR system which is identified by a distributed processor architecture using a high speed array processor, enhanced by a two-dimensional accessible memory, a front-end processor and a knowledge engineering workstation. An expert system will support a human system operator for the mass production of SAR images and the detection and correction of system malfunctions. As a result the system will be accessible and comprehensive for both experts and operators.

  12. Transitive closure on the imagine stream processor

    SciTech Connect

    Griem, Gorden; Oliker, Leonid

    2003-11-11

    The increasing gap between processor and memory speeds is a well-known problem in modern computer architecture. The Imagine system is designed to address the processor-memory gap through streaming technology. Stream processors are best-suited for computationally intensive applications characterized by high data parallelism and producer-consumer locality with minimal data dependencies. This work examines an efficient streaming implementation of the computationally intensive Transitive Closure (TC) algorithm on the Imagine platform. We develop a tiled TC algorithm specifically for the Imagine environment, which efficiently reuses streams to minimize expensive off-chip data transfers. The implementation requires complex stream programming since the memory hierarchy and cluster organization of the underlying architecture are exposed to the Imagine programmer. Results demonstrate that limited performance of TC is achieved primarily due to the complicated data-dependencies of the blocked algorithm. This work is an ongoing effort to identify classes of scientific problems well-suited for streaming processors.

  13. Graph-Based Dynamic Assignment Of Multiple Processors

    NASA Technical Reports Server (NTRS)

    Hayes, Paul J.; Andrews, Asa M.

    1994-01-01

    Algorithm-to-architecture mapping model (ATAMM) is strategy minimizing time needed to periodically execute graphically described, data-driven application algorithm on multiple data processors. Implemented as operating system managing flow of data and dynamically assigns nodes of graph to processors. Predicts throughput versus number of processors available to execute given application algorithm. Includes rules ensuring application algorithm represented by graph executed periodically without deadlock and in shortest possible repetition time. ATAMM proves useful in maximizing effectiveness of parallel computing systems.

  14. Processor-Group Aware Runtime Support for Shared-and Global-Address Space Models

    SciTech Connect

    Krishnan, Manoj Kumar; Tipparaju, Vinod; Palmer, Bruce; Nieplocha, Jarek

    2004-12-07

    Exploiting multilevel parallelism using processor groups is becoming increasingly important for programming on high-end systems. This paper describes a group-aware run-time support for shared-/global- address space programming models. The current effort has been undertaken in the context of the Aggregate Remote Memory Copy Interface (ARMCI) [5], a portable runtime system used as a communication layer for Global Arrays [6], Co-Array Fortran (CAF) [9], GPSHMEM [10], Co-Array Python [11], and also end-user applications. The paper describes the management of shared memory, integration of shared memory communication and RDMA on clusters with SMP nodes, and registration. These are all required for efficient multi- method and multi-protocol communication on modern systems. Focus is placed on techniques for supporting process groups while maximizing communication performance and efficiently managing global memory system-wide.

  15. Parallel algorithms for mapping pipelined and parallel computations

    NASA Technical Reports Server (NTRS)

    Nicol, David M.

    1988-01-01

    Many computational problems in image processing, signal processing, and scientific computing are naturally structured for either pipelined or parallel computation. When mapping such problems onto a parallel architecture it is often necessary to aggregate an obvious problem decomposition. Even in this context the general mapping problem is known to be computationally intractable, but recent advances have been made in identifying classes of problems and architectures for which optimal solutions can be found in polynomial time. Among these, the mapping of pipelined or parallel computations onto linear array, shared memory, and host-satellite systems figures prominently. This paper extends that work first by showing how to improve existing serial mapping algorithms. These improvements have significantly lower time and space complexities: in one case a published O(nm sup 3) time algorithm for mapping m modules onto n processors is reduced to an O(nm log m) time complexity, and its space requirements reduced from O(nm sup 2) to O(m). Run time complexity is further reduced with parallel mapping algorithms based on these improvements, which run on the architecture for which they create the mappings.

  16. Unstructured Adaptive Grid Computations on an Array of SMPs

    NASA Technical Reports Server (NTRS)

    Biswas, Rupak; Pramanick, Ira; Sohn, Andrew; Simon, Horst D.

    1996-01-01

    Dynamic load balancing is necessary for parallel adaptive methods to solve unsteady CFD problems on unstructured grids. We have presented such a dynamic load balancing framework called JOVE, in this paper. Results on a four-POWERnode POWER CHALLENGEarray demonstrated that load balancing gives significant performance improvements over no load balancing for such adaptive computations. The parallel speedup of JOVE, implemented using MPI on the POWER CHALLENCEarray, was significant, being as high as 31 for 32 processors. An implementation of JOVE that exploits 'an array of SMPS' architecture was also studied; this hybrid JOVE outperformed flat JOVE by up to 28% on the meshes and adaption models tested. With large, realistic meshes and actual flow-solver and adaption phases incorporated into JOVE, hybrid JOVE can be expected to yield significant advantage over flat JOVE, especially as the number of processors is increased, thus demonstrating the scalability of an array of SMPs architecture.

  17. Rapid geodesic mapping of brain functional connectivity: implementation of a dedicated co-processor in a field-programmable gate array (FPGA) and application to resting state functional MRI.

    PubMed

    Minati, Ludovico; Cercignani, Mara; Chan, Dennis

    2013-10-01

    Graph theory-based analyses of brain network topology can be used to model the spatiotemporal correlations in neural activity detected through fMRI, and such approaches have wide-ranging potential, from detection of alterations in preclinical Alzheimer's disease through to command identification in brain-machine interfaces. However, due to prohibitive computational costs, graph-based analyses to date have principally focused on measuring connection density rather than mapping the topological architecture in full by exhaustive shortest-path determination. This paper outlines a solution to this problem through parallel implementation of Dijkstra's algorithm in programmable logic. The processor design is optimized for large, sparse graphs and provided in full as synthesizable VHDL code. An acceleration factor between 15 and 18 is obtained on a representative resting-state fMRI dataset, and maps of Euclidean path length reveal the anticipated heterogeneous cortical involvement in long-range integrative processing. These results enable high-resolution geodesic connectivity mapping for resting-state fMRI in patient populations and real-time geodesic mapping to support identification of imagined actions for fMRI-based brain-machine interfaces. PMID:23746911

  18. Architectures for reasoning in parallel

    NASA Technical Reports Server (NTRS)

    Hall, Lawrence O.

    1989-01-01

    The research conducted has dealt with rule-based expert systems. The algorithms that may lead to effective parallelization of them were investigated. Both the forward and backward chained control paradigms were investigated in the course of this work. The best computer architecture for the developed and investigated algorithms has been researched. Two experimental vehicles were developed to facilitate this research. They are Backpac, a parallel backward chained rule-based reasoning system and Datapac, a parallel forward chained rule-based reasoning system. Both systems have been written in Multilisp, a version of Lisp which contains the parallel construct, future. Applying the future function to a function causes the function to become a task parallel to the spawning task. Additionally, Backpac and Datapac have been run on several disparate parallel processors. The machines are an Encore Multimax with 10 processors, the Concert Multiprocessor with 64 processors, and a 32 processor BBN GP1000. Both the Concert and the GP1000 are switch-based machines. The Multimax has all its processors hung off a common bus. All are shared memory machines, but have different schemes for sharing the memory and different locales for the shared memory. The main results of the investigations come from experiments on the 10 processor Encore and the Concert with partitions of 32 or less processors. Additionally, experiments have been run with a stripped down version of EMYCIN.

  19. A novel picoliter droplet array for parallel real-time polymerase chain reaction based on double-inkjet printing.

    PubMed

    Sun, Yingnan; Zhou, Xiaoguang; Yu, Yude

    2014-09-21

    We developed and characterized a novel picoliter droplet-in-oil array generated by a double-inkjet printing method on a uniform hydrophobic silicon chip specifically designed for quantitative polymerase chain reaction (qPCR) analysis. Double-inkjet printing was proposed to efficiently address the evaporation issues of picoliter droplets during array generation on a planar substrate without the assistance of a humidifier or glycerol. The method utilizes piezoelectric inkjet printing equipment to precisely eject a reagent droplet into an oil droplet, which had first been dispensed on a hydrophobic and oleophobic substrate. No evaporation, random movement, or cross-contamination was observed during array fabrication and thermal cycling. We demonstrated the feasibility and effectiveness of this novel double-inkjet method for real-time PCR analysis. This method can readily produce multivolume droplet-in-oil arrays with volume variations ranging from picoliters to nanoliters. This feature would be useful for simultaneous multivolume PCR experiments aimed at wide and tunable dynamic ranges. These double-inkjet-based picoliter droplet arrays may have potential for multiplexed applications that require isolated containers for single-cell cultures, single molecular enzymatic assays, or digital PCR and provide an alternative option for generating droplet arrays on planar substrates without chemical patterning. PMID:25070461

  20. An Experimental Digital Image Processor

    NASA Astrophysics Data System (ADS)

    Cok, Ronald S.

    1986-12-01

    A prototype digital image processor for enhancing photographic images has been built in the Research Laboratories at Kodak. This image processor implements a particular version of each of the following algorithms: photographic grain and noise removal, edge sharpening, multidimensional image-segmentation, image-tone reproduction adjustment, and image-color saturation adjustment. All processing, except for segmentation and analysis, is performed by massively parallel and pipelined special-purpose hardware. This hardware runs at 10 MHz and can be adjusted to handle any size digital image. The segmentation circuits run at 30 MHz. The segmentation data are used by three single-board computers for calculating the tonescale adjustment curves. The system, as a whole, has the capability of completely processing 10 million three-color pixels per second. The grain removal and edge enhancement algorithms represent the largest part of the pipelined hardware, operating at over 8 billion integer operations per second. The edge enhancement is performed by unsharp masking, and the grain removal is done using a collapsed Walsh-hadamard transform filtering technique (U.S. Patent No. 4549212). These two algo-rithms can be realized using four basic processing elements, some of which have been imple-mented as VLSI semicustom integrated circuits. These circuits implement the algorithms with a high degree of efficiency, modularity, and testability. The digital processor is controlled by a Digital Equipment Corporation (DEC) PDP 11 minicomputer and can be interfaced to electronic printing and/or electronic scanning de-vices. The processor has been used to process over a thousand diagnostic images.

  1. Broadcasting collective operation contributions throughout a parallel computer

    SciTech Connect

    Faraj, Ahmad

    2012-02-21

    Methods, systems, and products are disclosed for broadcasting collective operation contributions throughout a parallel computer. The parallel computer includes a plurality of compute nodes connected together through a data communications network. Each compute node has a plurality of processors for use in collective parallel operations on the parallel computer. Broadcasting collective operation contributions throughout a parallel computer according to embodiments of the present invention includes: transmitting, by each processor on each compute node, that processor's collective operation contribution to the other processors on that compute node using intra-node communications; and transmitting on a designated network link, by each processor on each compute node according to a serial processor transmission sequence, that processor's collective operation contribution to the other processors on the other compute nodes using inter-node communications.

  2. Adjunct processors in embedded medical imaging systems

    NASA Astrophysics Data System (ADS)

    Trepanier, Marc; Goddard, Iain

    2002-05-01

    Adjunct processors have traditionally been used for certain tasks in medical imaging systems. Often based on application-specific integrated circuits (ASICs), these processors formed X-ray image-processing pipelines or constituted the backprojectors in computed tomography (CT) systems. We examine appropriate functions to perform with adjunct processing and draw some conclusions about system design trade-offs. These trade-offs have traditionally focused on the required performance and flexibility of individual system components, with increasing emphasis on time-to-market impact. Typically, front-end processing close to the sensor has the most intensive processing requirements. However, the performance capabilities of each level are dynamic and the system architect must keep abreast of the current capabilities of all options to remain competitive. Designers are searching for the most efficient implementation of their particular system requirements. We cite algorithm characteristics that point to effective solutions by adjunct processors. We have developed a field- programmable gate array (FPGA) adjunct-processor solution for a Cone-Beam Reconstruction (CBR) algorithm that offers significant performance improvements over a general-purpose processor implementation. The same hardware could efficiently perform other image processing functions such as two-dimensional (2D) convolution. The potential performance, price, operating power, and flexibility advantages of an FPGA adjunct processor over an ASIC, DSP or general-purpose processing solutions are compelling.

  3. Architecture and data processing alternatives for the TSE computer. Volume 3: Execution of a parallel counting algorithm using array logic (Tse) devices

    NASA Technical Reports Server (NTRS)

    Metcalfe, A. G.; Bodenheimer, R. E.

    1976-01-01

    A parallel algorithm for counting the number of logic-l elements in a binary array or image developed during preliminary investigation of the Tse concept is described. The counting algorithm is implemented using a basic combinational structure. Modifications which improve the efficiency of the basic structure are also presented. A programmable Tse computer structure is proposed, along with a hardware control unit, Tse instruction set, and software program for execution of the counting algorithm. Finally, a comparison is made between the different structures in terms of their more important characteristics.

  4. Online track processor for the CDF upgrade

    SciTech Connect

    Ciobanu, C.; Gertenslager, J.; Hoftiezer, J.

    1999-08-01

    A trigger track processor is being designed for the CDF upgrade. This processor identifies high momentum (P{sub T} > 1.5 GeV/c) charged tracks in the new central outer tracking chamber for CDF II. The track processor is called the Extremely Fast Tracker (XFT). The XFT design is highly parallel to handle the input rate of 183 Gbits/sec and output rate of 44 Gbits/sec. The processor is pipelined and reports the results for a new event every 132 ns. The processor uses three stages, hit classification, segment finding, and segment linking. The pattern recognition algorithms for the three stages are implemented in programmable logic devices (PLDs) which allow for in-situ modification of the algorithm at any time. The PLDs reside on three different types of modules. Prototypes of each of these modules have been designed and built, and are presently undergoing testing. An overview of the track processor and results of testing are presented.

  5. FPGA realization of a split radix FFT processor

    NASA Astrophysics Data System (ADS)

    Garca, Jess; Michell, Juan A.; Ruiz, Gustavo; Burn, Angel M.

    2007-05-01

    Applications based on Fast Fourier Transform (FFT) such as signal and image processing require high computational power, plus the ability to choose the algorithm and architecture to implement it. This paper explains the realization of a Split Radix FFT (SRFFT) processor based on a pipeline architecture reported before by the same authors. This architecture has as basic building blocks a Complex Butterfly and a Delay Commutator. The main advantages of this architecture are: * To combine the higher parallelism of the 4r-FFTs and the possibility of processing sequences having length of any power of two. * The simultaneous operation of multipliers and adder-subtracters implicit in the SRFFT, which leads to faster operation at the same degree of pipeline. The implementation has been made on a Field Programmable Gate Array (FPGA) as a way of obtaining high performance at economical price and a short time of realization. The Delay Commutator has been designed to be customized for even and odd SRFFT computation levels. It can be used with segmented arithmetic of any level of pipeline in order to speed up the operating frequency. The processor has been simulated up to 350 MHz, with an EP2S15F672C3 Altera Stratix II as a target device, for a transform length of 256 complex points.

  6. Parallel asynchronous hardware implementation of image processing algorithms

    NASA Technical Reports Server (NTRS)

    Coon, Darryl D.; Perera, A. G. U.

    1990-01-01

    Research is being carried out on hardware for a new approach to focal plane processing. The hardware involves silicon injection mode devices. These devices provide a natural basis for parallel asynchronous focal plane image preprocessing. The simplicity and novel properties of the devices would permit an independent analog processing channel to be dedicated to every pixel. A laminar architecture built from arrays of the devices would form a two-dimensional (2-D) array processor with a 2-D array of inputs located directly behind a focal plane detector array. A 2-D image data stream would propagate in neuron-like asynchronous pulse-coded form through the laminar processor. No multiplexing, digitization, or serial processing would occur in the preprocessing state. High performance is expected, based on pulse coding of input currents down to one picoampere with noise referred to input of about 10 femtoamperes. Linear pulse coding has been observed for input currents ranging up to seven orders of magnitude. Low power requirements suggest utility in space and in conjunction with very large arrays. Very low dark current and multispectral capability are possible because of hardware compatibility with the cryogenic environment of high performance detector arrays. The aforementioned hardware development effort is aimed at systems which would integrate image acquisition and image processing.

  7. SPROC: A multiple-processor DSP IC

    NASA Technical Reports Server (NTRS)

    Davis, R.

    1991-01-01

    A large, single-chip, multiple-processor, digital signal processing (DSP) integrated circuit (IC) fabricated in HP-Cmos34 is presented. The innovative architecture is best suited for analog and real-time systems characterized by both parallel signal data flows and concurrent logic processing. The IC is supported by a powerful development system that transforms graphical signal flow graphs into production-ready systems in minutes. Automatic compiler partitioning of tasks among four on-chip processors gives the IC the signal processing power of several conventional DSP chips.

  8. Switch for serial or parallel communication networks

    DOEpatents

    Crosette, Dario B.

    1994-01-01

    A communication switch apparatus and a method for use in a geographically extensive serial, parallel or hybrid communication network linking a multi-processor or parallel processing system has a very low software processing overhead in order to accommodate random burst of high density data. Associated with each processor is a communication switch. A data source and a data destination, a sensor suite or robot for example, may also be associated with a switch. The configuration of the switches in the network are coordinated through a master processor node and depends on the operational phase of the multi-processor network: data acquisition, data processing, and data exchange. The master processor node passes information on the state to be assumed by each switch to the processor node associated with the switch. The processor node then operates a series of multi-state switches internal to each communication switch. The communication switch does not parse and interpret communication protocol and message routing information. During a data acquisition phase, the communication switch couples sensors producing data to the processor node associated with the switch, to a downlink destination on the communications network, or to both. It also may couple an uplink data source to its processor node. During the data exchange phase, the switch couples its processor node or an uplink data source to a downlink destination (which may include a processor node or a robot), or couples an uplink source to its processor node and its processor node to a downlink destination.

  9. Switch for serial or parallel communication networks

    DOEpatents

    Crosette, D.B.

    1994-07-19

    A communication switch apparatus and a method for use in a geographically extensive serial, parallel or hybrid communication network linking a multi-processor or parallel processing system has a very low software processing overhead in order to accommodate random burst of high density data. Associated with each processor is a communication switch. A data source and a data destination, a sensor suite or robot for example, may also be associated with a switch. The configuration of the switches in the network are coordinated through a master processor node and depends on the operational phase of the multi-processor network: data acquisition, data processing, and data exchange. The master processor node passes information on the state to be assumed by each switch to the processor node associated with the switch. The processor node then operates a series of multi-state switches internal to each communication switch. The communication switch does not parse and interpret communication protocol and message routing information. During a data acquisition phase, the communication switch couples sensors producing data to the processor node associated with the switch, to a downlink destination on the communications network, or to both. It also may couple an uplink data source to its processor node. During the data exchange phase, the switch couples its processor node or an uplink data source to a downlink destination (which may include a processor node or a robot), or couples an uplink source to its processor node and its processor node to a downlink destination. 9 figs.

  10. Adaptive fusion processor

    NASA Astrophysics Data System (ADS)

    Dasarathy, Belur V.

    1995-07-01

    An adaptive learning fusion processor, capable of fusion of a mix of information at the data, feature, and decision levels, acquired from multiple sources (sensors as well as feature extractors and/or decision processors) is presented. Four alternative approaches: a self- partitioning neural net, an adaptive fusion process, an evidential reasoning approach, and a concurrence seeking approach were initially evaluated from a conceptual viewpoint followed by some limited simulation and testing. Based on this assessment, an adaptive fusion processor employing innovative advances of the nearest neighbor concept was selected for detailed implementation and testing using real-world field data. Results show the benefits of fusion in terms of improved performance as compared to those obtainable from the individual component information streams being input to the fusion processor and clearly bring out the feasibility and effectiveness of the new multi-level fusion concepts.

  11. Approximate programmable quantum processors

    SciTech Connect

    Hillery, Mark; Ziman, Mario; Buzek, Vladimir

    2006-02-15

    A quantum processor is a programmable quantum circuit in which both the data and the program, which specifies the operation that is carried out on the data, are quantum states. We study the situation in which we want to use such a processor to approximate a set of unitary operators to a specified level of precision. We measure how well an operation is performed by the process fidelity between the desired operation and the operation produced by the processor. We show how to find the program for a given processor that produces the best approximation of a particular unitary operation. We also place bounds on the dimension of the program space that is necessary to approximate a set of unitary operators to a specified level of precision.

  12. Liquid sample processor

    NASA Technical Reports Server (NTRS)

    Jahnsen, V. J.; Campen, C. F., Jr.

    1975-01-01

    Processor is automatic and includes series of extraction tubes packed with fibrous absorbent material of large surface area. When introduced into these tubes, liquid test samples become completely absorbed by packing material as thin film.

  13. Reconfigurable computer array: The bridge between high speed sensors and low speed computing

    SciTech Connect

    Robinson, S.H.; Caffrey, M.P.; Dunham, M.E.

    1998-06-16

    A universal limitation of RF and imaging front-end sensors is that they easily produce data at a higher rate than any general-purpose computer can continuously handle. Therefore, Los Alamos National Laboratory has developed a custom Reconfigurable Computing Array board to support a large variety of processing applications including wideband RF signals, LIDAR and multi-dimensional imaging. The boards design exploits three key features to achieve its performance. First, there are large banks of fast memory dedicated to each reconfigurable processor and also shared between pairs of processors. Second, there are dedicated data paths between processors, and from a processor to flexible I/O interfaces. Third, the design provides the ability to link multiple boards into a serial and/or parallel structure.

  14. Comparison of simulated parallel transmit body arrays at 3 T using excitation uniformity, global SAR, local SAR and power efficiency metrics

    PubMed Central

    Gurin, Bastien; Gebhardt, Matthias; Serano, Peter; Adalsteinsson, Elfar; Hamm, Michael; Pfeuffer, Josef; Nistler, Juergen; Wald, Lawrence L.

    2014-01-01

    Purpose We compare the performance of 8 parallel transmit (pTx) body arrays with up to 32 channels and a standard birdcage design. Excitation uniformity, local SAR, global SAR and power metrics are analyzed in the torso at 3 T for RF-shimming and 2-spoke excitations. Methods We used a fast co-simulation strategy for field calculation in the presence of coupling between transmit channels. We designed spoke pulses using magnitude least squares (MLS) optimization with explicit constraint of SAR and power and compared the performance of the different pTx coils using the L-curve method. Results PTx arrays outperformed the conventional birdcage coil in all metrics except peak and average power efficiency. The presence of coupling exacerbated this power efficiency problem. At constant excitation fidelity, the pTx array with 24 channels arranged in 3 z-rows could decrease local SAR more than 4-fold (2-fold) for RF-shimming (2-spoke) compared to the birdcage coil for pulses of equal duration. Multi-row pTx coils had a marked performance advantage compared to single row designs, especially for coronal imaging. Conclusion PTx coils can simultaneously improve the excitation uniformity and reduce SAR compared to a birdcage coil when SAR metrics are explicitly constrained in the pulse design. PMID:24752979

  15. Non-numerical methods on parallel computers

    NASA Astrophysics Data System (ADS)

    Flanders, P. M.

    1982-06-01

    Analysis of computation on parallel computers reveals an interleaving of local (often numeric) processing and data re-organisation. The local processing is readily handled since it is contained within individual processors and easily expressed in terms of element-by-element operations on whole arrays. The intervening data re-organisation accounts for most of the complexity and interest in parallel processing; it is important that operations and techniques are developed for these non-numeric tasks which permit the natural and concise description of algorithms and are readily implemented on parallel hardware. Basic techniques for data organisation and movement are described and illustrated in some numeric and non-numeric problems. Various aspects of matching problems on to arrays of parallel hardware, such as the ICL DAP, are considered. An approach is outlined whereby more sophisticated solutions tocertain problems, such as the fast Fourier transform and sorting, are obtained by working with a specification of the mapping of data on to the store rather than with the physical data organisation.

  16. Programmable DNA-Mediated Multitasking Processor.

    PubMed

    Shu, Jian-Jun; Wang, Qi-Wen; Yong, Kian-Yan; Shao, Fangwei; Lee, Kee Jin

    2015-04-30

    Because of DNA appealing features as perfect material, including minuscule size, defined structural repeat and rigidity, programmable DNA-mediated processing is a promising computing paradigm, which employs DNAs as information storing and processing substrates to tackle the computational problems. The massive parallelism of DNA hybridization exhibits transcendent potential to improve multitasking capabilities and yield a tremendous speed-up over the conventional electronic processors with stepwise signal cascade. As an example of multitasking capability, we present an in vitro programmable DNA-mediated optimal route planning processor as a functional unit embedded in contemporary navigation systems. The novel programmable DNA-mediated processor has several advantages over the existing silicon-mediated methods, such as conducting massive data storage and simultaneous processing via much fewer materials than conventional silicon devices. PMID:25874653

  17. Onboard processor technology review

    NASA Technical Reports Server (NTRS)

    Benz, Harry F.

    1990-01-01

    The general need and requirements for the onboard embedded processors necessary to control and manipulate data in spacecraft systems are discussed. The current known requirements are reviewed from a user perspective, based on current practices in the spacecraft development process. The current capabilities of available processor technologies are then discussed, and these are projected to the generation of spacecraft computers currently under identified, funded development. An appraisal is provided for the current national developmental effort.

  18. Gang scheduling a parallel machine

    SciTech Connect

    Gorda, B.C.; Brooks, E.D. III.

    1991-03-01

    Program development on parallel machines can be a nightmare of scheduling headaches. We have developed a portable time sharing mechanism to handle the problem of scheduling gangs of processors. User program and their gangs of processors are put to sleep and awakened by the gang scheduler to provide a time sharing environment. Time quantums are adjusted according to priority queues and a system of fair share accounting. The initial platform for this software is the 128 processor BBN TC2000 in use in the Massively Parallel Computing Initiative at the Lawrence Livermore National Laboratory. 2 refs., 1 fig.

  19. A structural approach to the photonic processor

    NASA Astrophysics Data System (ADS)

    Jackson, Deborah

    In the early 1990, photonics, the confluence of electronics, and optics technologies to improve net processing efficiency was advanced to the highest priority ranking on the DoD critical technologies list. Currently, photonics is considered a high-leverage technology because it is believed that photonic processors could potentially circumvent the serial processor limitation, or von Neuman bottleneck, which limits the throughput capacity of most electronic processors. Indeed, the realtime solutions to currently military problems, such as high-accurate missile guidance, sensor fusion, automatic target recognition, automated guidance of remotely piloted vehicles, etc., are consistently crippled by information processing bottlenecks. Such bottlenecks are particularly endemic to image-formatted data bases. An image-formatted data base is defined as a data base where, besides the information contained in each pixel, there is also information imparted by the spatial relationship among the data in the pixels. Thus, in image data, variations in grey scale are used to define edges and corners. To extract the spatially imparted information, it is often necessary to compare N x N pixels in the input image with the N x N pixels in a model image; this process takes N exp 4 comparison calculations. As the demand for higher resolution imagery increases and N gets larger, it becomes increasingly more difficult to make the image comparisons in realtime. Currently, digital electronic processor designs are optimized for numerical processing, which is an intrinsically serial operation. It is this serial nature that causes the limitation; the photonic processor, which can be designed with a more parallel architecture, has potential for circumventing this bottleneck. It is, therefore, anticipated that the intrinsic parallelism of optics will enable the photonic processor to solve problems in realtime that were previously considered unsolvable or only marginally solvable.

  20. Parallel nearest neighbor calculations

    NASA Astrophysics Data System (ADS)

    Trease, Harold

    We are just starting to parallelize the nearest neighbor portion of our free-Lagrange code. Our implementation of the nearest neighbor reconnection algorithm has not been parallelizable (i.e., we just flip one connection at a time). In this paper we consider what sort of nearest neighbor algorithms lend themselves to being parallelized. For example, the construction of the Voronoi mesh can be parallelized, but the construction of the Delaunay mesh (dual to the Voronoi mesh) cannot because of degenerate connections. We will show our most recent attempt to tessellate space with triangles or tetrahedrons with a new nearest neighbor construction algorithm called DAM (Dial-A-Mesh). This method has the characteristics of a parallel algorithm and produces a better tessellation of space than the Delaunay mesh. Parallel processing is becoming an everyday reality for us at Los Alamos. Our current production machines are Cray YMPs with 8 processors that can run independently or combined to work on one job. We are also exploring massive parallelism through the use of two 64K processor Connection Machines (CM2), where all the processors run in lock step mode. The effective application of 3-D computer models requires the use of parallel processing to achieve reasonable "turn around" times for our calculations.

  1. Methodology for the qualitative screening of parallel arrays of potential Am3+ ligands using a photographic film.

    PubMed

    Dam, Henk H; Tomasberger, Tanja; Reinhoudt, David N; Verboom, Willem

    2009-03-01

    A screening method for parallel Am(3+) ligand libraries is presented. The method makes use of alpha-radiation in combination with a photographic film to detect the complexed Am(3+). After screening and development of the film spots of varying intensities are obtained. The intensities of the spots correspond with the amount of complexed Am(3+). This allows a fast discrimination between the Am(3+) complexation efficiencies of ligands from large libraries. Depending on the exposure time of the film, activities as small as 5Bq (241)Am can be detected. Using internal standards a semi-quantitative assessment can be performed. PMID:19200482

  2. The oblique muscle organizer in Hirudo medicinalis, an identified embryonic cell projecting multiple parallel growth cones in an orderly array.

    PubMed

    Jellies, J; Kristan, W B

    1991-11-01

    The oblique muscle layer in the leech body wall is built upon the processes of a unique identified embryonic cell, the Comb- or C-cell. Each C-cell is composed of a spindle-shaped soma that projects approximately 70 parallel processes through the developing body wall at an angle oblique to the long axis. The morphogenesis of this cell and the navigation of its growth cones were examined by intracellular dye filling and antibody staining. At the earliest stages described each C-cell had about six processes, with those near the center of the cell oriented obliquely. As processes were added at the axial ends of the soma they often projected along previously developed longitudinal or circular muscle founder cells and then secondarily aligned themselves parallel to the older processes from the same C-cell. All growth cones initially extended to a particular location in the body wall, where they ceased growing until all 70 processes had been added (over the course of about 5 days). As adjacent segmental homologs met, their growth cones intermingled, eventually sorting out to align parallel. When one of these cells was ablated early--but not later--in development, the remaining adjacent segmental homologs expanded into the vacant territory, consistent with a hypothesis of mutual avoidance between segmental homologs. Most processes that expanded into the experimentally induced vacancy remained correctly oriented and parallel; the few exceptions projected instead along the mirror-image trajectory. Thus, expression of specific avoidance between adjacent C-cell processes is developmentally regulated and functions as a guidance mechanism in vivo, in that it serves to restrict possible trajectories. After aligning its growth cones, each cell stopped adding processes and the processes rapidly extended in concert along relatively precise trajectories. Processes of contralateral homologs cross to form the orthogonal grid used as a scaffold by myocytes to form the oblique muscles. The advancing fronts of growth cones reached the dorsal midline at about the same time as body closure occurs (at about Embryonic Day 20) at which time the C-cells became granular, lost processes, and presumably died. This sequence of developmental events is consistent with temporal and spatial regulation of different morphogenetic strategies, including--but not limited to--specific avoidance, and further suggests testable hypotheses of mechanisms of growth cone navigation in the intact embryo. PMID:1936570

  3. Model-driven mapping onto distributed memory parallel computers

    NASA Technical Reports Server (NTRS)

    Sussman, Alan

    1992-01-01

    The author addresses the problem of exploiting the parallelism available in a program to efficiently employ the resources of the target machine in the context of building a mapping compiler for a distributed memory parallel machine. He demonstrates the effectiveness of using execution models to select the best mapping technique from among those available for a given program segment on a particular machine. Through analysis of the execution models for several mapping techniques for one class of programs on a linear processor array, it is shown that selecting the best technique for a particular program instance can make a significant difference in performance. On the other hand, the results of benchmarks from a mapping compiler for the Warp systolic array machine show that the execution models considered are accurate enough to select the best mapping technique for a given program.

  4. Configurable Multi-Purpose Processor

    NASA Technical Reports Server (NTRS)

    Valencia, J. Emilio; Forney, Chirstopher; Morrison, Robert; Birr, Richard

    2010-01-01

    Advancements in technology have allowed the miniaturization of systems used in aerospace vehicles. This technology is driven by the need for next-generation systems that provide reliable, responsive, and cost-effective range operations while providing increased capabilities such as simultaneous mission support, increased launch trajectories, improved launch, and landing opportunities, etc. Leveraging the newest technologies, the command and telemetry processor (CTP) concept provides for a compact, flexible, and integrated solution for flight command and telemetry systems and range systems. The CTP is a relatively small circuit board that serves as a processing platform for high dynamic, high vibration environments. The CTP can be reconfigured and reprogrammed, allowing it to be adapted for many different applications. The design is centered around a configurable field-programmable gate array (FPGA) device that contains numerous logic cells that can be used to implement traditional integrated circuits. The FPGA contains two PowerPC processors running the Vx-Works real-time operating system and are used to execute software programs specific to each application. The CTP was designed and developed specifically to provide telemetry functions; namely, the command processing, telemetry processing, and GPS metric tracking of a flight vehicle. However, it can be used as a general-purpose processor board to perform numerous functions implemented in either hardware or software using the FPGA s processors and/or logic cells. Functionally, the CTP was designed for range safety applications where it would ultimately become part of a vehicle s flight termination system. Consequently, the major functions of the CTP are to perform the forward link command processing, GPS metric tracking, return link telemetry data processing, error detection and correction, data encryption/ decryption, and initiate flight termination action commands. Also, the CTP had to be designed to survive and operate in a launch environment. Additionally, the CTP was designed to interface with the WFF (Wallops Flight Facility) custom-designed transceiver board which is used in the Low Cost TDRSS Transceiver (LCT2) also developed by WFF. The LCT2 s transceiver board demodulates commands received from the ground via the forward link and sends them to the CTP, where they are processed. The CTP inputs and processes data from the inertial measurement unit (IMU) and the GPS receiver board, generates status data, and then sends the data to the transceiver board where it is modulated and sent to the ground via the return link. Overall, the CTP has combined processing with the ability to interface to a GPS receiver, an IMU, and a pulse code modulation (PCM) communication link, while providing the capability to support common interfaces including Ethernet and serial interfaces boarding a relatively small-sized, lightweight package.

  5. Stochastic propagation of an array of parallel cracks: Exploratory work on matrix fatigue damage in composite laminates

    SciTech Connect

    Williford, R.E.

    1989-09-01

    Transverse cracking of polymeric matrix materials is an important fatigue damage mechanism in continuous-fiber composite laminates. The propagation of an array of these cracks is a stochastic problem usually treated by Monte Carlo methods. However, this exploratory work proposes an alternative approach wherein the Monte Carlo method is replaced by a more closed-form recursion relation based on fractional Brownian motion.'' A fractal scaling equation is also proposed as a substitute for the more empirical Paris equation describing individual crack growth in this approach. Preliminary calculations indicate that the new recursion relation is capable of reproducing the primary features of transverse matrix fatigue cracking behavior. Although not yet fully tested or verified, this cursion relation may eventually be useful for real-time applications such as monitoring damage in aircraft structures.

  6. Concurrent and Accurate Short Read Mapping on Multicore Processors.

    PubMed

    Martínez, Héctor; Tárraga, Joaquín; Medina, Ignacio; Barrachina, Sergio; Castillo, Maribel; Dopazo, Joaquín; Quintana-Ortí, Enrique S

    2015-01-01

    We introduce a parallel aligner with a work-flow organization for fast and accurate mapping of RNA sequences on servers equipped with multicore processors. Our software, HPG Aligner SA (HPG Aligner SA is an open-source application. The software is available at http://www.opencb.org, exploits a suffix array to rapidly map a large fraction of the RNA fragments (reads), as well as leverages the accuracy of the Smith-Waterman algorithm to deal with conflictive reads. The aligner is enhanced with a careful strategy to detect splice junctions based on an adaptive division of RNA reads into small segments (or seeds), which are then mapped onto a number of candidate alignment locations, providing crucial information for the successful alignment of the complete reads. The experimental results on a platform with Intel multicore technology report the parallel performance of HPG Aligner SA, on RNA reads of 100-400 nucleotides, which excels in execution time/sensitivity to state-of-the-art aligners such as TopHat 2+Bowtie 2, MapSplice, and STAR. PMID:26451814

  7. Detector defect correction of medical images on graphics processors

    NASA Astrophysics Data System (ADS)

    Membarth, Richard; Hannig, Frank; Teich, Jürgen; Litz, Gerhard; Hornegger, Heinz

    2011-03-01

    The ever increasing complexity and power dissipation of computer architectures in the last decade blazed the trail for more power efficient parallel architectures. Hence, such architectures like field-programmable gate arrays (FPGAs) and particular graphics cards attained great interest and are consequently adopted for parallel execution of many number crunching loop programs from fields like image processing or linear algebra. However, there is little effort to deploy barely computational, but memory intensive applications to graphics hardware. This paper considers a memory intensive detector defect correction pipeline for medical imaging with strict latency requirements. The image pipeline compensates for different effects caused by the detector during exposure of X-ray images and calculates parameters to control the subsequent dosage. So far, dedicated hardware setups with special processors like DSPs were used for such critical processing. We show that this is today feasible with commodity graphics hardware. Using CUDA as programming model, it is demonstrated that the detector defect correction pipeline consisting of more than ten algorithms is significantly accelerated and that a speedup of 20x can be achieved on NVIDIA's Quadro FX 5800 compared to our reference implementation. For deployment in a streaming application with steadily new incoming data, it is shown that the memory transfer overhead of successive images to the graphics card memory is reduced by 83% using double buffering.

  8. Fabrication and Evaluation of a Micro(Bio)Sensor Array Chip for Multiple Parallel Measurements of Important Cell Biomarkers

    PubMed Central

    Pemberton, Roy M.; Cox, Timothy; Tuffin, Rachel; Drago, Guido A.; Griffiths, John; Pittson, Robin; Johnson, Graham; Xu, Jinsheng; Sage, Ian C.; Davies, Rhodri; Jackson, Simon K.; Kenna, Gerry; Luxton, Richard; Hart, John P.

    2014-01-01

    This report describes the design and development of an integrated electrochemical cell culture monitoring system, based on enzyme-biosensors and chemical sensors, for monitoring indicators of mammalian cell metabolic status. MEMS technology was used to fabricate a microwell-format silicon platform including a thermometer, onto which chemical sensors (pH, O2) and screen-printed biosensors (glucose, lactate), were grafted/deposited. Microwells were formed over the fabricated sensors to give 5-well sensor strips which were interfaced with a multipotentiostat via a bespoke connector box interface. The operation of each sensor/biosensor type was examined individually, and examples of operating devices in five microwells in parallel, in either potentiometric (pH sensing) or amperometric (glucose biosensing) mode are shown. The performance characteristics of the sensors/biosensors indicate that the system could readily be applied to cell culture/toxicity studies. PMID:25360580

  9. Fabrication and evaluation of a micro(bio)sensor array chip for multiple parallel measurements of important cell biomarkers.

    PubMed

    Pemberton, Roy M; Cox, Timothy; Tuffin, Rachel; Drago, Guido A; Griffiths, John; Pittson, Robin; Johnson, Graham; Xu, Jinsheng; Sage, Ian C; Davies, Rhodri; Jackson, Simon K; Kenna, Gerry; Luxton, Richard; Hart, John P

    2014-01-01

    This report describes the design and development of an integrated electrochemical cell culture monitoring system, based on enzyme-biosensors and chemical sensors, for monitoring indicators of mammalian cell metabolic status. MEMS technology was used to fabricate a microwell-format silicon platform including a thermometer, onto which chemical sensors (pH, O2) and screen-printed biosensors (glucose, lactate), were grafted/deposited. Microwells were formed over the fabricated sensors to give 5-well sensor strips which were interfaced with a multipotentiostat via a bespoke connector box interface. The operation of each sensor/biosensor type was examined individually, and examples of operating devices in five microwells in parallel, in either potentiometric (pH sensing) or amperometric (glucose biosensing) mode are shown. The performance characteristics of the sensors/biosensors indicate that the system could readily be applied to cell culture/toxicity studies. PMID:25360580

  10. Development of a prototype PET scanner with depth-of-interaction measurement using solid-state photomultiplier arrays and parallel readout electronics

    PubMed Central

    Shao, Yiping; Sun, Xishan; Lan, Kejian A.; Bircher, Chad; Lou, Kai; Deng, Zhi

    2014-01-01

    In this study, we developed a prototype animal PET by applying several novel technologies to use the solid-state photomultiplier (SSPM) arrays for measuring the depth-of-interaction (DOI) and improving imaging performance. Each PET detector has an 88 array of about 1.91.930.0 mm3 lutetium-yttrium-oxyorthosilicate (LYSO) scintillators, with each end optically connected to a SSPM array (16-channel in a 44 matrix) through a light guide to enable continuous DOI measurement. Each SSPM has an active area of about 33 mm2, and its output is read by a custom-developed application-specific-integrated-circuit (ASIC) to directly convert analog signals to digital timing pulses that encode the interaction information. These pulses are transferred to and be decoded by a field-programmable-gate-array (FPGA) based time-to-digital convertor for coincident event selection and data acquisition. The independent readout of each SSPM and the parallel signal process can significantly improve the signal-to-noise ratio and enable using flexible algorithms for different data processes. The prototype PET consists of two rotating detector panels on a portable gantry with four detectors in each panel to provide 16 mm axial and variable transaxial field-of-view (FOV) sizes. List-mode ordered-subset-expectation-maximization image reconstruction was implemented. The measured mean energy, coincidence timing, and DOI resolution for a crystal were about 17.6%, 2.8 ns, and 5.6 mm, respectively. The measured transaxial resolutions at the center of the FOV were 2.0 mm and 2.3 mm for images reconstructed with and without DOI, respectively. In addition, the resolutions across the FOV with DOI were substantially better than those without DOI. The quality of PET images of both a hot-rod phantom and mouse acquired with DOI was much higher than that of images obtained without DOI. This study demonstrates that SSPM arrays and advanced readout/processing electronics can be used to develop a practical DOI-measureable PET scanner. PMID:24556629

  11. Development of a prototype PET scanner with depth-of-interaction measurement using solid-state photomultiplier arrays and parallel readout electronics

    NASA Astrophysics Data System (ADS)

    Shao, Yiping; Sun, Xishan; Lan, Kejian A.; Bircher, Chad; Lou, Kai; Deng, Zhi

    2014-03-01

    In this study, we developed a prototype animal PET by applying several novel technologies to use solid-state photomultiplier (SSPM) arrays to measure the depth of interaction (DOI) and improve imaging performance. Each PET detector has an 8 8 array of about 1.9 1.9 30.0 mm3 lutetium-yttrium-oxyorthosilicate scintillators, with each end optically connected to an SSPM array (16 channels in a 4 4 matrix) through a light guide to enable continuous DOI measurement. Each SSPM has an active area of about 3 3 mm2, and its output is read by a custom-developed application-specific integrated circuit to directly convert analogue signals to digital timing pulses that encode the interaction information. These pulses are transferred to and are decoded by a field-programmable gate array-based time-to-digital convertor for coincident event selection and data acquisition. The independent readout of each SSPM and the parallel signal process can significantly improve the signal-to-noise ratio and enable the use of flexible algorithms for different data processes. The prototype PET consists of two rotating detector panels on a portable gantry with four detectors in each panel to provide 16 mm axial and variable transaxial field-of-view (FOV) sizes. List-mode ordered subset expectation maximization image reconstruction was implemented. The measured mean energy, coincidence timing and DOI resolution for a crystal were about 17.6%, 2.8 ns and 5.6 mm, respectively. The measured transaxial resolutions at the center of the FOV were 2.0 mm and 2.3 mm for images reconstructed with and without DOI, respectively. In addition, the resolutions across the FOV with DOI were substantially better than those without DOI. The quality of PET images of both a hot-rod phantom and mouse acquired with DOI was much higher than that of images obtained without DOI. This study demonstrates that SSPM arrays and advanced readout/processing electronics can be used to develop a practical DOI-measureable PET scanner.

  12. Hypercluster - Parallel processing for computational mechanics

    NASA Technical Reports Server (NTRS)

    Blech, Richard A.

    1988-01-01

    An account is given of the development status, performance capabilities and implications for further development of NASA-Lewis' testbed 'hypercluster' parallel computer network, in which multiple processors communicate through a shared memory. Processors have local as well as shared memory; the hypercluster is expanded in the same manner as the hypercube, with processor clusters replacing the normal single processor node. The NASA-Lewis machine has three nodes with a vector personality and one node with a scalar personality. Each of the vector nodes uses four board-level vector processors, while the scalar node uses four general-purpose microcomputer boards.

  13. Multiple Embedded Processors for Fault-Tolerant Computing

    NASA Technical Reports Server (NTRS)

    Bolotin, Gary; Watson, Robert; Katanyoutanant, Sunant; Burke, Gary; Wang, Mandy

    2005-01-01

    A fault-tolerant computer architecture has been conceived in an effort to reduce vulnerability to single-event upsets (spurious bit flips caused by impingement of energetic ionizing particles or photons). As in some prior fault-tolerant architectures, the redundancy needed for fault tolerance is obtained by use of multiple processors in one computer. Unlike prior architectures, the multiple processors are embedded in a single field-programmable gate array (FPGA). What makes this new approach practical is the recent commercial availability of FPGAs that are capable of having multiple embedded processors. A working prototype (see figure) consists of two embedded IBM PowerPC 405 processor cores and a comparator built on a Xilinx Virtex-II Pro FPGA. This relatively simple instantiation of the architecture implements an error-detection scheme. A planned future version, incorporating four processors and two comparators, would correct some errors in addition to detecting them.

  14. Rapid, Single-Molecule Assays in Nano/Micro-Fluidic Chips with Arrays of Closely Spaced Parallel Channels Fabricated by Femtosecond Laser Machining

    PubMed Central

    Canfield, Brian K.; King, Jason K.; Robinson, William N.; Hofmeister, William H.; Davis, Lloyd M.

    2014-01-01

    Cost-effective pharmaceutical drug discovery depends on increasing assay throughput while reducing reagent needs. To this end, we are developing an ultrasensitive, fluorescence-based platform that incorporates a nano/micro-fluidic chip with an array of closely spaced channels for parallelized optical readout of single-molecule assays. Here we describe the use of direct femtosecond laser machining to fabricate several hundred closely spaced channels on the surfaces of fused silica substrates. The channels are sealed by bonding to a microscope cover slip spin-coated with a thin film of poly(dimethylsiloxane). Single-molecule detection experiments are conducted using a custom-built, wide-field microscope. The array of channels is epi-illuminated by a line-generating red diode laser, resulting in a line focus just a few microns thick across a 500 micron field of view. A dilute aqueous solution of fluorescently labeled biomolecules is loaded into the device and fluorescence is detected with an electron-multiplying CCD camera, allowing acquisition rates up to 7 kHz for each microchannel. Matched digital filtering based on experimental parameters is used to perform an initial, rapid assessment of detected fluorescence. More detailed analysis is obtained through fluorescence correlation spectroscopy. Simulated fluorescence data is shown to agree well with experimental values. PMID:25140634

  15. Rapid, single-molecule assays in nano/micro-fluidic chips with arrays of closely spaced parallel channels fabricated by femtosecond laser machining.

    PubMed

    Canfield, Brian K; King, Jason K; Robinson, William N; Hofmeister, William H; Davis, Lloyd M

    2014-01-01

    Cost-effective pharmaceutical drug discovery depends on increasing assay throughput while reducing reagent needs. To this end, we are developing an ultrasensitive, fluorescence-based platform that incorporates a nano/micro-fluidic chip with an array of closely spaced channels for parallelized optical readout of single-molecule assays. Here we describe the use of direct femtosecond laser machining to fabricate several hundred closely spaced channels on the surfaces of fused silica substrates. The channels are sealed by bonding to a microscope cover slip spin-coated with a thin film of poly(dimethylsiloxane). Single-molecule detection experiments are conducted using a custom-built, wide-field microscope. The array of channels is epi-illuminated by a line-generating red diode laser, resulting in a line focus just a few microns thick across a 500 micron field of view. A dilute aqueous solution of fluorescently labeled biomolecules is loaded into the device and fluorescence is detected with an electron-multiplying CCD camera, allowing acquisition rates up to 7 kHz for each microchannel. Matched digital filtering based on experimental parameters is used to perform an initial, rapid assessment of detected fluorescence. More detailed analysis is obtained through fluorescence correlation spectroscopy. Simulated fluorescence data is shown to agree well with experimental values. PMID:25140634

  16. Power processor design considerations for a solar electric propulsion spacecraft

    NASA Technical Reports Server (NTRS)

    Costogue, E. N.; Gardner, J. A.

    1974-01-01

    Propulsion power processor design options are described. The propulsion power processor generated the regulated dc voltages and currents from a solar array source of a solar electric propelled spacecraft. The power processor consisted of 12 power supplies that provide the regulated voltages and currents necessary to power a 30-cm mercury ion thruster. The design options for processing unregulated solar array power and for generating the regulated power required by each supply are studied. The technical approaches utilized in the developed design and the technological limitation of the identified design options are discussed. Alternate approaches for delivering power to a number of mercury ion thrusters and methods of optimizing are described. It was concluded that this power processor design should be considered for application in solar electric propulsion missions of the future.

  17. System and method for representing and manipulating three-dimensional objects on massively parallel architectures

    DOEpatents

    Karasick, Michael S. (Ridgefield, CT); Strip, David R. (Albuquerque, NM)

    1996-01-01

    A parallel computing system is described that comprises a plurality of uniquely labeled, parallel processors, each processor capable of modelling a three-dimensional object that includes a plurality of vertices, faces and edges. The system comprises a front-end processor for issuing a modelling command to the parallel processors, relating to a three-dimensional object. Each parallel processor, in response to the command and through the use of its own unique label, creates a directed-edge (d-edge) data structure that uniquely relates an edge of the three-dimensional object to one face of the object. Each d-edge data structure at least includes vertex descriptions of the edge and a description of the one face. As a result, each processor, in response to the modelling command, operates upon a small component of the model and generates results, in parallel with all other processors, without the need for processor-to-processor intercommunication.

  18. System and method for representing and manipulating three-dimensional objects on massively parallel architectures

    DOEpatents

    Karasick, M.S.; Strip, D.R.

    1996-01-30

    A parallel computing system is described that comprises a plurality of uniquely labeled, parallel processors, each processor capable of modeling a three-dimensional object that includes a plurality of vertices, faces and edges. The system comprises a front-end processor for issuing a modeling command to the parallel processors, relating to a three-dimensional object. Each parallel processor, in response to the command and through the use of its own unique label, creates a directed-edge (d-edge) data structure that uniquely relates an edge of the three-dimensional object to one face of the object. Each d-edge data structure at least includes vertex descriptions of the edge and a description of the one face. As a result, each processor, in response to the modeling command, operates upon a small component of the model and generates results, in parallel with all other processors, without the need for processor-to-processor intercommunication. 8 figs.

  19. Some systolic array developments in the United Kingdom

    SciTech Connect

    McCanny, J.V.; McWhirter, J.G.

    1987-07-01

    For the last five or six years, there has been an active program of research on systolic array processors (SAPs) in the United Kingdom. In this article, the authors describe two key projects initiated at the Royal Signals and Radar Establishment (RSRE) and carried out in collaboration with a number of major electronics companies and several universities. The success of these projects demonstrates clearly that the systolic approach to system design is not just an academic concept but a practical means of exploiting large amounts of parallelism and hence achieving orders of magnitude improvement in performance for digital signal processing (DSP). This type of application not only demands but also fully utilizes the SAP architecture. The first project was aimed at developing an electronic processor capable of computing the vector of complex weights required to form the receive beam for an adaptive antenna array.

  20. High-speed packet filtering utilizing stream processors

    NASA Astrophysics Data System (ADS)

    Hummel, Richard J.; Fulp, Errin W.

    2009-04-01

    Parallel firewalls offer a scalable architecture for the next generation of high-speed networks. While these parallel systems can be implemented using multiple firewalls, the latest generation of stream processors can provide similar benefits with a significantly reduced latency due to locality. This paper describes how the Cell Broadband Engine (CBE), a popular stream processor, can be used as a high-speed packet filter. Results show the CBE can potentially process packets arriving at a rate of 1 Gbps with a latency less than 82 ?-seconds. Performance depends on how well the packet filtering process is translated to the unique stream processor architecture. For example the method used for transmitting data and control messages among the pseudo-independent processor cores has a significant impact on performance. Experimental results will also show the current limitations of a CBE operating system when used to process packets. Possible solutions to these issues will be discussed.

  1. Buffered coscheduling for parallel programming and enhanced fault tolerance

    DOEpatents

    Petrini, Fabrizio (Los Alamos, NM); Feng, Wu-chun (Los Alamos, NM)

    2006-01-31

    A computer implemented method schedules processor jobs on a network of parallel machine processors or distributed system processors. Control information communications generated by each process performed by each processor during a defined time interval is accumulated in buffers, where adjacent time intervals are separated by strobe intervals for a global exchange of control information. A global exchange of the control information communications at the end of each defined time interval is performed during an intervening strobe interval so that each processor is informed by all of the other processors of the number of incoming jobs to be received by each processor in a subsequent time interval. The buffered coscheduling method of this invention also enhances the fault tolerance of a network of parallel machine processors or distributed system processors

  2. Efficiency of parallel direct optimization

    NASA Technical Reports Server (NTRS)

    Janies, D. A.; Wheeler, W. C.

    2001-01-01

    Tremendous progress has been made at the level of sequential computation in phylogenetics. However, little attention has been paid to parallel computation. Parallel computing is particularly suited to phylogenetics because of the many ways large computational problems can be broken into parts that can be analyzed concurrently. In this paper, we investigate the scaling factors and efficiency of random addition and tree refinement strategies using the direct optimization software, POY, on a small (10 slave processors) and a large (256 slave processors) cluster of networked PCs running LINUX. These algorithms were tested on several data sets composed of DNA and morphology ranging from 40 to 500 taxa. Various algorithms in POY show fundamentally different properties within and between clusters. All algorithms are efficient on the small cluster for the 40-taxon data set. On the large cluster, multibuilding exhibits excellent parallel efficiency, whereas parallel building is inefficient. These results are independent of data set size. Branch swapping in parallel shows excellent speed-up for 16 slave processors on the large cluster. However, there is no appreciable speed-up for branch swapping with the further addition of slave processors (>16). This result is independent of data set size. Ratcheting in parallel is efficient with the addition of up to 32 processors in the large cluster. This result is independent of data set size. c2001 The Willi Hennig Society.

  3. Dedicated hardware processor and corresponding system-on-chip design for real-time laser speckle imaging.

    PubMed

    Jiang, Chao; Zhang, Hongyan; Wang, Jia; Wang, Yaru; He, Heng; Liu, Rui; Zhou, Fangyuan; Deng, Jialiang; Li, Pengcheng; Luo, Qingming

    2011-11-01

    Laser speckle imaging (LSI) is a noninvasive and full-field optical imaging technique which produces two-dimensional blood flow maps of tissues from the raw laser speckle images captured by a CCD camera without scanning. We present a hardware-friendly algorithm for the real-time processing of laser speckle imaging. The algorithm is developed and optimized specifically for LSI processing in the field programmable gate array (FPGA). Based on this algorithm, we designed a dedicated hardware processor for real-time LSI in FPGA. The pipeline processing scheme and parallel computing architecture are introduced into the design of this LSI hardware processor. When the LSI hardware processor is implemented in the FPGA running at the maximum frequency of 130 MHz, up to 85 raw images with the resolution of 640480 pixels can be processed per second. Meanwhile, we also present a system on chip (SOC) solution for LSI processing by integrating the CCD controller, memory controller, LSI hardware processor, and LCD display controller into a single FPGA chip. This SOC solution also can be used to produce an application specific integrated circuit for LSI processing. PMID:22112113

  4. Accelerating parallel transmit array B1 mapping in high field MRI with slice undersampling and interpolation by kriging.

    PubMed

    Ferrand, Guillaume; Luong, Michel; Cloos, Martijn A; Amadon, Alexis; Wackernagel, Hans

    2014-08-01

    Transmit arrays have been developed to mitigate the RF field inhomogeneity commonly observed in high field magnetic resonance imaging (MRI), typically above 3T. To this end, the knowledge of the RF complex-valued B1 transmit-sensitivities of each independent radiating element has become essential. This paper details a method to speed up a currently available B1-calibration method. The principle relies on slice undersampling, slice and channel interleaving and kriging, an interpolation method developed in geostatistics and applicable in many domains. It has been demonstrated that, under certain conditions, kriging gives the best estimator of a field in a region of interest. The resulting accelerated sequence allows mapping a complete set of eight volumetric field maps of the human head in about 1 min. For validation, the accuracy of kriging is first evaluated against a well-known interpolation technique based on Fourier transform as well as to a B1-maps interpolation method presented in the literature. This analysis is carried out on simulated and decimated experimental B1 maps. Finally, the accelerated sequence is compared to the standard sequence on a phantom and a volunteer. The new sequence provides B1 maps three times faster with a loss of accuracy limited potentially to about 5%. PMID:24816550

  5. Hybrid photomultiplier tube and photodiode parallel detection array for wideband optical spectroscopy of the breast guided by magnetic resonance imaging

    PubMed Central

    Mastanduno, Michael A.; Jiang, Shudong; Pogue, Brian W.; Paulsen, Keith D.

    2013-01-01

    Abstract. A new optical parallel detection system of hybrid frequency and continuous-wave domains was developed to improve the data quality and accuracy in recovery of all breast optical properties. This new system was deployed in a previously existing system for magnetic resonance imaging (MRI)-guided spectroscopy, and allows incorporation of additional near-infrared wavelengths beyond 850 nm, with interlaced channels of photomultiplier tubes (PMTs) and silicon photodiodes (PDs). The acquisition time for obtaining frequency-domain data at six wavelengths (660, 735, 785, 808, 826, and 849 nm) and continuous-wave data at three wavelengths (903, 912, and 948 nm) is 12 min. The dynamic ranges of the detected signal are 105 and 106 for PMT and PD detectors, respectively. Compared to the previous detection system, the SNR ratio of frequency-domain detection was improved by nearly 103 through the addition of an RF amplifier and the utilization of programmable gain. The current system is being utilized in a clinical trial imaging suspected breast cancer tumors as detected by contrast MRI scans. PMID:23979460

  6. Hybrid photomultiplier tube and photodiode parallel detection array for wideband optical spectroscopy of the breast guided by magnetic resonance imaging.

    PubMed

    El-Ghussein, Fadi; Mastanduno, Michael A; Jiang, Shudong; Pogue, Brian W; Paulsen, Keith D

    2014-01-01

    A new optical parallel detection system of hybrid frequency and continuous-wave domains was developed to improve the data quality and accuracy in recovery of all breast optical properties. This new system was deployed in a previously existing system for magnetic resonance imaging (MRI)-guided spectroscopy, and allows incorporation of additional near-infrared wavelengths beyond 850 nm, with interlaced channels of photomultiplier tubes (PMTs) and silicon photodiodes (PDs). The acquisition time for obtaining frequency-domain data at six wavelengths (660, 735, 785, 808, 826, and 849 nm) and continuous-wave data at three wavelengths (903, 912, and 948 nm) is 12 min. The dynamic ranges of the detected signal are 105 and 106 for PMT and PD detectors, respectively. Compared to the previous detection system, the SNR ratio of frequency-domain detection was improved by nearly 103 through the addition of an RF amplifier and the utilization of programmable gain. The current system is being utilized in a clinical trial imaging suspected breast cancer tumors as detected by contrast MRI scans. PMID:23979460

  7. Hybrid photomultiplier tube and photodiode parallel detection array for wideband optical spectroscopy of the breast guided by magnetic resonance imaging

    NASA Astrophysics Data System (ADS)

    El-Ghussein, Fadi; Mastanduno, Michael A.; Jiang, Shudong; Pogue, Brian W.; Paulsen, Keith D.

    2014-01-01

    A new optical parallel detection system of hybrid frequency and continuous-wave domains was developed to improve the data quality and accuracy in recovery of all breast optical properties. This new system was deployed in a previously existing system for magnetic resonance imaging (MRI)-guided spectroscopy, and allows incorporation of additional near-infrared wavelengths beyond 850 nm, with interlaced channels of photomultiplier tubes (PMTs) and silicon photodiodes (PDs). The acquisition time for obtaining frequency-domain data at six wavelengths (660, 735, 785, 808, 826, and 849 nm) and continuous-wave data at three wavelengths (903, 912, and 948 nm) is 12 min. The dynamic ranges of the detected signal are 105 and 106 for PMT and PD detectors, respectively. Compared to the previous detection system, the SNR ratio of frequency-domain detection was improved by nearly 103 through the addition of an RF amplifier and the utilization of programmable gain. The current system is being utilized in a clinical trial imaging suspected breast cancer tumors as detected by contrast MRI scans.

  8. Ultra Dependable Processor

    NASA Astrophysics Data System (ADS)

    Sakai, Shuichi; Goshima, Masahiro; Irie, Hidetsugu

    This paper presents the processor architecture which provides much higher level dependability than the current ones. The features of it are: (1) fault tolerance and secure processing are integrated into a modern superscalar VLSI processor; (2) light-weight effective soft-error tolerant mechanisms are proposed and evaluated; (3) timing errors on random logic and registers are prevented by low-overhead mechanisms; (4) program behavior is hidden from the outer world by proposed address translation methods; (5) information leakage can be avoided by attaching policy tags for all data and monitoring them for each instruction execution; (6) injection attacks are avoided with much higher accuracy than the current systems, by providing tag trackings; (7) the overall structure of the dependable processor is proposed with a dependability manager which controls the detection of illegal conditions and recovers to the normal mode; and (8) an FPGA-based testbed system is developed where the system clock and the voltage are intentionally varied for experiment. The paper presents the fundamental scheme for the dependability, elemental technologies for dependability and the whole architecture of the ultra dependable processor. After showing them, the paper concludes with future works.

  9. Interactive Digital Signal Processor

    NASA Technical Reports Server (NTRS)

    Mish, W. H.

    1985-01-01

    Interactive Digital Signal Processor, IDSP, consists of set of time series analysis "operators" based on various algorithms commonly used for digital signal analysis. Processing of digital signal time series to extract information usually achieved by applications of number of fairly standard operations. IDSP excellent teaching tool for demonstrating application for time series operators to artificially generated signals.

  10. Recorder/processor apparatus

    NASA Technical Reports Server (NTRS)

    Shim, I. H.; Stelben, J. J.

    1974-01-01

    Laser beam is intensity modulated in response to incoming video signals. Latent image is recorded on rotating drum which generates raster in conjunction with incrementally-driven lens carriage. Image is fed automatically to thermal processor; actual image is developed by controlled application of heat onto medium containing latent image.

  11. Universal voice processor development

    NASA Technical Reports Server (NTRS)

    1972-01-01

    The development of a universal voice processor is discussed. The device is based on several circuit configurations using hybrid techniques to satisfy the electrical specifications. The steps taken during the design process are described. Circuit diagrams of the final design are presented. Mathematical models are included to support the theoretical aspects.

  12. Implementing the PM Programming Language using MPI and OpenMP - a New Tool for Programming Geophysical Models on Parallel Systems

    NASA Astrophysics Data System (ADS)

    Bellerby, Tim

    2015-04-01

    PM (Parallel Models) is a new parallel programming language specifically designed for writing environmental and geophysical models. The language is intended to enable implementers to concentrate on the science behind the model rather than the details of running on parallel hardware. At the same time PM leaves the programmer in control - all parallelisation is explicit and the parallel structure of any given program may be deduced directly from the code. This paper describes a PM implementation based on the Message Passing Interface (MPI) and Open Multi-Processing (OpenMP) standards, looking at issues involved with translating the PM parallelisation model to MPI/OpenMP protocols and considering performance in terms of the competing factors of finer-grained parallelisation and increased communication overhead. In order to maximise portability, the implementation stays within the MPI 1.3 standard as much as possible, with MPI-2 MPI-IO file handling the only significant exception. Moreover, it does not assume a thread-safe implementation of MPI. PM adopts a two-tier abstract representation of parallel hardware. A PM processor is a conceptual unit capable of efficiently executing a set of language tasks, with a complete parallel system consisting of an abstract N-dimensional array of such processors. PM processors may map to single cores executing tasks using cooperative multi-tasking, to multiple cores or even to separate processing nodes, efficiently sharing tasks using algorithms such as work stealing. While tasks may move between hardware elements within a PM processor, they may not move between processors without specific programmer intervention. Tasks are assigned to processors using a nested parallelism approach, building on ideas from Reyes et al. (2009). The main program owns all available processors. When the program enters a parallel statement then either processors are divided out among the newly generated tasks (number of new tasks < number of processors) or tasks are divided out among the available processors (number of tasks > number of processors). Nested parallel statements may further subdivide the processor set owned by a given task. Tasks or processors are distributed evenly by default, but uneven distributions are possible under programmer control. It is also possible to explicitly enable child tasks to migrate within the processor set owned by their parent task, reducing load unbalancing at the potential cost of increased inter-processor message traffic. PM incorporates some programming structures from the earlier MIST language presented at a previous EGU General Assembly, while adopting a significantly different underlying parallelisation model and type system. PM code is available at www.pm-lang.org under an unrestrictive MIT license. Reference Ruymán Reyes, Antonio J. Dorta, Francisco Almeida, Francisco de Sande, 2009. Automatic Hybrid MPI+OpenMP Code Generation with llc, Recent Advances in Parallel Virtual Machine and Message Passing Interface, Lecture Notes in Computer Science Volume 5759, 185-195

  13. QSpike tools: a generic framework for parallel batch preprocessing of extracellular neuronal signals recorded by substrate microelectrode arrays.

    PubMed

    Mahmud, Mufti; Pulizzi, Rocco; Vasilaki, Eleni; Giugliano, Michele

    2014-01-01

    Micro-Electrode Arrays (MEAs) have emerged as a mature technique to investigate brain (dys)functions in vivo and in in vitro animal models. Often referred to as "smart" Petri dishes, MEAs have demonstrated a great potential particularly for medium-throughput studies in vitro, both in academic and pharmaceutical industrial contexts. Enabling rapid comparison of ionic/pharmacological/genetic manipulations with control conditions, MEAs are employed to screen compounds by monitoring non-invasively the spontaneous and evoked neuronal electrical activity in longitudinal studies, with relatively inexpensive equipment. However, in order to acquire sufficient statistical significance, recordings last up to tens of minutes and generate large amount of raw data (e.g., 60 channels/MEA, 16 bits A/D conversion, 20 kHz sampling rate: approximately 8 GB/MEA,h uncompressed). Thus, when the experimental conditions to be tested are numerous, the availability of fast, standardized, and automated signal preprocessing becomes pivotal for any subsequent analysis and data archiving. To this aim, we developed an in-house cloud-computing system, named QSpike Tools, where CPU-intensive operations, required for preprocessing of each recorded channel (e.g., filtering, multi-unit activity detection, spike-sorting, etc.), are decomposed and batch-queued to a multi-core architecture or to a computers cluster. With the commercial availability of new and inexpensive high-density MEAs, we believe that disseminating QSpike Tools might facilitate its wide adoption and customization, and inspire the creation of community-supported cloud-computing facilities for MEAs users. PMID:24678297

  14. QSpike tools: a generic framework for parallel batch preprocessing of extracellular neuronal signals recorded by substrate microelectrode arrays

    PubMed Central

    Mahmud, Mufti; Pulizzi, Rocco; Vasilaki, Eleni; Giugliano, Michele

    2014-01-01

    Micro-Electrode Arrays (MEAs) have emerged as a mature technique to investigate brain (dys)functions in vivo and in in vitro animal models. Often referred to as “smart” Petri dishes, MEAs have demonstrated a great potential particularly for medium-throughput studies in vitro, both in academic and pharmaceutical industrial contexts. Enabling rapid comparison of ionic/pharmacological/genetic manipulations with control conditions, MEAs are employed to screen compounds by monitoring non-invasively the spontaneous and evoked neuronal electrical activity in longitudinal studies, with relatively inexpensive equipment. However, in order to acquire sufficient statistical significance, recordings last up to tens of minutes and generate large amount of raw data (e.g., 60 channels/MEA, 16 bits A/D conversion, 20 kHz sampling rate: approximately 8 GB/MEA,h uncompressed). Thus, when the experimental conditions to be tested are numerous, the availability of fast, standardized, and automated signal preprocessing becomes pivotal for any subsequent analysis and data archiving. To this aim, we developed an in-house cloud-computing system, named QSpike Tools, where CPU-intensive operations, required for preprocessing of each recorded channel (e.g., filtering, multi-unit activity detection, spike-sorting, etc.), are decomposed and batch-queued to a multi-core architecture or to a computers cluster. With the commercial availability of new and inexpensive high-density MEAs, we believe that disseminating QSpike Tools might facilitate its wide adoption and customization, and inspire the creation of community-supported cloud-computing facilities for MEAs users. PMID:24678297

  15. National Resource for Computation in Chemistry (NRCC). Attached scientific processors for chemical computations: a report to the chemistry community

    SciTech Connect

    Ostlund, N.S.

    1980-01-01

    The demands of chemists for computational resources are well known and have been amply documented. The best and most cost-effective means of providing these resources is still open to discussion, however. This report surveys the field of attached scientific processors (array processors) and attempts to indicate their present and possible future use in computational chemistry. Array processors have the possibility of providing very cost-effective computation. This report attempts to provide information that will assist chemists who might be considering the use of an array processor for their computations. It describes the general ideas and concepts involved in using array processors, the commercial products that are available, and the experiences reported by those currently using them. In surveying the field of array processors, the author makes certain recommendations regarding their use in computational chemistry. 5 figures, 1 table (RWR)

  16. Soft-core processor study for node-based architectures.

    SciTech Connect

    Van Houten, Jonathan Roger; Jarosz, Jason P.; Welch, Benjamin James; Gallegos, Daniel E.; Learn, Mark Walter

    2008-09-01

    Node-based architecture (NBA) designs for future satellite projects hold the promise of decreasing system development time and costs, size, weight, and power and positioning the laboratory to address other emerging mission opportunities quickly. Reconfigurable Field Programmable Gate Array (FPGA) based modules will comprise the core of several of the NBA nodes. Microprocessing capabilities will be necessary with varying degrees of mission-specific performance requirements on these nodes. To enable the flexibility of these reconfigurable nodes, it is advantageous to incorporate the microprocessor into the FPGA itself, either as a hardcore processor built into the FPGA or as a soft-core processor built out of FPGA elements. This document describes the evaluation of three reconfigurable FPGA based processors for use in future NBA systems--two soft cores (MicroBlaze and non-fault-tolerant LEON) and one hard core (PowerPC 405). Two standard performance benchmark applications were developed for each processor. The first, Dhrystone, is a fixed-point operation metric. The second, Whetstone, is a floating-point operation metric. Several trials were run at varying code locations, loop counts, processor speeds, and cache configurations. FPGA resource utilization was recorded for each configuration. Cache configurations impacted the results greatly; for optimal processor efficiency it is necessary to enable caches on the processors. Processor caches carry a penalty; cache error mitigation is necessary when operating in a radiation environment.

  17. Extended VLIW processor for real-time imaging

    NASA Astrophysics Data System (ADS)

    Sakai, Keiichi; Fujiwara, Itaru; Ae, Tadashi

    2001-04-01

    We propose EVLIW as a new processor architecture which is designed for general purpose processing and is suitable especially for real-time image processing. The processor architecture is a VLIW, but it has more functional units than the generic VLIW processor has. The EVLIW consists of the interconnection network for connecting the neighbor and of functional units, which are more primitive than in the generic VLIW processor. Some of general-purpose processors in the market includes several processing units, e.g. the same four single precision floating-point or four 16bit-word integer units for Intel processor with SSE/MMX, where the four units do the same operation with the four different data. In the image processing, the data are processed in parallel, where the operating is not complicated an only the high-speed processing is usually required. We have tried a simple image processing using Intel's processor with SSE/MMX and summarize the results. In this paper, we describe a new architecture for real-time imaging, and its design, comparing with Intel's processor with SSE/MMX.

  18. Generating local addresses and communication sets for data-parallel programs

    NASA Technical Reports Server (NTRS)

    Chatterjee, Siddhartha; Gilbert, John R.; Long, Fred J. E.; Schreiber, Robert; Teng, Shang-Hua

    1993-01-01

    Generating local addresses and communication sets is an important issue in distributed-memory implementations of data-parallel languages such as High Performance FORTRAN. We show that, for an array A affinely aligned to a template that is distributed across p processors with a cyclic(k) distribution and a computation involving the regular section A(l:h:s), the local memory access sequence for any processor is characterized by a finite state machine of at most k states. We present fast algorithms for computing the essential information about these state machines, and extend the framework to handle multidimensional arrays. We also show how to generate communication sets using the state machine approach. Performance results show that this solution requires very little run-time overhead and acceptable preprocessing time.

  19. Generating local addresses and communication sets for data-parallel programs

    NASA Technical Reports Server (NTRS)

    Chatterjee, Siddhartha; Gilbert, John R.; Long, Fred J. E.; Schreiber, Robert; Teng, Shang-Hua

    1993-01-01

    Generating local addresses and communication sets is an important issue in distributed-memory implementations of data-parallel languages such as High Performance Fortran. We show that for an array A affinely aligned to a template that is distributed across p processors with a cyclic(k) distribution, and a computation involving the regular section A, the local memory access sequence for any processor is characterized by a finite state machine of at most k states. We present fast algorithms for computing the essential information about these state machines, and extend the framework to handle multidimensional arrays. We also show how to generate communication sets using the state machine approach. Performance results show that this solution requires very little runtime overhead and acceptable preprocessing time.

  20. Is Monte Carlo embarrassingly parallel?

    SciTech Connect

    Hoogenboom, J. E.

    2012-07-01

    Monte Carlo is often stated as being embarrassingly parallel. However, running a Monte Carlo calculation, especially a reactor criticality calculation, in parallel using tens of processors shows a serious limitation in speedup and the execution time may even increase beyond a certain number of processors. In this paper the main causes of the loss of efficiency when using many processors are analyzed using a simple Monte Carlo program for criticality. The basic mechanism for parallel execution is MPI. One of the bottlenecks turn out to be the rendez-vous points in the parallel calculation used for synchronization and exchange of data between processors. This happens at least at the end of each cycle for fission source generation in order to collect the full fission source distribution for the next cycle and to estimate the effective multiplication factor, which is not only part of the requested results, but also input to the next cycle for population control. Basic improvements to overcome this limitation are suggested and tested. Also other time losses in the parallel calculation are identified. Moreover, the threading mechanism, which allows the parallel execution of tasks based on shared memory using OpenMP, is analyzed in detail. Recommendations are given to get the maximum efficiency out of a parallel Monte Carlo calculation. (authors)

  1. Dielectrophoretic cell trapping and parallel one-to-one fusion based on field constriction created by a micro-orifice array

    PubMed Central

    Gel, Murat; Kimura, Yuji; Kurosawa, Osamu; Oana, Hidehiro; Kotera, Hidetoshi; Washizu, Masao

    2010-01-01

    Micro-orifice based cell fusion assures high-yield fusion without compromising the cell viability. This paper examines feasibility of a dielectrophoresis (DEP) assisted cell trapping method for parallel fusion with a micro-orifice array. The goal is to create viable fusants for studying postfusion cell behavior. We fabricated a microfluidic chip that contained a chamber and partition. The partition divided the chamber into two compartments and it had a number of embedded micro-orifices. The voltage applied to the electrodes located at each compartment generated an electric field distribution concentrating in micro-orifices. Cells introduced into each compartment moved toward the micro-orifice array by manipulation of hydrostatic pressure. DEP assisted trapping was used to keep the cells in micro-orifice and to establish cell to cell contact through orifice. By applying a pulse, cell fusion was initiated to form a neck between cells. The neck passing through the orifice resulted in immobilization of the fused cell pair at micro-orifice. After washing away the unfused cells, the chip was loaded to a microscope with stage top incubator for time lapse imaging of the selected fusants. The viable fusants were successfully generated by fusion of mouse fibroblast cells (L929). Time lapse observation of the fusants showed that fused cell pairs escaping from micro-orifice became one tetraploid cell. The generated tetraploid cells divided into three daughter cells. The fusants generated with a smaller micro-orifice (diameter?2 ?m) were kept immobilized at micro-orifice until cell division phase. After observation of two synchronized cell divisions, the fusant divided into four daughter cells. We conclude that the presented method of cell pairing and fusion is suitable for high-yield generation of viable fusants and furthermore, subsequent study of postfusion phenomena. PMID:20697592

  2. Highly parallel computer architecture for robotic computation

    NASA Technical Reports Server (NTRS)

    Fijany, Amir (inventor); Bejczy, Anta K. (inventor)

    1991-01-01

    In a computer having a large number of single instruction multiple data (SIMD) processors, each of the SIMD processors has two sets of three individual processor elements controlled by a master control unit and interconnected among a plurality of register file units where data is stored. The register files input and output data in synchronism with a minor cycle clock under control of two slave control units controlling the register file units connected to respective ones of the two sets of processor elements. Depending upon which ones of the register file units are enabled to store or transmit data during a particular minor clock cycle, the processor elements within an SIMD processor are connected in rings or in pipeline arrays, and may exchange data with the internal bus or with neighboring SIMD processors through interface units controlled by respective ones of the two slave control units.

  3. Wavelength-encoded OCDMA system using opto-VLSI processors

    NASA Astrophysics Data System (ADS)

    Aljada, Muhsen; Alameh, Kamal

    2007-07-01

    We propose and experimentally demonstrate a 2.5 Gbits/sper user wavelength-encoded optical code-division multiple-access encoder-decoder structure based on opto-VLSI processing. Each encoder and decoder is constructed using a single 1D opto-very-large-scale-integrated (VLSI) processor in conjunction with a fiber Bragg grating (FBG) array of different Bragg wavelengths. The FBG array spectrally and temporally slices the broadband input pulse into several components and the opto-VLSI processor generates codewords using digital phase holograms. System performance is measured in terms of the autocorrelation and cross-correlation functions as well as the eye diagram.

  4. Problem size, parallel architecture and optimal speedup

    NASA Technical Reports Server (NTRS)

    Nicol, David M.; Willard, Frank H.

    1987-01-01

    The communication and synchronization overhead inherent in parallel processing can lead to situations where adding processors to the solution method actually increases execution time. Problem type, problem size, and architecture type all affect the optimal number of processors to employ. The numerical solution of an elliptic partial differential equation is examined in order to study the relationship between problem size and architecture. The equation's domain is discretized into n sup 2 grid points which are divided into partitions and mapped onto the individual processor memories. The relationships between grid size, stencil type, partitioning strategy, processor execution time, and communication network type are analytically quantified. In so doing, the optimal number of processors was determined to assign to the solution, and identified (1) the smallest grid size which fully benefits from using all available processors, (2) the leverage on performance given by increasing processor speed or communication network speed, and (3) the suitability of various architectures for large numerical problems.

  5. Stereoscopic Optical Signal Processor

    NASA Technical Reports Server (NTRS)

    Graig, Glenn D.

    1988-01-01

    Optical signal processor produces two-dimensional cross correlation of images from steroscopic video camera in real time. Cross correlation used to identify object, determines distance, or measures movement. Left and right cameras modulate beams from light source for correlation in video detector. Switch in position 1 produces information about range of object viewed by cameras. Position 2 gives information about movement. Position 3 helps to identify object.

  6. Highly integrated pulse processor

    NASA Astrophysics Data System (ADS)

    Bordessoule, Michel; Bosshard, Roger

    1995-02-01

    A digital pulse processor for a multichannel solid-state detector [1,2] has been designed, using highly integrated analog and digital signal processing circuits. The detector preamplifier output is digitized and a parametrizable synchro is elaborated by a dedicated chipset. The digitized signal is then processed by a finite impulse response filter chip (FIR) whose programming is discussed, after which a pile-up protected peak detector sorts it into a histogram. All parameters are digitally controlled.

  7. VLSI Processor For Vector Quantization

    NASA Technical Reports Server (NTRS)

    Tawel, Raoul

    1995-01-01

    Pixel intensities in each kernel compared simultaneously with all code vectors. Prototype high-performance, low-power, very-large-scale integrated (VLSI) circuit designed to perform compression of image data by vector-quantization method. Contains relatively simple analog computational cells operating on direct or buffered outputs of photodetectors grouped into blocks in imaging array, yielding vector-quantization code word for each such block in sequence. Scheme exploits parallel-processing nature of vector-quantization architecture, with consequent increase in speed.

  8. Conversion via software of a simd processor into a mimd processor

    SciTech Connect

    Guzman, A.; Gerzso, M.; Norkin, K.B.; Vilenkin, S.Y.

    1983-01-01

    A method is described which takes a pure LISP program and automatically decomposes it via automatic parallelization into several parts, one for each processor of an SIMD architecture. Each of these parts is a different execution flow, i.e., a different program. The execution of these different programs by an SIMD architecture is examined. The method has been developed in some detail for the PS-2000, an SIMD Soviet multiprocessor, making it behave like AHR, a Mexican MIMD multi-microprocessor. Both the PS-2000 and AHR execute a pure LISP program in parallel; its decomposition into >n> pieces, their synchronization, scheduling, etc., are performed by the system (hardware and software). In order to achieve simultaneous execution of different programs in an SIMD processor, the method uses a scheme of node scheduling and node exportation. 14 references.

  9. Algorithmic commonalities in the parallel environment

    NASA Technical Reports Server (NTRS)

    Mcanulty, Michael A.; Wainer, Michael S.

    1987-01-01

    The ultimate aim of this project was to analyze procedures from substantially different application areas to discover what is either common or peculiar in the process of conversion to the Massively Parallel Processor (MPP). Three areas were identified: molecular dynamic simulation, production systems (rule systems), and various graphics and vision algorithms. To date, only selected graphics procedures have been investigated. They are the most readily available, and produce the most visible results. These include simple polygon patch rendering, raycasting against a constructive solid geometric model, and stochastic or fractal based textured surface algorithms. Only the simplest of conversion strategies, mapping a major loop to the array, has been investigated so far. It is not entirely satisfactory.

  10. MAP3D: a media processor approach for high-end 3D graphics

    NASA Astrophysics Data System (ADS)

    Darsa, Lucia; Stadnicki, Steven; Basoglu, Chris

    1999-12-01

    Equator Technologies, Inc. has used a software-first approach to produce several programmable and advanced VLIW processor architectures that have the flexibility to run both traditional systems tasks and an array of media-rich applications. For example, Equator's MAP1000A is the world's fastest single-chip programmable signal and image processor targeted for digital consumer and office automation markets. The Equator MAP3D is a proposal for the architecture of the next generation of the Equator MAP family. The MAP3D is designed to achieve high-end 3D performance and a variety of customizable special effects by combining special graphics features with high performance floating-point and media processor architecture. As a programmable media processor, it offers the advantages of a completely configurable 3D pipeline--allowing developers to experiment with different algorithms and to tailor their pipeline to achieve the highest performance for a particular application. With the support of Equator's advanced C compiler and toolkit, MAP3D programs can be written in a high-level language. This allows the compiler to successfully find and exploit any parallelism in a programmer's code, thus decreasing the time to market of a given applications. The ability to run an operating system makes it possible to run concurrent applications in the MAP3D chip, such as video decoding while executing the 3D pipelines, so that integration of applications is easily achieved--using real-time decoded imagery for texturing 3D objects, for instance. This novel architecture enables an affordable, integrated solution for high performance 3D graphics.

  11. NAS Parallel Benchmarks Results

    NASA Technical Reports Server (NTRS)

    Subhash, Saini; Bailey, David H.; Lasinski, T. A. (Technical Monitor)

    1995-01-01

    The NAS Parallel Benchmarks (NPB) were developed in 1991 at NASA Ames Research Center to study the performance of parallel supercomputers. The eight benchmark problems are specified in a pencil and paper fashion i.e. the complete details of the problem to be solved are given in a technical document, and except for a few restrictions, benchmarkers are free to select the language constructs and implementation techniques best suited for a particular system. In this paper, we present new NPB performance results for the following systems: (a) Parallel-Vector Processors: Cray C90, Cray T'90 and Fujitsu VPP500; (b) Highly Parallel Processors: Cray T3D, IBM SP2 and IBM SP-TN2 (Thin Nodes 2); (c) Symmetric Multiprocessing Processors: Convex Exemplar SPP1000, Cray J90, DEC Alpha Server 8400 5/300, and SGI Power Challenge XL. We also present sustained performance per dollar for Class B LU, SP and BT benchmarks. We also mention NAS future plans of NPB.

  12. Scaling and Graphical Transport-Map Analysis of Ambipolar Schottky-Barrier Thin-Film Transistors Based on a Parallel Array of Si Nanowires.

    PubMed

    Jeon, Dae-Young; Pregl, Sebastian; Park, So Jeong; Baraban, Larysa; Cuniberti, Gianaurelio; Mikolajick, Thomas; Weber, Walter M

    2015-07-01

    Si nanowire (Si-NW) based thin-film transistors (TFTs) have been considered as a promising candidate for next-generation flexible and wearable electronics as well as sensor applications with high performance. Here, we have fabricated ambipolar Schottky-barrier (SB) TFTs consisting of a parallel array of Si-NWs and performed an in-depth study related to their electrical performance and operation mechanism through several electrical parameters extracted from the channel length scaling based method. Especially, the newly suggested current-voltage (I-V) contour map clearly elucidates the unique operation mechanism of the ambipolar SB-TFTs, governed by Schottky-junction between NiSi2 and Si-NW. Further, it reveals for the first-time in SB based FETs the important internal electrostatic coupling between the channel and externally applied voltages. This work provides helpful information for the realization of practical circuits with ambipolar SB-TFTs that can be transferred to different substrate technologies and applications. PMID:26087437

  13. Multi-microprocessor that executes pure LISP in parallel

    SciTech Connect

    Guzman, A.

    1982-01-01

    The architecture presented allows parallel computation of high level languages, with some advantages: (1) the programmer is unaware that he is writing programs for a parallel computer; (2) the processors communicate little with each other, so that interconnection problems are minimised; (3) a given processor is unaware of how many other processors there are, or what they are doing; (4) a processor never waits for another process to have finished, nor does it awake or interrupt another processor. The machine processes in parallel programs written in high level languages capable of being expressed in the lambda notation (applicative languages). It is formed by a collection of general purpose processors which are weakly coupled and without hierarchy. Asynchronous computation is permitted by each processor evaluating a part of a program. 17 references.

  14. Incremental Parallelization of Non-Data-Parallel Programs Using the Charon Message-Passing Library

    NASA Technical Reports Server (NTRS)

    VanderWijngaart, Rob F.

    2000-01-01

    Message passing is among the most popular techniques for parallelizing scientific programs on distributed-memory architectures. The reasons for its success are wide availability (MPI), efficiency, and full tuning control provided to the programmer. A major drawback, however, is that incremental parallelization, as offered by compiler directives, is not generally possible, because all data structures have to be changed throughout the program simultaneously. Charon remedies this situation through mappings between distributed and non-distributed data. It allows breaking up the parallelization into small steps, guaranteeing correctness at every stage. Several tools are available to help convert legacy codes into high-performance message-passing programs. They usually target data-parallel applications, whose loops carrying most of the work can be distributed among all processors without much dependency analysis. Others do a full dependency analysis and then convert the code virtually automatically. Even more toolkits are available that aid construction from scratch of message passing programs. None, however, allows piecemeal translation of codes with complex data dependencies (i.e. non-data-parallel programs) into message passing codes. The Charon library (available in both C and Fortran) provides incremental parallelization capabilities by linking legacy code arrays with distributed arrays. During the conversion process, non-distributed and distributed arrays exist side by side, and simple mapping functions allow the programmer to switch between the two in any location in the program. Charon also provides wrapper functions that leave the structure of the legacy code intact, but that allow execution on truly distributed data. Finally, the library provides a rich set of communication functions that support virtually all patterns of remote data demands in realistic structured grid scientific programs, including transposition, nearest-neighbor communication, pipelining, gather/scatter, and redistribution. At the end of the conversion process most intermediate Charon function calls will have been removed, the non-distributed arrays will have been deleted, and virtually the only remaining Charon functions calls are the high-level, highly optimized communications. Distribution of the data is under complete control of the programmer, although a wide range of useful distributions is easily available through predefined functions. A crucial aspect of the library is that it does not allocate space for distributed arrays, but accepts programmer-specified memory. This has two major consequences. First, codes parallelized using Charon do not suffer from encapsulation; user data is always directly accessible. This provides high efficiency, and also retains the possibility of using message passing directly for highly irregular communications. Second, non-distributed arrays can be interpreted as (trivial) distributions in the Charon sense, which allows them to be mapped to truly distributed arrays, and vice versa. This is the mechanism that enables incremental parallelization. In this paper we provide a brief introduction of the library and then focus on the actual steps in the parallelization process, using some representative examples from, among others, the NAS Parallel Benchmarks. We show how a complicated two-dimensional pipeline-the prototypical non-data-parallel algorithm- can be constructed with ease. To demonstrate the flexibility of the library, we give examples of the stepwise, efficient parallel implementation of nonlocal boundary conditions common in aircraft simulations, as well as the construction of the sequence of grids required for multigrid.

  15. Massively parallel mathematical sieves

    SciTech Connect

    Montry, G.R.

    1989-01-01

    The Sieve of Eratosthenes is a well-known algorithm for finding all prime numbers in a given subset of integers. A parallel version of the Sieve is described that produces computational speedups over 800 on a hypercube with 1,024 processing elements for problems of fixed size. Computational speedups as high as 980 are achieved when the problem size per processor is fixed. The method of parallelization generalizes to other sieves and will be efficient on any ensemble architecture. We investigate two highly parallel sieves using scattered decomposition and compare their performance on a hypercube multiprocessor. A comparison of different parallelization techniques for the sieve illustrates the trade-offs necessary in the design and implementation of massively parallel algorithms for large ensemble computers.

  16. LEWICE droplet trajectory calculations on a parallel computer

    NASA Technical Reports Server (NTRS)

    Caruso, Steven C.

    1993-01-01

    A parallel computer implementation (128 processors) of LEWICE, a NASA Lewis code used to predict the time-dependent ice accretion process for two-dimensional aerodynamic bodies of simple geometries, is described. Two-dimensional parallel droplet trajectory calculations are performed to demonstrate the potential benefits of applying parallel processing to ice accretion analysis. Parallel performance is evaluated as a function of the number of trajectories and the number of processors. For comparison, similar trajectory calculations are performed on single-processor Cray computers, and the best parallel results are found to be 33 and 23 times faster, respectively, than those of the Cray XMP and YMP.

  17. First level trigger processor for the ZEUS calorimeter

    SciTech Connect

    Dawson, J.W.; Talaga, R.L.; Burr, G.W.; Laird, R.J. ); Smith, W.; Lackey, J. . Dept. of Physics)

    1990-01-01

    This paper discusses the design of the first level trigger processor for the ZEUS calorimeter. This processor accepts data from the 13,000 photomultipliers of the calorimeter which is topologically divided into 16 regions, and after regional preprocessing, performs logical and numerical operations which cross regional boundaries. Because the crossing period at the HERA collider is 96 ns, it is necessary that first-level trigger decisions be made in pipelined hardware. One microsecond is allowed for the processor to perform the required logical and numerical operations, during which time the data from ten crossings would be resident in the processor while being clocked through the pipelined hardware. The circuitry is implemented in 100K ECL, Advanced CMOS discrete devices, and programmable gate arrays, and operates in a VME environment. All tables and registers are written/read from VME, and all diagnostic codes are executed from VME. Preprocessed data flows into the processor at a rate of 5.2GB/s, and processed data flows from the processor to the Global First-Level Trigger at a rate of 700MB/s. The system allows for subsets of the logic to be configured by software and for various important variables to be histogrammed as they flow through the processor. 2 refs., 3 figs.

  18. Parallel VLSI architecture emulation and the organization of APSA/MPP

    NASA Technical Reports Server (NTRS)

    Odonnell, John T.

    1987-01-01

    The Applicative Programming System Architecture (APSA) combines an applicative language interpreter with a novel parallel computer architecture that is well suited for Very Large Scale Integration (VLSI) implementation. The Massively Parallel Processor (MPP) can simulate VLSI circuits by allocating one processing element in its square array to an area on a square VLSI chip. As long as there are not too many long data paths, the MPP can simulate a VLSI clock cycle very rapidly. The APSA circuit contains a binary tree with a few long paths and many short ones. A skewed H-tree layout allows every processing element to simulate a leaf cell and up to four tree nodes, with no loss in parallelism. Emulation of a key APSA algorithm on the MPP resulted in performance 16,000 times faster than a Vax. This speed will make it possible for the APSA language interpreter to run fast enough to support research in parallel list processing algorithms.

  19. Fault detection and bypass in a sequence information signal processor

    NASA Technical Reports Server (NTRS)

    Peterson, John C. (Inventor); Chow, Edward T. (Inventor)

    1992-01-01

    The invention comprises a plurality of scan registers, each such register respectively associated with a processor element; an on-chip comparator, encoder and fault bypass register. Each scan register generates a unitary signal the logic state of which depends on the correctness of the input from the previous processor in the systolic array. These unitary signals are input to a common comparator which generates an output indicating whether or not an error has occurred. These unitary signals are also input to an encoder which identifies the location of any fault detected so that an appropriate multiplexer can be switched to bypass the faulty processor element. Input scan data can be readily programmed to fully exercise all of the processor elements so that no fault can remain undetected.

  20. Optical backplane interconnect switch for data processors and computers

    NASA Technical Reports Server (NTRS)

    Hendricks, Herbert D.; Benz, Harry F.; Hammer, Jacob M.

    1989-01-01

    An optoelectronic integrated device design is reported which can be used to implement an all-optical backplane interconnect switch. The switch is sized to accommodate an array of processors and memories suitable for direct replacement into the basic avionic multiprocessor backplane. The optical backplane interconnect switch is also suitable for direct replacement of the PI bus traffic switch and at the same time, suitable for supporting pipelining of the processor and memory. The 32 bidirectional switchable interconnects are configured with broadcast capability for controls, reconfiguration, and messages. The approach described here can handle a serial interconnection of data processors or a line-to-link interconnection of data processors. An optical fiber demonstration of this approach is presented.

  1. Finite element computation with parallel VLSI

    NASA Technical Reports Server (NTRS)

    Mcgregor, J.; Salama, M.

    1983-01-01

    This paper describes a parallel processing computer consisting of a 16-bit microcomputer as a master processor which controls and coordinates the activities of 8086/8087 VLSI chip set slave processors working in parallel. The hardware is inexpensive and can be flexibly configured and programmed to perform various functions. This makes it a useful research tool for the development of, and experimentation with parallel mathematical algorithms. Application of the hardware to computational tasks involved in the finite element analysis method is demonstrated by the generation and assembly of beam finite element stiffness matrices. A number of possible schemes for the implementation of N-elements on N- or n-processors (N is greater than n) are described, and the speedup factors of their time consumption are determined as a function of the number of available parallel processors.

  2. Finite Element Modeling on Scalable Parallel Computers

    NASA Technical Reports Server (NTRS)

    Cwik, T.; Zuffada, C.; Jamnejad, V.; Katz, D.

    1995-01-01

    A coupled finite element-integral equation was developed to model fields scattered from inhomogenous, three-dimensional objects of arbitrary shape. This paper outlines how to implement the software on a scalable parallel processor.

  3. Extending Automatic Parallelization to Optimize High-Level Abstractions for Multicore

    SciTech Connect

    Liao, C; Quinlan, D J; Willcock, J J; Panas, T

    2008-12-12

    Automatic introduction of OpenMP for sequential applications has attracted significant attention recently because of the proliferation of multicore processors and the simplicity of using OpenMP to express parallelism for shared-memory systems. However, most previous research has only focused on C and Fortran applications operating on primitive data types. C++ applications using high-level abstractions, such as STL containers and complex user-defined types, are largely ignored due to the lack of research compilers that are readily able to recognize high-level object-oriented abstractions and leverage their associated semantics. In this paper, we automatically parallelize C++ applications using ROSE, a multiple-language source-to-source compiler infrastructure which preserves the high-level abstractions and gives us access to their semantics. Several representative parallelization candidate kernels are used to explore semantic-aware parallelization strategies for high-level abstractions, combined with extended compiler analyses. Those kernels include an array-base computation loop, a loop with task-level parallelism, and a domain-specific tree traversal. Our work extends the applicability of automatic parallelization to modern applications using high-level abstractions and exposes more opportunities to take advantage of multicore processors.

  4. Extended Parallelism Models for Optimization on Massively Parallel Computers

    SciTech Connect

    Eldred, M.S.; Schimel, B.D.

    1999-05-24

    Single-level parallel optimization approaches, those in which either the simulation code executes in parallel or the optimiza- tion algorithm invokes multiple simultaneous single-processor analyses, have been investigated previously and been shown to be effective in reducing the time required to compute optimal solutions. However, these approaches have clear performance limita- tions that prevent effective scaling with the thousands of processors available in massively parallel supercomputers. In more recent work, a capability has been developed for multilevel parallelism in which multiple instances of multiprocessor simulations are coordinated simultaneously. This implementation employs a master-slave approach using the Message Passing Interface (MPI) within the DAKOTA software toolkit. Mathematical analysis on achieving peak efficiency in multilevel parallelism has shown that the most effective processor partitioning scheme is the one that limits the size of multiprocessor simulations in favor of concurrent execution of multiple simulations. That is, if both coarse-grained and fine-grained parallelism can be exploited, then preference should be given to the coarse-grained parallelism. This analysis was verified in multilevel paralIel computatiorud experiments on networks of workstations (NOWS) and on the Intel TeraFLOPS massively parallel supercomputer. In current work, methods for exploiting additional coarse-grained parallelism in optimization are being investigated so that fine-grained efficiency losses can be further minimized. These activities are focusing on both algorithmic coarse-grained parallel- ism (multiple independent function evaluations) through the development of speculative gradient methods and concurrent iterator strategies and on function evaluation coarse-grained parallelism (multiple separable simulations within a function evaluation) through the development of general partitioning and nested synchronization facilities. The net result is a total of four separate lev- els of parallelism which can minimize efficiency losses and achieve near linear scaling on massively parallel computers.

  5. An Optical Tomography System Using a Digital Signal Processor

    PubMed Central

    Rahim, Ruzairi Abdul; Thiam, Chiam Kok; Fazalul Rahiman, Mohd Hafiz

    2008-01-01

    The use of a personal computer together with a Data Acquisition System (DAQ) as the processing tool in optical tomography systems has been the norm ever since the beginning of process tomography. However, advancements in silicon fabrication technology allow nowadays the fabrication of powerful Digital Signal Processors (DSP) at a reasonable cost. This allows this technology to be used in an optical tomography system since data acquisition and processing can be performed within the DSP. Thus, the dependency on a personal computer and a DAQ to sample and process the external signals can be reduced or even eliminated. The DSP system was customized to control the data acquisition process of 1616 optical sensor array, arranged in parallel beam projection. The data collected was used to reconstruct the cross sectional image of the pipeline conveyor. For image display purposes, the reconstructed image was sent to a personal computer via serial communication. This allows the use of a laptop to display the tomogram image besides performing any other offline analysis.

  6. Parallel computation and computers for artificial intelligence

    SciTech Connect

    Kowalik, J.S. )

    1988-01-01

    This book discusses Parallel Processing in Artificial Intelligence; Parallel Computing using Multilisp; Execution of Common Lisp in a Parallel Environment; Qlisp; Restricted AND-Parallel Execution of Logic Programs; PARLOG: Parallel Programming in Logic; and Data-driven Processing of Semantic Nets. Attention is also given to: Application of the Butterfly Parallel Processor in Artificial Intelligence; On the Range of Applicability of an Artificial Intelligence Machine; Low-level Vision on Warp and the Apply Programming Mode; AHR: A Parallel Computer for Pure Lisp; FAIM-1: An Architecture for Symbolic Multi-processing; and Overview of Al Application Oriented Parallel Processing Research in Japan.

  7. Optimizing Vector-Quantization Processor Architecture for Intelligent Query-Search Applications

    NASA Astrophysics Data System (ADS)

    Xu, Huaiyu; Mita, Yoshio; Shibata, Tadashi

    2002-04-01

    The architecture of a very large scale integration (VLSI) vector-quantization processor (VQP) has been optimized to develop a general-purpose intelligent query-search agent. The agent performs a similarity-based search in a large-volume database. Although similarity-based search processing is computationally very expensive, latency-free searches have become possible due to the highly parallel maximum-likelihood search architecture of the VQP chip. Three architectures of the VQP chip have been studied and their performances are compared. In order to give reasonable searching results according to the different policies, the concept of penalty function has been introduced into the VQP. An E-commerce real-estate agency system has been developed using the VQP chip implemented in a field-programmable gate array (FPGA) and the effectiveness of such an agency system has been demonstrated.

  8. Distributed processor allocation for launching applications in a massively connected processors complex

    DOEpatents

    Pedretti, Kevin (Goleta, CA)

    2008-11-18

    A compute processor allocator architecture for allocating compute processors to run applications in a multiple processor computing apparatus is distributed among a subset of processors within the computing apparatus. Each processor of the subset includes a compute processor allocator. The compute processor allocators can share a common database of information pertinent to compute processor allocation. A communication path permits retrieval of information from the database independently of the compute processor allocators.

  9. Parallel algorithms for interactive manipulation of digital terrain models

    NASA Technical Reports Server (NTRS)

    Davis, E. W.; Mcallister, D. F.; Nagaraj, V.

    1988-01-01

    Interactive three-dimensional graphics applications, such as terrain data representation and manipulation, require extensive arithmetic processing. Massively parallel machines are attractive for this application since they offer high computational rates, and grid connected architectures provide a natural mapping for grid based terrain models. Presented here are algorithms for data movement on the massive parallel processor (MPP) in support of pan and zoom functions over large data grids. It is an extension of earlier work that demonstrated real-time performance of graphics functions on grids that were equal in size to the physical dimensions of the MPP. When the dimensions of a data grid exceed the processing array size, data is packed in the array memory. Windows of the total data grid are interactively selected for processing. Movement of packed data is needed to distribute items across the array for efficient parallel processing. Execution time for data movement was found to exceed that for arithmetic aspects of graphics functions. Performance figures are given for routines written in MPP Pascal.

  10. Parallel Modem Architectures for High-Data-Rate Space Modems

    NASA Astrophysics Data System (ADS)

    Satorius, E.

    2014-08-01

    Existing software-defined radios (SDRs) for space are limited in data volume by several factors, including bandwidth, space-qualified analog-to-digital converter (ADC) technology, and processor throughput, e.g., the throughput of a space-qualified field-programmable gate array (FPGA). In an attempt to further improve the throughput of space-based SDRs and to fully exploit the newer and more capable space-qualified technology (ADCs, FPGAs), we are evaluating parallel transmitter/receiver architectures for space SDRs. These architectures would improve data volume for both deep-space and particularly proximity (e.g., relay) links. In this article, designs for FPGA implementation of a high-rate parallel modem are presented as well as both fixed- and floating-point simulated performance results based on a functional design that is suitable for FPGA implementation.

  11. Dual-scale topology optoelectronic processor.

    PubMed

    Marsden, G C; Krishnamoorthy, A V; Esener, S C; Lee, S H

    1991-12-15

    The dual-scale topology optoelectronic processor (D-STOP) is a parallel optoelectronic architecture for matrix algebraic processing. The architecture can be used for matrix-vector multiplication and two types of vector outer product. The computations are performed electronically, which allows multiplication and summation concepts in linear algebra to be generalized to various nonlinear or symbolic operations. This generalization permits the application of D-STOP to many computational problems. The architecture uses a minimum number of optical transmitters, which thereby reduces fabrication requirements while maintaining area-efficient electronics. The necessary optical interconnections are space invariant, minimizing space-bandwidth requirements. PMID:19784198

  12. Simulation of an array-based neural net model

    NASA Technical Reports Server (NTRS)

    Barnden, John A.

    1987-01-01

    Research in cognitive science suggests that much of cognition involves the rapid manipulation of complex data structures. However, it is very unclear how this could be realized in neural networks or connectionist systems. A core question is: how could the interconnectivity of items in an abstract-level data structure be neurally encoded? The answer appeals mainly to positional relationships between activity patterns within neural arrays, rather than directly to neural connections in the traditional way. The new method was initially devised to account for abstract symbolic data structures, but it also supports cognitively useful spatial analogue, image-like representations. As the neural model is based on massive, uniform, parallel computations over 2D arrays, the massively parallel processor is a convenient tool for simulation work, although there are complications in using the machine to the fullest advantage. An MPP Pascal simulation program for a small pilot version of the model is running.

  13. Reconfigurable data path processor

    NASA Technical Reports Server (NTRS)

    Donohoe, Gregory (Inventor)

    2005-01-01

    A reconfigurable data path processor comprises a plurality of independent processing elements. Each of the processing elements advantageously comprising an identical architecture. Each processing element comprises a plurality of data processing means for generating a potential output. Each processor is also capable of through-putting an input as a potential output with little or no processing. Each processing element comprises a conditional multiplexer having a first conditional multiplexer input, a second conditional multiplexer input and a conditional multiplexer output. A first potential output value is transmitted to the first conditional multiplexer input, and a second potential output value is transmitted to the second conditional multiplexer output. The conditional multiplexer couples either the first conditional multiplexer input or the second conditional multiplexer input to the conditional multiplexer output, according to an output control command. The output control command is generated by processing a set of arithmetic status-bits through a logical mask. The conditional multiplexer output is coupled to a first processing element output. A first set of arithmetic bits are generated according to the processing of the first processable value. A second set of arithmetic bits may be generated from a second processing operation. The selection of the arithmetic status-bits is performed by an arithmetic-status bit multiplexer selects the desired set of arithmetic status bits from among the first and second set of arithmetic status bits. The conditional multiplexer evaluates the select arithmetic status bits according to logical mask defining an algorithm for evaluating the arithmetic status bits.

  14. On-chip CMOS-compatible optical signal processor.

    PubMed

    Yang, Lin; Ji, Ruiqiang; Zhang, Lei; Ding, Jianfeng; Xu, Qianfan

    2012-06-01

    We propose and demonstrate an optical signal processor performing matrix-vector multiplication, which is composed of laser-modulator array, multiplexer, splitter, microring modulator matrix and photodetector array. 8 10? multiplications and accumulations (MACs) per second is implemented at the clock at a clock frequency of 10 MHz. All functional units can be ultimately monolithically integrated on a chip with the development of silicon photonics and an efficient high-performance computing system is expected in the future. PMID:22714383

  15. Programable Pipelined-Image Processor

    NASA Technical Reports Server (NTRS)

    Gennery, D. B.; Wilcox, B.

    1986-01-01

    Computer serves as pipelined processor for imagery or other two-dimensional digital data. Processor does feature extraction, smoothing, edge detection, texture measurement, and stereoscoptic area correlation. Also plans routes for obstacle avoidance by robots and solves two-dimensional partial differential equations. Image processor consists of modular units: each includes set of computing elements of types particularly useful in pipelined-image processing. Flexible interconnection scheme used to route data to subsequent stages of pipeline.

  16. Use of parallel computing in mass processing of laser data

    NASA Astrophysics Data System (ADS)

    Będkowski, J.; Bratuś, R.; Prochaska, M.; Rzonca, A.

    2015-12-01

    The first part of the paper includes a description of the rules used to generate the algorithm needed for the purpose of parallel computing and also discusses the origins of the idea of research on the use of graphics processors in large scale processing of laser scanning data. The next part of the paper includes the results of an efficiency assessment performed for an array of different processing options, all of which were substantially accelerated with parallel computing. The processing options were divided into the generation of orthophotos using point clouds, coloring of point clouds, transformations, and the generation of a regular grid, as well as advanced processes such as the detection of planes and edges, point cloud classification, and the analysis of data for the purpose of quality control. Most algorithms had to be formulated from scratch in the context of the requirements of parallel computing. A few of the algorithms were based on existing technology developed by the Dephos Software Company and then adapted to parallel computing in the course of this research study. Processing time was determined for each process employed for a typical quantity of data processed, which helped confirm the high efficiency of the solutions proposed and the applicability of parallel computing to the processing of laser scanning data. The high efficiency of parallel computing yields new opportunities in the creation and organization of processing methods for laser scanning data.

  17. Scalable parallel communications

    NASA Technical Reports Server (NTRS)

    Maly, K.; Khanna, S.; Overstreet, C. M.; Mukkamala, R.; Zubair, M.; Sekhar, Y. S.; Foudriat, E. C.

    1992-01-01

    Coarse-grain parallelism in networking (that is, the use of multiple protocol processors running replicated software sending over several physical channels) can be used to provide gigabit communications for a single application. Since parallel network performance is highly dependent on real issues such as hardware properties (e.g., memory speeds and cache hit rates), operating system overhead (e.g., interrupt handling), and protocol performance (e.g., effect of timeouts), we have performed detailed simulations studies of both a bus-based multiprocessor workstation node (based on the Sun Galaxy MP multiprocessor) and a distributed-memory parallel computer node (based on the Touchstone DELTA) to evaluate the behavior of coarse-grain parallelism. Our results indicate: (1) coarse-grain parallelism can deliver multiple 100 Mbps with currently available hardware platforms and existing networking protocols (such as Transmission Control Protocol/Internet Protocol (TCP/IP) and parallel Fiber Distributed Data Interface (FDDI) rings); (2) scale-up is near linear in n, the number of protocol processors, and channels (for small n and up to a few hundred Mbps); and (3) since these results are based on existing hardware without specialized devices (except perhaps for some simple modifications of the FDDI boards), this is a low cost solution to providing multiple 100 Mbps on current machines. In addition, from both the performance analysis and the properties of these architectures, we conclude: (1) multiple processors providing identical services and the use of space division multiplexing for the physical channels can provide better reliability than monolithic approaches (it also provides graceful degradation and low-cost load balancing); (2) coarse-grain parallelism supports running several transport protocols in parallel to provide different types of service (for example, one TCP handles small messages for many users, other TCP's running in parallel provide high bandwidth service to a single application); and (3) coarse grain parallelism will be able to incorporate many future improvements from related work (e.g., reduced data movement, fast TCP, fine-grain parallelism) also with near linear speed-ups.

  18. Speeding up parallel processing

    NASA Technical Reports Server (NTRS)

    Denning, Peter J.

    1988-01-01

    In 1967 Amdahl expressed doubts about the ultimate utility of multiprocessors. The formulation, now called Amdahl's law, became part of the computing folklore and has inspired much skepticism about the ability of the current generation of massively parallel processors to efficiently deliver all their computing power to programs. The widely publicized recent results of a group at Sandia National Laboratory, which showed speedup on a 1024 node hypercube of over 500 for three fixed size problems and over 1000 for three scalable problems, have convincingly challenged this bit of folklore and have given new impetus to parallel scientific computing.

  19. Efficacy of Code Optimization on Cache-based Processors

    NASA Technical Reports Server (NTRS)

    VanderWijngaart, Rob F.; Chancellor, Marisa K. (Technical Monitor)

    1997-01-01

    The current common wisdom in the U.S. is that the powerful, cost-effective supercomputers of tomorrow will be based on commodity (RISC) micro-processors with cache memories. Already, most distributed systems in the world use such hardware as building blocks. This shift away from vector supercomputers and towards cache-based systems has brought about a change in programming paradigm, even when ignoring issues of parallelism. Vector machines require inner-loop independence and regular, non-pathological memory strides (usually this means: non-power-of-two strides) to allow efficient vectorization of array operations. Cache-based systems require spatial and temporal locality of data, so that data once read from main memory and stored in high-speed cache memory is used optimally before being written back to main memory. This means that the most cache-friendly array operations are those that feature zero or unit stride, so that each unit of data read from main memory (a cache line) contains information for the next iteration in the loop. Moreover, loops ought to be 'fat', meaning that as many operations as possible are performed on cache data-provided instruction caches do not overflow and enough registers are available. If unit stride is not possible, for example because of some data dependency, then care must be taken to avoid pathological strides, just ads on vector computers. For cache-based systems the issues are more complex, due to the effects of associativity and of non-unit block (cache line) size. But there is more to the story. Most modern micro-processors are superscalar, which means that they can issue several (arithmetic) instructions per clock cycle, provided that there are enough independent instructions in the loop body. This is another argument for providing fat loop bodies. With these restrictions, it appears fairly straightforward to produce code that will run efficiently on any cache-based system. It can be argued that although some of the important computational algorithms employed at NASA Ames require different programming styles on vector machines and cache-based machines, respectively, neither architecture class appeared to be favored by particular algorithms in principle. Practice tells us that the situation is more complicated. This report presents observations and some analysis of performance tuning for cache-based systems. We point out several counterintuitive results that serve as a cautionary reminder that memory accesses are not the only factors that determine performance, and that within the class of cache-based systems, significant differences exist.

  20. Software-Reconfigurable Processors for Spacecraft

    NASA Technical Reports Server (NTRS)

    Farrington, Allen; Gray, Andrew; Bell, Bryan; Stanton, Valerie; Chong, Yong; Peters, Kenneth; Lee, Clement; Srinivasan, Jeffrey

    2005-01-01

    A report presents an overview of an architecture for a software-reconfigurable network data processor for a spacecraft engaged in scientific exploration. When executed on suitable electronic hardware, the software performs the functions of a physical layer (in effect, acts as a software radio in that it performs modulation, demodulation, pulse-shaping, error correction, coding, and decoding), a data-link layer, a network layer, a transport layer, and application-layer processing of scientific data. The software-reconfigurable network processor is undergoing development to enable rapid prototyping and rapid implementation of communication, navigation, and scientific signal-processing functions; to provide a long-lived communication infrastructure; and to provide greatly improved scientific-instrumentation and scientific-data-processing functions by enabling science-driven in-flight reconfiguration of computing resources devoted to these functions. This development is an extension of terrestrial radio and network developments (e.g., in the cellular-telephone industry) implemented in software running on such hardware as field-programmable gate arrays, digital signal processors, traditional digital circuits, and mixed-signal application-specific integrated circuits (ASICs).

  1. Parallel Genetic Algorithm for Alpha Spectra Fitting

    NASA Astrophysics Data System (ADS)

    Garca-Orellana, Carlos J.; Rubio-Montero, Pilar; Gonzlez-Velasco, Horacio

    2005-01-01

    We present a performance study of alpha-particle spectra fitting using parallel Genetic Algorithm (GA). The method uses a two-step approach. In the first step we run parallel GA to find an initial solution for the second step, in which we use Levenberg-Marquardt (LM) method for a precise final fit. GA is a high resources-demanding method, so we use a Beowulf cluster for parallel simulation. The relationship between simulation time (and parallel efficiency) and processors number is studied using several alpha spectra, with the aim of obtaining a method to estimate the optimal processors number that must be used in a simulation.

  2. Parallel machine architecture for production rule systems

    DOEpatents

    Allen, Jr., John D. (Knoxville, TN); Butler, Philip L. (Knoxville, TN)

    1989-01-01

    A parallel processing system for production rule programs utilizes a host processor for storing production rule right hand sides (RHS) and a plurality of rule processors for storing left hand sides (LHS). The rule processors operate in parallel in the recognize phase of the system recognize -Act Cycle to match their respective LHS's against a stored list of working memory elements (WME) in order to find a self consistent set of WME's. The list of WME is dynamically varied during the Act phase of the system in which the host executes or fires rule RHS's for those rules for which a self-consistent set has been found by the rule processors. The host transmits instructions for creating or deleting working memory elements as dictated by the rule firings until the rule processors are unable to find any further self-consistent working memory element sets at which time the production rule system is halted.

  3. Linear array implementation of the EM algorithm for PET image reconstruction

    SciTech Connect

    Rajan, K.; Patnaik, L.M.; Ramakrishna, J.

    1995-08-01

    The PET image reconstruction based on the EM algorithm has several attractive advantages over the conventional convolution back projection algorithms. However, the PET image reconstruction based on the EM algorithm is computationally burdensome for today`s single processor systems. In addition, a large memory is required for the storage of the image, projection data, and the probability matrix. Since the computations are easily divided into tasks executable in parallel, multiprocessor configurations are the ideal choice for fast execution of the EM algorithms. In tis study, the authors attempt to overcome these two problems by parallelizing the EM algorithm on a multiprocessor systems. The parallel EM algorithm on a linear array topology using the commercially available fast floating point digital signal processor (DSP) chips as the processing elements (PE`s) has been implemented. The performance of the EM algorithm on a 386/387 machine, IBM 6000 RISC workstation, and on the linear array system is discussed and compared. The results show that the computational speed performance of a linear array using 8 DSP chips as PE`s executing the EM image reconstruction algorithm is about 15.5 times better than that of the IBM 6000 RISC workstation. The novelty of the scheme is its simplicity. The linear array topology is expandable with a larger number of PE`s. The architecture is not dependant on the DSP chip chosen, and the substitution of the latest DSP chip is straightforward and could yield better speed performance.

  4. 3D-Flow processor for a programmable Level-1 trigger (feasibility study)

    SciTech Connect

    Crosetto, D.

    1992-10-01

    A feasibility study has been made to use the 3D-Flow processor in a pipelined programmable parallel processing architecture to identify particles such as electrons, jets, muons, etc., in high-energy physics experiments.

  5. Multiple processor version of a Monte Carlo code for photon transport in turbid media

    NASA Astrophysics Data System (ADS)

    Colasanti, Alberto; Guida, Giovanni; Kisslinger, Annamaria; Liuzzi, Raffaele; Quarto, Maria; Riccio, Patrizia; Roberti, Giuseppe; Villani, Fulvia

    2000-10-01

    Although Monte Carlo (MC) simulations represent an accurate and flexible tool to study the photon transport in strongly scattering media with complex geometrical topologies, they are very often infeasible because of their very high computation times. Parallel computing, in principle very suitable for MC approach because it consists in the repeated application of the same calculations to unrelated and superposing events, offers a possible approach to overcome this problem. It was developed an MC multiple processor code for optical and IR photon transport which was run on the parallel processor computer CRAY-T3E (128 DEC Alpha EV5 nodes, 600 Mflops) at CINECA (Bologna, Italy). The comparison between single processor and multiple processor runs for the same tissue models shows that the parallelization reduces the computation time by a factor of about N , where N is the number of used processors. This means a computation time reduction by a factor ranging from about 10 2 (as in our case where 128 processors are available) up to about 10 3 (with the most powerful parallel computers with 1024 processors). This reduction could make feasible MC simulations till now impracticable. The scaling of the execution time of the parallel code, as a function of the values of the main input parameters, is also evaluated.

  6. Interactive digital signal processor

    NASA Technical Reports Server (NTRS)

    Mish, W. H.; Wenger, R. M.; Behannon, K. W.; Byrnes, J. B.

    1982-01-01

    The Interactive Digital Signal Processor (IDSP) is examined. It consists of a set of time series analysis Operators each of which operates on an input file to produce an output file. The operators can be executed in any order that makes sense and recursively, if desired. The operators are the various algorithms used in digital time series analysis work. User written operators can be easily interfaced to the sysatem. The system can be operated both interactively and in batch mode. In IDSP a file can consist of up to n (currently n=8) simultaneous time series. IDSP currently includes over thirty standard operators that range from Fourier transform operations, design and application of digital filters, eigenvalue analysis, to operators that provide graphical output, allow batch operation, editing and display information.

  7. CoNNeCT Baseband Processor Module

    NASA Technical Reports Server (NTRS)

    Yamamoto, Clifford K; Jedrey, Thomas C.; Gutrich, Daniel G.; Goodpasture, Richard L.

    2011-01-01

    A document describes the CoNNeCT Baseband Processor Module (BPM) based on an updated processor, memory technology, and field-programmable gate arrays (FPGAs). The BPM was developed from a requirement to provide sufficient computing power and memory storage to conduct experiments for a Software Defined Radio (SDR) to be implemented. The flight SDR uses the AT697 SPARC processor with on-chip data and instruction cache. The non-volatile memory has been increased from a 20-Mbit EEPROM (electrically erasable programmable read only memory) to a 4-Gbit Flash, managed by the RTAX2000 Housekeeper, allowing more programs and FPGA bit-files to be stored. The volatile memory has been increased from a 20-Mbit SRAM (static random access memory) to a 1.25-Gbit SDRAM (synchronous dynamic random access memory), providing additional memory space for more complex operating systems and programs to be executed on the SPARC. All memory is EDAC (error detection and correction) protected, while the SPARC processor implements fault protection via TMR (triple modular redundancy) architecture. Further capability over prior BPM designs includes the addition of a second FPGA to implement features beyond the resources of a single FPGA. Both FPGAs are implemented with Xilinx Virtex-II and are interconnected by a 96-bit bus to facilitate data exchange. Dedicated 1.25- Gbit SDRAMs are wired to each Xilinx FPGA to accommodate high rate data buffering for SDR applications as well as independent SpaceWire interfaces. The RTAX2000 manages scrub and configuration of each Xilinx.

  8. Implementing clips on a parallel computer

    NASA Technical Reports Server (NTRS)

    Riley, Gary

    1987-01-01

    The C language integrated production system (CLIPS) is a forward chaining rule based language to provide training and delivery for expert systems. Conceptually, rule based languages have great potential for benefiting from the inherent parallelism of the algorithms that they employ. During each cycle of execution, a knowledge base of information is compared against a set of rules to determine if any rules are applicable. Parallelism also can be employed for use with multiple cooperating expert systems. To investigate the potential benefits of using a parallel computer to speed up the comparison of facts to rules in expert systems, a parallel version of CLIPS was developed for the FLEX/32, a large grain parallel computer. The FLEX implementation takes a macroscopic approach in achieving parallelism by splitting whole sets of rules among several processors rather than by splitting the components of an individual rule among processors. The parallel CLIPS prototype demonstrates the potential advantages of integrating expert system tools with parallel computers.

  9. FY 2006 Accomplishment Colony - "Services and Interfaces to Support Large Numbers of Processors"

    SciTech Connect

    Jones, T; Kale, L; Moreira, J; Mendes, C; Chakravorty, S; Tauferner, A; Inglett, T

    2006-06-30

    The Colony Project is developing operating system and runtime system technology to enable efficient general purpose environments on tens of thousands of processors. To accomplish this, we are investigating memory management techniques, fault management strategies, and parallel resource management schemes. Recent results show promising findings for scalable strategies based on processor virtualization, in-memory checkpointing, and parallel aware modifications to full featured operating systems.

  10. Parallelization of adaptive MC integrators

    NASA Astrophysics Data System (ADS)

    Kreckel, Richard

    1997-11-01

    Monte Carlo (MC) methods for numerical integration seem to be embarrassingly parallel on first sight. When adaptive schemes are applied in order to enhance convergence however, the seemingly most natural way of replicating the whole job on each processor can potentially ruin the adaptive behaviour. Using the popular VEGAS-Algorithm as an example an economic method of semi-micro parallelization with variable grain-size is presented and contrasted with another straightforward approach of macro-parallelization. A portable implementation of this semi-micro parallelization is used in the xloops-project and is made publicly available.

  11. Never Trust Your Word Processor

    ERIC Educational Resources Information Center

    Linke, Dirk

    2009-01-01

    In this article, the author talks about the auto correction mode of word processors that leads to a number of problems and describes an example in biochemistry exams that shows how word processors can lead to mistakes in databases and in papers. The author contends that, where this system is applied, spell checking should not be left to a word

  12. Never Trust Your Word Processor

    ERIC Educational Resources Information Center

    Linke, Dirk

    2009-01-01

    In this article, the author talks about the auto correction mode of word processors that leads to a number of problems and describes an example in biochemistry exams that shows how word processors can lead to mistakes in databases and in papers. The author contends that, where this system is applied, spell checking should not be left to a word…

  13. Parallel algorithms for message decomposition

    SciTech Connect

    Teng, S.H.; Wang, B.

    1987-06-01

    The authors consider the deterministic and random parallel complexity (time and processor) of message decoding: an essential problem in communications systems and translation systems. They present an optimal parallel algorithm to decompose prefix-coded messages and uniquely decipherable-coded messages in O(n/P) time, using O(P) processors (for all P:1 less than or equal toPless than or equal ton/log n) deterministically as well as randomly on the weakest version of parallel random access machines in which concurrent read and concurrent write to a cell in the common memory are not allowed. This is done by reducing decoding to parallel finite-state automata simulation and the prefix sums.

  14. Design of free space interconnected signal processor

    NASA Astrophysics Data System (ADS)

    Murdocca, Miles; Stone, Thomas

    1993-12-01

    Progress is described on a collaborative effort between the Photonics Center at Rome Laboratory (RL), Griffiss AFB and Rutgers University, through the RL Expert Science and Engineering (ES&E) program. The goal of the effort is to develop a prototype random access memory (RAM) that can be used in a signal processor for a computing model that consists of cascaded arrays of optical logic gates interconnected in free space with regular patterns. The effort involved the optical and architectural development of a cascadable optical logic system in which microlaser pumped S-SEED devices serve as logic gates. At the completion of the contract, two gate-level layouts of the module were completed which were created in collaboration with RL personnel. The basic layout of the optical system has been developed, and key components have been tested. The delayed delivery of microlaser arrays precluded completion of the processor during the contract period, but preliminary testing was made possible through the use of other microlaser devices.

  15. A parallel algorithm for channel routing on a hypercube

    NASA Technical Reports Server (NTRS)

    Brouwer, Randall; Banerjee, Prithviraj

    1987-01-01

    A new parallel simulated annealing algorithm for channel routing on a P processor hypercube is presented. The basic idea used is to partition a set of tracks equally among processors in the hypercube. In parallel, P/2 pairs of processors perform displacements and exchanges of nets between tracks, compute the changes in cost functions, and accept moves using a parallel annealing criteria. Through the use of a unique distributed data structure, it is possible to minimize message traffic and add versatility and efficiency in a parallel routing tool. The algorithm has been implemented and is being tested on some of the popular channel problems from the literature.

  16. Introduction to the POKER parallel programming environment

    SciTech Connect

    Snyder, L.

    1983-01-01

    The POKER parallel programming environment is a graphics-based, interactive system for programming the configurable, highly parallel (CHIP) computer. Designed to support nearly all aspects of parallel programming in one integrated system, POKER has been implemented as a (=35000 line) C program on the VAX 11/780 under UNIX. It provides a number of novel features including graphics programming of parallel processor communication. 4 references.

  17. Electrostatically focused addressable field emission array chips (AFEA's) for high-speed massively parallel maskless digital E-beam direct write lithography and scanning electron microscopy

    DOEpatents

    Thomas, Clarence E.; Baylor, Larry R.; Voelkl, Edgar; Simpson, Michael L.; Paulus, Michael J.; Lowndes, Douglas H.; Whealton, John H.; Whitson, John C.; Wilgen, John B.

    2002-12-24

    Systems and methods are described for addressable field emission array (AFEA) chips. A method of operating an addressable field-emission array, includes: generating a plurality of electron beams from a pluralitly of emitters that compose the addressable field-emission array; and focusing at least one of the plurality of electron beams with an on-chip electrostatic focusing stack. The systems and methods provide advantages including the avoidance of space-charge blow-up.

  18. Matrix preconditioning: a robust operation for optical linear algebra processors.

    PubMed

    Ghosh, A; Paparao, P

    1987-07-15

    Analog electrooptical processors are best suited for applications demanding high computational throughput with tolerance for inaccuracies. Matrix preconditioning is one such application. Matrix preconditioning is a preprocessing step for reducing the condition number of a matrix and is used extensively with gradient algorithms for increasing the rate of convergence and improving the accuracy of the solution. In this paper, we describe a simple parallel algorithm for matrix preconditioning, which can be implemented efficiently on a pipelined optical linear algebra processor. From the results of our numerical experiments we show that the efficacy of the preconditioning algorithm is affected very little by the errors of the optical system. PMID:20489953

  19. CCD/CID Processors Would Offer Greater Precision

    NASA Technical Reports Server (NTRS)

    Barhen, Jacob; Toomarian, Nikzad; Fijany, Amir

    1995-01-01

    Charge-coupled-device/charge-injection-device (CCD/CID) data processors of proposed type offer advantages of massively parallel computational architecture and high computational speed typical of older CCD/CID data processors, but with increased precision. Useful in performing matrix vector multiplications in variety of applications, including solving partial differential equations, processing signal and image data, control computations, and neural-network simulations. Greater precision of proposed devices help to ensure accuracy in CCD/CID implementations of pseudospectral neural networks - particular class of artificial neural networks especially suited to solving nonlinear differential equations.

  20. Parallel processing data network of master and slave transputers controlled by a serial control network

    DOEpatents

    Crosetto, D.B.

    1996-12-31

    The present device provides for a dynamically configurable communication network having a multi-processor parallel processing system having a serial communication network and a high speed parallel communication network. The serial communication network is used to disseminate commands from a master processor to a plurality of slave processors to effect communication protocol, to control transmission of high density data among nodes and to monitor each slave processor`s status. The high speed parallel processing network is used to effect the transmission of high density data among nodes in the parallel processing system. Each node comprises a transputer, a digital signal processor, a parallel transfer controller, and two three-port memory devices. A communication switch within each node connects it to a fast parallel hardware channel through which all high density data arrives or leaves the node. 6 figs.

  1. Highly scalable linear solvers on thousands of processors.

    SciTech Connect

    Domino, Stefan Paul; Karlin, Ian; Siefert, Christopher; Hu, Jonathan Joseph; Robinson, Allen Conrad; Tuminaro, Raymond Stephen

    2009-09-01

    In this report we summarize research into new parallel algebraic multigrid (AMG) methods. We first provide a introduction to parallel AMG. We then discuss our research in parallel AMG algorithms for very large scale platforms. We detail significant improvements in the AMG setup phase to a matrix-matrix multiplication kernel. We present a smoothed aggregation AMG algorithm with fewer communication synchronization points, and discuss its links to domain decomposition methods. Finally, we discuss a multigrid smoothing technique that utilizes two message passing layers for use on multicore processors.

  2. Automated anomaly detection processor

    NASA Astrophysics Data System (ADS)

    Kraiman, James B.; Arouh, Scott L.; Webb, Michael L.

    2002-07-01

    Robust exploitation of tracking and surveillance data will provide an early warning and cueing capability for military and civilian Law Enforcement Agency operations. This will improve dynamic tasking of limited resources and hence operational efficiency. The challenge is to rapidly identify threat activity within a huge background of noncombatant traffic. We discuss development of an Automated Anomaly Detection Processor (AADP) that exploits multi-INT, multi-sensor tracking and surveillance data to rapidly identify and characterize events and/or objects of military interest, without requiring operators to specify threat behaviors or templates. The AADP has successfully detected an anomaly in traffic patterns in Los Angeles, analyzed ship track data collected during a Fleet Battle Experiment to detect simulated mine laying behavior amongst maritime noncombatants, and is currently under development for surface vessel tracking within the Coast Guard's Vessel Traffic Service to support port security, ship inspection, and harbor traffic control missions, and to monitor medical surveillance databases for early alert of a bioterrorist attack. The AADP can also be integrated into combat simulations to enhance model fidelity of multi-sensor fusion effects in military operations.

  3. Fast multipole methods on graphics processors

    NASA Astrophysics Data System (ADS)

    Gumerov, Nail A.; Duraiswami, Ramani

    2008-09-01

    The fast multipole method allows the rapid approximate evaluation of sums of radial basis functions. For a specified accuracy, ?, the method scales as O(N) in both time and memory compared to the direct method with complexity O(N2), which allows the solution of larger problems with given resources. Graphical processing units (GPU) are now increasingly viewed as data parallel compute coprocessors that can provide significant computational performance at low price. We describe acceleration of the FMM using the data parallel GPU architecture. The FMM has a complex hierarchical (adaptive) structure, which is not easily implemented on data-parallel processors. We described strategies for parallelization of all components of the FMM, develop a model to explain the performance of the algorithm on the GPU architecture; and determined optimal settings for the FMM on the GPU. These optimal settings are different from those on usual CPUs. Some innovations in the FMM algorithm, including the use of modified stencils, real polynomial basis functions for the Laplace kernel, and decompositions of the translation operators, are also described. We obtained accelerations of the Laplace kernel FMM on a single NVIDIA GeForce 8800 GTX GPU in the range of 30-60 compared to a serial CPU FMM implementation. For a problem with a million sources, the summations involved are performed in approximately one second. This performance is equivalent to solving of the same problem at a 43 Teraflop rate if we use straightforward summation.

  4. Survey of new vector computers: The CRAY 1S from CRAY research; the CYBER 205 from CDC and the parallel computer from ICL - architecture and programming

    NASA Technical Reports Server (NTRS)

    Gentzsch, W.

    1982-01-01

    Problems which can arise with vector and parallel computers are discussed in a user oriented context. Emphasis is placed on the algorithms used and the programming techniques adopted. Three recently developed supercomputers are examined and typical application examples are given in CRAY FORTRAN, CYBER 205 FORTRAN and DAP (distributed array processor) FORTRAN. The systems performance is compared. The addition of parts of two N x N arrays is considered. The influence of the architecture on the algorithms and programming language is demonstrated. Numerical analysis of magnetohydrodynamic differential equations by an explicit difference method is illustrated, showing very good results for all three systems. The prognosis for supercomputer development is assessed.

  5. Multigrid and Krylov Solvers for Large Scale Finite Element Groundwater Flow Simulations on Distributed Memory Parallel Platforms

    SciTech Connect

    Mahinthakumar, K.

    1997-01-01

    In this report we present parallel solvers for large linear systems arising from the finite-element discretization of the three-dimensional steady-state groundwater flow problem. Our solvers are based on multigrid and Krylov subspace methods. The parallel implementation is based on a domain decomposition strategy with explicit message passing using NX and MPI libraries. We have tested our parallel implementations on the Intel Paragon XP/S 150 supercomputer using up to 1024 parallel processors and on other parallel platforms such as SGI/Power Challenge Array, Cray/SGI Origin 2000, Convex Exemplar SPP-1200, and IBM SP using up to 64 processors. We show that multigrid can be a scalable algorithm on distributed memory machines. We demonstrate the effectiveness of parallel multigrid based solvers by solving problems requiring more than 70 million nodes in less than a minute. This is more than 25 times faster than the diagonal preconditioned conjugate gradient method which is one of the more popular methods for large sparse linear systems. Our results also show that multigrid as a stand alone solver works best for problems with smooth coefficients, but for rough coefficients it is best used as a preconditioner for a Krylov subspace method such as the conjugate gradient method. We show that even for extremely heterogeneous systems the multigrid pre-conditioned conjugate gradient method is at least 10 times faster than the diagonally preconditioned conjugate gradient method.

  6. A scalable parallel open architecture data acquisition system for low to high rate experiments, test beams and all SSC (Superconducting Super Collider) detectors

    SciTech Connect

    Barsotti, E.; Booth, A.; Bowden, M.; Swoboda, C. ); Lockyer, N.; VanBerg, R. )

    1989-12-01

    A new era of high-energy physics research is beginning requiring accelerators with much higher luminosities and interaction rates in order to discover new elementary particles. As a consequences, both orders of magnitude higher data rates from the detector and online processing power, well beyond the capabilities of current high energy physics data acquisition systems, are required. This paper describes a new data acquisition system architecture which draws heavily from the communications industry, is totally parallel (i.e., without any bottlenecks), is capable of data rates of hundreds of GigaBytes per second from the detector and into an array of online processors (i.e., processor farm), and uses an open systems architecture to guarantee compatibility with future commercially available online processor farms. The main features of the system architecture are standard interface ICs to detector subsystems wherever possible, fiber optic digital data transmission from the near-detector electronics, a self-routing parallel event builder, and the use of industry-supported and high-level language programmable processors in the proposed BCD system for both triggers and online filters. A brief status report of an ongoing project at Fermilab to build the self-routing parallel event builder will also be given in the paper. 3 figs., 1 tab.

  7. Recursive star-tree parallel data structure. Technical report

    SciTech Connect

    Berkman, O.; Vishkin, U.

    1990-03-01

    The model of parallel computation that is used in this paper is the concurrent-read concurrent-write (CRCW) parallel random access machine (PRAM). We assume that several processors may attempt to write at the same memory location only if they are seeking to write the same value (the so called, Common CRCW PRAM). We use the weakest Common CRCW PRAM model, in which only concurrent writes of the value one are allowed. Given two parallel algorithms for the same problem one is more efficient than the other if: (1) primarily, its time-processor product is smaller, and (2) secondarily (but important), its parallel time is smaller. Optimal parallel algorithms are those with a linear time-processor product. A fully-parallel algorithm that runs in constant time using an optimal number of processors. An almost fully-parallel algorithm is a parallel algorithm that runs in alpha(n) (the inverse of Ackermann function) time using on optimal number of processors. The notion of fully-parallel algorithm represents an ultimate theoretical goal for designers of parallel algorithms. Research on lower bounds for parallel computation (see references later) indicates that for nearly any interesting problem this goal is unachievable. These same results also preclude almost fully-parallel algorithms for the same problems. Therefore, any result that approaches this goal is somewhat surprising.

  8. Fully automatic telemetry data processor

    NASA Technical Reports Server (NTRS)

    Cox, F. B.; Keipert, F. A.; Lee, R. C.

    1968-01-01

    Satellite Telemetry Automatic Reduction System /STARS 2/, a fully automatic computer-controlled telemetry data processor, maximizes data recovery, reduces turnaround time, increases flexibility, and improves operational efficiency. The system incorporates a CDC 3200 computer as its central element.

  9. Acousto-optic/CCD real-time SAR data processor

    NASA Technical Reports Server (NTRS)

    Psaltis, D.

    1983-01-01

    The SAR processor which uses an acousto-optic device as the input electronic-to-optical transducer and a 2-D CCD image sensor, which is operated in the time-delay-and-integrate (TDI) mode is presented. The CCD serves as the optical detector, and it simultaneously operates as an array of optically addressed correlators. The lines of the focused SAR image form continuously (at the radar PRF) at the final row of the CCD. The principles of operation of this processor, its performance characteristics, the state-of-the-art of the devices used and experimental results are outlined. The methods by which this processor can be made flexible so that it can be dynamically adapted to changing SAR geometries is discussed.

  10. Miniaturization of a micro-optics array for highly sensitive and parallel detection on an injection moulded lab-on-a-chip.

    PubMed

    Hung, Tran Quang; Sun, Yi; Poulsen, Carl Esben; Linh-Quyen, Than; Chin, Wai Hoe; Bang, Dang Duong; Wolff, Anders

    2015-06-01

    A miniaturised array of supercritical angle fluorescence (SAF) micro-optics embedded in a microfluidic chamber was fabricated by injection moulding. The fabricated chip could enhance the fluorescence signal around 46 times compared to a conventional microscope. Collection of the fluorescence signal from the SAF array is almost independent of the numerical aperture, and the limit of detection was improved 36-fold using a simple and inexpensive optical detection system. PMID:25912610

  11. Benchmarking NWP Kernels on Multi- and Many-core Processors

    NASA Astrophysics Data System (ADS)

    Michalakes, J.; Vachharajani, M.

    2008-12-01

    Increased computing power for weather, climate, and atmospheric science has provided direct benefits for defense, agriculture, the economy, the environment, and public welfare and convenience. Today, very large clusters with many thousands of processors are allowing scientists to move forward with simulations of unprecedented size. But time-critical applications such as real-time forecasting or climate prediction need strong scaling: faster nodes and processors, not more of them. Moreover, the need for good cost- performance has never been greater, both in terms of performance per watt and per dollar. For these reasons, the new generations of multi- and many-core processors being mass produced for commercial IT and "graphical computing" (video games) are being scrutinized for their ability to exploit the abundant fine- grain parallelism in atmospheric models. We present results of our work to date identifying key computational kernels within the dynamics and physics of a large community NWP model, the Weather Research and Forecast (WRF) model. We benchmark and optimize these kernels on several different multi- and many-core processors. The goals are to (1) characterize and model performance of the kernels in terms of computational intensity, data parallelism, memory bandwidth pressure, memory footprint, etc. (2) enumerate and classify effective strategies for coding and optimizing for these new processors, (3) assess difficulties and opportunities for tool or higher-level language support, and (4) establish a continuing set of kernel benchmarks that can be used to measure and compare effectiveness of current and future designs of multi- and many-core processors for weather and climate applications.

  12. Parallel Mapping Approaches for GNUMAP

    PubMed Central

    Clement, Nathan L.; Clement, Mark J.; Snell, Quinn; Johnson, W. Evan

    2013-01-01

    Mapping short next-generation reads to reference genomes is an important element in SNP calling and expression studies. A major limitation to large-scale whole-genome mapping is the large memory requirements for the algorithm and the long run-time necessary for accurate studies. Several parallel implementations have been performed to distribute memory on different processors and to equally share the processing requirements. These approaches are compared with respect to their memory footprint, load balancing, and accuracy. When using MPI with multi-threading, linear speedup can be achieved for up to 256 processors. PMID:23396612

  13. Asynchronous parallel status comparator

    DOEpatents

    Arnold, Jeffrey W. (828 Hickory Ridge Rd., Aiken, SC 29801); Hart, Mark M. (223 Limerick Dr., Aiken, SC 29803)

    1992-01-01

    Apparatus for matching asynchronously received signals and determining whether two or more out of a total number of possible signals match. The apparatus comprises, in one embodiment, an array of sensors positioned in discrete locations and in communication with one or more processors. The processors will receive signals if the sensors detect a change in the variable sensed from a nominal to a special condition and will transmit location information in the form of a digital data set to two or more receivers. The receivers collect, read, latch and acknowledge the data sets and forward them to decoders that produce an output signal for each data set received. The receivers also periodically reset the system following each scan of the sensor array. A comparator then determines if any two or more, as specified by the user, of the output signals corresponds to the same location. A sufficient number of matches produces a system output signal that activates a system to restore the array to its nominal condition.

  14. Asynchronous parallel status comparator

    DOEpatents

    Arnold, J.W.; Hart, M.M.

    1992-12-15

    Disclosed is an apparatus for matching asynchronously received signals and determining whether two or more out of a total number of possible signals match. The apparatus comprises, in one embodiment, an array of sensors positioned in discrete locations and in communication with one or more processors. The processors will receive signals if the sensors detect a change in the variable sensed from a nominal to a special condition and will transmit location information in the form of a digital data set to two or more receivers. The receivers collect, read, latch and acknowledge the data sets and forward them to decoders that produce an output signal for each data set received. The receivers also periodically reset the system following each scan of the sensor array. A comparator then determines if any two or more, as specified by the user, of the output signals corresponds to the same location. A sufficient number of matches produces a system output signal that activates a system to restore the array to its nominal condition. 4 figs.

  15. Parallel distributed free-space optoelectronic computer engine using flat plug-on-top optics package

    NASA Astrophysics Data System (ADS)

    Berger, Christoph; Ekman, Jeremy T.; Wang, Xiaoqing; Marchand, Philippe J.; Spaanenburg, Henk; Kiamilev, Fouad E.; Esener, Sadik C.

    2000-05-01

    We report about ongoing work on a free-space optical interconnect system, which will demonstrate a Fast Fourier Transformation calculation, distributed among six processor chips. Logically, the processors are arranged in two linear chains, where each element communicates optically with its nearest neighbors. Physically, the setup consists of a large motherboard, several multi-chip carrier modules, which hold the processor/driver chips and the optoelectronic chips (arrays of lasers and detectors), and several plug-on-top optics modules, which provide the optical links between the chip carrier modules. The system design tries to satisfy numerous constraints, such as compact size, potential for mass-production, suitability for large arrays (up to 1024 parallel channels), compatibility with standard electronics fabrication and packaging technology, potential for active misalignment compensation by integration MEMS technology, and suitability for testing different imaging topologies. We present the system architecture together with details of key components and modules, and report on first experiences with prototype modules of the setup.

  16. 40 CFR 791.45 - Processors.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 32 2014-07-01 2014-07-01 false Processors. 791.45 Section 791.45...) DATA REIMBURSEMENT Basis for Proposed Order 791.45 Processors. (a) Generally, processors will be... processors will have a responsibility to provide reimbursement directly to those paying for the testing:...

  17. 40 CFR 791.45 - Processors.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 33 2013-07-01 2013-07-01 false Processors. 791.45 Section 791.45...) DATA REIMBURSEMENT Basis for Proposed Order 791.45 Processors. (a) Generally, processors will be... processors will have a responsibility to provide reimbursement directly to those paying for the testing:...

  18. 40 CFR 791.45 - Processors.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 33 2012-07-01 2012-07-01 false Processors. 791.45 Section 791.45...) DATA REIMBURSEMENT Basis for Proposed Order 791.45 Processors. (a) Generally, processors will be... processors will have a responsibility to provide reimbursement directly to those paying for the testing:...

  19. 40 CFR 791.45 - Processors.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 32 2011-07-01 2011-07-01 false Processors. 791.45 Section 791.45...) DATA REIMBURSEMENT Basis for Proposed Order 791.45 Processors. (a) Generally, processors will be... processors will have a responsibility to provide reimbursement directly to those paying for the testing:...

  20. 40 CFR 791.45 - Processors.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 31 2010-07-01 2010-07-01 true Processors. 791.45 Section 791.45...) DATA REIMBURSEMENT Basis for Proposed Order 791.45 Processors. (a) Generally, processors will be... processors will have a responsibility to provide reimbursement directly to those paying for the testing:...