mesh-connected processor array: Topics by Science.gov

Sample records for mesh-connected processor array

Method and structure for skewed block-cyclic distribution of lower-dimensional data arrays in higher-dimensional processor grids

DOEpatents

Chatterjee, Siddhartha [Yorktown Heights, NY; Gunnels, John A [Brewster, NY

2011-11-08

A method and structure of distributing elements of an array of data in a computer memory to a specific processor of a multi-dimensional mesh of parallel processors includes designating a distribution of elements of at least a portion of the array to be executed by specific processors in the multi-dimensional mesh of parallel processors. The pattern of the designating includes a cyclical repetitive pattern of the parallel processor mesh, as modified to have a skew in at least one dimension so that both a row of data in the array and a column of data in the array map to respective contiguous groupings of the processors such that a dimension of the contiguous groupings is greater than one.
Solving very large, sparse linear systems on mesh-connected parallel computers

NASA Technical Reports Server (NTRS)

Opsahl, Torstein; Reif, John

1987-01-01

The implementation of Pan and Reif's Parallel Nested Dissection (PND) algorithm on mesh connected parallel computers is described. This is the first known algorithm that allows very large, sparse linear systems of equations to be solved efficiently in polylog time using a small number of processors. How the processor bound of PND can be matched to the number of processors available on a given parallel computer by slowing down the algorithm by constant factors is described. Also, for the important class of problems where G(A) is a grid graph, a unique memory mapping that reduces the inter-processor communication requirements of PND to those that can be executed on mesh connected parallel machines is detailed. A description of an implementation on the Goodyear Massively Parallel Processor (MPP), located at Goddard is given. Also, a detailed discussion of data mappings and performance issues is given.
Array processor architecture connection network

NASA Technical Reports Server (NTRS)

Barnes, George H. (Inventor); Lundstrom, Stephen F. (Inventor); Shafer, Philip E. (Inventor)

1982-01-01

A connection network is disclosed for use between a parallel array of processors and a parallel array of memory modules for establishing non-conflicting data communications paths between requested memory modules and requesting processors. The connection network includes a plurality of switching elements interposed between the processor array and the memory modules array in an Omega networking architecture. Each switching element includes a first and a second processor side port, a first and a second memory module side port, and control logic circuitry for providing data connections between the first and second processor ports and the first and second memory module ports. The control logic circuitry includes strobe logic for examining data arriving at the first and the second processor ports to indicate when the data arriving is requesting data from a requesting processor to a requested memory module. Further, connection circuitry is associated with the strobe logic for examining requesting data arriving at the first and the second processor ports for providing a data connection therefrom to the first and the second memory module ports in response thereto when the data connection so provided does not conflict with a pre-established data connection currently in use.
Digital system for structural dynamics simulation

NASA Technical Reports Server (NTRS)

Krauter, A. I.; Lagace, L. J.; Wojnar, M. K.; Glor, C.

1982-01-01

State-of-the-art digital hardware and software for the simulation of complex structural dynamic interactions, such as those which occur in rotating structures (engine systems). System were incorporated in a designed to use an array of processors in which the computation for each physical subelement or functional subsystem would be assigned to a single specific processor in the simulator. These node processors are microprogrammed bit-slice microcomputers which function autonomously and can communicate with each other and a central control minicomputer over parallel digital lines. Inter-processor nearest neighbor communications busses pass the constants which represent physical constraints and boundary conditions. The node processors are connected to the six nearest neighbor node processors to simulate the actual physical interface of real substructures. Computer generated finite element mesh and force models can be developed with the aid of the central control minicomputer. The control computer also oversees the animation of a graphics display system, disk-based mass storage along with the individual processing elements.
Interconnection arrangement of routers of processor boards in array of cabinets supporting secure physical partition

DOEpatents

Tomkins, James L [Albuquerque, NM; Camp, William J [Albuquerque, NM

2007-07-17

A multiple processor computing apparatus includes a physical interconnect structure that is flexibly configurable to support selective segregation of classified and unclassified users. The physical interconnect structure includes routers in service or compute processor boards distributed in an array of cabinets connected in series on each board and to respective routers in neighboring row cabinet boards with the routers in series connection coupled to routers in series connection in respective neighboring column cabinet boards. The array can include disconnect cabinets or respective routers in all boards in each cabinet connected in a toroid. The computing apparatus can include an emulator which permits applications from the same job to be launched on processors that use different operating systems.
A Navier-Strokes Chimera Code on the Connection Machine CM-5: Design and Performance

NASA Technical Reports Server (NTRS)

Jespersen, Dennis C.; Levit, Creon; Kwak, Dochan (Technical Monitor)

1994-01-01

We have implemented a three-dimensional compressible Navier-Stokes code on the Connection Machine CM-5. The code is set up for implicit time-stepping on single or multiple structured grids. For multiple grids and geometrically complex problems, we follow the 'chimera' approach, where flow data on one zone is interpolated onto another in the region of overlap. We will describe our design philosophy and give some timing results for the current code. A parallel machine like the CM-5 is well-suited for finite-difference methods on structured grids. The regular pattern of connections of a structured mesh maps well onto the architecture of the machine. So the first design choice, finite differences on a structured mesh, is natural. We use centered differences in space, with added artificial dissipation terms. When numerically solving the Navier-Stokes equations, there are liable to be some mesh cells near a solid body that are small in at least one direction. This mesh cell geometry can impose a very severe CFL (Courant-Friedrichs-Lewy) condition on the time step for explicit time-stepping methods. Thus, though explicit time-stepping is well-suited to the architecture of the machine, we have adopted implicit time-stepping. We have further taken the approximate factorization approach. This creates the need to solve large banded linear systems and creates the first possible barrier to an efficient algorithm. To overcome this first possible barrier we have considered two options. The first is just to solve the banded linear systems with data spread over the whole machine, using whatever fast method is available. This option is adequate for solving scalar tridiagonal systems, but for scalar pentadiagonal or block tridiagonal systems it is somewhat slower than desired. The second option is to 'transpose' the flow and geometry variables as part of the time-stepping process: Start with x-lines of data in-processor. Form explicit terms in x, then transpose so y-lines of data are in-processor. Form explicit terms in y, then transpose so z-lines are in processor. Form explicit terms in z, then solve linear systems in the z-direction. Transpose to the y-direction, then solve linear systems in the y-direction. Finally transpose to the x direction and solve linear systems in the x-direction. This strategy avoids inter-processor communication when differencing and solving linear systems, but requires a large amount of communication when doing the transposes. The transpose method is more efficient than the non-transpose strategy when dealing with scalar pentadiagonal or block tridiagonal systems. For handling geometrically complex problems the chimera strategy was adopted. For multiple zone cases we compute on each zone sequentially (using the whole parallel machine), then send the chimera interpolation data to a distributed data structure (array) laid out over the whole machine. This information transfer implies an irregular communication pattern, and is the second possible barrier to an efficient algorithm. We have implemented these ideas on the CM-5 using CMF (Connection Machine Fortran), a data parallel language which combines elements of Fortran 90 and certain extensions, and which bears a strong similarity to High Performance Fortran. We make use of the Connection Machine Scientific Software Library (CMSSL) for the linear solver and array transpose operations.
Fault-tolerant onboard digital information switching and routing for communications satellites

NASA Technical Reports Server (NTRS)

Shalkhauser, Mary JO; Quintana, Jorge A.; Soni, Nitin J.; Kim, Heechul

1993-01-01

The NASA Lewis Research Center is developing an information-switching processor for future meshed very-small-aperture terminal (VSAT) communications satellites. The information-switching processor will switch and route baseband user data onboard the VSAT satellite to connect thousands of Earth terminals. Fault tolerance is a critical issue in developing information-switching processor circuitry that will provide and maintain reliable communications services. In parallel with the conceptual development of the meshed VSAT satellite network architecture, NASA designed and built a simple test bed for developing and demonstrating baseband switch architectures and fault-tolerance techniques. The meshed VSAT architecture and the switching demonstration test bed are described, and the initial switching architecture and the fault-tolerance techniques that were developed and tested are discussed.
Unstructured Adaptive Grid Computations on an Array of SMPs

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Pramanick, Ira; Sohn, Andrew; Simon, Horst D.

1996-01-01

Dynamic load balancing is necessary for parallel adaptive methods to solve unsteady CFD problems on unstructured grids. We have presented such a dynamic load balancing framework called JOVE, in this paper. Results on a four-POWERnode POWER CHALLENGEarray demonstrated that load balancing gives significant performance improvements over no load balancing for such adaptive computations. The parallel speedup of JOVE, implemented using MPI on the POWER CHALLENCEarray, was significant, being as high as 31 for 32 processors. An implementation of JOVE that exploits 'an array of SMPS' architecture was also studied; this hybrid JOVE outperformed flat JOVE by up to 28% on the meshes and adaption models tested. With large, realistic meshes and actual flow-solver and adaption phases incorporated into JOVE, hybrid JOVE can be expected to yield significant advantage over flat JOVE, especially as the number of processors is increased, thus demonstrating the scalability of an array of SMPs architecture.
Array processor architecture

NASA Technical Reports Server (NTRS)

Barnes, George H. (Inventor); Lundstrom, Stephen F. (Inventor); Shafer, Philip E. (Inventor)

1983-01-01

A high speed parallel array data processing architecture fashioned under a computational envelope approach includes a data base memory for secondary storage of programs and data, and a plurality of memory modules interconnected to a plurality of processing modules by a connection network of the Omega gender. Programs and data are fed from the data base memory to the plurality of memory modules and from hence the programs are fed through the connection network to the array of processors (one copy of each program for each processor). Execution of the programs occur with the processors operating normally quite independently of each other in a multiprocessing fashion. For data dependent operations and other suitable operations, all processors are instructed to finish one given task or program branch before all are instructed to proceed in parallel processing fashion on the next instruction. Even when functioning in the parallel processing mode however, the processors are not locked-step but execute their own copy of the program individually unless or until another overall processor array synchronization instruction is issued.
A partitioning strategy for nonuniform problems on multiprocessors

NASA Technical Reports Server (NTRS)

Berger, M. J.; Bokhari, S.

1985-01-01

The partitioning of a problem on a domain with unequal work estimates in different subddomains is considered in a way that balances the work load across multiple processors. Such a problem arises for example in solving partial differential equations using an adaptive method that places extra grid points in certain subregions of the domain. A binary decomposition of the domain is used to partition it into rectangles requiring equal computational effort. The communication costs of mapping this partitioning onto different microprocessors: a mesh-connected array, a tree machine and a hypercube is then studied. The communication cost expressions can be used to determine the optimal depth of the above partitioning.
Preliminary study on the potential usefulness of array processor techniques for structural synthesis

NASA Technical Reports Server (NTRS)

Feeser, L. J.

1980-01-01

The effects of the use of array processor techniques within the structural analyzer program, SPAR, are simulated in order to evaluate the potential analysis speedups which may result. In particular the connection of a Floating Point System AP120 processor to the PRIME computer is discussed. Measurements of execution, input/output, and data transfer times are given. Using these data estimates are made as to the relative speedups that can be executed in a more complete implementation on an array processor maxi-mini computer system.
DFT algorithms for bit-serial GaAs array processor architectures

NASA Technical Reports Server (NTRS)

Mcmillan, Gary B.

1988-01-01

Systems and Processes Engineering Corporation (SPEC) has developed an innovative array processor architecture for computing Fourier transforms and other commonly used signal processing algorithms. This architecture is designed to extract the highest possible array performance from state-of-the-art GaAs technology. SPEC's architectural design includes a high performance RISC processor implemented in GaAs, along with a Floating Point Coprocessor and a unique Array Communications Coprocessor, also implemented in GaAs technology. Together, these data processors represent the latest in technology, both from an architectural and implementation viewpoint. SPEC has examined numerous algorithms and parallel processing architectures to determine the optimum array processor architecture. SPEC has developed an array processor architecture with integral communications ability to provide maximum node connectivity. The Array Communications Coprocessor embeds communications operations directly in the core of the processor architecture. A Floating Point Coprocessor architecture has been defined that utilizes Bit-Serial arithmetic units, operating at very high frequency, to perform floating point operations. These Bit-Serial devices reduce the device integration level and complexity to a level compatible with state-of-the-art GaAs device technology.
The Extension of Wireless Mesh Networks Via Vertical Takeoff and Landing Unmanned Aerial Vehicles

DTIC Science & Technology

2007-12-01

development. When connected to Crossbow’s Stargate Processor Board (SPB400) (See Figure 19), via the standard 51-pin connector, the MNAV100CA combines with...also be connected and processed by the Stargate to support intelligent robotics applications.22 22 UAV
Analog hardware for learning neural networks

NASA Technical Reports Server (NTRS)

Eberhardt, Silvio P. (Inventor)

1991-01-01

This is a recurrent or feedforward analog neural network processor having a multi-level neuron array and a synaptic matrix for storing weighted analog values of synaptic connection strengths which is characterized by temporarily changing one connection strength at a time to determine its effect on system output relative to the desired target. That connection strength is then adjusted based on the effect, whereby the processor is taught the correct response to training examples connection by connection.
Multipurpose silicon photonics signal processor core.

PubMed

Pérez, Daniel; Gasulla, Ivana; Crudgington, Lee; Thomson, David J; Khokhar, Ali Z; Li, Ke; Cao, Wei; Mashanovich, Goran Z; Capmany, José

2017-09-21

Integrated photonics changes the scaling laws of information and communication systems offering architectural choices that combine photonics with electronics to optimize performance, power, footprint, and cost. Application-specific photonic integrated circuits, where particular circuits/chips are designed to optimally perform particular functionalities, require a considerable number of design and fabrication iterations leading to long development times. A different approach inspired by electronic Field Programmable Gate Arrays is the programmable photonic processor, where a common hardware implemented by a two-dimensional photonic waveguide mesh realizes different functionalities through programming. Here, we report the demonstration of such reconfigurable waveguide mesh in silicon. We demonstrate over 20 different functionalities with a simple seven hexagonal cell structure, which can be applied to different fields including communications, chemical and biomedical sensing, signal processing, multiprocessor networks, and quantum information systems. Our work is an important step toward this paradigm.Integrated optical circuits today are typically designed for a few special functionalities and require complex design and development procedures. Here, the authors demonstrate a reconfigurable but simple silicon waveguide mesh with different functionalities.
Experimental demonstration of the optical multi-mesh hypercube: scaleable interconnection network for multiprocessors and multicomputers.

PubMed

Louri, A; Furlonge, S; Neocleous, C

1996-12-10

A prototype of a novel topology for scaleable optical interconnection networks called the optical multi-mesh hypercube (OMMH) is experimentally demonstrated to as high as a 150-Mbit/s data rate (2(7) - 1 nonreturn-to-zero pseudo-random data pattern) at a bit error rate of 10(-13)/link by the use of commercially available devices. OMMH is a scaleable network [Appl. Opt. 33, 7558 (1994); J. Lightwave Technol. 12, 704 (1994)] architecture that combines the positive features of the hypercube (small diameter, connectivity, symmetry, simple routing, and fault tolerance) and the mesh (constant node degree and size scaleability). The optical implementation method is divided into two levels: high-density local connections for the hypercube modules, and high-bit-rate, low-density, long connections for the mesh links connecting the hypercube modules. Free-space imaging systems utilizing vertical-cavity surface-emitting laser (VCSEL) arrays, lenslet arrays, space-invariant holographic techniques, and photodiode arrays are demonstrated for the local connections. Optobus fiber interconnects from Motorola are used for the long-distance connections. The OMMH was optimized to operate at the data rate of Motorola's Optobus (10-bit-wide, VCSEL-based bidirectional data interconnects at 150 Mbits/s). Difficulties encountered included the varying fan-out efficiencies of the different orders of the hologram, misalignment sensitivity of the free-space links, low power (1 mW) of the individual VCSEL's, and noise.
The AIS-5000 parallel processor

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schmitt, L.A.; Wilson, S.S.

1988-05-01

The AIS-5000 is a commercially available massively parallel processor which has been designed to operate in an industrial environment. It has fine-grained parallelism with up to 1024 processing elements arranged in a single-instruction multiple-data (SIMD) architecture. The processing elements are arranged in a one-dimensional chain that, for computer vision applications, can be as wide as the image itself. This architecture has superior cost/performance characteristics than two-dimensional mesh-connected systems. The design of the processing elements and their interconnections as well as the software used to program the system allow a wide variety of algorithms and applications to be implemented. In thismore » paper, the overall architecture of the system is described. Various components of the system are discussed, including details of the processing elements, data I/O pathways and parallel memory organization. A virtual two-dimensional model for programming image-based algorithms for the system is presented. This model is supported by the AIS-5000 hardware and software and allows the system to be treated as a full-image-size, two-dimensional, mesh-connected parallel processor. Performance bench marks are given for certain simple and complex functions.« less
Design of a MIMD neural network processor

NASA Astrophysics Data System (ADS)

Saeks, Richard E.; Priddy, Kevin L.; Pap, Robert M.; Stowell, S.

1994-03-01

The Accurate Automation Corporation (AAC) neural network processor (NNP) module is a fully programmable multiple instruction multiple data (MIMD) parallel processor optimized for the implementation of neural networks. The AAC NNP design fully exploits the intrinsic sparseness of neural network topologies. Moreover, by using a MIMD parallel processing architecture one can update multiple neurons in parallel with efficiency approaching 100 percent as the size of the network increases. Each AAC NNP module has 8 K neurons and 32 K interconnections and is capable of 140,000,000 connections per second with an eight processor array capable of over one billion connections per second.
Highly parallel reconfigurable computer architecture for robotic computation having plural processor cells each having right and left ensembles of plural processors

NASA Technical Reports Server (NTRS)

Fijany, Amir (Inventor); Bejczy, Antal K. (Inventor)

1994-01-01

In a computer having a large number of single-instruction multiple data (SIMD) processors, each of the SIMD processors has two sets of three individual processor elements controlled by a master control unit and interconnected among a plurality of register file units where data is stored. The register files input and output data in synchronism with a minor cycle clock under control of two slave control units controlling the register file units connected to respective ones of the two sets of processor elements. Depending upon which ones of the register file units are enabled to store or transmit data during a particular minor clock cycle, the processor elements within an SIMD processor are connected in rings or in pipeline arrays, and may exchange data with the internal bus or with neighboring SIMD processors through interface units controlled by respective ones of the two slave control units.
Sequence information signal processor

DOEpatents

Peterson, John C.; Chow, Edward T.; Waterman, Michael S.; Hunkapillar, Timothy J.

1999-01-01

An electronic circuit is used to compare two sequences, such as genetic sequences, to determine which alignment of the sequences produces the greatest similarity. The circuit includes a linear array of series-connected processors, each of which stores a single element from one of the sequences and compares that element with each successive element in the other sequence. For each comparison, the processor generates a scoring parameter that indicates which segment ending at those two elements produces the greatest degree of similarity between the sequences. The processor uses the scoring parameter to generate a similar scoring parameter for a comparison between the stored element and the next successive element from the other sequence. The processor also delivers the scoring parameter to the next processor in the array for use in generating a similar scoring parameter for another pair of elements. The electronic circuit determines which processor and alignment of the sequences produce the scoring parameter with the highest value.

Multitasking for flows about multiple body configurations using the chimera grid scheme

NASA Technical Reports Server (NTRS)

Dougherty, F. C.; Morgan, R. L.

1987-01-01

The multitasking of a finite-difference scheme using multiple overset meshes is described. In this chimera, or multiple overset mesh approach, a multiple body configuration is mapped using a major grid about the main component of the configuration, with minor overset meshes used to map each additional component. This type of code is well suited to multitasking. Both steady and unsteady two dimensional computations are run on parallel processors on a CRAY-X/MP 48, usually with one mesh per processor. Flow field results are compared with single processor results to demonstrate the feasibility of running multiple mesh codes on parallel processors and to show the increase in efficiency.
Reconfigurable lattice mesh designs for programmable photonic processors.

PubMed

Pérez, Daniel; Gasulla, Ivana; Capmany, José; Soref, Richard A

2016-05-30

We propose and analyse two novel mesh design geometries for the implementation of tunable optical cores in programmable photonic processors. These geometries are the hexagonal and the triangular lattice. They are compared here to a previously proposed square mesh topology in terms of a series of figures of merit that account for metrics that are relevant to on-chip integration of the mesh. We find that that the hexagonal mesh is the most suitable option of the three considered for the implementation of the reconfigurable optical core in the programmable processor.
Optimal expression evaluation for data parallel architectures

NASA Technical Reports Server (NTRS)

Gilbert, John R.; Schreiber, Robert

1990-01-01

A data parallel machine represents an array or other composite data structure by allocating one processor (at least conceptually) per data item. A pointwise operation can be performed between two such arrays in unit time, provided their corresponding elements are allocated in the same processors. If the arrays are not aligned in this fashion, the cost of moving one or both of them is part of the cost of the operation. The choice of where to perform the operation then affects this cost. If an expression with several operands is to be evaluated, there may be many choices of where to perform the intermediate operations. An efficient algorithm is given to find the minimum-cost way to evaluate an expression, for several different data parallel architectures. This algorithm applies to any architecture in which the metric describing the cost of moving an array is robust. This encompasses most of the common data parallel communication architectures, including meshes of arbitrary dimension and hypercubes. Remarks are made on several variations of the problem, some of which are solved and some of which remain open.
A class of parallel algorithms for computation of the manipulator inertia matrix

NASA Technical Reports Server (NTRS)

Fijany, Amir; Bejczy, Antal K.

1989-01-01

Parallel and parallel/pipeline algorithms for computation of the manipulator inertia matrix are presented. An algorithm based on composite rigid-body spatial inertia method, which provides better features for parallelization, is used for the computation of the inertia matrix. Two parallel algorithms are developed which achieve the time lower bound in computation. Also described is the mapping of these algorithms with topological variation on a two-dimensional processor array, with nearest-neighbor connection, and with cardinality variation on a linear processor array. An efficient parallel/pipeline algorithm for the linear array was also developed, but at significantly higher efficiency.
FFT Computation with Systolic Arrays, A New Architecture

NASA Technical Reports Server (NTRS)

Boriakoff, Valentin

1994-01-01

The use of the Cooley-Tukey algorithm for computing the l-d FFT lends itself to a particular matrix factorization which suggests direct implementation by linearly-connected systolic arrays. Here we present a new systolic architecture that embodies this algorithm. This implementation requires a smaller number of processors and a smaller number of memory cells than other recent implementations, as well as having all the advantages of systolic arrays. For the implementation of the decimation-in-frequency case, word-serial data input allows continuous real-time operation without the need of a serial-to-parallel conversion device. No control or data stream switching is necessary. Computer simulation of this architecture was done in the context of a 1024 point DFT with a fixed point processor, and CMOS processor implementation has started.
Global Load Balancing with Parallel Mesh Adaption on Distributed-Memory Systems

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Oliker, Leonid; Sohn, Andrew

1996-01-01

Dynamic mesh adaption on unstructured grids is a powerful tool for efficiently computing unsteady problems to resolve solution features of interest. Unfortunately, this causes load imbalance among processors on a parallel machine. This paper describes the parallel implementation of a tetrahedral mesh adaption scheme and a new global load balancing method. A heuristic remapping algorithm is presented that assigns partitions to processors such that the redistribution cost is minimized. Results indicate that the parallel performance of the mesh adaption code depends on the nature of the adaption region and show a 35.5X speedup on 64 processors of an SP2 when 35% of the mesh is randomly adapted. For large-scale scientific computations, our load balancing strategy gives almost a sixfold reduction in solver execution times over non-balanced loads. Furthermore, our heuristic remapper yields processor assignments that are less than 3% off the optimal solutions but requires only 1% of the computational time.
A new procedure for dynamic adaption of three-dimensional unstructured grids

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Strawn, Roger

1993-01-01

A new procedure is presented for the simultaneous coarsening and refinement of three-dimensional unstructured tetrahedral meshes. This algorithm allows for localized grid adaption that is used to capture aerodynamic flow features such as vortices and shock waves in helicopter flowfield simulations. The mesh-adaption algorithm is implemented in the C programming language and uses a data structure consisting of a series of dynamically-allocated linked lists. These lists allow the mesh connectivity to be rapidly reconstructed when individual mesh points are added and/or deleted. The algorithm allows the mesh to change in an anisotropic manner in order to efficiently resolve directional flow features. The procedure has been successfully implemented on a single processor of a Cray Y-MP computer. Two sample cases are presented involving three-dimensional transonic flow. Computed results show good agreement with conventional structured-grid solutions for the Euler equations.
CTF Preprocessor User's Manual

DOE Office of Scientific and Technical Information (OSTI.GOV)

Avramova, Maria; Salko, Robert K.

2016-05-26

This document describes how a user should go about using the CTF pre- processor tool to create an input deck for modeling rod-bundle geometry in CTF. The tool was designed to generate input decks in a quick and less error-prone manner for CTF. The pre-processor is a completely independent utility, written in Fortran, that takes a reduced amount of input from the user. The information that the user must supply is basic information on bundle geometry, such as rod pitch, clad thickness, and axial location of spacer grids--the pre-processor takes this basic information and determines channel placement and connection informationmore » to be written to the input deck, which is the most time-consuming and error-prone segment of creating a deck. Creation of the model is also more intuitive, as the user can specify assembly and water-tube placement using visual maps instead of having to place them by determining channel/channel and rod/channel connections. As an example of the benefit of the pre-processor, a quarter-core model that contains 500,000 scalar-mesh cells was read into CTF from an input deck containing 200,000 lines of data. This 200,000 line input deck was produced automatically from a set of pre-processor decks that contained only 300 lines of data.« less
Global Load Balancing with Parallel Mesh Adaption on Distributed-Memory Systems

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Oliker, Leonid; Sohn, Andrew

1996-01-01

Dynamic mesh adaptation on unstructured grids is a powerful tool for efficiently computing unsteady problems to resolve solution features of interest. Unfortunately, this causes load inbalances among processors on a parallel machine. This paper described the parallel implementation of a tetrahedral mesh adaption scheme and a new global load balancing method. A heuristic remapping algorithm is presented that assigns partitions to processors such that the redistribution coast is minimized. Results indicate that the parallel performance of the mesh adaption code depends on the nature of the adaption region and show a 35.5X speedup on 64 processors of an SP2 when 35 percent of the mesh is randomly adapted. For large scale scientific computations, our load balancing strategy gives an almost sixfold reduction in solver execution times over non-balanced loads. Furthermore, our heuristic remappier yields processor assignments that are less than 3 percent of the optimal solutions, but requires only 1 percent of the computational time.
Load Balancing Unstructured Adaptive Grids for CFD Problems

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Oliker, Leonid

1996-01-01

Mesh adaption is a powerful tool for efficient unstructured-grid computations but causes load imbalance among processors on a parallel machine. A dynamic load balancing method is presented that balances the workload across all processors with a global view. After each parallel tetrahedral mesh adaption, the method first determines if the new mesh is sufficiently unbalanced to warrant a repartitioning. If so, the adapted mesh is repartitioned, with new partitions assigned to processors so that the redistribution cost is minimized. The new partitions are accepted only if the remapping cost is compensated by the improved load balance. Results indicate that this strategy is effective for large-scale scientific computations on distributed-memory multiprocessors.
High-Order Methods for Computational Physics

DTIC Science & Technology

1999-03-01

computation is running in 278 Ronald D. Henderson parallel. Instead we use the concept of a voxel database (VDB) of geometric positions in the mesh [85...processor 0 Fig. 4.19. Connectivity and communications axe established by building a voxel database (VDB) of positions. A VDB maps each position to a...studies such as the highly accurate stability computations considered help expand the database for this benchmark problem. The two-dimensional linear
Silicon-fiber blanket solar-cell array concept

NASA Technical Reports Server (NTRS)

Eliason, J. T.

1973-01-01

Proposed economical manufacture of solar-cell arrays involves parallel, planar weaving of filaments made of doped silicon fibers with diffused radial junction. Each filament is a solar cell connected either in series or parallel with others to form a blanket of deposited grids or attached electrode wire mesh screens.
Efficient Load Balancing and Data Remapping for Adaptive Grid Calculations

NASA Technical Reports Server (NTRS)

Oliker, Leonid; Biswas, Rupak

1997-01-01

Mesh adaption is a powerful tool for efficient unstructured- grid computations but causes load imbalance among processors on a parallel machine. We present a novel method to dynamically balance the processor workloads with a global view. This paper presents, for the first time, the implementation and integration of all major components within our dynamic load balancing strategy for adaptive grid calculations. Mesh adaption, repartitioning, processor assignment, and remapping are critical components of the framework that must be accomplished rapidly and efficiently so as not to cause a significant overhead to the numerical simulation. Previous results indicated that mesh repartitioning and data remapping are potential bottlenecks for performing large-scale scientific calculations. We resolve these issues and demonstrate that our framework remains viable on a large number of processors.
Multi-mode sensor processing on a dynamically reconfigurable massively parallel processor array

NASA Astrophysics Data System (ADS)

Chen, Paul; Butts, Mike; Budlong, Brad; Wasson, Paul

2008-04-01

This paper introduces a novel computing architecture that can be reconfigured in real time to adapt on demand to multi-mode sensor platforms' dynamic computational and functional requirements. This 1 teraOPS reconfigurable Massively Parallel Processor Array (MPPA) has 336 32-bit processors. The programmable 32-bit communication fabric provides streamlined inter-processor connections with deterministically high performance. Software programmability, scalability, ease of use, and fast reconfiguration time (ranging from microseconds to milliseconds) are the most significant advantages over FPGAs and DSPs. This paper introduces the MPPA architecture, its programming model, and methods of reconfigurability. An MPPA platform for reconfigurable computing is based on a structural object programming model. Objects are software programs running concurrently on hundreds of 32-bit RISC processors and memories. They exchange data and control through a network of self-synchronizing channels. A common application design pattern on this platform, called a work farm, is a parallel set of worker objects, with one input and one output stream. Statically configured work farms with homogeneous and heterogeneous sets of workers have been used in video compression and decompression, network processing, and graphics applications.
Array architectures for iterative algorithms

NASA Technical Reports Server (NTRS)

Jagadish, Hosagrahar V.; Rao, Sailesh K.; Kailath, Thomas

1987-01-01

Regular mesh-connected arrays are shown to be isomorphic to a class of so-called regular iterative algorithms. For a wide variety of problems it is shown how to obtain appropriate iterative algorithms and then how to translate these algorithms into arrays in a systematic fashion. Several 'systolic' arrays presented in the literature are shown to be specific cases of the variety of architectures that can be derived by the techniques presented here. These include arrays for Fourier Transform, Matrix Multiplication, and Sorting.
Runtime support and compilation methods for user-specified data distributions

NASA Technical Reports Server (NTRS)

Ponnusamy, Ravi; Saltz, Joel; Choudhury, Alok; Hwang, Yuan-Shin; Fox, Geoffrey

1993-01-01

This paper describes two new ideas by which an HPF compiler can deal with irregular computations effectively. The first mechanism invokes a user specified mapping procedure via a set of compiler directives. The directives allow use of program arrays to describe graph connectivity, spatial location of array elements, and computational load. The second mechanism is a simple conservative method that in many cases enables a compiler to recognize that it is possible to reuse previously computed information from inspectors (e.g. communication schedules, loop iteration partitions, information that associates off-processor data copies with on-processor buffer locations). We present performance results for these mechanisms from a Fortran 90D compiler implementation.
PLUM: Parallel Load Balancing for Adaptive Unstructured Meshes

NASA Technical Reports Server (NTRS)

Oliker, Leonid; Biswas, Rupak; Saini, Subhash (Technical Monitor)

1998-01-01

Mesh adaption is a powerful tool for efficient unstructured-grid computations but causes load imbalance among processors on a parallel machine. We present a novel method called PLUM to dynamically balance the processor workloads with a global view. This paper presents the implementation and integration of all major components within our dynamic load balancing strategy for adaptive grid calculations. Mesh adaption, repartitioning, processor assignment, and remapping are critical components of the framework that must be accomplished rapidly and efficiently so as not to cause a significant overhead to the numerical simulation. A data redistribution model is also presented that predicts the remapping cost on the SP2. This model is required to determine whether the gain from a balanced workload distribution offsets the cost of data movement. Results presented in this paper demonstrate that PLUM is an effective dynamic load balancing strategy which remains viable on a large number of processors.
Parallel implementation of an adaptive scheme for 3D unstructured grids on the SP2

NASA Technical Reports Server (NTRS)

Strawn, Roger C.; Oliker, Leonid; Biswas, Rupak

1996-01-01

Dynamic mesh adaption on unstructured grids is a powerful tool for computing unsteady flows that require local grid modifications to efficiently resolve solution features. For this work, we consider an edge-based adaption scheme that has shown good single-processor performance on the C90. We report on our experience parallelizing this code for the SP2. Results show a 47.0X speedup on 64 processors when 10 percent of the mesh is randomly refined. Performance deteriorates to 7.7X when the same number of edges are refined in a highly-localized region. This is because almost all the mesh adaption is confined to a single processor. However, this problem can be remedied by repartitioning the mesh immediately after targeting edges for refinement but before the actual adaption takes place. With this change, the speedup improves dramatically to 43.6X.
Parallel Implementation of an Adaptive Scheme for 3D Unstructured Grids on the SP2

NASA Technical Reports Server (NTRS)

Oliker, Leonid; Biswas, Rupak; Strawn, Roger C.

1996-01-01

Dynamic mesh adaption on unstructured grids is a powerful tool for computing unsteady flows that require local grid modifications to efficiently resolve solution features. For this work, we consider an edge-based adaption scheme that has shown good single-processor performance on the C90. We report on our experience parallelizing this code for the SP2. Results show a 47.OX speedup on 64 processors when 10% of the mesh is randomly refined. Performance deteriorates to 7.7X when the same number of edges are refined in a highly-localized region. This is because almost all mesh adaption is confined to a single processor. However, this problem can be remedied by repartitioning the mesh immediately after targeting edges for refinement but before the actual adaption takes place. With this change, the speedup improves dramatically to 43.6X.
Fluid leakage detector for vacuum applications

NASA Technical Reports Server (NTRS)

Nguyen, Bich Ngoc (Inventor); Farkas, Tibor (Inventor); Kim, Brian Byungkyu (Inventor)

2002-01-01

A leak detection system for use with a fluid conducting system in a vacuum environment, such as space, is described. The system preferably includes a mesh-like member substantially disposed about the fluid conducting system, and at least one sensor disposed within the mesh-like member. The sensor is capable of detecting a decrease in temperature of the mesh-like member when a leak condition causes the fluid of the fluid conducting system to freeze when exposed to the vacuum environment. Additionally, a signal processor in preferably in communication with the sensor. The sensor transmits an electrical signal to the signal processor such that the signal processor is capable of indicating the location of the fluid leak in the fluid conducting system.

Performance characteristics of a nanoscale double-gate reconfigurable array

NASA Astrophysics Data System (ADS)

Beckett, Paul

2008-12-01

The double gate transistor is a promising device applicable to deep sub-micron design due to its inherent resistance to short-channel effects and superior subthreshold performance. Using both TCAD and SPICE circuit simulation, it is shown that the characteristics of fully depleted dual-gate thin-body Schottky barrier silicon transistors will not only uncouple the conflicting requirements of high performance and low standby power in digital logic, but will also allow the development of a locally-connected reconfigurable computing mesh. The magnitude of the threshold shift effect will scale with device dimensions and will remain compatible with oxide reliability constraints. A field-programmable architecture based on the double gate transistor is described in which the operating point of the circuit is biased via one gate while the other gate is used to form the logic array, such that complex heterogeneous computing functions may be developed from this homogeneous, mesh-connected organization.
Large-N in Volcano Settings: Volcanosri

NASA Astrophysics Data System (ADS)

Lees, J. M.; Song, W.; Xing, G.; Vick, S.; Phillips, D.

2014-12-01

We seek a paradigm shift in the approach we take on volcano monitoring where the compromise from high fidelity to large numbers of sensors is used to increase coverage and resolution. Accessibility, danger and the risk of equipment loss requires that we develop systems that are independent and inexpensive. Furthermore, rather than simply record data on hard disk for later analysis we desire a system that will work autonomously, capitalizing on wireless technology and in field network analysis. To this end we are currently producing a low cost seismic array which will incorporate, at the very basic level, seismological tools for first cut analysis of a volcano in crises mode. At the advanced end we expect to perform tomographic inversions in the network in near real time. Geophone (4 Hz) sensors connected to a low cost recording system will be installed on an active volcano where triggering earthquake location and velocity analysis will take place independent of human interaction. Stations are designed to be inexpensive and possibly disposable. In one of the first implementations the seismic nodes consist of an Arduino Due processor board with an attached Seismic Shield. The Arduino Due processor board contains an Atmel SAM3X8E ARM Cortex-M3 CPU. This 32 bit 84 MHz processor can filter and perform coarse seismic event detection on a 1600 sample signal in fewer than 200 milliseconds. The Seismic Shield contains a GPS module, 900 MHz high power mesh network radio, SD card, seismic amplifier, and 24 bit ADC. External sensors can be attached to either this 24-bit ADC or to the internal multichannel 12 bit ADC contained on the Arduino Due processor board. This allows the node to support attachment of multiple sensors. By utilizing a high-speed 32 bit processor complex signal processing tasks can be performed simultaneously on multiple sensors. Using a 10 W solar panel, second system being developed can run autonomously and collect data on 3 channels at 100Hz for 6 months with the installed 16Gb SD card. Initial designs and test results will be presented and discussed.
A Novel Coarsening Method for Scalable and Efficient Mesh Generation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yoo, A; Hysom, D; Gunney, B

2010-12-02

In this paper, we propose a novel mesh coarsening method called brick coarsening method. The proposed method can be used in conjunction with any graph partitioners and scales to very large meshes. This method reduces problem space by decomposing the original mesh into fixed-size blocks of nodes called bricks, layered in a similar way to conventional brick laying, and then assigning each node of the original mesh to appropriate brick. Our experiments indicate that the proposed method scales to very large meshes while allowing simple RCB partitioner to produce higher-quality partitions with significantly less edge cuts. Our results further indicatemore » that the proposed brick-coarsening method allows more complicated partitioners like PT-Scotch to scale to very large problem size while still maintaining good partitioning performance with relatively good edge-cut metric. Graph partitioning is an important problem that has many scientific and engineering applications in such areas as VLSI design, scientific computing, and resource management. Given a graph G = (V,E), where V is the set of vertices and E is the set of edges, (k-way) graph partitioning problem is to partition the vertices of the graph (V) into k disjoint groups such that each group contains roughly equal number of vertices and the number of edges connecting vertices in different groups is minimized. Graph partitioning plays a key role in large scientific computing, especially in mesh-based computations, as it is used as a tool to minimize the volume of communication and to ensure well-balanced load across computing nodes. The impact of graph partitioning on the reduction of communication can be easily seen, for example, in different iterative methods to solve a sparse system of linear equation. Here, a graph partitioning technique is applied to the matrix, which is basically a graph in which each edge is a non-zero entry in the matrix, to allocate groups of vertices to processors in such a way that many of matrix-vector multiplication can be performed locally on each processor and hence to minimize communication. Furthermore, a good graph partitioning scheme ensures the equal amount of computation performed on each processor. Graph partitioning is a well known NP-complete problem, and thus the most commonly used graph partitioning algorithms employ some forms of heuristics. These algorithms vary in terms of their complexity, partition generation time, and the quality of partitions, and they tend to trade off these factors. A significant challenge we are currently facing at the Lawrence Livermore National Laboratory is how to partition very large meshes on massive-size distributed memory machines like IBM BlueGene/P, where scalability becomes a big issue. For example, we have found that the ParMetis, a very popular graph partitioning tool, can only scale to 16K processors. An ideal graph partitioning method on such an environment should be fast and scale to very large meshes, while producing high quality partitions. This is an extremely challenging task, as to scale to that level, the partitioning algorithm should be simple and be able to produce partitions that minimize inter-processor communications and balance the load imposed on the processors. Our goals in this work are two-fold: (1) To develop a new scalable graph partitioning method with good load balancing and communication reduction capability. (2) To study the performance of the proposed partitioning method on very large parallel machines using actual data sets and compare the performance to that of existing methods. The proposed method achieves the desired scalability by reducing the mesh size. For this, it coarsens an input mesh into a smaller size mesh by coalescing the vertices and edges of the original mesh into a set of mega-vertices and mega-edges. A new coarsening method called brick algorithm is developed in this research. In the brick algorithm, the zones in a given mesh are first grouped into fixed size blocks called bricks. These brick are then laid in a way similar to conventional brick laying technique, which reduces the number of neighboring blocks each block needs to communicate. Contributions of this research are as follows: (1) We have developed a novel method that scales to a really large problem size while producing high quality mesh partitions; (2) We measured the performance and scalability of the proposed method on a machine of massive size using a set of actual large complex data sets, where we have scaled to a mesh with 110 million zones using our method. To the best of our knowledge, this is the largest complex mesh that a partitioning method is successfully applied to; and (3) We have shown that proposed method can reduce the number of edge cuts by as much as 65%.« less
Impact of Load Balancing on Unstructured Adaptive Grid Computations for Distributed-Memory Multiprocessors

NASA Technical Reports Server (NTRS)

Sohn, Andrew; Biswas, Rupak; Simon, Horst D.

1996-01-01

The computational requirements for an adaptive solution of unsteady problems change as the simulation progresses. This causes workload imbalance among processors on a parallel machine which, in turn, requires significant data movement at runtime. We present a new dynamic load-balancing framework, called JOVE, that balances the workload across all processors with a global view. Whenever the computational mesh is adapted, JOVE is activated to eliminate the load imbalance. JOVE has been implemented on an IBM SP2 distributed-memory machine in MPI for portability. Experimental results for two model meshes demonstrate that mesh adaption with load balancing gives more than a sixfold improvement over one without load balancing. We also show that JOVE gives a 24-fold speedup on 64 processors compared to sequential execution.
Smart-Pixel Array Processors Based on Optimal Cellular Neural Networks for Space Sensor Applications

NASA Technical Reports Server (NTRS)

Fang, Wai-Chi; Sheu, Bing J.; Venus, Holger; Sandau, Rainer

1997-01-01

A smart-pixel cellular neural network (CNN) with hardware annealing capability, digitally programmable synaptic weights, and multisensor parallel interface has been under development for advanced space sensor applications. The smart-pixel CNN architecture is a programmable multi-dimensional array of optoelectronic neurons which are locally connected with their local neurons and associated active-pixel sensors. Integration of the neuroprocessor in each processor node of a scalable multiprocessor system offers orders-of-magnitude computing performance enhancements for on-board real-time intelligent multisensor processing and control tasks of advanced small satellites. The smart-pixel CNN operation theory, architecture, design and implementation, and system applications are investigated in detail. The VLSI (Very Large Scale Integration) implementation feasibility was illustrated by a prototype smart-pixel 5x5 neuroprocessor array chip of active dimensions 1380 micron x 746 micron in a 2-micron CMOS technology.
Turbine component cooling channel mesh with intersection chambers

DOEpatents

Lee, Ching-Pang; Marra, John J

2014-05-06

A mesh (35) of cooling channels (35A, 35B) with an array of cooling channel intersections (42) in a wall (21, 22) of a turbine component. A mixing chamber (42A-C) at each intersection is wider (W1, W2)) than a width (W) of each of the cooling channels connected to the mixing chamber. The mixing chamber promotes swirl, and slows the coolant for more efficient and uniform cooling. A series of cooling meshes (M1, M2) may be separated by mixing manifolds (44), which may have film cooling holes (46) and/or coolant refresher holes (48).
Grouper: A Compact, Streamable Triangle Mesh Data Structure.

PubMed

Luffel, Mark; Gurung, Topraj; Lindstrom, Peter; Rossignac, Jarek

2013-05-08

We present Grouper: an all-in-one compact file format, random-access data structure, and streamable representation for large triangle meshes. Similarly to the recently published SQuad representation, Grouper represents the geometry and connectivity of a mesh by grouping vertices and triangles into fixed-size records, most of which store two adjacent triangles and a shared vertex. Unlike SQuad, however, Grouper interleaves geometry with connectivity and uses a new connectivity representation to ensure that vertices and triangles can be stored in a coherent order that enables memory-efficient sequential stream processing. We present a linear-time construction algorithm that allows streaming out Grouper meshes using a small memory footprint while preserving the initial ordering of vertices. As part of this construction, we show how the problem of assigning vertices and triangles to groups reduces to a well-known NP-hard optimization problem, and present a simple yet effective heuristic solution that performs well in practice. Our array-based Grouper representation also doubles as a triangle mesh data structure that allows direct access to vertices and triangles. Storing only about two integer references per triangle, Grouper answers both incidence and adjacency queries in amortized constant time. Our compact representation enables data-parallel processing on multicore computers, instant partitioning and fast transmission for distributed processing, as well as efficient out-of-core access.
Sequence information signal processor for local and global string comparisons

DOEpatents

Peterson, John C.; Chow, Edward T.; Waterman, Michael S.; Hunkapillar, Timothy J.

1997-01-01

A sequence information signal processing integrated circuit chip designed to perform high speed calculation of a dynamic programming algorithm based upon the algorithm defined by Waterman and Smith. The signal processing chip of the present invention is designed to be a building block of a linear systolic array, the performance of which can be increased by connecting additional sequence information signal processing chips to the array. The chip provides a high speed, low cost linear array processor that can locate highly similar global sequences or segments thereof such as contiguous subsequences from two different DNA or protein sequences. The chip is implemented in a preferred embodiment using CMOS VLSI technology to provide the equivalent of about 400,000 transistors or 100,000 gates. Each chip provides 16 processing elements, and is designed to provide 16 bit, two's compliment operation for maximum score precision of between -32,768 and +32,767. It is designed to provide a comparison between sequences as long as 4,194,304 elements without external software and between sequences of unlimited numbers of elements with the aid of external software. Each sequence can be assigned different deletion and insertion weight functions. Each processor is provided with a similarity measure device which is independently variable. Thus, each processor can contribute to maximum value score calculation using a different similarity measure.
APRON: A Cellular Processor Array Simulation and Hardware Design Tool

NASA Astrophysics Data System (ADS)

Barr, David R. W.; Dudek, Piotr

2009-12-01

We present a software environment for the efficient simulation of cellular processor arrays (CPAs). This software (APRON) is used to explore algorithms that are designed for massively parallel fine-grained processor arrays, topographic multilayer neural networks, vision chips with SIMD processor arrays, and related architectures. The software uses a highly optimised core combined with a flexible compiler to provide the user with tools for the design of new processor array hardware architectures and the emulation of existing devices. We present performance benchmarks for the software processor array implemented on standard commodity microprocessors. APRON can be configured to use additional processing hardware if necessary and can be used as a complete graphical user interface and development environment for new or existing CPA systems, allowing more users to develop algorithms for CPA systems.
A comparative study of an ABC and an artificial absorber for truncating finite element meshes

NASA Technical Reports Server (NTRS)

Oezdemir, T.; Volakis, John L.

1993-01-01

The type of mesh termination used in the context of finite element formulations plays a major role on the efficiency and accuracy of the field solution. The performance of an absorbing boundary condition (ABC) and an artificial absorber (a new concept) for terminating the finite element mesh was evaluated. This analysis is done in connection with the problem of scattering by a finite slot array in a thick ground plane. The two approximate mesh truncation schemes are compared with the exact finite element-boundary integral (FEM-BI) method in terms of accuracy and efficiency. It is demonstrated that both approximate truncation schemes yield reasonably accurate results even when the mesh is extended only 0.3 wavelengths away from the array aperture. However, the artificial absorber termination method leads to a substantially more efficient solution. Moreover, it is shown that the FEM-BI method remains quite competitive with the FEM-artificial absorber method when the FFT is used for computing the matrix-vector products in the iterative solution algorithm. These conclusions are indeed surprising and of major importance in electromagnetic simulations based on the finite element method.
JPRS Report, Science & Technology, China, High-Performance Computer Systems

DTIC Science & Technology

1992-10-28

microprocessor array The microprocessor array in the AP85 system is com- posed of 16 completely identical array element micro - processors . Each array element...microprocessors and capable of host machine reading and writing. The memory capacity of the array element micro - processors as a whole can be expanded...transmission functions to carry out data transmission from array element micro - processor to array element microprocessor, from array element
Latency Hiding in Dynamic Partitioning and Load Balancing of Grid Computing Applications

NASA Technical Reports Server (NTRS)

Das, Sajal K.; Harvey, Daniel J.; Biswas, Rupak

2001-01-01

The Information Power Grid (IPG) concept developed by NASA is aimed to provide a metacomputing platform for large-scale distributed computations, by hiding the intricacies of highly heterogeneous environment and yet maintaining adequate security. In this paper, we propose a latency-tolerant partitioning scheme that dynamically balances processor workloads on the.IPG, and minimizes data movement and runtime communication. By simulating an unsteady adaptive mesh application on a wide area network, we study the performance of our load balancer under the Globus environment. The number of IPG nodes, the number of processors per node, and the interconnected speeds are parameterized to derive conditions under which the IPG would be suitable for parallel distributed processing of such applications. Experimental results demonstrate that effective solution are achieved when the IPG nodes are connected by a high-speed asynchronous interconnection network.
The Use of a Microcomputer Based Array Processor for Real Time Laser Velocimeter Data Processing

NASA Technical Reports Server (NTRS)

Meyers, James F.

1990-01-01

The application of an array processor to laser velocimeter data processing is presented. The hardware is described along with the method of parallel programming required by the array processor. A portion of the data processing program is described in detail. The increase in computational speed of a microcomputer equipped with an array processor is illustrated by comparative testing with a minicomputer.
Implementing Access to Data Distributed on Many Processors

NASA Technical Reports Server (NTRS)

James, Mark

2006-01-01

A reference architecture is defined for an object-oriented implementation of domains, arrays, and distributions written in the programming language Chapel. This technology primarily addresses domains that contain arrays that have regular index sets with the low-level implementation details being beyond the scope of this discussion. What is defined is a complete set of object-oriented operators that allows one to perform data distributions for domain arrays involving regular arithmetic index sets. What is unique is that these operators allow for the arbitrary regions of the arrays to be fragmented and distributed across multiple processors with a single point of access giving the programmer the illusion that all the elements are collocated on a single processor. Today's massively parallel High Productivity Computing Systems (HPCS) are characterized by a modular structure, with a large number of processing and memory units connected by a high-speed network. Locality of access as well as load balancing are primary concerns in these systems that are typically used for high-performance scientific computation. Data distributions address these issues by providing a range of methods for spreading large data sets across the components of a system. Over the past two decades, many languages, systems, tools, and libraries have been developed for the support of distributions. Since the performance of data parallel applications is directly influenced by the distribution strategy, users often resort to low-level programming models that allow fine-tuning of the distribution aspects affecting performance, but, at the same time, are tedious and error-prone. This technology presents a reusable design of a data-distribution framework for data parallel high-performance applications. Distributions are a means to express locality in systems composed of large numbers of processor and memory components connected by a network. Since distributions have a great effect on the performance of applications, it is important that the distribution strategy is flexible, so its behavior can change depending on the needs of the application. At the same time, high productivity concerns require that the user be shielded from error-prone, tedious details such as communication and synchronization.
Grouper: a compact, streamable triangle mesh data structure.

PubMed

Luffel, Mark; Gurung, Topraj; Lindstrom, Peter; Rossignac, Jarek

2014-01-01

We present Grouper: an all-in-one compact file format, random-access data structure, and streamable representation for large triangle meshes. Similarly to the recently published SQuad representation, Grouper represents the geometry and connectivity of a mesh by grouping vertices and triangles into fixed-size records, most of which store two adjacent triangles and a shared vertex. Unlike SQuad, however, Grouper interleaves geometry with connectivity and uses a new connectivity representation to ensure that vertices and triangles can be stored in a coherent order that enables memory-efficient sequential stream processing. We present a linear-time construction algorithm that allows streaming out Grouper meshes using a small memory footprint while preserving the initial ordering of vertices. As a part of this construction, we show how the problem of assigning vertices and triangles to groups reduces to a well-known NP-hard optimization problem, and present a simple yet effective heuristic solution that performs well in practice. Our array-based Grouper representation also doubles as a triangle mesh data structure that allows direct access to vertices and triangles. Storing only about two integer references per triangle--i.e., less than the three vertex references stored with each triangle in a conventional indexed mesh format--Grouper answers both incidence and adjacency queries in amortized constant time. Our compact representation enables data-parallel processing on multicore computers, instant partitioning and fast transmission for distributed processing, as well as efficient out-of-core access. We demonstrate the versatility and performance benefits of Grouper using a suite of example meshes and processing kernels.
Serial multiplier arrays for parallel computation

NASA Technical Reports Server (NTRS)

Winters, Kel

1990-01-01

Arrays of systolic serial-parallel multiplier elements are proposed as an alternative to conventional SIMD mesh serial adder arrays for applications that are multiplication intensive and require few stored operands. The design and operation of a number of multiplier and array configurations featuring locality of connection, modularity, and regularity of structure are discussed. A design methodology combining top-down and bottom-up techniques is described to facilitate development of custom high-performance CMOS multiplier element arrays as well as rapid synthesis of simulation models and semicustom prototype CMOS components. Finally, a differential version of NORA dynamic circuits requiring a single-phase uncomplemented clock signal introduced for this application.
High-performance ultra-low power VLSI analog processor for data compression

NASA Technical Reports Server (NTRS)

Tawel, Raoul (Inventor)

1996-01-01

An apparatus for data compression employing a parallel analog processor. The apparatus includes an array of processor cells with N columns and M rows wherein the processor cells have an input device, memory device, and processor device. The input device is used for inputting a series of input vectors. Each input vector is simultaneously input into each column of the array of processor cells in a pre-determined sequential order. An input vector is made up of M components, ones of which are input into ones of M processor cells making up a column of the array. The memory device is used for providing ones of M components of a codebook vector to ones of the processor cells making up a column of the array. A different codebook vector is provided to each of the N columns of the array. The processor device is used for simultaneously comparing the components of each input vector to corresponding components of each codebook vector, and for outputting a signal representative of the closeness between the compared vector components. A combination device is used to combine the signal output from each processor cell in each column of the array and to output a combined signal. A closeness determination device is then used for determining which codebook vector is closest to an input vector from the combined signals, and for outputting a codebook vector index indicating which of the N codebook vectors was the closest to each input vector input into the array.
ALEGRA -- A massively parallel h-adaptive code for solid dynamics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Summers, R.M.; Wong, M.K.; Boucheron, E.A.

1997-12-31

ALEGRA is a multi-material, arbitrary-Lagrangian-Eulerian (ALE) code for solid dynamics designed to run on massively parallel (MP) computers. It combines the features of modern Eulerian shock codes, such as CTH, with modern Lagrangian structural analysis codes using an unstructured grid. ALEGRA is being developed for use on the teraflop supercomputers to conduct advanced three-dimensional (3D) simulations of shock phenomena important to a variety of systems. ALEGRA was designed with the Single Program Multiple Data (SPMD) paradigm, in which the mesh is decomposed into sub-meshes so that each processor gets a single sub-mesh with approximately the same number of elements. Usingmore » this approach the authors have been able to produce a single code that can scale from one processor to thousands of processors. A current major effort is to develop efficient, high precision simulation capabilities for ALEGRA, without the computational cost of using a global highly resolved mesh, through flexible, robust h-adaptivity of finite elements. H-adaptivity is the dynamic refinement of the mesh by subdividing elements, thus changing the characteristic element size and reducing numerical error. The authors are working on several major technical challenges that must be met to make effective use of HAMMER on MP computers.« less
Potential of minicomputer/array-processor system for nonlinear finite-element analysis

NASA Technical Reports Server (NTRS)

Strohkorb, G. A.; Noor, A. K.

1983-01-01

The potential of using a minicomputer/array-processor system for the efficient solution of large-scale, nonlinear, finite-element problems is studied. A Prime 750 is used as the host computer, and a software simulator residing on the Prime is employed to assess the performance of the Floating Point Systems AP-120B array processor. Major hardware characteristics of the system such as virtual memory and parallel and pipeline processing are reviewed, and the interplay between various hardware components is examined. Effective use of the minicomputer/array-processor system for nonlinear analysis requires the following: (1) proper selection of the computational procedure and the capability to vectorize the numerical algorithms; (2) reduction of input-output operations; and (3) overlapping host and array-processor operations. A detailed discussion is given of techniques to accomplish each of these tasks. Two benchmark problems with 1715 and 3230 degrees of freedom, respectively, are selected to measure the anticipated gain in speed obtained by using the proposed algorithms on the array processor.
A new parallelization scheme for adaptive mesh refinement

DOE PAGES

Loffler, Frank; Cao, Zhoujian; Brandt, Steven R.; ...

2016-05-06

Here, we present a new method for parallelization of adaptive mesh refinement called Concurrent Structured Adaptive Mesh Refinement (CSAMR). This new method offers the lower computational cost (i.e. wall time x processor count) of subcycling in time, but with the runtime performance (i.e. smaller wall time) of evolving all levels at once using the time step of the finest level (which does more work than subcycling but has less parallelism). We demonstrate our algorithm's effectiveness using an adaptive mesh refinement code, AMSS-NCKU, and show performance on Blue Waters and other high performance clusters. For the class of problem considered inmore » this paper, our algorithm achieves a speedup of 1.7-1.9 when the processor count for a given AMR run is doubled, consistent with our theoretical predictions.« less

A new parallelization scheme for adaptive mesh refinement

DOE Office of Scientific and Technical Information (OSTI.GOV)

Loffler, Frank; Cao, Zhoujian; Brandt, Steven R.

Here, we present a new method for parallelization of adaptive mesh refinement called Concurrent Structured Adaptive Mesh Refinement (CSAMR). This new method offers the lower computational cost (i.e. wall time x processor count) of subcycling in time, but with the runtime performance (i.e. smaller wall time) of evolving all levels at once using the time step of the finest level (which does more work than subcycling but has less parallelism). We demonstrate our algorithm's effectiveness using an adaptive mesh refinement code, AMSS-NCKU, and show performance on Blue Waters and other high performance clusters. For the class of problem considered inmore » this paper, our algorithm achieves a speedup of 1.7-1.9 when the processor count for a given AMR run is doubled, consistent with our theoretical predictions.« less
Phase space simulation of collisionless stellar systems on the massively parallel processor

NASA Technical Reports Server (NTRS)

White, Richard L.

1987-01-01

A numerical technique for solving the collisionless Boltzmann equation describing the time evolution of a self gravitating fluid in phase space was implemented on the Massively Parallel Processor (MPP). The code performs calculations for a two dimensional phase space grid (with one space and one velocity dimension). Some results from calculations are presented. The execution speed of the code is comparable to the speed of a single processor of a Cray-XMP. Advantages and disadvantages of the MPP architecture for this type of problem are discussed. The nearest neighbor connectivity of the MPP array does not pose a significant obstacle. Future MPP-like machines should have much more local memory and easier access to staging memory and disks in order to be effective for this type of problem.
Grouper: A Compact, Streamable Triangle Mesh Data Structure

DOE Office of Scientific and Technical Information (OSTI.GOV)

Luffel, Mark; Gurung, Topraj; Lindstrom, Peter

2014-01-01

Here, we present Grouper: an all-in-one compact file format, random-access data structure, and streamable representation for large triangle meshes. Similarly to the recently published SQuad representation, Grouper represents the geometry and connectivity of a mesh by grouping vertices and triangles into fixed-size records, most of which store two adjacent triangles and a shared vertex. Unlike SQuad, however, Grouper interleaves geometry with connectivity and uses a new connectivity representation to ensure that vertices and triangles can be stored in a coherent order that enables memory-efficient sequential stream processing. We also present a linear-time construction algorithm that allows streaming out Grouper meshesmore » using a small memory footprint while preserving the initial ordering of vertices. In this construction, we show how the problem of assigning vertices and triangles to groups reduces to a well-known NP-hard optimization problem, and present a simple yet effective heuristic solution that performs well in practice. Our array-based Grouper representation also doubles as a triangle mesh data structure that allows direct access to vertices and triangles. Storing only about two integer references per triangle-i.e., less than the three vertex references stored with each triangle in a conventional indexed mesh format-Grouper answers both incidence and adjacency queries in amortized constant time. Our compact representation enables data-parallel processing on multicore computers, instant partitioning and fast transmission for distributed processing, as well as efficient out-of-core access. We demonstrate the versatility and performance benefits of Grouper using a suite of example meshes and processing kernels.« less
Finite element computation on nearest neighbor connected machines

NASA Technical Reports Server (NTRS)

Mcaulay, A. D.

1984-01-01

Research aimed at faster, more cost effective parallel machines and algorithms for improving designer productivity with finite element computations is discussed. A set of 8 boards, containing 4 nearest neighbor connected arrays of commercially available floating point chips and substantial memory, are inserted into a commercially available machine. One-tenth Mflop (64 bit operation) processors provide an 89% efficiency when solving the equations arising in a finite element problem for a single variable regular grid of size 40 by 40 by 40. This is approximately 15 to 20 times faster than a much more expensive machine such as a VAX 11/780 used in double precision. The efficiency falls off as faster or more processors are envisaged because communication times become dominant. A novel successive overrelaxation algorithm which uses cyclic reduction in order to permit data transfer and computation to overlap in time is proposed.
A New Approach to Parallel Dynamic Partitioning for Adaptive Unstructured Meshes

NASA Technical Reports Server (NTRS)

Heber, Gerd; Biswas, Rupak; Gao, Guang R.

1999-01-01

Classical mesh partitioning algorithms were designed for rather static situations, and their straightforward application in a dynamical framework may lead to unsatisfactory results, e.g., excessive data migration among processors. Furthermore, special attention should be paid to their amenability to parallelization. In this paper, a novel parallel method for the dynamic partitioning of adaptive unstructured meshes is described. It is based on a linear representation of the mesh using self-avoiding walks.
Automatic partitioning of unstructured meshes for the parallel solution of problems in computational mechanics

NASA Technical Reports Server (NTRS)

Farhat, Charbel; Lesoinne, Michel

1993-01-01

Most of the recently proposed computational methods for solving partial differential equations on multiprocessor architectures stem from the 'divide and conquer' paradigm and involve some form of domain decomposition. For those methods which also require grids of points or patches of elements, it is often necessary to explicitly partition the underlying mesh, especially when working with local memory parallel processors. In this paper, a family of cost-effective algorithms for the automatic partitioning of arbitrary two- and three-dimensional finite element and finite difference meshes is presented and discussed in view of a domain decomposed solution procedure and parallel processing. The influence of the algorithmic aspects of a solution method (implicit/explicit computations), and the architectural specifics of a multiprocessor (SIMD/MIMD, startup/transmission time), on the design of a mesh partitioning algorithm are discussed. The impact of the partitioning strategy on load balancing, operation count, operator conditioning, rate of convergence and processor mapping is also addressed. Finally, the proposed mesh decomposition algorithms are demonstrated with realistic examples of finite element, finite volume, and finite difference meshes associated with the parallel solution of solid and fluid mechanics problems on the iPSC/2 and iPSC/860 multiprocessors.
CoNNeCT Baseband Processor Module

NASA Technical Reports Server (NTRS)

Yamamoto, Clifford K; Jedrey, Thomas C.; Gutrich, Daniel G.; Goodpasture, Richard L.

2011-01-01

A document describes the CoNNeCT Baseband Processor Module (BPM) based on an updated processor, memory technology, and field-programmable gate arrays (FPGAs). The BPM was developed from a requirement to provide sufficient computing power and memory storage to conduct experiments for a Software Defined Radio (SDR) to be implemented. The flight SDR uses the AT697 SPARC processor with on-chip data and instruction cache. The non-volatile memory has been increased from a 20-Mbit EEPROM (electrically erasable programmable read only memory) to a 4-Gbit Flash, managed by the RTAX2000 Housekeeper, allowing more programs and FPGA bit-files to be stored. The volatile memory has been increased from a 20-Mbit SRAM (static random access memory) to a 1.25-Gbit SDRAM (synchronous dynamic random access memory), providing additional memory space for more complex operating systems and programs to be executed on the SPARC. All memory is EDAC (error detection and correction) protected, while the SPARC processor implements fault protection via TMR (triple modular redundancy) architecture. Further capability over prior BPM designs includes the addition of a second FPGA to implement features beyond the resources of a single FPGA. Both FPGAs are implemented with Xilinx Virtex-II and are interconnected by a 96-bit bus to facilitate data exchange. Dedicated 1.25- Gbit SDRAMs are wired to each Xilinx FPGA to accommodate high rate data buffering for SDR applications as well as independent SpaceWire interfaces. The RTAX2000 manages scrub and configuration of each Xilinx.
Silicon nanodisk array with a fin field-effect transistor for time-domain weighted sum calculation toward massively parallel spiking neural networks

NASA Astrophysics Data System (ADS)

Tohara, Takashi; Liang, Haichao; Tanaka, Hirofumi; Igarashi, Makoto; Samukawa, Seiji; Endo, Kazuhiko; Takahashi, Yasuo; Morie, Takashi

2016-03-01

A nanodisk array connected with a fin field-effect transistor is fabricated and analyzed for spiking neural network applications. This nanodevice performs weighted sums in the time domain using rising slopes of responses triggered by input spike pulses. The nanodisk arrays, which act as a resistance of several giga-ohms, are fabricated using a self-assembly bio-nano-template technique. Weighted sums are achieved with an energy dissipation on the order of 1 fJ, where the number of inputs can be more than one hundred. This amount of energy is several orders of magnitude lower than that of conventional digital processors.
A Scalable Multicore Architecture With Heterogeneous Memory Structures for Dynamic Neuromorphic Asynchronous Processors (DYNAPs).

PubMed

Moradi, Saber; Qiao, Ning; Stefanini, Fabio; Indiveri, Giacomo

2018-02-01

Neuromorphic computing systems comprise networks of neurons that use asynchronous events for both computation and communication. This type of representation offers several advantages in terms of bandwidth and power consumption in neuromorphic electronic systems. However, managing the traffic of asynchronous events in large scale systems is a daunting task, both in terms of circuit complexity and memory requirements. Here, we present a novel routing methodology that employs both hierarchical and mesh routing strategies and combines heterogeneous memory structures for minimizing both memory requirements and latency, while maximizing programming flexibility to support a wide range of event-based neural network architectures, through parameter configuration. We validated the proposed scheme in a prototype multicore neuromorphic processor chip that employs hybrid analog/digital circuits for emulating synapse and neuron dynamics together with asynchronous digital circuits for managing the address-event traffic. We present a theoretical analysis of the proposed connectivity scheme, describe the methods and circuits used to implement such scheme, and characterize the prototype chip. Finally, we demonstrate the use of the neuromorphic processor with a convolutional neural network for the real-time classification of visual symbols being flashed to a dynamic vision sensor (DVS) at high speed.
Rectangular Array Of Digital Processors For Planning Paths

NASA Technical Reports Server (NTRS)

Kemeny, Sabrina E.; Fossum, Eric R.; Nixon, Robert H.

1993-01-01

Prototype 24 x 25 rectangular array of asynchronous parallel digital processors rapidly finds best path across two-dimensional field, which could be patch of terrain traversed by robotic or military vehicle. Implemented as single-chip very-large-scale integrated circuit. Excepting processors on edges, each processor communicates with four nearest neighbors along paths representing travel to north, south, east, and west. Each processor contains delay generator in form of 8-bit ripple counter, preset to 1 of 256 possible values. Operation begins with choice of processor representing starting point. Transmits signals to nearest neighbor processors, which retransmits to other neighboring processors, and process repeats until signals propagated across entire field.
Electrically reconfigurable logic array

NASA Technical Reports Server (NTRS)

Agarwal, R. K.

1982-01-01

To compose the complicated systems using algorithmically specialized logic circuits or processors, one solution is to perform relational computations such as union, division and intersection directly on hardware. These relations can be pipelined efficiently on a network of processors having an array configuration. These processors can be designed and implemented with a few simple cells. In order to determine the state-of-the-art in Electrically Reconfigurable Logic Array (ERLA), a survey of the available programmable logic array (PLA) and the logic circuit elements used in such arrays was conducted. Based on this survey some recommendations are made for ERLA devices.
A programmable computational image sensor for high-speed vision

NASA Astrophysics Data System (ADS)

Yang, Jie; Shi, Cong; Long, Xitian; Wu, Nanjian

2013-08-01

In this paper we present a programmable computational image sensor for high-speed vision. This computational image sensor contains four main blocks: an image pixel array, a massively parallel processing element (PE) array, a row processor (RP) array and a RISC core. The pixel-parallel PE is responsible for transferring, storing and processing image raw data in a SIMD fashion with its own programming language. The RPs are one dimensional array of simplified RISC cores, it can carry out complex arithmetic and logic operations. The PE array and RP array can finish great amount of computation with few instruction cycles and therefore satisfy the low- and middle-level high-speed image processing requirement. The RISC core controls the whole system operation and finishes some high-level image processing algorithms. We utilize a simplified AHB bus as the system bus to connect our major components. Programming language and corresponding tool chain for this computational image sensor are also developed.
ATCA digital controller hardware for vertical stabilization of plasmas in tokamaks

DOE Office of Scientific and Technical Information (OSTI.GOV)

Batista, A. J. N.; Sousa, J.; Varandas, C. A. F.

2006-10-15

The efficient vertical stabilization (VS) of plasmas in tokamaks requires a fast reaction of the VS controller, for example, after detection of edge localized modes (ELM). For controlling the effects of very large ELMs a new digital control hardware, based on the Advanced Telecommunications Computing Architecture trade mark sign (ATCA), is being developed aiming to reduce the VS digital control loop cycle (down to an optimal value of 10 {mu}s) and improve the algorithm performance. The system has 1 ATCA trade mark sign processor module and up to 12 ATCA trade mark sign control modules, each one with 32 analogmore » input channels (12 bit resolution), 4 analog output channels (12 bit resolution), and 8 digital input/output channels. The Aurora trade mark sign and PCI Express trade mark sign communication protocols will be used for data transport, between modules, with expected latencies below 2 {mu}s. Control algorithms are implemented on a ix86 based processor with 6 Gflops and on field programmable gate arrays with 80 GMACS, interconnected by serial gigabit links in a full mesh topology.« less
A novel VLSI processor architecture for supercomputing arrays

NASA Technical Reports Server (NTRS)

Venkateswaran, N.; Pattabiraman, S.; Devanathan, R.; Ahmed, Ashaf; Venkataraman, S.; Ganesh, N.

1993-01-01

Design of the processor element for general purpose massively parallel supercomputing arrays is highly complex and cost ineffective. To overcome this, the architecture and organization of the functional units of the processor element should be such as to suit the diverse computational structures and simplify mapping of complex communication structures of different classes of algorithms. This demands that the computation and communication structures of different class of algorithms be unified. While unifying the different communication structures is a difficult process, analysis of a wide class of algorithms reveals that their computation structures can be expressed in terms of basic IP,IP,OP,CM,R,SM, and MAA operations. The execution of these operations is unified on the PAcube macro-cell array. Based on this PAcube macro-cell array, we present a novel processor element called the GIPOP processor, which has dedicated functional units to perform the above operations. The architecture and organization of these functional units are such to satisfy the two important criteria mentioned above. The structure of the macro-cell and the unification process has led to a very regular and simpler design of the GIPOP processor. The production cost of the GIPOP processor is drastically reduced as it is designed on high performance mask programmable PAcube arrays.
Optical Interconnections for VLSI Computational Systems Using Computer-Generated Holography.

NASA Astrophysics Data System (ADS)

Feldman, Michael Robert

Optical interconnects for VLSI computational systems using computer generated holograms are evaluated in theory and experiment. It is shown that by replacing particular electronic connections with free-space optical communication paths, connection of devices on a single chip or wafer and between chips or modules can be improved. Optical and electrical interconnects are compared in terms of power dissipation, communication bandwidth, and connection density. Conditions are determined for which optical interconnects are advantageous. Based on this analysis, it is shown that by applying computer generated holographic optical interconnects to wafer scale fine grain parallel processing systems, dramatic increases in system performance can be expected. Some new interconnection networks, designed to take full advantage of optical interconnect technology, have been developed. Experimental Computer Generated Holograms (CGH's) have been designed, fabricated and subsequently tested in prototype optical interconnected computational systems. Several new CGH encoding methods have been developed to provide efficient high performance CGH's. One CGH was used to decrease the access time of a 1 kilobit CMOS RAM chip. Another was produced to implement the inter-processor communication paths in a shared memory SIMD parallel processor array.
Reduction of solar vector magnetograph data using a microMSP array processor

NASA Technical Reports Server (NTRS)

Kineke, Jack

1990-01-01

The processing of raw data obtained by the solar vector magnetograph at NASA-Marshall requires extensive arithmetic operations on large arrays of real numbers. The objectives of this summer faculty fellowship study are to: (1) learn the programming language of the MicroMSP Array Processor and adapt some existing data reduction routines to exploit its capabilities; and (2) identify other applications and/or existing programs which lend themselves to array processor utilization which can be developed by undergraduate student programmers under the provisions of project JOVE.
Parallel processing in a host plus multiple array processor system for radar

NASA Technical Reports Server (NTRS)

Barkan, B. Z.

1983-01-01

Host plus multiple array processor architecture is demonstrated to yield a modular, fast, and cost-effective system for radar processing. Software methodology for programming such a system is developed. Parallel processing with pipelined data flow among the host, array processors, and discs is implemented. Theoretical analysis of performance is made and experimentally verified. The broad class of problems to which the architecture and methodology can be applied is indicated.
Contextual classification on a CDC Flexible Processor system. [for photomapped remote sensing data

NASA Technical Reports Server (NTRS)

Smith, B. W.; Siegel, H. J.; Swain, P. H.

1981-01-01

A potential hardware organization for the Flexible Processor Array is presented. An algorithm that implements a contextual classifier for remote sensing data analysis is given, along with uniprocessor classification algorithms. The Flexible Processor algorithm is provided, as are simulated timings for contextual classifiers run on the Flexible Processor Array and another system. The timings are analyzed for context neighborhoods of sizes three and nine.
3-D inversion of airborne electromagnetic data parallelized and accelerated by local mesh and adaptive soundings

NASA Astrophysics Data System (ADS)

Yang, Dikun; Oldenburg, Douglas W.; Haber, Eldad

2014-03-01

Airborne electromagnetic (AEM) methods are highly efficient tools for assessing the Earth's conductivity structures in a large area at low cost. However, the configuration of AEM measurements, which typically have widely distributed transmitter-receiver pairs, makes the rigorous modelling and interpretation extremely time-consuming in 3-D. Excessive overcomputing can occur when working on a large mesh covering the entire survey area and inverting all soundings in the data set. We propose two improvements. The first is to use a locally optimized mesh for each AEM sounding for the forward modelling and calculation of sensitivity. This dedicated local mesh is small with fine cells near the sounding location and coarse cells far away in accordance with EM diffusion and the geometric decay of the signals. Once the forward problem is solved on the local meshes, the sensitivity for the inversion on the global mesh is available through quick interpolation. Using local meshes for AEM forward modelling avoids unnecessary computing on fine cells on a global mesh that are far away from the sounding location. Since local meshes are highly independent, the forward modelling can be efficiently parallelized over an array of processors. The second improvement is random and dynamic down-sampling of the soundings. Each inversion iteration only uses a random subset of the soundings, and the subset is reselected for every iteration. The number of soundings in the random subset, determined by an adaptive algorithm, is tied to the degree of model regularization. This minimizes the overcomputing caused by working with redundant soundings. Our methods are compared against conventional methods and tested with a synthetic example. We also invert a field data set that was previously considered to be too large to be practically inverted in 3-D. These examples show that our methodology can dramatically reduce the processing time of 3-D inversion to a practical level without losing resolution. Any existing modelling technique can be included into our framework of mesh decoupling and adaptive sampling to accelerate large-scale 3-D EM inversions.
Lenslet array processors.

PubMed

Glaser, I

1982-04-01

By combining a lenslet array with masks it is possible to obtain a noncoherent optical processor capable of computing in parallel generalized 2-D discrete linear transformations. We present here an analysis of such lenslet array processors (LAP). The effect of several errors, including optical aberrations, diffraction, vignetting, and geometrical and mask errors, are calculated, and guidelines to optical design of LAP are derived. Using these results, both ultimate and practical performances of LAP are compared with those of competing techniques.

PLUM: Parallel Load Balancing for Unstructured Adaptive Meshes. Degree awarded by Colorado Univ.

NASA Technical Reports Server (NTRS)

Oliker, Leonid

1998-01-01

Dynamic mesh adaption on unstructured grids is a powerful tool for computing large-scale problems that require grid modifications to efficiently resolve solution features. By locally refining and coarsening the mesh to capture physical phenomena of interest, such procedures make standard computational methods more cost effective. Unfortunately, an efficient parallel implementation of these adaptive methods is rather difficult to achieve, primarily due to the load imbalance created by the dynamically-changing nonuniform grid. This requires significant communication at runtime, leading to idle processors and adversely affecting the total execution time. Nonetheless, it is generally thought that unstructured adaptive- grid techniques will constitute a significant fraction of future high-performance supercomputing. Various dynamic load balancing methods have been reported to date; however, most of them either lack a global view of loads across processors or do not apply their techniques to realistic large-scale applications.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Chrisochoides, N.; Sukup, F.

In this paper we present a parallel implementation of the Bowyer-Watson (BW) algorithm using the task-parallel programming model. The BW algorithm constitutes an ideal mesh refinement strategy for implementing a large class of unstructured mesh generation techniques on both sequential and parallel computers, by preventing the need for global mesh refinement. Its implementation on distributed memory multicomputes using the traditional data-parallel model has been proven very inefficient due to excessive synchronization needed among processors. In this paper we demonstrate that with the task-parallel model we can tolerate synchronization costs inherent to data-parallel methods by exploring concurrency in the processor level.more » Our preliminary performance data indicate that the task- parallel approach: (i) is almost four times faster than the existing data-parallel methods, (ii) scales linearly, and (iii) introduces minimum overheads compared to the {open_quotes}best{close_quotes} sequential implementation of the BW algorithm.« less
Reconfiguration Schemes for Fault-Tolerant Processor Arrays

DTIC Science & Technology

1992-10-15

partially notion of linear schedule are easily related to similar ordered subset of a multidimensional integer lattice models and concepts used in [11-[131...and several other (called indec set). The points of this lattice correspond works. to (i.e.. are the indices of) computations, and the partial There are...These data dependencies are represented as vectors that of all computations of the algorithm is to be minimized. connect points of the lattice . If a
Implicit, nonswitching, vector-oriented algorithm for steady transonic flow

NASA Technical Reports Server (NTRS)

Lottati, I.

1983-01-01

A rapid computation of a sequence of transonic flow solutions has to be performed in many areas of aerodynamic technology. The employment of low-cost vector array processors makes the conduction of such calculations economically feasible. However, for a full utilization of the new hardware, the developed algorithms must take advantage of the special characteristics of the vector array processor. The present investigation has the objective to develop an efficient algorithm for solving transonic flow problems governed by mixed partial differential equations on an array processor.
High performance Python for direct numerical simulations of turbulent flows

NASA Astrophysics Data System (ADS)

Mortensen, Mikael; Langtangen, Hans Petter

2016-06-01

Direct Numerical Simulations (DNS) of the Navier Stokes equations is an invaluable research tool in fluid dynamics. Still, there are few publicly available research codes and, due to the heavy number crunching implied, available codes are usually written in low-level languages such as C/C++ or Fortran. In this paper we describe a pure scientific Python pseudo-spectral DNS code that nearly matches the performance of C++ for thousands of processors and billions of unknowns. We also describe a version optimized through Cython, that is found to match the speed of C++. The solvers are written from scratch in Python, both the mesh, the MPI domain decomposition, and the temporal integrators. The solvers have been verified and benchmarked on the Shaheen supercomputer at the KAUST supercomputing laboratory, and we are able to show very good scaling up to several thousand cores. A very important part of the implementation is the mesh decomposition (we implement both slab and pencil decompositions) and 3D parallel Fast Fourier Transforms (FFT). The mesh decomposition and FFT routines have been implemented in Python using serial FFT routines (either NumPy, pyFFTW or any other serial FFT module), NumPy array manipulations and with MPI communications handled by MPI for Python (mpi4py). We show how we are able to execute a 3D parallel FFT in Python for a slab mesh decomposition using 4 lines of compact Python code, for which the parallel performance on Shaheen is found to be slightly better than similar routines provided through the FFTW library. For a pencil mesh decomposition 7 lines of code is required to execute a transform.
Efficient Use of Distributed Systems for Scientific Applications

NASA Technical Reports Server (NTRS)

Taylor, Valerie; Chen, Jian; Canfield, Thomas; Richard, Jacques

2000-01-01

Distributed computing has been regarded as the future of high performance computing. Nationwide high speed networks such as vBNS are becoming widely available to interconnect high-speed computers, virtual environments, scientific instruments and large data sets. One of the major issues to be addressed with distributed systems is the development of computational tools that facilitate the efficient execution of parallel applications on such systems. These tools must exploit the heterogeneous resources (networks and compute nodes) in distributed systems. This paper presents a tool, called PART, which addresses this issue for mesh partitioning. PART takes advantage of the following heterogeneous system features: (1) processor speed; (2) number of processors; (3) local network performance; and (4) wide area network performance. Further, different finite element applications under consideration may have different computational complexities, different communication patterns, and different element types, which also must be taken into consideration when partitioning. PART uses parallel simulated annealing to partition the domain, taking into consideration network and processor heterogeneity. The results of using PART for an explicit finite element application executing on two IBM SPs (located at Argonne National Laboratory and the San Diego Supercomputer Center) indicate an increase in efficiency by up to 36% as compared to METIS, a widely used mesh partitioning tool. The input to METIS was modified to take into consideration heterogeneous processor performance; METIS does not take into consideration heterogeneous networks. The execution times for these applications were reduced by up to 30% as compared to METIS. These results are given in Figure 1 for four irregular meshes with number of elements ranging from 30,269 elements for the Barth5 mesh to 11,451 elements for the Barth4 mesh. Future work with PART entails using the tool with an integrated application requiring distributed systems. In particular this application, illustrated in the document entails an integration of finite element and fluid dynamic simulations to address the cooling of turbine blades of a gas turbine engine design. It is not uncommon to encounter high-temperature, film-cooled turbine airfoils with 1,000,000s of degrees of freedom. This results because of the complexity of the various components of the airfoils, requiring fine-grain meshing for accuracy. Additional information is contained in the original.
Reconfigurable signal processor designs for advanced digital array radar systems

NASA Astrophysics Data System (ADS)

Suarez, Hernan; Zhang, Yan (Rockee); Yu, Xining

2017-05-01

The new challenges originated from Digital Array Radar (DAR) demands a new generation of reconfigurable backend processor in the system. The new FPGA devices can support much higher speed, more bandwidth and processing capabilities for the need of digital Line Replaceable Unit (LRU). This study focuses on using the latest Altera and Xilinx devices in an adaptive beamforming processor. The field reprogrammable RF devices from Analog Devices are used as analog front end transceivers. Different from other existing Software-Defined Radio transceivers on the market, this processor is designed for distributed adaptive beamforming in a networked environment. The following aspects of the novel radar processor will be presented: (1) A new system-on-chip architecture based on Altera's devices and adaptive processing module, especially for the adaptive beamforming and pulse compression, will be introduced, (2) Successful implementation of generation 2 serial RapidIO data links on FPGA, which supports VITA-49 radio packet format for large distributed DAR processing. (3) Demonstration of the feasibility and capabilities of the processor in a Micro-TCA based, SRIO switching backplane to support multichannel beamforming in real-time. (4) Application of this processor in ongoing radar system development projects, including OU's dual-polarized digital array radar, the planned new cylindrical array radars, and future airborne radars.
A wideband software reconfigurable modem

NASA Astrophysics Data System (ADS)

Turner, J. H., Jr.; Vickers, H.

A wideband modem is described which provides signal processing capability for four Lx-band signals employing QPSK, MSK and PPM waveforms and employs a software reconfigurable architecture for maximum system flexibility and graceful degradation. The current processor uses a 2901 and two 8086 microprocessors per channel and performs acquisition, tracking, and data demodulation for JITDS, GPS, IFF and TACAN systems. The next generation processor will be implemented using a VHSIC chip set employing a programmable complex array vector processor module, a GP computer module, customized gate array modules, and a digital array correlator. This integrated processor has application to a wide number of diverse system waveforms, and will bring the benefits of VHSIC technology insertion into avionic antijam communications systems.
Integrated circuit for SAW and MEMS sensors

NASA Astrophysics Data System (ADS)

Fischer, Wolf-Joachim; Koenig, Peter; Ploetner, Matthias; Hermann, Rudiger; Stab, Helmut

2001-11-01

The sensor processor circuit has been developed for hand-held devices used in industrial and environmental applications, such as on-line process monitoring. Thereby devices with SAW sensors or MEMS resonators will benefit from this processor especially. Up to 8 sensors can be connected to the circuit as multisensors or sensor arrays. Two sensor processors SP1 and SP2 for different applications are presented in this paper. The SP-1 chip has a PCMCIA interface which can be used for the program and data transfer. SAW sensors which are working in the frequency range from 80 MHz to 160 MHz can be connected to the processor directly. It is possible to use the new SP-2 chip fabricated in a 0.5(mu) CMOS process for SAW devices with a maximum frequency of 600 MHz. An on-chip analog-digital-converter (ADC) and 6 PWM modules support the development of high-miniaturized intelligent sensor systems We have developed a multi-SAW sensor system with this ASIC that manages the requirements on control as well as signal generation and storage and provides an interface to the PC and electronic devices on the board. Its low power consumption and its PCMCIA plug fulfil the requirements of small size and mobility. For this application sensors have been developed to detect hazardous gases in ambient air. Sensors with differently modified copper-phthalocyanine films are capable of detecting NO2 and O3, whereas those with a hyperbranched polyester film respond to NH3.
Developing infrared array controller with software real time operating system

NASA Astrophysics Data System (ADS)

Sako, Shigeyuki; Miyata, Takashi; Nakamura, Tomohiko; Motohara, Kentaro; Uchimoto, Yuka Katsuno; Onaka, Takashi; Kataza, Hirokazu

2008-07-01

Real-time capabilities are required for a controller of a large format array to reduce a dead-time attributed by readout and data transfer. The real-time processing has been achieved by dedicated processors including DSP, CPLD, and FPGA devices. However, the dedicated processors have problems with memory resources, inflexibility, and high cost. Meanwhile, a recent PC has sufficient resources of CPUs and memories to control the infrared array and to process a large amount of frame data in real-time. In this study, we have developed an infrared array controller with a software real-time operating system (RTOS) instead of the dedicated processors. A Linux PC equipped with a RTAI extension and a dual-core CPU is used as a main computer, and one of the CPU cores is allocated to the real-time processing. A digital I/O board with DMA functions is used for an I/O interface. The signal-processing cores are integrated in the OS kernel as a real-time driver module, which is composed of two virtual devices of the clock processor and the frame processor tasks. The array controller with the RTOS realizes complicated operations easily, flexibly, and at a low cost.
A digital retina-like low-level vision processor.

PubMed

Mertoguno, S; Bourbakis, N G

2003-01-01

This correspondence presents the basic design and the simulation of a low level multilayer vision processor that emulates to some degree the functional behavior of a human retina. This retina-like multilayer processor is the lower part of an autonomous self-organized vision system, called Kydon, that could be used on visually impaired people with a damaged visual cerebral cortex. The Kydon vision system, however, is not presented in this paper. The retina-like processor consists of four major layers, where each of them is an array processor based on hexagonal, autonomous processing elements that perform a certain set of low level vision tasks, such as smoothing and light adaptation, edge detection, segmentation, line recognition and region-graph generation. At each layer, the array processor is a 2D array of k/spl times/m hexagonal identical autonomous cells that simultaneously execute certain low level vision tasks. Thus, the hardware design and the simulation at the transistor level of the processing elements (PEs) of the retina-like processor and its simulated functionality with illustrative examples are provided in this paper.
Ring-array processor distribution topology for optical interconnects

NASA Technical Reports Server (NTRS)

Li, Yao; Ha, Berlin; Wang, Ting; Wang, Sunyu; Katz, A.; Lu, X. J.; Kanterakis, E.

1992-01-01

The existing linear and rectangular processor distribution topologies for optical interconnects, although promising in many respects, cannot solve problems such as clock skews, the lack of supporting elements for efficient optical implementation, etc. The use of a ring-array processor distribution topology, however, can overcome these problems. Here, a study of the ring-array topology is conducted with an aim of implementing various fast clock rate, high-performance, compact optical networks for digital electronic multiprocessor computers. Practical design issues are addressed. Some proof-of-principle experimental results are included.
A Parallel Ghosting Algorithm for The Flexible Distributed Mesh Database

DOE PAGES

Mubarak, Misbah; Seol, Seegyoung; Lu, Qiukai; ...

2013-01-01

Critical to the scalability of parallel adaptive simulations are parallel control functions including load balancing, reduced inter-process communication and optimal data decomposition. In distributed meshes, many mesh-based applications frequently access neighborhood information for computational purposes which must be transmitted efficiently to avoid parallel performance degradation when the neighbors are on different processors. This article presents a parallel algorithm of creating and deleting data copies, referred to as ghost copies, which localize neighborhood data for computation purposes while minimizing inter-process communication. The key characteristics of the algorithm are: (1) It can create ghost copies of any permissible topological order in amore » 1D, 2D or 3D mesh based on selected adjacencies. (2) It exploits neighborhood communication patterns during the ghost creation process thus eliminating all-to-all communication. (3) For applications that need neighbors of neighbors, the algorithm can create n number of ghost layers up to a point where the whole partitioned mesh can be ghosted. Strong and weak scaling results are presented for the IBM BG/P and Cray XE6 architectures up to a core count of 32,768 processors. The algorithm also leads to scalable results when used in a parallel super-convergent patch recovery error estimator, an application that frequently accesses neighborhood data to carry out computation.« less
Towards implementation of cellular automata in Microbial Fuel Cells.

PubMed

Tsompanas, Michail-Antisthenis I; Adamatzky, Andrew; Sirakoulis, Georgios Ch; Greenman, John; Ieropoulos, Ioannis

2017-01-01

The Microbial Fuel Cell (MFC) is a bio-electrochemical transducer converting waste products into electricity using microbial communities. Cellular Automaton (CA) is a uniform array of finite-state machines that update their states in discrete time depending on states of their closest neighbors by the same rule. Arrays of MFCs could, in principle, act as massive-parallel computing devices with local connectivity between elementary processors. We provide a theoretical design of such a parallel processor by implementing CA in MFCs. We have chosen Conway's Game of Life as the 'benchmark' CA because this is the most popular CA which also exhibits an enormously rich spectrum of patterns. Each cell of the Game of Life CA is realized using two MFCs. The MFCs are linked electrically and hydraulically. The model is verified via simulation of an electrical circuit demonstrating equivalent behaviours. The design is a first step towards future implementations of fully autonomous biological computing devices with massive parallelism. The energy independence of such devices counteracts their somewhat slow transitions-compared to silicon circuitry-between the different states during computation.
Towards implementation of cellular automata in Microbial Fuel Cells

PubMed Central

Adamatzky, Andrew; Sirakoulis, Georgios Ch.; Greenman, John; Ieropoulos, Ioannis

2017-01-01

The Microbial Fuel Cell (MFC) is a bio-electrochemical transducer converting waste products into electricity using microbial communities. Cellular Automaton (CA) is a uniform array of finite-state machines that update their states in discrete time depending on states of their closest neighbors by the same rule. Arrays of MFCs could, in principle, act as massive-parallel computing devices with local connectivity between elementary processors. We provide a theoretical design of such a parallel processor by implementing CA in MFCs. We have chosen Conway’s Game of Life as the ‘benchmark’ CA because this is the most popular CA which also exhibits an enormously rich spectrum of patterns. Each cell of the Game of Life CA is realized using two MFCs. The MFCs are linked electrically and hydraulically. The model is verified via simulation of an electrical circuit demonstrating equivalent behaviours. The design is a first step towards future implementations of fully autonomous biological computing devices with massive parallelism. The energy independence of such devices counteracts their somewhat slow transitions—compared to silicon circuitry—between the different states during computation. PMID:28498871
Parallel computing on Unix workstation arrays

NASA Astrophysics Data System (ADS)

Reale, F.; Bocchino, F.; Sciortino, S.

1994-12-01

We have tested arrays of general-purpose Unix workstations used as MIMD systems for massive parallel computations. In particular we have solved numerically a demanding test problem with a 2D hydrodynamic code, generally developed to study astrophysical flows, by exucuting it on arrays either of DECstations 5000/200 on Ethernet LAN, or of DECstations 3000/400, equipped with powerful Alpha processors, on FDDI LAN. The code is appropriate for data-domain decomposition, and we have used a library for parallelization previously developed in our Institute, and easily extended to work on Unix workstation arrays by using the PVM software toolset. We have compared the parallel efficiencies obtained on arrays of several processors to those obtained on a dedicated MIMD parallel system, namely a Meiko Computing Surface (CS-1), equipped with Intel i860 processors. We discuss the feasibility of using non-dedicated parallel systems and conclude that the convenience depends essentially on the size of the computational domain as compared to the relative processor power and network bandwidth. We point out that for future perspectives a parallel development of processor and network technology is important, and that the software still offers great opportunities of improvement, especially in terms of latency times in the message-passing protocols. In conditions of significant gain in terms of speedup, such workstation arrays represent a cost-effective approach to massive parallel computations.
3D Voronoi grid dedicated software for modeling gas migration in deep layered sedimentary formations with TOUGH2-TMGAS

NASA Astrophysics Data System (ADS)

Bonduà, Stefano; Battistelli, Alfredo; Berry, Paolo; Bortolotti, Villiam; Consonni, Alberto; Cormio, Carlo; Geloni, Claudio; Vasini, Ester Maria

2017-11-01

As is known, a full three-dimensional (3D) unstructured grid permits a great degree of flexibility when performing accurate numerical reservoir simulations. However, when the Integral Finite Difference Method (IFDM) is used for spatial discretization, constraints (arising from the required orthogonality between the segment connecting the blocks nodes and the interface area between blocks) pose difficulties in the creation of grids with irregular shaped blocks. The full 3D Voronoi approach guarantees the respect of IFDM constraints and allows generation of grids conforming to geological formations and structural objects and at the same time higher grid resolution in volumes of interest. In this work, we present dedicated pre- and post-processing gridding software tools for the TOUGH family of numerical reservoir simulators, developed by the Geothermal Research Group of the DICAM Department, University of Bologna. VORO2MESH is a new software coded in C++, based on the voro++ library, allowing computation of the 3D Voronoi tessellation for a given domain and the creation of a ready to use TOUGH2 MESH file. If a set of geological surfaces is available, the software can directly generate the set of Voronoi seed points used for tessellation. In order to reduce the number of connections and so to decrease computation time, VORO2MESH can produce a mixed grid with regular blocks (orthogonal prisms) and irregular blocks (polyhedron Voronoi blocks) at the point of contact between different geological formations. In order to visualize 3D Voronoi grids together with the results of numerical simulations, the functionality of the TOUGH2Viewer post-processor has been extended. We describe an application of VORO2MESH and TOUGH2Viewer to validate the two tools. The case study deals with the simulation of the migration of gases in deep layered sedimentary formations at basin scale using TOUGH2-TMGAS. A comparison between the simulation performances of unstructured and structured grids is presented.
SPECIAL ISSUE ON OPTICAL PROCESSING OF INFORMATION: Optoelectronic processors with scanning CCD photodetectors

NASA Astrophysics Data System (ADS)

Esepkina, N. A.; Lavrov, A. P.; Anan'ev, M. N.; Blagodarnyi, V. S.; Ivanov, S. I.; Mansyrev, M. I.; Molodyakov, S. A.

1995-10-01

Two new types of optoelectronic radio-signal processors were investigated. Charge-coupled device (CCD) photodetectors are used in these processors under continuous scanning conditions, i.e. in a time delay and storage mode. One of these processors is based on a CCD photodetector array with a reference-signal amplitude transparency and the other is an adaptive acousto-optical signal processor with linear frequency modulation. The processor with the transparency performs multichannel discrete—analogue convolution of an input signal with a corresponding kernel of the transformation determined by the transparency. If a light source is an array of light-emitting diodes of special (stripe) geometry, the optical stages of the processor can be made from optical fibre components and the whole processor then becomes a rigid 'sandwich' (a compact hybrid optoelectronic microcircuit). A report is given also of a study of a prototype processor with optical fibre components for the reception of signals from a system with antenna aperture synthesis, which forms a radio image of the Earth.
The P-Mesh: A Commodity-based Scalable Network Architecture for Clusters

NASA Technical Reports Server (NTRS)

Nitzberg, Bill; Kuszmaul, Chris; Stockdale, Ian; Becker, Jeff; Jiang, John; Wong, Parkson; Tweten, David (Technical Monitor)

1998-01-01

We designed a new network architecture, the P-Mesh which combines the scalability and fault resilience of a torus with the performance of a switch. We compare the scalability, performance, and cost of the hub, switch, torus, tree, and P-Mesh architectures. The latter three are capable of scaling to thousands of nodes, however, the torus has severe performance limitations with that many processors. The tree and P-Mesh have similar latency, bandwidth, and bisection bandwidth, but the P-Mesh outperforms the switch architecture (a lower bound for tree performance) on 16-node NAB Parallel Benchmark tests by up to 23%, and costs 40% less. Further, the P-Mesh has better fault resilience characteristics. The P-Mesh architecture trades increased management overhead for lower cost, and is a good bridging technology while the price of tree uplinks is expensive.
Shift-, rotation-, and scale-invariant shape recognition system using an optical Hough transform

NASA Astrophysics Data System (ADS)

Schmid, Volker R.; Bader, Gerhard; Lueder, Ernst H.

1998-02-01

We present a hybrid shape recognition system with an optical Hough transform processor. The features of the Hough space offer a separate cancellation of distortions caused by translations and rotations. Scale invariance is also provided by suitable normalization. The proposed system extends the capabilities of Hough transform based detection from only straight lines to areas bounded by edges. A very compact optical design is achieved by a microlens array processor accepting incoherent light as direct optical input and realizing the computationally expensive connections massively parallel. Our newly developed algorithm extracts rotation and translation invariant normalized patterns of bright spots on a 2D grid. A neural network classifier maps the 2D features via a nonlinear hidden layer onto the classification output vector. We propose initialization of the connection weights according to regions of activity specifically assigned to each neuron in the hidden layer using a competitive network. The presented system is designed for industry inspection applications. Presently we have demonstrated detection of six different machined parts in real-time. Our method yields very promising detection results of more than 96% correctly classified parts.

Connectivity-based, all-hexahedral mesh generation method and apparatus

DOEpatents

Tautges, T.J.; Mitchell, S.A.; Blacker, T.D.; Murdoch, P.

1998-06-16

The present invention is a computer-based method and apparatus for constructing all-hexahedral finite element meshes for finite element analysis. The present invention begins with a three-dimensional geometry and an all-quadrilateral surface mesh, then constructs hexahedral element connectivity from the outer boundary inward, and then resolves invalid connectivity. The result of the present invention is a complete representation of hex mesh connectivity only; actual mesh node locations are determined later. The basic method of the present invention comprises the step of forming hexahedral elements by making crossings of entities referred to as ``whisker chords.`` This step, combined with a seaming operation in space, is shown to be sufficient for meshing simple block problems. Entities that appear when meshing more complex geometries, namely blind chords, merged sheets, and self-intersecting chords, are described. A method for detecting invalid connectivity in space, based on repeated edges, is also described, along with its application to various cases of invalid connectivity introduced and resolved by the method. 79 figs.
Connectivity-based, all-hexahedral mesh generation method and apparatus

DOEpatents

Tautges, Timothy James; Mitchell, Scott A.; Blacker, Ted D.; Murdoch, Peter

1998-01-01

The present invention is a computer-based method and apparatus for constructing all-hexahedral finite element meshes for finite element analysis. The present invention begins with a three-dimensional geometry and an all-quadrilateral surface mesh, then constructs hexahedral element connectivity from the outer boundary inward, and then resolves invalid connectivity. The result of the present invention is a complete representation of hex mesh connectivity only; actual mesh node locations are determined later. The basic method of the present invention comprises the step of forming hexahedral elements by making crossings of entities referred to as "whisker chords." This step, combined with a seaming operation in space, is shown to be sufficient for meshing simple block problems. Entities that appear when meshing more complex geometries, namely blind chords, merged sheets, and self-intersecting chords, are described. A method for detecting invalid connectivity in space, based on repeated edges, is also described, along with its application to various cases of invalid connectivity introduced and resolved by the method.
Single bus star connected reluctance drive and method

DOEpatents

Fahimi, Babak; Shamsi, Pourya

2016-05-10

A system and methods for operating a switched reluctance machine includes a controller, an inverter connected to the controller and to the switched reluctance machine, a hysteresis control connected to the controller and to the inverter, a set of sensors connected to the switched reluctance machine and to the controller, the switched reluctance machine further including a set of phases the controller further comprising a processor and a memory connected to the processor, wherein the processor programmed to execute a control process and a generation process.
Software Defined GPS Receiver for International Space Station

NASA Technical Reports Server (NTRS)

Duncan, Courtney B.; Robison, David E.; Koelewyn, Cynthia Lee

2011-01-01

JPL is providing a software defined radio (SDR) that will fly on the International Space Station (ISS) as part of the CoNNeCT project under NASA's SCaN program. The SDR consists of several modules including a Baseband Processor Module (BPM) and a GPS Module (GPSM). The BPM executes applications (waveforms) consisting of software components for the embedded SPARC processor and logic for two Virtex II Field Programmable Gate Arrays (FPGAs) that operate on data received from the GPSM. GPS waveforms on the SDR are enabled by an L-Band antenna, low noise amplifier (LNA), and the GPSM that performs quadrature downconversion at L1, L2, and L5. The GPS waveform for the JPL SDR will acquire and track L1 C/A, L2C, and L5 GPS signals from a CoNNeCT platform on ISS, providing the best GPS-based positioning of ISS achieved to date, the first use of multiple frequency GPS on ISS, and potentially the first L5 signal tracking from space. The system will also enable various radiometric investigations on ISS such as local multipath or ISS dynamic behavior characterization. In following the software-defined model, this work will create a highly portable GPS software and firmware package that can be adapted to another platform with the necessary processor and FPGA capability. This paper also describes ISS applications for the JPL CoNNeCT SDR GPS waveform, possibilities for future global navigation satellite system (GNSS) tracking development, and the applicability of the waveform components to other space navigation applications.
The Tera Multithreaded Architecture and Unstructured Meshes

NASA Technical Reports Server (NTRS)

Bokhari, Shahid H.; Mavriplis, Dimitri J.

1998-01-01

The Tera Multithreaded Architecture (MTA) is a new parallel supercomputer currently being installed at San Diego Supercomputing Center (SDSC). This machine has an architecture quite different from contemporary parallel machines. The computational processor is a custom design and the machine uses hardware to support very fine grained multithreading. The main memory is shared, hardware randomized and flat. These features make the machine highly suited to the execution of unstructured mesh problems, which are difficult to parallelize on other architectures. We report the results of a study carried out during July-August 1998 to evaluate the execution of EUL3D, a code that solves the Euler equations on an unstructured mesh, on the 2 processor Tera MTA at SDSC. Our investigation shows that parallelization of an unstructured code is extremely easy on the Tera. We were able to get an existing parallel code (designed for a shared memory machine), running on the Tera by changing only the compiler directives. Furthermore, a serial version of this code was compiled to run in parallel on the Tera by judicious use of directives to invoke the "full/empty" tag bits of the machine to obtain synchronization. This version achieves 212 and 406 Mflop/s on one and two processors respectively, and requires no attention to partitioning or placement of data issues that would be of paramount importance in other parallel architectures.
Compute Element and Interface Box for the Hazard Detection System

NASA Technical Reports Server (NTRS)

Villalpando, Carlos Y.; Khanoyan, Garen; Stern, Ryan A.; Some, Raphael R.; Bailey, Erik S.; Carson, John M.; Vaughan, Geoffrey M.; Werner, Robert A.; Salomon, Phil M.; Martin, Keith E.;

2013-01-01

The Autonomous Landing and Hazard Avoidance Technology (ALHAT) program is building a sensor that enables a spacecraft to evaluate autonomously a potential landing area to generate a list of hazardous and safe landing sites. It will also provide navigation inputs relative to those safe sites. The Hazard Detection System Compute Element (HDS-CE) box combines a field-programmable gate array (FPGA) board for sensor integration and timing, with a multicore computer board for processing. The FPGA does system-level timing and data aggregation, and acts as a go-between, removing the real-time requirements from the processor and labeling events with a high resolution time. The processor manages the behavior of the system, controls the instruments connected to the HDS-CE, and services the "heavy lifting" computational requirements for analyzing the potential landing spots.

Fast neural net simulation with a DSP processor array.

PubMed

Muller, U A; Gunzinger, A; Guggenbuhl, W

1995-01-01

This paper describes the implementation of a fast neural net simulator on a novel parallel distributed-memory computer. A 60-processor system, named MUSIC (multiprocessor system with intelligent communication), is operational and runs the backpropagation algorithm at a speed of 330 million connection updates per second (continuous weight update) using 32-b floating-point precision. This is equal to 1.4 Gflops sustained performance. The complete system with 3.8 Gflops peak performance consumes less than 800 W of electrical power and fits into a 19-in rack. While reaching the speed of modern supercomputers, MUSIC still can be used as a personal desktop computer at a researcher's own disposal. In neural net simulation, this gives a computing performance to a single user which was unthinkable before. The system's real-time interfaces make it especially useful for embedded applications.
Implementation of context independent code on a new array processor: The Super-65

NASA Technical Reports Server (NTRS)

Colbert, R. O.; Bowhill, S. A.

1981-01-01

The feasibility of rewriting standard uniprocessor programs into code which contains no context-dependent branches is explored. Context independent code (CIC) would contain no branches that might require different processing elements to branch different ways. In order to investigate the possibilities and restrictions of CIC, several programs were recoded into CIC and a four-element array processor was built. This processor (the Super-65) consisted of three 6502 microprocessors and the Apple II microcomputer. The results obtained were somewhat dependent upon the specific architecture of the Super-65 but within bounds, the throughput of the array processor was found to increase linearly with the number of processing elements (PEs). The slope of throughput versus PEs is highly dependent on the program and varied from 0.33 to 1.00 for the sample programs.
Optical systolic array processor using residue arithmetic

NASA Technical Reports Server (NTRS)

Jackson, J.; Casasent, D.

1983-01-01

The use of residue arithmetic to increase the accuracy and reduce the dynamic range requirements of optical matrix-vector processors is evaluated. It is determined that matrix-vector operations and iterative algorithms can be performed totally in residue notation. A new parallel residue quantizer circuit is developed which significantly improves the performance of the systolic array feedback processor. Results are presented of a computer simulation of this system used to solve a set of three simultaneous equations.
Array-based Hierarchical Mesh Generation in Parallel

DOE PAGES

Ray, Navamita; Grindeanu, Iulian; Zhao, Xinglin; ...

2015-11-03

In this paper, we describe an array-based hierarchical mesh generation capability through uniform refinement of unstructured meshes for efficient solution of PDE's using finite element methods and multigrid solvers. A multi-degree, multi-dimensional and multi-level framework is designed to generate the nested hierarchies from an initial mesh that can be used for a number of purposes such as multi-level methods to generating large meshes. The capability is developed under the parallel mesh framework “Mesh Oriented dAtaBase” a.k.a MOAB. We describe the underlying data structures and algorithms to generate such hierarchies and present numerical results for computational efficiency and mesh quality. Inmore » conclusion, we also present results to demonstrate the applicability of the developed capability to a multigrid finite-element solver.« less
50 CFR 648.206 - Framework provisions.

Code of Federal Regulations, 2011 CFR

2011-10-01

... deducted when determining specifications; (8) Distribution of the ACL; (9) Gear restrictions (such as mesh... and bycatch monitoring; (27) Requirements for a herring processor survey; (28) ACL set-aside amounts...
50 CFR 648.206 - Framework provisions.

Code of Federal Regulations, 2013 CFR

2013-10-01

... deducted when determining specifications; (8) Distribution of the ACL; (9) Gear restrictions (such as mesh... and bycatch monitoring; (27) Requirements for a herring processor survey; (28) ACL set-aside amounts...
50 CFR 648.206 - Framework provisions.

Code of Federal Regulations, 2014 CFR

2014-10-01

... deducted when determining specifications; (8) Distribution of the ACL; (9) Gear restrictions (such as mesh... and bycatch monitoring; (27) Requirements for a herring processor survey; (28) ACL set-aside amounts...
50 CFR 648.206 - Framework provisions.

Code of Federal Regulations, 2012 CFR

2012-10-01

... deducted when determining specifications; (8) Distribution of the ACL; (9) Gear restrictions (such as mesh... and bycatch monitoring; (27) Requirements for a herring processor survey; (28) ACL set-aside amounts...
Digital Parallel Processor Array for Optimum Path Planning

NASA Technical Reports Server (NTRS)

Kremeny, Sabrina E. (Inventor); Fossum, Eric R. (Inventor); Nixon, Robert H. (Inventor)

1996-01-01

The invention computes the optimum path across a terrain or topology represented by an array of parallel processor cells interconnected between neighboring cells by links extending along different directions to the neighboring cells. Such an array is preferably implemented as a high-speed integrated circuit. The computation of the optimum path is accomplished by, in each cell, receiving stimulus signals from neighboring cells along corresponding directions, determining and storing the identity of a direction along which the first stimulus signal is received, broadcasting a subsequent stimulus signal to the neighboring cells after a predetermined delay time, whereby stimulus signals propagate throughout the array from a starting one of the cells. After propagation of the stimulus signal throughout the array, a master processor traces back from a selected destination cell to the starting cell along an optimum path of the cells in accordance with the identity of the directions stored in each of the cells.
Rapid geodesic mapping of brain functional connectivity: implementation of a dedicated co-processor in a field-programmable gate array (FPGA) and application to resting state functional MRI.

PubMed

Minati, Ludovico; Cercignani, Mara; Chan, Dennis

2013-10-01

Graph theory-based analyses of brain network topology can be used to model the spatiotemporal correlations in neural activity detected through fMRI, and such approaches have wide-ranging potential, from detection of alterations in preclinical Alzheimer's disease through to command identification in brain-machine interfaces. However, due to prohibitive computational costs, graph-based analyses to date have principally focused on measuring connection density rather than mapping the topological architecture in full by exhaustive shortest-path determination. This paper outlines a solution to this problem through parallel implementation of Dijkstra's algorithm in programmable logic. The processor design is optimized for large, sparse graphs and provided in full as synthesizable VHDL code. An acceleration factor between 15 and 18 is obtained on a representative resting-state fMRI dataset, and maps of Euclidean path length reveal the anticipated heterogeneous cortical involvement in long-range integrative processing. These results enable high-resolution geodesic connectivity mapping for resting-state fMRI in patient populations and real-time geodesic mapping to support identification of imagined actions for fMRI-based brain-machine interfaces. Copyright © 2013 IPEM. Published by Elsevier Ltd. All rights reserved.
Stereo matching and view interpolation based on image domain triangulation.

PubMed

Fickel, Guilherme Pinto; Jung, Claudio R; Malzbender, Tom; Samadani, Ramin; Culbertson, Bruce

2013-09-01

This paper presents a new approach for stereo matching and view interpolation problems based on triangular tessellations suitable for a linear array of rectified cameras. The domain of the reference image is initially partitioned into triangular regions using edge and scale information, aiming to place vertices along image edges and increase the number of triangles in textured regions. A region-based matching algorithm is then used to find an initial disparity for each triangle, and a refinement stage is applied to change the disparity at the vertices of the triangles, generating a piecewise linear disparity map. A simple post-processing procedure is applied to connect triangles with similar disparities generating a full 3D mesh related to each camera (view), which are used to generate new synthesized views along the linear camera array. With the proposed framework, view interpolation reduces to the trivial task of rendering polygonal meshes, which can be done very fast, particularly when GPUs are employed. Furthermore, the generated views are hole-free, unlike most point-based view interpolation schemes that require some kind of post-processing procedures to fill holes.
Real time processor for array speckle interferometry

NASA Astrophysics Data System (ADS)

Chin, Gordon; Florez, Jose; Borelli, Renan; Fong, Wai; Miko, Joseph; Trujillo, Carlos

1989-02-01

The authors are constructing a real-time processor to acquire image frames, perform array flat-fielding, execute a 64 x 64 element two-dimensional complex FFT (fast Fourier transform) and average the power spectrum, all within the 25 ms coherence time for speckles at near-IR (infrared) wavelength. The processor will be a compact unit controlled by a PC with real-time display and data storage capability. This will provide the ability to optimize observations and obtain results on the telescope rather than waiting several weeks before the data can be analyzed and viewed with offline methods. The image acquisition and processing, design criteria, and processor architecture are described.
Real time processor for array speckle interferometry

NASA Technical Reports Server (NTRS)

Chin, Gordon; Florez, Jose; Borelli, Renan; Fong, Wai; Miko, Joseph; Trujillo, Carlos

1989-01-01

The authors are constructing a real-time processor to acquire image frames, perform array flat-fielding, execute a 64 x 64 element two-dimensional complex FFT (fast Fourier transform) and average the power spectrum, all within the 25 ms coherence time for speckles at near-IR (infrared) wavelength. The processor will be a compact unit controlled by a PC with real-time display and data storage capability. This will provide the ability to optimize observations and obtain results on the telescope rather than waiting several weeks before the data can be analyzed and viewed with offline methods. The image acquisition and processing, design criteria, and processor architecture are described.
S-HARP: A parallel dynamic spectral partitioner

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sohn, A.; Simon, H.

1998-01-01

Computational science problems with adaptive meshes involve dynamic load balancing when implemented on parallel machines. This dynamic load balancing requires fast partitioning of computational meshes at run time. The authors present in this report a fast parallel dynamic partitioner, called S-HARP. The underlying principles of S-HARP are the fast feature of inertial partitioning and the quality feature of spectral partitioning. S-HARP partitions a graph from scratch, requiring no partition information from previous iterations. Two types of parallelism have been exploited in S-HARP, fine grain loop level parallelism and coarse grain recursive parallelism. The parallel partitioner has been implemented in Messagemore » Passing Interface on Cray T3E and IBM SP2 for portability. Experimental results indicate that S-HARP can partition a mesh of over 100,000 vertices into 256 partitions in 0.2 seconds on a 64 processor Cray T3E. S-HARP is much more scalable than other dynamic partitioners, giving over 15 fold speedup on 64 processors while ParaMeTiS1.0 gives a few fold speedup. Experimental results demonstrate that S-HARP is three to 10 times faster than the dynamic partitioners ParaMeTiS and Jostle on six computational meshes of size over 100,000 vertices.« less

Method and apparatus for connecting finite element meshes and performing simulations therewith

DOEpatents

Dohrmann, Clark R.; Key, Samuel W.; Heinstein, Martin W.

2003-05-06

The present invention provides a method of connecting dissimilar finite element meshes. A first mesh, designated the master mesh, and a second mesh, designated the slave mesh, each have interface surfaces proximal the other. Each interface surface has a corresponding interface mesh comprising a plurality of interface nodes. Each slave interface node is assigned new coordinates locating the interface node on the interface surface of the master mesh. The slave interface surface is further redefined to be the projection of the slave interface mesh onto the master interface surface.
Balancing Contention and Synchronization on the Intel Paragon

NASA Technical Reports Server (NTRS)

Bokhari, Shahid H.; Nicol, David M.

1996-01-01

The Intel Paragon is a mesh-connected distributed memory parallel computer. It uses an oblivious and deterministic message routing algorithm: this permits us to develop highly optimized schedules for frequently needed communication patterns. The complete exchange is one such pattern. Several approaches are available for carrying it out on the mesh. We study an algorithm developed by Scott. This algorithm assumes that a communication link can carry one message at a time and that a node can only transmit one message at a time. It requires global synchronization to enforce a schedule of transmissions. Unfortunately global synchronization has substantial overhead on the Paragon. At the same time the powerful interconnection mechanism of this machine permits 2 or 3 messages to share a communication link with minor overhead. It can also overlap multiple message transmission from the same node to some extent. We develop a generalization of Scott's algorithm that executes complete exchange with a prescribed contention. Schedules that incur greater contention require fewer synchronization steps. This permits us to tradeoff contention against synchronization overhead. We describe the performance of this algorithm and compare it with Scott's original algorithm as well as with a naive algorithm that does not take interconnection structure into account. The Bounded contention algorithm is always better than Scott's algorithm and outperforms the naive algorithm for all but the smallest message sizes. The naive algorithm fails to work on meshes larger than 12 x 12. These results show that due consideration of processor interconnect and machine performance parameters is necessary to obtain peak performance from the Paragon and its successor mesh machines.
Simulation of an array-based neural net model

NASA Technical Reports Server (NTRS)

Barnden, John A.

1987-01-01

Research in cognitive science suggests that much of cognition involves the rapid manipulation of complex data structures. However, it is very unclear how this could be realized in neural networks or connectionist systems. A core question is: how could the interconnectivity of items in an abstract-level data structure be neurally encoded? The answer appeals mainly to positional relationships between activity patterns within neural arrays, rather than directly to neural connections in the traditional way. The new method was initially devised to account for abstract symbolic data structures, but it also supports cognitively useful spatial analogue, image-like representations. As the neural model is based on massive, uniform, parallel computations over 2D arrays, the massively parallel processor is a convenient tool for simulation work, although there are complications in using the machine to the fullest advantage. An MPP Pascal simulation program for a small pilot version of the model is running.
Proceedings of the 14th International Conference on the Numerical Simulation of Plasmas

NASA Astrophysics Data System (ADS)

Partial Contents are as follows: Numerical Simulations of the Vlasov-Maxwell Equations by Coupled Particle-Finite Element Methods on Unstructured Meshes; Electromagnetic PIC Simulations Using Finite Elements on Unstructured Grids; Modelling Travelling Wave Output Structures with the Particle-in-Cell Code CONDOR; SST--A Single-Slice Particle Simulation Code; Graphical Display and Animation of Data Produced by Electromagnetic, Particle-in-Cell Codes; A Post-Processor for the PEST Code; Gray Scale Rendering of Beam Profile Data; A 2D Electromagnetic PIC Code for Distributed Memory Parallel Computers; 3-D Electromagnetic PIC Simulation on the NRL Connection Machine; Plasma PIC Simulations on MIMD Computers; Vlasov-Maxwell Algorithm for Electromagnetic Plasma Simulation on Distributed Architectures; MHD Boundary Layer Calculation Using the Vortex Method; and Eulerian Codes for Plasma Simulations.
Sub-nanosecond clock synchronization and trigger management in the nuclear physics experiment AGATA

NASA Astrophysics Data System (ADS)

Bellato, M.; Bortolato, D.; Chavas, J.; Isocrate, R.; Rampazzo, G.; Triossi, A.; Bazzacco, D.; Mengoni, D.; Recchia, F.

2013-07-01

The new-generation spectrometer AGATA, the Advanced GAmma Tracking Array, requires sub-nanosecond clock synchronization among readout and front-end electronics modules that may lie hundred meters apart. We call GTS (Global Trigger and Synchronization System) the infrastructure responsible for precise clock synchronization and for the trigger management of AGATA. It is made of a central trigger processor and nodes, connected in a tree structure by means of optical fibers operated at 2Gb/s. The GTS tree handles the synchronization and the trigger data flow, whereas the trigger processor analyses and eventually validates the trigger primitives centrally. Sub-nanosecond synchronization is achieved by measuring two different types of round-trip times and by automatically correcting for phase-shift differences. For a tree of depth two, the peak-to-peak clock jitter at each leaf is 70 ps; the mean phase difference is 180 ps, while the standard deviation over such phase difference, namely the phase equalization repeatability, is 20 ps. The GTS system has run flawlessly for the two-year long AGATA campaign, held at the INFN Legnaro National Laboratories, Italy, where five triple clusters of the AGATA sub-array were coupled with a variety of ancillary detectors.
A Parallel Cartesian Approach for External Aerodynamics of Vehicles with Complex Geometry

NASA Technical Reports Server (NTRS)

Aftosmis, M. J.; Berger, M. J.; Adomavicius, G.

2001-01-01

This workshop paper presents the current status in the development of a new approach for the solution of the Euler equations on Cartesian meshes with embedded boundaries in three dimensions on distributed and shared memory architectures. The approach uses adaptively refined Cartesian hexahedra to fill the computational domain. Where these cells intersect the geometry, they are cut by the boundary into arbitrarily shaped polyhedra which receive special treatment by the solver. The presentation documents a newly developed multilevel upwind solver based on a flexible domain-decomposition strategy. One novel aspect of the work is its use of space-filling curves (SFC) for memory efficient on-the-fly parallelization, dynamic re-partitioning and automatic coarse mesh generation. Within each subdomain the approach employs a variety reordering techniques so that relevant data are on the same page in memory permitting high-performance on cache-based processors. Details of the on-the-fly SFC based partitioning are presented as are construction rules for the automatic coarse mesh generation. After describing the approach, the paper uses model problems and 3- D configurations to both verify and validate the solver. The model problems demonstrate that second-order accuracy is maintained despite the presence of the irregular cut-cells in the mesh. In addition, it examines both parallel efficiency and convergence behavior. These investigations demonstrate a parallel speed-up in excess of 28 on 32 processors of an SGI Origin 2000 system and confirm that mesh partitioning has no effect on convergence behavior.
An Efficient Solution Method for Multibody Systems with Loops Using Multiple Processors

NASA Technical Reports Server (NTRS)

Ghosh, Tushar K.; Nguyen, Luong A.; Quiocho, Leslie J.

2015-01-01

This paper describes a multibody dynamics algorithm formulated for parallel implementation on multiprocessor computing platforms using the divide-and-conquer approach. The system of interest is a general topology of rigid and elastic articulated bodies with or without loops. The algorithm divides the multibody system into a number of smaller sets of bodies in chain or tree structures, called "branches" at convenient joints called "connection points", and uses an Order-N (O (N)) approach to formulate the dynamics of each branch in terms of the unknown spatial connection forces. The equations of motion for the branches, leaving the connection forces as unknowns, are implemented in separate processors in parallel for computational efficiency, and the equations for all the unknown connection forces are synthesized and solved in one or several processors. The performances of two implementations of this divide-and-conquer algorithm in multiple processors are compared with an existing method implemented on a single processor.
Performance Analysis and Portability of the PLUM Load Balancing System

NASA Technical Reports Server (NTRS)

Oliker, Leonid; Biswas, Rupak; Gabow, Harold N.

1998-01-01

The ability to dynamically adapt an unstructured mesh is a powerful tool for solving computational problems with evolving physical features; however, an efficient parallel implementation is rather difficult. To address this problem, we have developed PLUM, an automatic portable framework for performing adaptive numerical computations in a message-passing environment. PLUM requires that all data be globally redistributed after each mesh adaption to achieve load balance. We present an algorithm for minimizing this remapping overhead by guaranteeing an optimal processor reassignment. We also show that the data redistribution cost can be significantly reduced by applying our heuristic processor reassignment algorithm to the default mapping of the parallel partitioner. Portability is examined by comparing performance on a SP2, an Origin2000, and a T3E. Results show that PLUM can be successfully ported to different platforms without any code modifications.
Design of the SLAC RCE Platform: A General Purpose ATCA Based Data Acquisition System

DOE Office of Scientific and Technical Information (OSTI.GOV)

Herbst, R.; Claus, R.; Freytag, M.

2015-01-23

The SLAC RCE platform is a general purpose clustered data acquisition system implemented on a custom ATCA compliant blade, called the Cluster On Board (COB). The core of the system is the Reconfigurable Cluster Element (RCE), which is a system-on-chip design based upon the Xilinx Zynq family of FPGAs, mounted on custom COB daughter-boards. The Zynq architecture couples a dual core ARM Cortex A9 based processor with a high performance 28nm FPGA. The RCE has 12 external general purpose bi-directional high speed links, each supporting serial rates of up to 12Gbps. 8 RCE nodes are included on a COB, eachmore » with a 10Gbps connection to an on-board 24-port Ethernet switch integrated circuit. The COB is designed to be used with a standard full-mesh ATCA backplane allowing multiple RCE nodes to be tightly interconnected with minimal interconnect latency. Multiple shelves can be clustered using the front panel 10-gbps connections. The COB also supports local and inter-blade timing and trigger distribution. An experiment specific Rear Transition Module adapts the 96 high speed serial links to specific experiments and allows an experiment-specific timing and busy feedback connection. This coupling of processors with a high performance FPGA fabric in a low latency, multiple node cluster allows high speed data processing that can be easily adapted to any physics experiment. RTEMS and Linux are both ported to the module. The RCE has been used or is the baseline for several current and proposed experiments (LCLS, HPS, LSST, ATLAS-CSC, LBNE, DarkSide, ILC-SiD, etc).« less
System and method for cognitive processing for data fusion

NASA Technical Reports Server (NTRS)

Duong, Tuan A. (Inventor); Duong, Vu A. (Inventor)

2012-01-01

A system and method for cognitive processing of sensor data. A processor array receiving analog sensor data and having programmable interconnects, multiplication weights, and filters provides for adaptive learning in real-time. A static random access memory contains the programmable data for the processor array and the stored data is modified to provide for adaptive learning.
Iterative color-multiplexed, electro-optical processor.

PubMed

Psaltis, D; Casasent, D; Carlotto, M

1979-11-01

A noncoherent optical vector-matrix multiplier using a linear LED source array and a linear P-I-N photodiode detector array has been combined with a 1-D adder in a feedback loop. The resultant iterative optical processor and its use in solving simultaneous linear equations are described. Operation on complex data is provided by a novel color-multiplexing system.
Apparatus for and method of testing an electrical ground fault circuit interrupt device

DOEpatents

Andrews, L.B.

1998-08-18

An apparatus for testing a ground fault circuit interrupt device includes a processor, an input device connected to the processor for receiving input from an operator, a storage media connected to the processor for storing test data, an output device connected to the processor for outputting information corresponding to the test data to the operator, and a calibrated variable load circuit connected between the processor and the ground fault circuit interrupt device. The ground fault circuit interrupt device is configured to trip a corresponding circuit breaker. The processor is configured to receive signals from the calibrated variable load circuit and to process the signals to determine a trip threshold current and/or a trip time. A method of testing the ground fault circuit interrupt device includes a first step of providing an identification for the ground fault circuit interrupt device. Test data is then recorded in accordance with the identification. By comparing test data from an initial test with test data from a subsequent test, a trend of performance for the ground fault circuit interrupt device is determined. 17 figs.
Apparatus for and method of testing an electrical ground fault circuit interrupt device

DOEpatents

Andrews, Lowell B.

1998-01-01

An apparatus for testing a ground fault circuit interrupt device includes a processor, an input device connected to the processor for receiving input from an operator, a storage media connected to the processor for storing test data, an output device connected to the processor for outputting information corresponding to the test data to the operator, and a calibrated variable load circuit connected between the processor and the ground fault circuit interrupt device. The ground fault circuit interrupt device is configured to trip a corresponding circuit breaker. The processor is configured to receive signals from the calibrated variable load circuit and to process the signals to determine a trip threshold current and/or a trip time. A method of testing the ground fault circuit interrupt device includes a first step of providing an identification for the ground fault circuit interrupt device. Test data is then recorded in accordance with the identification. By comparing test data from an initial test with test data from a subsequent test, a trend of performance for the ground fault circuit interrupt device is determined.
Advances in Parallelization for Large Scale Oct-Tree Mesh Generation

NASA Technical Reports Server (NTRS)

O'Connell, Matthew; Karman, Steve L.

2015-01-01

Despite great advancements in the parallelization of numerical simulation codes over the last 20 years, it is still common to perform grid generation in serial. Generating large scale grids in serial often requires using special "grid generation" compute machines that can have more than ten times the memory of average machines. While some parallel mesh generation techniques have been proposed, generating very large meshes for LES or aeroacoustic simulations is still a challenging problem. An automated method for the parallel generation of very large scale off-body hierarchical meshes is presented here. This work enables large scale parallel generation of off-body meshes by using a novel combination of parallel grid generation techniques and a hybrid "top down" and "bottom up" oct-tree method. Meshes are generated using hardware commonly found in parallel compute clusters. The capability to generate very large meshes is demonstrated by the generation of off-body meshes surrounding complex aerospace geometries. Results are shown including a one billion cell mesh generated around a Predator Unmanned Aerial Vehicle geometry, which was generated on 64 processors in under 45 minutes.
Six-port optical switch for cluster-mesh photonic network-on-chip

NASA Astrophysics Data System (ADS)

Jia, Hao; Zhou, Ting; Zhao, Yunchou; Xia, Yuhao; Dai, Jincheng; Zhang, Lei; Ding, Jianfeng; Fu, Xin; Yang, Lin

2018-05-01

Photonic network-on-chip for high-performance multi-core processors has attracted substantial interest in recent years as it offers a systematic method to meet the demand of large bandwidth, low latency and low power dissipation. In this paper we demonstrate a non-blocking six-port optical switch for cluster-mesh photonic network-on-chip. The architecture is constructed by substituting three optical switching units of typical Spanke-Benes network to optical waveguide crossings. Compared with Spanke-Benes network, the number of optical switching units is reduced by 20%, while the connectivity of routing path is maintained. By this way the footprint and power consumption can be reduced at the expense of sacrificing the network latency performance in some cases. The device is realized by 12 thermally tuned silicon Mach-Zehnder optical switching units. Its theoretical spectral responses are evaluated by establishing a numerical model. The experimental spectral responses are also characterized, which indicates that the optical signal-to-noise ratios of the optical switch are larger than 13.5 dB in the wavelength range from 1525 nm to 1565 nm. Data transmission experiment with the data rate of 32 Gbps is implemented for each optical link.
Parallel deterministic transport sweeps of structured and unstructured meshes with overloaded mesh decompositions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pautz, Shawn D.; Bailey, Teresa S.

Here, the efficiency of discrete ordinates transport sweeps depends on the scheduling algorithm, the domain decomposition, the problem to be solved, and the computational platform. Sweep scheduling algorithms may be categorized by their approach to several issues. In this paper we examine the strategy of domain overloading for mesh partitioning as one of the components of such algorithms. In particular, we extend the domain overloading strategy, previously defined and analyzed for structured meshes, to the general case of unstructured meshes. We also present computational results for both the structured and unstructured domain overloading cases. We find that an appropriate amountmore » of domain overloading can greatly improve the efficiency of parallel sweeps for both structured and unstructured partitionings of the test problems examined on up to 10 5 processor cores.« less
Parallel deterministic transport sweeps of structured and unstructured meshes with overloaded mesh decompositions

DOE PAGES

Pautz, Shawn D.; Bailey, Teresa S.

2016-11-29

Here, the efficiency of discrete ordinates transport sweeps depends on the scheduling algorithm, the domain decomposition, the problem to be solved, and the computational platform. Sweep scheduling algorithms may be categorized by their approach to several issues. In this paper we examine the strategy of domain overloading for mesh partitioning as one of the components of such algorithms. In particular, we extend the domain overloading strategy, previously defined and analyzed for structured meshes, to the general case of unstructured meshes. We also present computational results for both the structured and unstructured domain overloading cases. We find that an appropriate amountmore » of domain overloading can greatly improve the efficiency of parallel sweeps for both structured and unstructured partitionings of the test problems examined on up to 10 5 processor cores.« less
Distributed processor allocation for launching applications in a massively connected processors complex

DOEpatents

Pedretti, Kevin

2008-11-18

A compute processor allocator architecture for allocating compute processors to run applications in a multiple processor computing apparatus is distributed among a subset of processors within the computing apparatus. Each processor of the subset includes a compute processor allocator. The compute processor allocators can share a common database of information pertinent to compute processor allocation. A communication path permits retrieval of information from the database independently of the compute processor allocators.
LR: Compact connectivity representation for triangle meshes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gurung, T; Luffel, M; Lindstrom, P

2011-01-28

We propose LR (Laced Ring) - a simple data structure for representing the connectivity of manifold triangle meshes. LR provides the option to store on average either 1.08 references per triangle or 26.2 bits per triangle. Its construction, from an input mesh that supports constant-time adjacency queries, has linear space and time complexity, and involves ordering most vertices along a nearly-Hamiltonian cycle. LR is best suited for applications that process meshes with fixed connectivity, as any changes to the connectivity require the data structure to be rebuilt. We provide an implementation of the set of standard random-access, constant-time operators formore » traversing a mesh, and show that LR often saves both space and traversal time over competing representations.« less
Design and Evaluation of Fault-Tolerant VLSI/WSI Processor Arrays.

DTIC Science & Technology

1987-12-31

studies reported in this paper. In Section .3, the reliabuility characteristics of single-level FTPA’s are discusseri. Four different type of FTPA’s are...for processor arrays are proposed and studied . Stu- dies on algorithmic and software aspects relevant to systems are reported in items 4, 5, 8, 12 and...O’Keefe M., and Fortes, J. A. B., "A Comparative Study of Two Systematic Design Methodologies for Systolic Arrays," (Long Version) International Workshop on

SWARM: A 32 GHz Correlator and VLBI Beamformer for the Submillimeter Array

NASA Astrophysics Data System (ADS)

Primiani, Rurik A.; Young, Kenneth H.; Young, André; Patel, Nimesh; Wilson, Robert W.; Vertatschitsch, Laura; Chitwood, Billie B.; Srinivasan, Ranjani; MacMahon, David; Weintroub, Jonathan

2016-03-01

A 32GHz bandwidth VLBI capable correlator and phased array has been designed and deployeda at the Smithsonian Astrophysical Observatory’s Submillimeter Array (SMA). The SMA Wideband Astronomical ROACH2 Machine (SWARM) integrates two instruments: a correlator with 140kHz spectral resolution across its full 32GHz band, used for connected interferometric observations, and a phased array summer used when the SMA participates as a station in the Event Horizon Telescope (EHT) very long baseline interferometry (VLBI) array. For each SWARM quadrant, Reconfigurable Open Architecture Computing Hardware (ROACH2) units shared under open-source from the Collaboration for Astronomy Signal Processing and Electronics Research (CASPER) are equipped with a pair of ultra-fast analog-to-digital converters (ADCs), a field programmable gate array (FPGA) processor, and eight 10 Gigabit Ethernet (GbE) ports. A VLBI data recorder interface designated the SWARM digital back end, or SDBE, is implemented with a ninth ROACH2 per quadrant, feeding four Mark6 VLBI recorders with an aggregate recording rate of 64 Gbps. This paper describes the design and implementation of SWARM, as well as its deployment at SMA with reference to verification and science data.
A Full Mesh ATCA-based General Purpose Data Processing Board (Pulsar II)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ajuha, S.

The Pulsar II is a custom ATCA full mesh enabled FPGA-based processor board which has been designed with the goal of creating a scalable architecture abundant in flexible, non-blocking, high bandwidth interconnections. The design has been motivated by silicon-based tracking trigger needs for LHC experiments. In this technical memo we describe the Pulsar II hardware and its performance, such as the performance test results with full mesh backplanes from different vendors, how the backplane is used for the development of low-latency time-multiplexed data transfer schemes and how the inter-shelf and intra-shelf synchronization works.
Optimization of multiple turbine arrays in a channel with tidally reversing flow by numerical modelling with adaptive mesh.

PubMed

Divett, T; Vennell, R; Stevens, C

2013-02-28

At tidal energy sites, large arrays of hundreds of turbines will be required to generate economically significant amounts of energy. Owing to wake effects within the array, the placement of turbines within will be vital to capturing the maximum energy from the resource. This study presents preliminary results using Gerris, an adaptive mesh flow solver, to investigate the flow through four different arrays of 15 turbines each. The goal is to optimize the position of turbines within an array in an idealized channel. The turbines are represented as areas of increased bottom friction in an adaptive mesh model so that the flow and power capture in tidally reversing flow through large arrays can be studied. The effect of oscillating tides is studied, with interesting dynamics generated as the tidal current reverses direction, forcing turbulent flow through the array. The energy removed from the flow by each of the four arrays is compared over a tidal cycle. A staggered array is found to extract 54 per cent more energy than a non-staggered array. Furthermore, an array positioned to one side of the channel is found to remove a similar amount of energy compared with an array in the centre of the channel.
Interprocessor bus switching system for simultaneous communication in plural bus parallel processing system

DOEpatents

Atac, R.; Fischler, M.S.; Husby, D.E.

1991-01-15

A bus switching apparatus and method for multiple processor computer systems comprises a plurality of bus switches interconnected by branch buses. Each processor or other module of the system is connected to a spigot of a bus switch. Each bus switch also serves as part of a backplane of a modular crate hardware package. A processor initiates communication with another processor by identifying that other processor. The bus switch to which the initiating processor is connected identifies and secures, if possible, a path to that other processor, either directly or via one or more other bus switches which operate similarly. If a particular desired path through a given bus switch is not available to be used, an alternate path is considered, identified and secured. 11 figures.
Interprocessor bus switching system for simultaneous communication in plural bus parallel processing system

DOEpatents

Atac, Robert; Fischler, Mark S.; Husby, Donald E.

1991-01-01

A bus switching apparatus and method for multiple processor computer systems comprises a plurality of bus switches interconnected by branch buses. Each processor or other module of the system is connected to a spigot of a bus switch. Each bus switch also serves as part of a backplane of a modular crate hardware package. A processor initiates communication with another processor by identifying that other processor. The bus switch to which the initiating processor is connected identifies and secures, if possible, a path to that other processor, either directly or via one or more other bus switches which operate similarly. If a particular desired path through a given bus switch is not available to be used, an alternate path is considered, identified and secured.
Radiation Hardened Electronics for Extreme Environments

NASA Technical Reports Server (NTRS)

Keys, Andrew S.; Watson, Michael D.

2007-01-01

The Radiation Hardened Electronics for Space Environments (RHESE) project consists of a series of tasks designed to develop and mature a broad spectrum of radiation hardened and low temperature electronics technologies. Three approaches are being taken to address radiation hardening: improved material hardness, design techniques to improve radiation tolerance, and software methods to improve radiation tolerance. Within these approaches various technology products are being addressed including Field Programmable Gate Arrays (FPGA), Field Programmable Analog Arrays (FPAA), MEMS Serial Processors, Reconfigurable Processors, and Parallel Processors. In addition to radiation hardening, low temperature extremes are addressed with a focus on material and design approaches.
Options for Parallelizing a Planning and Scheduling Algorithm

NASA Technical Reports Server (NTRS)

Clement, Bradley J.; Estlin, Tara A.; Bornstein, Benjamin D.

2011-01-01

Space missions have a growing interest in putting multi-core processors onboard spacecraft. For many missions processing power significantly slows operations. We investigate how continual planning and scheduling algorithms can exploit multi-core processing and outline different potential design decisions for a parallelized planning architecture. This organization of choices and challenges helps us with an initial design for parallelizing the CASPER planning system for a mesh multi-core processor. This work extends that presented at another workshop with some preliminary results.
Multitask neurovision processor with extensive feedback and feedforward connections

NASA Astrophysics Data System (ADS)

Gupta, Madan M.; Knopf, George K.

1991-11-01

A multi-task neuro-vision parameter which performs a variety of information processing operations associated with the early stages of biological vision is presented. The network architecture of this neuro-vision processor, called the positive-negative (PN) neural processor, is loosely based on the neural activity fields exhibited by thalamic and cortical nervous tissue layers. The computational operation performed by the processor arises from the strength of the recurrent feedback among the numerous positive and negative neural computing units. By adjusting the feedback connections it is possible to generate diverse dynamic behavior that may be used for short-term visual memory (STVM), spatio-temporal filtering (STF), and pulse frequency modulation (PFM). The information attributes that are to be processes may be regulated by modifying the feedforward connections from the signal space to the neural processor.
Methods of Real Time Image Enhancement of Flash LIDAR Data and Navigating a Vehicle Using Flash LIDAR Data

NASA Technical Reports Server (NTRS)

Vanek, Michael D. (Inventor)

2014-01-01

A method for creating a digital elevation map ("DEM") from frames of flash LIDAR data includes generating a first distance R(sub i) from a first detector i to a first point on a surface S(sub i). After defining a map with a mesh THETA having cells k, a first array S(k), a second array M(k), and a third array D(k) are initialized. The first array corresponds to the surface, the second array corresponds to the elevation map, and the third array D(k) receives an output for the DEM. The surface is projected onto the mesh THETA, so that a second distance R(sub k) from a second point on the mesh THETA to the detector can be found. From this, a height may be calculated, which permits the generation of a digital elevation map. Also, using sequential frames of flash LIDAR data, vehicle control is possible using an offset between successive frames.
Method for Enhancing a Three Dimensional Image from a Plurality of Frames of Flash LIDAR Data

NASA Technical Reports Server (NTRS)

Bulyshev, Alexander (Inventor); Vanek, Michael D. (Inventor); Amzajerdian, Farzin (Inventor)

2013-01-01

A method for enhancing a three dimensional image from frames of flash LIDAR data includes generating a first distance R(sub i) from a first detector i to a first point on a surface S(sub i). After defining a map with a mesh theta having cells k, a first array S(k), a second array M(k), and a third array D(k) are initialized. The first array corresponds to the surface, the second array corresponds to the elevation map, and the third array D(k) receives an output for the DEM. The surface is projected onto the mesh theta, so that a second distance R(sub k) from a second point on the mesh theta to the detector can be found. From this, a height may be calculated, which permits the generation of a digital elevation map. Also, using sequential frames of flash LIDAR data, vehicle control is possible using an offset between successive frames.
Noncoherent parallel optical processor for discrete two-dimensional linear transformations.

PubMed

Glaser, I

1980-10-01

We describe a parallel optical processor, based on a lenslet array, that provides general linear two-dimensional transformations using noncoherent light. Such a processor could become useful in image- and signal-processing applications in which the throughput requirements cannot be adequately satisfied by state-of-the-art digital processors. Experimental results that illustrate the feasibility of the processor by demonstrating its use in parallel optical computation of the two-dimensional Walsh-Hadamard transformation are presented.
Free Mesh Method: fundamental conception, algorithms and accuracy study

PubMed Central

YAGAWA, Genki

2011-01-01

The finite element method (FEM) has been commonly employed in a variety of fields as a computer simulation method to solve such problems as solid, fluid, electro-magnetic phenomena and so on. However, creation of a quality mesh for the problem domain is a prerequisite when using FEM, which becomes a major part of the cost of a simulation. It is natural that the concept of meshless method has evolved. The free mesh method (FMM) is among the typical meshless methods intended for particle-like finite element analysis of problems that are difficult to handle using global mesh generation, especially on parallel processors. FMM is an efficient node-based finite element method that employs a local mesh generation technique and a node-by-node algorithm for the finite element calculations. In this paper, FMM and its variation are reviewed focusing on their fundamental conception, algorithms and accuracy. PMID:21558752
A unified approach to VLSI layout automation and algorithm mapping on processor arrays

NASA Technical Reports Server (NTRS)

Venkateswaran, N.; Pattabiraman, S.; Srinivasan, Vinoo N.

1993-01-01

Development of software tools for designing supercomputing systems is highly complex and cost ineffective. To tackle this a special purpose PAcube silicon compiler which integrates different design levels from cell to processor arrays has been proposed. As a part of this, we present in this paper a novel methodology which unifies the problems of Layout Automation and Algorithm Mapping.
Transparent thin shield for radio frequency transmit coils.

PubMed

Rivera, Debra S; Schulz, Jessica; Siegert, Thomas; Zuber, Verena; Turner, Robert

2015-02-01

To identify a shielding material compatible with optical head-motion tracking for prospective motion correction and which minimizes radio frequency (RF) radiation losses at 7 T without sacrificing line-of-sight to an imaging target. We evaluated a polyamide mesh coated with silver. The thickness of the coating was approximated from the composition ratio provided by the material vendor and validated by an estimate derived from electrical conductivity and light transmission measurements. The performance of the shield is compared to a split-copper shield in the context of a four-channel transmit-only loop array. The mesh contains less than a skin-depth of silver coating (300 MHz) and attenuates light by 15 %. Elements of the array vary less in the presence of the mesh shield as compared to the split-copper shield indicating that the array behaves more symmetrically with the mesh shield. No degradation of transmit efficiency was observed for the mesh as compared to the split-copper shield. We present a shield compatible with future integration of camera-based motion-tracking systems. Based on transmit performance and eddy-current evaluations the mesh shield is appropriate for use at 7 T.
Fault-Tolerant, Real-Time, Multi-Core Computer System

NASA Technical Reports Server (NTRS)

Gostelow, Kim P.

2012-01-01

A document discusses a fault-tolerant, self-aware, low-power, multi-core computer for space missions with thousands of simple cores, achieving speed through concurrency. The proposed machine decides how to achieve concurrency in real time, rather than depending on programmers. The driving features of the system are simple hardware that is modular in the extreme, with no shared memory, and software with significant runtime reorganizing capability. The document describes a mechanism for moving ongoing computations and data that is based on a functional model of execution. Because there is no shared memory, the processor connects to its neighbors through a high-speed data link. Messages are sent to a neighbor switch, which in turn forwards that message on to its neighbor until reaching the intended destination. Except for the neighbor connections, processors are isolated and independent of each other. The processors on the periphery also connect chip-to-chip, thus building up a large processor net. There is no particular topology to the larger net, as a function at each processor allows it to forward a message in the correct direction. Some chip-to-chip connections are not necessarily nearest neighbors, providing short cuts for some of the longer physical distances. The peripheral processors also provide the connections to sensors, actuators, radios, science instruments, and other devices with which the computer system interacts.
Toshiba TDF-500 High Resolution Viewing And Analysis System

NASA Astrophysics Data System (ADS)

Roberts, Barry; Kakegawa, M.; Nishikawa, M.; Oikawa, D.

1988-06-01

A high resolution, operator interactive, medical viewing and analysis system has been developed by Toshiba and Bio-Imaging Research. This system provides many advanced features including high resolution displays, a very large image memory and advanced image processing capability. In particular, the system provides CRT frame buffers capable of update in one frame period, an array processor capable of image processing at operator interactive speeds, and a memory system capable of updating multiple frame buffers at frame rates whilst supporting multiple array processors. The display system provides 1024 x 1536 display resolution at 40Hz frame and 80Hz field rates. In particular, the ability to provide whole or partial update of the screen at the scanning rate is a key feature. This allows multiple viewports or windows in the display buffer with both fixed and cine capability. To support image processing features such as windowing, pan, zoom, minification, filtering, ROI analysis, multiplanar and 3D reconstruction, a high performance CPU is integrated into the system. This CPU is an array processor capable of up to 400 million instructions per second. To support the multiple viewer and array processors' instantaneous high memory bandwidth requirement, an ultra fast memory system is used. This memory system has a bandwidth capability of 400MB/sec and a total capacity of 256MB. This bandwidth is more than adequate to support several high resolution CRT's and also the fast processing unit. This fully integrated approach allows effective real time image processing. The integrated design of viewing system, memory system and array processor are key to the imaging system. It is the intention to describe the architecture of the image system in this paper.
Systems and methods for performing wireless financial transactions

DOE Office of Scientific and Technical Information (OSTI.GOV)

McCown, Steven Harvey

2012-07-03

A secure computing module (SCM) is configured for connection with a host device. The SCM includes a processor for performing secure processing operations, a host interface for coupling the processor to the host device, and a memory connected to the processor wherein the processor logically isolates at least some of the memory from access by the host device. The SCM also includes a proximate-field wireless communicator connected to the processor to communicate with another SCM associated with another host device. The SCM generates a secure digital signature for a financial transaction package and communicates the package and the signature tomore » the other SCM using the proximate-field wireless communicator. Financial transactions are performed from person to person using the secure digital signature of each person's SCM and possibly message encryption. The digital signatures and transaction details are communicated to appropriate financial organizations to authenticate the transaction parties and complete the transaction.« less
Introduction to Parallel Computing

DTIC Science & Technology

1992-05-01

Instruction Stream, Multiple Data Stream Machines .................... 19 2.4 Networks of M achines...independent memory units and connecting them to the processors by an interconnection network . Many different interconnection schemes have been considered, and...connected to the same processor at the same time. Crossbar switching networks are still too expensive to be practical for connecting large numbers of
A Domain-Decomposed Multilevel Method for Adaptively Refined Cartesian Grids with Embedded Boundaries

NASA Technical Reports Server (NTRS)

Aftosmis, M. J.; Berger, M. J.; Adomavicius, G.

2000-01-01

Preliminary verification and validation of an efficient Euler solver for adaptively refined Cartesian meshes with embedded boundaries is presented. The parallel, multilevel method makes use of a new on-the-fly parallel domain decomposition strategy based upon the use of space-filling curves, and automatically generates a sequence of coarse meshes for processing by the multigrid smoother. The coarse mesh generation algorithm produces grids which completely cover the computational domain at every level in the mesh hierarchy. A series of examples on realistically complex three-dimensional configurations demonstrate that this new coarsening algorithm reliably achieves mesh coarsening ratios in excess of 7 on adaptively refined meshes. Numerical investigations of the scheme's local truncation error demonstrate an achieved order of accuracy between 1.82 and 1.88. Convergence results for the multigrid scheme are presented for both subsonic and transonic test cases and demonstrate W-cycle multigrid convergence rates between 0.84 and 0.94. Preliminary parallel scalability tests on both simple wing and complex complete aircraft geometries shows a computational speedup of 52 on 64 processors using the run-time mesh partitioner.
Frequency-multiplexed and pipelined iterative optical systolic array processors

NASA Technical Reports Server (NTRS)

Casasent, D.; Jackson, J.; Neuman, C.

1983-01-01

Optical matrix processors using acoustooptic transducers are described, with emphasis on new systolic array architectures using frequency multiplexing in addition to space and time multiplexing. A Kalman filtering application is considered in a case study from which the operations required on such a system can be defined. This also serves as a new and powerful application for iterative optical processors. The importance of pipelining the data flow and the ordering of the operations performed in a specific application of such a system are also noted. Several examples of how to effectively achieve this are included. A new technique for handling bipolar data on such architectures is also described.

Eigensolution of finite element problems in a completely connected parallel architecture

NASA Technical Reports Server (NTRS)

Akl, F.; Morel, M.

1989-01-01

A parallel algorithm is presented for the solution of the generalized eigenproblem in linear elastic finite element analysis. The algorithm is based on a completely connected parallel architecture in which each processor is allowed to communicate with all other processors. The algorithm is successfully implemented on a tightly coupled MIMD parallel processor. A finite element model is divided into m domains each of which is assumed to process n elements. Each domain is then assigned to a processor or to a logical processor (task) if the number of domains exceeds the number of physical processors. The effect of the number of domains, the number of degrees-of-freedom located along the global fronts, and the dimension of the subspace on the performance of the algorithm is investigated. For a 64-element rectangular plate, speed-ups of 1.86, 3.13, 3.18, and 3.61 are achieved on two, four, six, and eight processors, respectively.
Mechanism to support generic collective communication across a variety of programming models

DOEpatents

Almasi, Gheorghe [Ardsley, NY; Dozsa, Gabor [Ardsley, NY; Kumar, Sameer [White Plains, NY

2011-07-19

A system and method for supporting collective communications on a plurality of processors that use different parallel programming paradigms, in one aspect, may comprise a schedule defining one or more tasks in a collective operation, an executor that executes the task, a multisend module to perform one or more data transfer functions associated with the tasks, and a connection manager that controls one or more connections and identifies an available connection. The multisend module uses the available connection in performing the one or more data transfer functions. A plurality of processors that use different parallel programming paradigms can use a common implementation of the schedule module, the executor module, the connection manager and the multisend module via a language adaptor specific to a parallel programming paradigm implemented on a processor.
Multichannel signal enhancement

DOEpatents

Lewis, Paul S.

1990-01-01

A mixed adaptive filter is formulated for the signal processing problem where desired a priori signal information is not available. The formulation generates a least squares problem which enables the filter output to be calculated directly from an input data matrix. In one embodiment, a folded processor array enables bidirectional data flow to solve the recursive problem by back substitution without global communications. In another embodiment, a balanced processor array solves the recursive problem by forward elimination through the array. In a particular application to magnetoencephalography, the mixed adaptive filter enables an evoked response to an auditory stimulus to be identified from only a single trial.
Earth As An Unstructured Mesh and Its Recovery from Seismic Waveform Data

NASA Astrophysics Data System (ADS)

De Hoop, M. V.

2015-12-01

We consider multi-scale representations of Earth's interior from thepoint of view of their possible recovery from multi- andhigh-frequency seismic waveform data. These representations areintrinsically connected to (geologic, tectonic) structures, that is,geometric parametrizations of Earth's interior. Indeed, we address theconstruction and recovery of such parametrizations using localiterative methods with appropriately designed data misfits andguaranteed convergence. The geometric parametrizations containinterior boundaries (defining, for example, faults, salt bodies,tectonic blocks, slabs) which can, in principle, be obtained fromsuccessive segmentation. We make use of unstructured meshes. For the adaptation and recovery of an unstructured mesh we introducean energy functional which is derived from the Hausdorff distance. Viaan augmented Lagrangian method, we incorporate the mentioned datamisfit. The recovery is constrained by shape optimization of theinterior boundaries, and is reminiscent of Hausdorff warping. We useelastic deformation via finite elements as a regularization whilefollowing a two-step procedure. The first step is an update determinedby the energy functional; in the second step, we modify the outcome ofthe first step where necessary to ensure that the new mesh isregular. This modification entails an array of techniques includingtopology correction involving interior boundary contacting andbreakup, edge warping and edge removal. We implement this as afeed-back mechanism from volume to interior boundary meshesoptimization. We invoke and apply a criterion of mesh quality controlfor coarsening, and for dynamical local multi-scale refinement. Wepresent a novel (fluid-solid) numerical framework based on theDiscontinuous Galerkin method.
Representing and computing regular languages on massively parallel networks

DOE Office of Scientific and Technical Information (OSTI.GOV)

Miller, M.I.; O'Sullivan, J.A.; Boysam, B.

1991-01-01

This paper proposes a general method for incorporating rule-based constraints corresponding to regular languages into stochastic inference problems, thereby allowing for a unified representation of stochastic and syntactic pattern constraints. The authors' approach first established the formal connection of rules to Chomsky grammars, and generalizes the original work of Shannon on the encoding of rule-based channel sequences to Markov chains of maximum entropy. This maximum entropy probabilistic view leads to Gibb's representations with potentials which have their number of minima growing at precisely the exponential rate that the language of deterministically constrained sequences grow. These representations are coupled to stochasticmore » diffusion algorithms, which sample the language-constrained sequences by visiting the energy minima according to the underlying Gibbs' probability law. The coupling to stochastic search methods yields the all-important practical result that fully parallel stochastic cellular automata may be derived to generate samples from the rule-based constraint sets. The production rules and neighborhood state structure of the language of sequences directly determines the necessary connection structures of the required parallel computing surface. Representations of this type have been mapped to the DAP-510 massively-parallel processor consisting of 1024 mesh-connected bit-serial processing elements for performing automated segmentation of electron-micrograph images.« less
Prototype Focal-Plane-Array Optoelectronic Image Processor

NASA Technical Reports Server (NTRS)

Fang, Wai-Chi; Shaw, Timothy; Yu, Jeffrey

1995-01-01

Prototype very-large-scale integrated (VLSI) planar array of optoelectronic processing elements combines speed of optical input and output with flexibility of reconfiguration (programmability) of electronic processing medium. Basic concept of processor described in "Optical-Input, Optical-Output Morphological Processor" (NPO-18174). Performs binary operations on binary (black and white) images. Each processing element corresponds to one picture element of image and located at that picture element. Includes input-plane photodetector in form of parasitic phototransistor part of processing circuit. Output of each processing circuit used to modulate one picture element in output-plane liquid-crystal display device. Intended to implement morphological processing algorithms that transform image into set of features suitable for high-level processing; e.g., recognition.
Technology Developments in Radiation-Hardened Electronics for Space Environments

NASA Technical Reports Server (NTRS)

Keys, Andrew S.; Howell, Joe T.

2008-01-01

The Radiation Hardened Electronics for Space Environments (RHESE) project consists of a series of tasks designed to develop and mature a broad spectrum of radiation hardened and low temperature electronics technologies. Three approaches are being taken to address radiation hardening: improved material hardness, design techniques to improve radiation tolerance, and software methods to improve radiation tolerance. Within these approaches various technology products are being addressed including Field Programmable Gate Arrays (FPGA), Field Programmable Analog Arrays (FPAA), MEMS, Serial Processors, Reconfigurable Processors, and Parallel Processors. In addition to radiation hardening, low temperature extremes are addressed with a focus on material and design approaches. System level applications for the RHESE technology products are discussed.
Supercomputer modeling of flow past hypersonic flight vehicles

NASA Astrophysics Data System (ADS)

Ermakov, M. K.; Kryukov, I. A.

2017-02-01

A software platform for MPI-based parallel solution of the Navier-Stokes (Euler) equations for viscous heat-conductive compressible perfect gas on 3-D unstructured meshes is developed. The discretization and solution of the Navier-Stokes equations are constructed on generalized S.K. Godunov’s method and the second order approximation in space and time. Developed software platform allows to carry out effectively flow past hypersonic flight vehicles simulations for the Mach numbers 6 and higher, and numerical meshes with up to 1 billion numerical cells and with up to 128 processors.
The MasPar MP-1 As a Computer Arithmetic Laboratory

PubMed Central

Anuta, Michael A.; Lozier, Daniel W.; Turner, Peter R.

1996-01-01

This paper is a blueprint for the use of a massively parallel SIMD computer architecture for the simulation of various forms of computer arithmetic. The particular system used is a DEC/MasPar MP-1 with 4096 processors in a square array. This architecture has many advantages for such simulations due largely to the simplicity of the individual processors. Arithmetic operations can be spread across the processor array to simulate a hardware chip. Alternatively they may be performed on individual processors to allow simulation of a massively parallel implementation of the arithmetic. Compromises between these extremes permit speed-area tradeoffs to be examined. The paper includes a description of the architecture and its features. It then summarizes some of the arithmetic systems which have been, or are to be, implemented. The implementation of the level-index and symmetric level-index, LI and SLI, systems is described in some detail. An extensive bibliography is included. PMID:27805123
Novel processor architecture for onboard infrared sensors

NASA Astrophysics Data System (ADS)

Hihara, Hiroki; Iwasaki, Akira; Tamagawa, Nobuo; Kuribayashi, Mitsunobu; Hashimoto, Masanori; Mitsuyama, Yukio; Ochi, Hiroyuki; Onodera, Hidetoshi; Kanbara, Hiroyuki; Wakabayashi, Kazutoshi; Tada, Munehiro

2016-09-01

Infrared sensor system is a major concern for inter-planetary missions that investigate the nature and the formation processes of planets and asteroids. The infrared sensor system requires signal preprocessing functions that compensate for the intensity of infrared image sensors to get high quality data and high compression ratio through the limited capacity of transmission channels towards ground stations. For those implementations, combinations of Field Programmable Gate Arrays (FPGAs) and microprocessors are employed by AKATSUKI, the Venus Climate Orbiter, and HAYABUSA2, the asteroid probe. On the other hand, much smaller size and lower power consumption are demanded for future missions to accommodate more sensors. To fulfill this future demand, we developed a novel processor architecture which consists of reconfigurable cluster cores and programmable-logic cells with complementary atom switches. The complementary atom switches enable hardware programming without configuration memories, and thus soft-error on logic circuit connection is completely eliminated. This is a noteworthy advantage for space applications which cannot be found in conventional re-writable FPGAs. Almost one-tenth of lower power consumption is expected compared to conventional re-writable FPGAs because of the elimination of configuration memories. The proposed processor architecture can be reconfigured by behavioral synthesis with higher level language specification. Consequently, compensation functions are implemented in a single chip without accommodating program memories, which is accompanied with conventional microprocessors, while maintaining the comparable performance. This enables us to embed a processor element on each infrared signal detector output channel.
Microlens array processor with programmable weight mask and direct optical input

NASA Astrophysics Data System (ADS)

Schmid, Volker R.; Lueder, Ernst H.; Bader, Gerhard; Maier, Gert; Siegordner, Jochen

1999-03-01

We present an optical feature extraction system with a microlens array processor. The system is suitable for online implementation of a variety of transforms such as the Walsh transform and DCT. Operating with incoherent light, our processor accepts direct optical input. Employing a sandwich- like architecture, we obtain a very compact design of the optical system. The key elements of the microlens array processor are a square array of 15 X 15 spherical microlenses on acrylic substrate and a spatial light modulator as transmissive mask. The light distribution behind the mask is imaged onto the pixels of a customized a-Si image sensor with adjustable gain. We obtain one output sample for each microlens image and its corresponding weight mask area as summation of the transmitted intensity within one sensor pixel. The resulting architecture is very compact and robust like a conventional camera lens while incorporating a high degree of parallelism. We successfully demonstrate a Walsh transform into the spatial frequency domain as well as the implementation of a discrete cosine transform with digitized gray values. We provide results showing the transformation performance for both synthetic image patterns and images of natural texture samples. The extracted frequency features are suitable for neural classification of the input image. Other transforms and correlations can be implemented in real-time allowing adaptive optical signal processing.
MULTI-CORE AND OPTICAL PROCESSOR RELATED APPLICATIONS RESEARCH AT OAK RIDGE NATIONAL LABORATORY

DOE Office of Scientific and Technical Information (OSTI.GOV)

Barhen, Jacob; Kerekes, Ryan A; ST Charles, Jesse Lee

2008-01-01

High-speed parallelization of common tasks holds great promise as a low-risk approach to achieving the significant increases in signal processing and computational performance required for next generation innovations in reconfigurable radio systems. Researchers at the Oak Ridge National Laboratory have been working on exploiting the parallelization offered by this emerging technology and applying it to a variety of problems. This paper will highlight recent experience with four different parallel processors applied to signal processing tasks that are directly relevant to signal processing required for SDR/CR waveforms. The first is the EnLight Optical Core Processor applied to matched filter (MF) correlationmore » processing via fast Fourier transform (FFT) of broadband Dopplersensitive waveforms (DSW) using active sonar arrays for target tracking. The second is the IBM CELL Broadband Engine applied to 2-D discrete Fourier transform (DFT) kernel for image processing and frequency domain processing. And the third is the NVIDIA graphical processor applied to document feature clustering. EnLight Optical Core Processor. Optical processing is inherently capable of high-parallelism that can be translated to very high performance, low power dissipation computing. The EnLight 256 is a small form factor signal processing chip (5x5 cm2) with a digital optical core that is being developed by an Israeli startup company. As part of its evaluation of foreign technology, ORNL's Center for Engineering Science Advanced Research (CESAR) had access to a precursor EnLight 64 Alpha hardware for a preliminary assessment of capabilities in terms of large Fourier transforms for matched filter banks and on applications related to Doppler-sensitive waveforms. This processor is optimized for array operations, which it performs in fixed-point arithmetic at the rate of 16 TeraOPS at 8-bit precision. This is approximately 1000 times faster than the fastest DSP available today. The optical core performs the matrix-vector multiplications, where the nominal matrix size is 256x256. The system clock is 125MHz. At each clock cycle, 128K multiply-and-add operations per second (OPS) are carried out, which yields a peak performance of 16 TeraOPS. IBM Cell Broadband Engine. The Cell processor is the extraordinary resulting product of 5 years of sustained, intensive R&D collaboration (involving over $400M investment) between IBM, Sony, and Toshiba. Its architecture comprises one multithreaded 64-bit PowerPC processor element (PPE) with VMX capabilities and two levels of globally coherent cache, and 8 synergistic processor elements (SPEs). Each SPE consists of a processor (SPU) designed for streaming workloads, local memory, and a globally coherent direct memory access (DMA) engine. Computations are performed in 128-bit wide single instruction multiple data streams (SIMD). An integrated high-bandwidth element interconnect bus (EIB) connects the nine processors and their ports to external memory and to system I/O. The Applied Software Engineering Research (ASER) Group at the ORNL is applying the Cell to a variety of text and image analysis applications. Research on Cell-equipped PlayStation3 (PS3) consoles has led to the development of a correlation-based image recognition engine that enables a single PS3 to process images at more than 10X the speed of state-of-the-art single-core processors. NVIDIA Graphics Processing Units. The ASER group is also employing the latest NVIDIA graphical processing units (GPUs) to accelerate clustering of thousands of text documents using recently developed clustering algorithms such as document flocking and affinity propagation.« less
Smart Power Supply for Battery-Powered Systems

NASA Technical Reports Server (NTRS)

Krasowski, Michael J.; Greer, Lawrence; Prokop, Norman F.; Flatico, Joseph M.

2010-01-01

A power supply for battery-powered systems has been designed with an embedded controller that is capable of monitoring and maintaining batteries, charging hardware, while maintaining output power. The power supply is primarily designed for rovers and other remote science and engineering vehicles, but it can be used in any battery alone, or battery and charging source applications. The supply can function autonomously, or can be connected to a host processor through a serial communications link. It can be programmed a priori or on the fly to return current and voltage readings to a host. It has two output power busses: a constant 24-V direct current nominal bus, and a programmable bus for output from approximately 24 up to approximately 50 V. The programmable bus voltage level, and its output power limit, can be changed on the fly as well. The power supply also offers options to reduce the programmable bus to 24 V when the set power limit is reached, limiting output power in the case of a system fault detected in the system. The smart power supply is based on an embedded 8051-type single-chip microcontroller. This choice was made in that a credible progression to flight (radiation hard, high reliability) can be assumed as many 8051 processors or gate arrays capable of accepting 8051-type core presently exist and will continue to do so for some time. To solve the problem of centralized control, this innovation moves an embedded microcontroller to the power supply and assigns it the task of overseeing the operation and charging of the power supply assets. This embedded processor is connected to the application central processor via a serial data link such that the central processor can request updates of various parameters within the supply, such as battery current, bus voltage, remaining power in battery estimations, etc. This supply has a direct connection to the battery bus for common (quiescent) power application. Because components from multiple vendors may have differing power needs, this supply also has a secondary power bus, which can be programmed a priori or on-the-fly to boost the primary battery voltage level from 24 to 50 V to accommodate various loads as they are brought on line. Through voltage and current monitoring, the device can also shield the charging source from overloads, keep it within safe operating modes, and can meter available power to the application and maintain safe operations.
Large-scale Parallel Unstructured Mesh Computations for 3D High-lift Analysis

NASA Technical Reports Server (NTRS)

Mavriplis, Dimitri J.; Pirzadeh, S.

1999-01-01

A complete "geometry to drag-polar" analysis capability for the three-dimensional high-lift configurations is described. The approach is based on the use of unstructured meshes in order to enable rapid turnaround for complicated geometries that arise in high-lift configurations. Special attention is devoted to creating a capability for enabling analyses on highly resolved grids. Unstructured meshes of several million vertices are initially generated on a work-station, and subsequently refined on a supercomputer. The flow is solved on these refined meshes on large parallel computers using an unstructured agglomeration multigrid algorithm. Good prediction of lift and drag throughout the range of incidences is demonstrated on a transport take-off configuration using up to 24.7 million grid points. The feasibility of using this approach in a production environment on existing parallel machines is demonstrated, as well as the scalability of the solver on machines using up to 1450 processors.
Experience in highly parallel processing using DAP

NASA Technical Reports Server (NTRS)

Parkinson, D.

1987-01-01

Distributed Array Processors (DAP) have been in day to day use for ten years and a large amount of user experience has been gained. The profile of user applications is similar to that of the Massively Parallel Processor (MPP) working group. Experience has shown that contrary to expectations, highly parallel systems provide excellent performance on so-called dirty problems such as the physics part of meteorological codes. The reasons for this observation are discussed. The arguments against replacing bit processors with floating point processors are also discussed.
High precision computing with charge domain devices and a pseudo-spectral method therefor

NASA Technical Reports Server (NTRS)

Barhen, Jacob (Inventor); Toomarian, Nikzad (Inventor); Fijany, Amir (Inventor); Zak, Michail (Inventor)

1997-01-01

The present invention enhances the bit resolution of a CCD/CID MVM processor by storing each bit of each matrix element as a separate CCD charge packet. The bits of each input vector are separately multiplied by each bit of each matrix element in massive parallelism and the resulting products are combined appropriately to synthesize the correct product. In another aspect of the invention, such arrays are employed in a pseudo-spectral method of the invention, in which partial differential equations are solved by expressing each derivative analytically as matrices, and the state function is updated at each computation cycle by multiplying it by the matrices. The matrices are treated as synaptic arrays of a neural network and the state function vector elements are treated as neurons. In a further aspect of the invention, moving target detection is performed by driving the soliton equation with a vector of detector outputs. The neural architecture consists of two synaptic arrays corresponding to the two differential terms of the soliton-equation and an adder connected to the output thereof and to the output of the detector array to drive the soliton equation.
The Microcode for the Control Processor of the ARO (Array Oriented Processor) Array Processor.

DTIC Science & Technology

1983-08-01

oiNi .TADDR=DBASE+MODE" 4CONT ŕWAfT’ FOR MEM, MORE", MOV) ,DRO BSX "S IGN EXT, MORE" SADD D FLDSEI,(6,3),IMN TADT)R=5+ 1 JMP I NDE-’XEI) "JU> IP ’ T1...JDTV1: YIP DIVI; TDIV2: Y,’ IP DIV2; JASHII: JMP ASHI; 4 JASH2: JMP AS112; JXOR1: YIP XDRI; JXOR2: YIP XOR2; JSOB: JMP SOB; JBPL: JMP BPL; JBMI: YIP BMI;0...JBHI: JMP BHill JBLOS: J! IP BLOS; JBVC: YIP BVC; JBWS: JMP BVS; JBCC: JMP BCC; JBCS: YIP BCS; JEMT: YIP EMT; JTRAP: YIP TRAPQ; JCLR6: YIP CLR6; JCOII
Array-based, parallel hierarchical mesh refinement algorithms for unstructured meshes

DOE PAGES

Ray, Navamita; Grindeanu, Iulian; Zhao, Xinglin; ...

2016-08-18

In this paper, we describe an array-based hierarchical mesh refinement capability through uniform refinement of unstructured meshes for efficient solution of PDE's using finite element methods and multigrid solvers. A multi-degree, multi-dimensional and multi-level framework is designed to generate the nested hierarchies from an initial coarse mesh that can be used for a variety of purposes such as in multigrid solvers/preconditioners, to do solution convergence and verification studies and to improve overall parallel efficiency by decreasing I/O bandwidth requirements (by loading smaller meshes and in memory refinement). We also describe a high-order boundary reconstruction capability that can be used tomore » project the new points after refinement using high-order approximations instead of linear projection in order to minimize and provide more control on geometrical errors introduced by curved boundaries.The capability is developed under the parallel unstructured mesh framework "Mesh Oriented dAtaBase" (MOAB Tautges et al. (2004)). We describe the underlying data structures and algorithms to generate such hierarchies in parallel and present numerical results for computational efficiency and effect on mesh quality. Furthermore, we also present results to demonstrate the applicability of the developed capability to study convergence properties of different point projection schemes for various mesh hierarchies and to a multigrid finite-element solver for elliptic problems.« less
ICE: A Scalable, Low-Cost FPGA-Based Telescope Signal Processing and Networking System

NASA Astrophysics Data System (ADS)

Bandura, K.; Bender, A. N.; Cliche, J. F.; de Haan, T.; Dobbs, M. A.; Gilbert, A. J.; Griffin, S.; Hsyu, G.; Ittah, D.; Parra, J. Mena; Montgomery, J.; Pinsonneault-Marotte, T.; Siegel, S.; Smecher, G.; Tang, Q. Y.; Vanderlinde, K.; Whitehorn, N.

2016-03-01

We present an overview of the ‘ICE’ hardware and software framework that implements large arrays of interconnected field-programmable gate array (FPGA)-based data acquisition, signal processing and networking nodes economically. The system was conceived for application to radio, millimeter and sub-millimeter telescope readout systems that have requirements beyond typical off-the-shelf processing systems, such as careful control of interference signals produced by the digital electronics, and clocking of all elements in the system from a single precise observatory-derived oscillator. A new generation of telescopes operating at these frequency bands and designed with a vastly increased emphasis on digital signal processing to support their detector multiplexing technology or high-bandwidth correlators — data rates exceeding a terabyte per second — are becoming common. The ICE system is built around a custom FPGA motherboard that makes use of an Xilinx Kintex-7 FPGA and ARM-based co-processor. The system is specialized for specific applications through software, firmware and custom mezzanine daughter boards that interface to the FPGA through the industry-standard FPGA mezzanine card (FMC) specifications. For high density applications, the motherboards are packaged in 16-slot crates with ICE backplanes that implement a low-cost passive full-mesh network between the motherboards in a crate, allow high bandwidth interconnection between crates and enable data offload to a computer cluster. A Python-based control software library automatically detects and operates the hardware in the array. Examples of specific telescope applications of the ICE framework are presented, namely the frequency-multiplexed bolometer readout systems used for the South Pole Telescope (SPT) and Simons Array and the digitizer, F-engine, and networking engine for the Canadian Hydrogen Intensity Mapping Experiment (CHIME) and Hydrogen Intensity and Real-time Analysis eXperiment (HIRAX) radio interferometers.
Development of a Receiver Processor For UAV Video Signal Acquisition and Tracking Using Digital Phased Array Antenna

DTIC Science & Technology

2010-09-01

53 Figure 26. Image of the phased array antenna...................................................................54...69 Figure 38. Computation of correction angle from array factor and sum/difference beams...71 Figure 39. Front panel of the tracking algorithm

A cost-effective methodology for the design of massively-parallel VLSI functional units

NASA Technical Reports Server (NTRS)

Venkateswaran, N.; Sriram, G.; Desouza, J.

1993-01-01

In this paper we propose a generalized methodology for the design of cost-effective massively-parallel VLSI Functional Units. This methodology is based on a technique of generating and reducing a massive bit-array on the mask-programmable PAcube VLSI array. This methodology unifies (maintains identical data flow and control) the execution of complex arithmetic functions on PAcube arrays. It is highly regular, expandable and uniform with respect to problem-size and wordlength, thereby reducing the communication complexity. The memory-functional unit interface is regular and expandable. Using this technique functional units of dedicated processors can be mask-programmed on the naked PAcube arrays, reducing the turn-around time. The production cost of such dedicated processors can be drastically reduced since the naked PAcube arrays can be mass-produced. Analysis of the the performance of functional units designed by our method yields promising results.
Mesh Oriented datABase

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tautges, Timothy J.

MOAB is a component for representing and evaluating mesh data. MOAB can store stuctured and unstructured mesh, consisting of elements in the finite element "zoo". The functional interface to MOAB is simple yet powerful, allowing the representation of many types of metadata commonly found on the mesh. MOAB is optimized for efficiency in space and time, based on access to mesh in chunks rather than through individual entities, while also versatile enough to support individual entity access. The MOAB data model consists of a mesh interface instance, mesh entities (vertices and elements), sets, and tags. Entities are addressed through handlesmore » rather than pointers, to allow the underlying representation of an entity to change without changing the handle to that entity. Sets are arbitrary groupings of mesh entities and other sets. Sets also support parent/child relationships as a relation distinct from sets containing other sets. The directed-graph provided by set parent/child relationships is useful for modeling topological relations from a geometric model or other metadata. Tags are named data which can be assigned to the mesh as a whole, individual entities, or sets. Tags are a mechanism for attaching data to individual entities and sets are a mechanism for describing relations between entities; the combination of these two mechanisms isa powerful yet simple interface for representing metadata or application-specific data. For example, sets and tags can be used together to describe geometric topology, boundary condition, and inter-processor interface groupings in a mesh. MOAB is used in several ways in various applications. MOAB serves as the underlying mesh data representation in the VERDE mesh verification code. MOAB can also be used as a mesh input mechanism, using mesh readers induded with MOAB, or as a tanslator between mesh formats, using readers and writers included with MOAB.« less
Optoelectronic switch matrix as a look-up table for residue arithmetic.

PubMed

Macdonald, R I

1987-10-01

The use of optoelectronic matrix switches to perform look-up table functions in residue arithmetic processors is proposed. In this application, switchable detector arrays give the advantage of a greatly reduced requirement for optical sources by comparison with previous optoelectronic residue processors.
Comparison of Microstructures and Mechanical Properties for Solid and Mesh Cobalt-Base Alloy Prototypes Fabricated by Electron Beam Melting

NASA Astrophysics Data System (ADS)

Gaytan, S. M.; Murr, L. E.; Martinez, E.; Martinez, J. L.; Machado, B. I.; Ramirez, D. A.; Medina, F.; Collins, S.; Wicker, R. B.

2010-12-01

The microstructures and mechanical behavior of simple, as-fabricated, solid geometries (with a density of 8.4 g/cm3), as-fabricated and fabricated and annealed femoral (knee) prototypes, and reticulated mesh components (with a density of 1.5 g/cm3) all produced by additive manufacturing (AM) using electron beam melting (EBM) of Co-26Cr-6Mo-0.2C powder are examined and compared in this study. Microstructures and microstructural issues are examined by optical metallography (OM), scanning electron microscopy (SEM), transmission electron microscopy (TEM), energy-dispersive X-ray spectrometry (EDS), and X-ray diffraction (XRD), while mechanical properties included selective specimen tensile testing and Vickers microindentation hardness (HV) and Rockwell C-scale hardness (HRC) measurements. Orthogonal (X-Y) melt scanning of the electron beam during AM produced unique, orthogonal and related Cr23C6 carbide (precipitate) arrays (a controlled microstructural architecture) with dimensions of 2 μm in the build plane perpendicular to the build direction, while connected carbide columns were formed in the vertical plane, parallel to the build direction, with microindentation hardnesses ranging from 4.4 to 5.9 GPa, corresponding to a yield stress and ultimate tensile strength (UTS) of 0.51 and 1.45 GPa with elongations ranging from 1.9 to 5.3 pct. Annealing produced an equiaxed fcc grain structure with some grain boundary carbides, frequent annealing twins, and often a high density of intrinsic {111} stacking faults within the grains. The reticulated mesh strut microstructure consisted of dense carbide arrays producing an average microindentation hardness of 6.2 GPa or roughly 25 pct higher than the fully dense components.
Resource and Performance Evaluations of Fixed Point QRD-RLS Systolic Array through FPGA Implementation

NASA Astrophysics Data System (ADS)

Yokoyama, Yoshiaki; Kim, Minseok; Arai, Hiroyuki

At present, when using space-time processing techniques with multiple antennas for mobile radio communication, real-time weight adaptation is necessary. Due to the progress of integrated circuit technology, dedicated processor implementation with ASIC or FPGA can be employed to implement various wireless applications. This paper presents a resource and performance evaluation of the QRD-RLS systolic array processor based on fixed-point CORDIC algorithm with FPGA. In this paper, to save hardware resources, we propose the shared architecture of a complex CORDIC processor. The required precision of internal calculation, the circuit area for the number of antenna elements and wordlength, and the processing speed will be evaluated. The resource estimation provides a possible processor configuration with a current FPGA on the market. Computer simulations assuming a fading channel will show a fast convergence property with a finite number of training symbols. The proposed architecture has also been implemented and its operation was verified by beamforming evaluation through a radio propagation experiment.
A universal computer control system for motors

NASA Technical Reports Server (NTRS)

Szakaly, Zoltan F. (Inventor)

1991-01-01

A control system for a multi-motor system such as a space telerobot, having a remote computational node and a local computational node interconnected with one another by a high speed data link is described. A Universal Computer Control System (UCCS) for the telerobot is located at each node. Each node is provided with a multibus computer system which is characterized by a plurality of processors with all processors being connected to a common bus, and including at least one command processor. The command processor communicates over the bus with a plurality of joint controller cards. A plurality of direct current torque motors, of the type used in telerobot joints and telerobot hand-held controllers, are connected to the controller cards and responds to digital control signals from the command processor. Essential motor operating parameters are sensed by analog sensing circuits and the sensed analog signals are converted to digital signals for storage at the controller cards where such signals can be read during an address read/write cycle of the command processing processor.
Microcomputer array processor system. [design for electronic warfare

NASA Technical Reports Server (NTRS)

Slezak, K. D.

1980-01-01

The microcomputer array system is discussed with specific attention given to its electronic warware applications. Several aspects of the system architecture are described as well as some of its distinctive characteristics.
Parallel Tetrahedral Mesh Adaptation with Dynamic Load Balancing

NASA Technical Reports Server (NTRS)

Oliker, Leonid; Biswas, Rupak; Gabow, Harold N.

1999-01-01

The ability to dynamically adapt an unstructured grid is a powerful tool for efficiently solving computational problems with evolving physical features. In this paper, we report on our experience parallelizing an edge-based adaptation scheme, called 3D_TAG. using message passing. Results show excellent speedup when a realistic helicopter rotor mesh is randomly refined. However. performance deteriorates when the mesh is refined using a solution-based error indicator since mesh adaptation for practical problems occurs in a localized region., creating a severe load imbalance. To address this problem, we have developed PLUM, a global dynamic load balancing framework for adaptive numerical computations. Even though PLUM primarily balances processor workloads for the solution phase, it reduces the load imbalance problem within mesh adaptation by repartitioning the mesh after targeting edges for refinement but before the actual subdivision. This dramatically improves the performance of parallel 3D_TAG since refinement occurs in a more load balanced fashion. We also present optimal and heuristic algorithms that, when applied to the default mapping of a parallel repartitioner, significantly reduce the data redistribution overhead. Finally, portability is examined by comparing performance on three state-of-the-art parallel machines.
Integrated Reconfigurable Aperture, Digital Beam Forming, and Software GPS Receiver for UAV Navigation

DTIC Science & Technology

2007-12-11

Implemented both carrier and code phase tracking loop for performance evaluation of a minimum power beam forming algorithm and null steering algorithm...4 Antennal Antenna2 Antenna K RF RF RF ct, Ct~2 ChKx1 X2 ....... Xk A W ~ ~ =Z, x W ,=1 Fig. 5. Schematics of a K-element antenna array spatial...adaptive processor Antennal Antenna K A N-i V/ ( Vil= .i= VK Fig. 6. Schematics of a K-element antenna array space-time adaptive processor Two additional
Electronic neural network for solving traveling salesman and similar global optimization problems

NASA Technical Reports Server (NTRS)

Thakoor, Anilkumar P. (Inventor); Moopenn, Alexander W. (Inventor); Duong, Tuan A. (Inventor); Eberhardt, Silvio P. (Inventor)

1993-01-01

This invention is a novel high-speed neural network based processor for solving the 'traveling salesman' and other global optimization problems. It comprises a novel hybrid architecture employing a binary synaptic array whose embodiment incorporates the fixed rules of the problem, such as the number of cities to be visited. The array is prompted by analog voltages representing variables such as distances. The processor incorporates two interconnected feedback networks, each of which solves part of the problem independently and simultaneously, yet which exchange information dynamically.
Systolic Processor Array For Recognition Of Spectra

NASA Technical Reports Server (NTRS)

Chow, Edward T.; Peterson, John C.

1995-01-01

Spectral signatures of materials detected and identified quickly. Spectral Analysis Systolic Processor Array (SPA2) relatively inexpensive and satisfies need to analyze large, complex volume of multispectral data generated by imaging spectrometers to extract desired information: computational performance needed to do this in real time exceeds that of current supercomputers. Locates highly similar segments or contiguous subsegments in two different spectra at time. Compares sampled spectra from instruments with data base of spectral signatures of known materials. Computes and reports scores that express degrees of similarity between sampled and data-base spectra.
Generating unstructured nuclear reactor core meshes in parallel

DOE PAGES

Jain, Rajeev; Tautges, Timothy J.

2014-10-24

Recent advances in supercomputers and parallel solver techniques have enabled users to run large simulations problems using millions of processors. Techniques for multiphysics nuclear reactor core simulations are under active development in several countries. Most of these techniques require large unstructured meshes that can be hard to generate in a standalone desktop computers because of high memory requirements, limited processing power, and other complexities. We have previously reported on a hierarchical lattice-based approach for generating reactor core meshes. Here, we describe efforts to exploit coarse-grained parallelism during reactor assembly and reactor core mesh generation processes. We highlight several reactor coremore » examples including a very high temperature reactor, a full-core model of the Korean MONJU reactor, a ¼ pressurized water reactor core, the fast reactor Experimental Breeder Reactor-II core with a XX09 assembly, and an advanced breeder test reactor core. The times required to generate large mesh models, along with speedups obtained from running these problems in parallel, are reported. A graphical user interface to the tools described here has also been developed.« less
DeepX: Deep Learning Accelerator for Restricted Boltzmann Machine Artificial Neural Networks.

PubMed

Kim, Lok-Won

2018-05-01

Although there have been many decades of research and commercial presence on high performance general purpose processors, there are still many applications that require fully customized hardware architectures for further computational acceleration. Recently, deep learning has been successfully used to learn in a wide variety of applications, but their heavy computation demand has considerably limited their practical applications. This paper proposes a fully pipelined acceleration architecture to alleviate high computational demand of an artificial neural network (ANN) which is restricted Boltzmann machine (RBM) ANNs. The implemented RBM ANN accelerator (integrating network size, using 128 input cases per batch, and running at a 303-MHz clock frequency) integrated in a state-of-the art field-programmable gate array (FPGA) (Xilinx Virtex 7 XC7V-2000T) provides a computational performance of 301-billion connection-updates-per-second and about 193 times higher performance than a software solution running on general purpose processors. Most importantly, the architecture enables over 4 times (12 times in batch learning) higher performance compared with a previous work when both are implemented in an FPGA device (XC2VP70).
Real-time lens distortion correction: speed, accuracy and efficiency

NASA Astrophysics Data System (ADS)

Bax, Michael R.; Shahidi, Ramin

2014-11-01

Optical lens systems suffer from nonlinear geometrical distortion. Optical imaging applications such as image-enhanced endoscopy and image-based bronchoscope tracking require correction of this distortion for accurate localization, tracking, registration, and measurement of image features. Real-time capability is desirable for interactive systems and live video. The use of a texture-mapping graphics accelerator, which is standard hardware on current motherboard chipsets and add-in video graphics cards, to perform distortion correction is proposed. Mesh generation for image tessellation, an error analysis, and performance results are presented. It is shown that distortion correction using commodity graphics hardware is substantially faster than using the main processor and can be performed at video frame rates (faster than 30 frames per second), and that the polar-based method of mesh generation proposed here is more accurate than a conventional grid-based approach. Using graphics hardware to perform distortion correction is not only fast and accurate but also efficient as it frees the main processor for other tasks, which is an important issue in some real-time applications.
Progress in the Simulation of Steady and Time-Dependent Flows with 3D Parallel Unstructured Cartesian Methods

NASA Technical Reports Server (NTRS)

Aftosmis, M. J.; Berger, M. J.; Murman, S. M.; Kwak, Dochan (Technical Monitor)

2002-01-01

The proposed paper will present recent extensions in the development of an efficient Euler solver for adaptively-refined Cartesian meshes with embedded boundaries. The paper will focus on extensions of the basic method to include solution adaptation, time-dependent flow simulation, and arbitrary rigid domain motion. The parallel multilevel method makes use of on-the-fly parallel domain decomposition to achieve extremely good scalability on large numbers of processors, and is coupled with an automatic coarse mesh generation algorithm for efficient processing by a multigrid smoother. Numerical results are presented demonstrating parallel speed-ups of up to 435 on 512 processors. Solution-based adaptation may be keyed off truncation error estimates using tau-extrapolation or a variety of feature detection based refinement parameters. The multigrid method is extended to for time-dependent flows through the use of a dual-time approach. The extension to rigid domain motion uses an Arbitrary Lagrangian-Eulerlarian (ALE) formulation, and results will be presented for a variety of two- and three-dimensional example problems with both simple and complex geometry.
A Versatile Multichannel Digital Signal Processing Module for Microcalorimeter Arrays

NASA Astrophysics Data System (ADS)

Tan, H.; Collins, J. W.; Walby, M.; Hennig, W.; Warburton, W. K.; Grudberg, P.

2012-06-01

Different techniques have been developed for reading out microcalorimeter sensor arrays: individual outputs for small arrays, and time-division or frequency-division or code-division multiplexing for large arrays. Typically, raw waveform data are first read out from the arrays using one of these techniques and then stored on computer hard drives for offline optimum filtering, leading not only to requirements for large storage space but also limitations on achievable count rate. Thus, a read-out module that is capable of processing microcalorimeter signals in real time will be highly desirable. We have developed multichannel digital signal processing electronics that are capable of on-board, real time processing of microcalorimeter sensor signals from multiplexed or individual pixel arrays. It is a 3U PXI module consisting of a standardized core processor board and a set of daughter boards. Each daughter board is designed to interface a specific type of microcalorimeter array to the core processor. The combination of the standardized core plus this set of easily designed and modified daughter boards results in a versatile data acquisition module that not only can easily expand to future detector systems, but is also low cost. In this paper, we first present the core processor/daughter board architecture, and then report the performance of an 8-channel daughter board, which digitizes individual pixel outputs at 1 MSPS with 16-bit precision. We will also introduce a time-division multiplexing type daughter board, which takes in time-division multiplexing signals through fiber-optic cables and then processes the digital signals to generate energy spectra in real time.
Using the GeoFEST Faulted Region Simulation System

NASA Technical Reports Server (NTRS)

Parker, Jay W.; Lyzenga, Gregory A.; Donnellan, Andrea; Judd, Michele A.; Norton, Charles D.; Baker, Teresa; Tisdale, Edwin R.; Li, Peggy

2004-01-01

GeoFEST (the Geophysical Finite Element Simulation Tool) simulates stress evolution, fault slip and plastic/elastic processes in realistic materials, and so is suitable for earthquake cycle studies in regions such as Southern California. Many new capabilities and means of access for GeoFEST are now supported. New abilities include MPI-based cluster parallel computing using automatic PYRAMID/Parmetis-based mesh partitioning, automatic mesh generation for layered media with rectangular faults, and results visualization that is integrated with remote sensing data. The parallel GeoFEST application has been successfully run on over a half-dozen computers, including Intel Xeon clusters, Itanium II and Altix machines, and the Apple G5 cluster. It is not separately optimized for different machines, but relies on good domain partitioning for load-balance and low communication, and careful writing of the parallel diagonally preconditioned conjugate gradient solver to keep communication overhead low. Demonstrated thousand-step solutions for over a million finite elements on 64 processors require under three hours, and scaling tests show high efficiency when using more than (order of) 4000 elements per processor. The source code and documentation for GeoFEST is available at no cost from Open Channel Foundation. In addition GeoFEST may be used through a browser-based portal environment available to approved users. That environment includes semi-automated geometry creation and mesh generation tools, GeoFEST, and RIVA-based visualization tools that include the ability to generate a flyover animation showing deformations and topography. Work is in progress to support simulation of a region with several faults using 16 million elements, using a strain energy metric to adapt the mesh to faithfully represent the solution in a region of widely varying strain.
Numerical simulation of air hypersonic flows with equilibrium chemical reactions

NASA Astrophysics Data System (ADS)

Emelyanov, Vladislav; Karpenko, Anton; Volkov, Konstantin

2018-05-01

The finite volume method is applied to solve unsteady three-dimensional compressible Navier-Stokes equations on unstructured meshes. High-temperature gas effects altering the aerodynamics of vehicles are taken into account. Possibilities of the use of graphics processor units (GPUs) for the simulation of hypersonic flows are demonstrated. Solutions of some test cases on GPUs are reported, and a comparison between computational results of equilibrium chemically reacting and perfect air flowfields is performed. Speedup of solution on GPUs with respect to the solution on central processor units (CPUs) is compared. The results obtained provide promising perspective for designing a GPU-based software framework for practical applications.
Monte Carlo charged-particle tracking and energy deposition on a Lagrangian mesh.

PubMed

Yuan, J; Moses, G A; McKenty, P W

2005-10-01

A Monte Carlo algorithm for alpha particle tracking and energy deposition on a cylindrical computational mesh in a Lagrangian hydrodynamics code used for inertial confinement fusion (ICF) simulations is presented. The straight line approximation is used to follow propagation of "Monte Carlo particles" which represent collections of alpha particles generated from thermonuclear deuterium-tritium (DT) reactions. Energy deposition in the plasma is modeled by the continuous slowing down approximation. The scheme addresses various aspects arising in the coupling of Monte Carlo tracking with Lagrangian hydrodynamics; such as non-orthogonal severely distorted mesh cells, particle relocation on the moving mesh and particle relocation after rezoning. A comparison with the flux-limited multi-group diffusion transport method is presented for a polar direct drive target design for the National Ignition Facility. Simulations show the Monte Carlo transport method predicts about earlier ignition than predicted by the diffusion method, and generates higher hot spot temperature. Nearly linear speed-up is achieved for multi-processor parallel simulations.
Optical backplane interconnect switch for data processors and computers

NASA Technical Reports Server (NTRS)

Hendricks, Herbert D.; Benz, Harry F.; Hammer, Jacob M.

1989-01-01

An optoelectronic integrated device design is reported which can be used to implement an all-optical backplane interconnect switch. The switch is sized to accommodate an array of processors and memories suitable for direct replacement into the basic avionic multiprocessor backplane. The optical backplane interconnect switch is also suitable for direct replacement of the PI bus traffic switch and at the same time, suitable for supporting pipelining of the processor and memory. The 32 bidirectional switchable interconnects are configured with broadcast capability for controls, reconfiguration, and messages. The approach described here can handle a serial interconnection of data processors or a line-to-link interconnection of data processors. An optical fiber demonstration of this approach is presented.

Low-Latency Embedded Vision Processor (LLEVS)

DTIC Science & Technology

2016-03-01

26 3.2.3 Task 3 Projected Performance Analysis of FPGA- based Vision Processor ........... 31 3.2.3.1 Algorithms Latency Analysis ...Programmable Gate Array Custom Hardware for Real- Time Multiresolution Analysis . ............................................... 35...conduct data analysis for performance projections. The data acquired through measurements , simulation and estimation provide the requisite platform for
Architecture and data processing alternatives for Tse computer. Volume 1: Tse logic design concepts and the development of image processing machine architectures

NASA Technical Reports Server (NTRS)

Rickard, D. A.; Bodenheimer, R. E.

1976-01-01

Digital computer components which perform two dimensional array logic operations (Tse logic) on binary data arrays are described. The properties of Golay transforms which make them useful in image processing are reviewed, and several architectures for Golay transform processors are presented with emphasis on the skeletonizing algorithm. Conventional logic control units developed for the Golay transform processors are described. One is a unique microprogrammable control unit that uses a microprocessor to control the Tse computer. The remaining control units are based on programmable logic arrays. Performance criteria are established and utilized to compare the various Golay transform machines developed. A critique of Tse logic is presented, and recommendations for additional research are included.
Multi-Material ALE with AMR for Modeling Hot Plasmas and Cold Fragmenting Materials

NASA Astrophysics Data System (ADS)

Alice, Koniges; Nathan, Masters; Aaron, Fisher; David, Eder; Wangyi, Liu; Robert, Anderson; David, Benson; Andrea, Bertozzi

2015-02-01

We have developed a new 3D multi-physics multi-material code, ALE-AMR, which combines Arbitrary Lagrangian Eulerian (ALE) hydrodynamics with Adaptive Mesh Refinement (AMR) to connect the continuum to the microstructural regimes. The code is unique in its ability to model hot radiating plasmas and cold fragmenting solids. New numerical techniques were developed for many of the physics packages to work efficiently on a dynamically moving and adapting mesh. We use interface reconstruction based on volume fractions of the material components within mixed zones and reconstruct interfaces as needed. This interface reconstruction model is also used for void coalescence and fragmentation. A flexible strength/failure framework allows for pluggable material models, which may require material history arrays to determine the level of accumulated damage or the evolving yield stress in J2 plasticity models. For some applications laser rays are propagating through a virtual composite mesh consisting of the finest resolution representation of the modeled space. A new 2nd order accurate diffusion solver has been implemented for the thermal conduction and radiation transport packages. One application area is the modeling of laser/target effects including debris/shrapnel generation. Other application areas include warm dense matter, EUV lithography, and material wall interactions for fusion devices.
Multiprocessor switch with selective pairing

DOEpatents

Gara, Alan; Gschwind, Michael K; Salapura, Valentina

2014-03-11

System, method and computer program product for a multiprocessing system to offer selective pairing of processor cores for increased processing reliability. A selective pairing facility is provided that selectively connects, i.e., pairs, multiple microprocessor or processor cores to provide one highly reliable thread (or thread group). Each paired microprocessor or processor cores that provide one highly reliable thread for high-reliability connect with a system components such as a memory "nest" (or memory hierarchy), an optional system controller, and optional interrupt controller, optional I/O or peripheral devices, etc. The memory nest is attached to a selective pairing facility via a switch or a bus
Method for simultaneous overlapped communications between neighboring processors in a multiple

DOEpatents

Benner, Robert E.; Gustafson, John L.; Montry, Gary R.

1991-01-01

A parallel computing system and method having improved performance where a program is concurrently run on a plurality of nodes for reducing total processing time, each node having a processor, a memory, and a predetermined number of communication channels connected to the node and independently connected directly to other nodes. The present invention improves performance of performance of the parallel computing system by providing a system which can provide efficient communication between the processors and between the system and input and output devices. A method is also disclosed which can locate defective nodes with the computing system.
Computations of Unsteady Viscous Compressible Flows Using Adaptive Mesh Refinement in Curvilinear Body-fitted Grid Systems

NASA Technical Reports Server (NTRS)

Steinthorsson, E.; Modiano, David; Colella, Phillip

1994-01-01

A methodology for accurate and efficient simulation of unsteady, compressible flows is presented. The cornerstones of the methodology are a special discretization of the Navier-Stokes equations on structured body-fitted grid systems and an efficient solution-adaptive mesh refinement technique for structured grids. The discretization employs an explicit multidimensional upwind scheme for the inviscid fluxes and an implicit treatment of the viscous terms. The mesh refinement technique is based on the AMR algorithm of Berger and Colella. In this approach, cells on each level of refinement are organized into a small number of topologically rectangular blocks, each containing several thousand cells. The small number of blocks leads to small overhead in managing data, while their size and regular topology means that a high degree of optimization can be achieved on computers with vector processors.
Universal sensor interface module (USIM)

NASA Astrophysics Data System (ADS)

King, Don; Torres, A.; Wynn, John

1999-01-01

A universal sensor interface model (USIM) is being developed by the Raytheon-TI Systems Company for use with fields of unattended distributed sensors. In its production configuration, the USIM will be a multichip module consisting of a set of common modules. The common module USIM set consists of (1) a sensor adapter interface (SAI) module, (2) digital signal processor (DSP) and associated memory module, and (3) a RF transceiver model. The multispectral sensor interface is designed around a low-power A/D converted, whose input/output interface consists of: -8 buffered, sampled inputs from various devices including environmental, acoustic seismic and magnetic sensors. The eight sensor inputs are each high-impedance, low- capacitance, differential amplifiers. The inputs are ideally suited for interface with discrete or MEMS sensors, since the differential input will allow direct connection with high-impedance bridge sensors and capacitance voltage sources. Each amplifier is connected to a 22-bit (Delta) (Sigma) A/D converter to enable simultaneous samples. The low power (Delta) (Sigma) converter provides 22-bit resolution at sample frequencies up to 142 hertz (used for magnetic sensors) and 16-bit resolution at frequencies up to 1168 hertz (used for acoustic and seismic sensors). The video interface module is based around the TMS320C5410 DSP. It can provide sensor array addressing, video data input, data calibration and correction. The processor module is based upon a MPC555. It will be used for mode control, synchronization of complex sensors, sensor signal processing, array processing, target classification and tracking. Many functions of the A/D, DSP and transceiver can be powered down by using variable clock speeds under software command or chip power switches. They can be returned to intermediate or full operation by DSP command. Power management may be based on the USIM's internal timer, command from the USIM transceiver, or by sleep mode processing management. The low power detection mode is implemented by monitoring any of the sensor analog outputs at lower sample rates for detection over a software controllable threshold.
Computer animation of modal and transient vibrations

NASA Technical Reports Server (NTRS)

Lipman, Robert R.

1987-01-01

An interactive computer graphics processor is described that is capable of generating input to animate modal and transient vibrations of finite element models on an interactive graphics system. The results from NASTRAN can be postprocessed such that a three dimensional wire-frame picture, in perspective, of the finite element mesh is drawn on the graphics display. Modal vibrations of any mode shape or transient motions over any range of steps can be animated. The finite element mesh can be color-coded by any component of displacement. Viewing parameters and the rate of vibration of the finite element model can be interactively updated while the structure is vibrating.
Sculpt test problem analysis.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sweetser, John David

2013-10-01

This report details Sculpt's implementation from a user's perspective. Sculpt is an automatic hexahedral mesh generation tool developed at Sandia National Labs by Steve Owen. 54 predetermined test cases are studied while varying the input parameters (Laplace iterations, optimization iterations, optimization threshold, number of processors) and measuring the quality of the resultant mesh. This information is used to determine the optimal input parameters to use for an unknown input geometry. The overall characteristics are covered in Chapter 1. The speci c details of every case are then given in Appendix A. Finally, example Sculpt inputs are given in B.1 andmore » B.2.« less
Spaceborne Processor Array

NASA Technical Reports Server (NTRS)

Chow, Edward T.; Schatzel, Donald V.; Whitaker, William D.; Sterling, Thomas

2008-01-01

A Spaceborne Processor Array in Multifunctional Structure (SPAMS) can lower the total mass of the electronic and structural overhead of spacecraft, resulting in reduced launch costs, while increasing the science return through dynamic onboard computing. SPAMS integrates the multifunctional structure (MFS) and the Gilgamesh Memory, Intelligence, and Network Device (MIND) multi-core in-memory computer architecture into a single-system super-architecture. This transforms every inch of a spacecraft into a sharable, interconnected, smart computing element to increase computing performance while simultaneously reducing mass. The MIND in-memory architecture provides a foundation for high-performance, low-power, and fault-tolerant computing. The MIND chip has an internal structure that includes memory, processing, and communication functionality. The Gilgamesh is a scalable system comprising multiple MIND chips interconnected to operate as a single, tightly coupled, parallel computer. The array of MIND components shares a global, virtual name space for program variables and tasks that are allocated at run time to the distributed physical memory and processing resources. Individual processor- memory nodes can be activated or powered down at run time to provide active power management and to configure around faults. A SPAMS system is comprised of a distributed Gilgamesh array built into MFS, interfaces into instrument and communication subsystems, a mass storage interface, and a radiation-hardened flight computer.
Three-Dimensional High-Lift Analysis Using a Parallel Unstructured Multigrid Solver

NASA Technical Reports Server (NTRS)

Mavriplis, Dimitri J.

1998-01-01

A directional implicit unstructured agglomeration multigrid solver is ported to shared and distributed memory massively parallel machines using the explicit domain-decomposition and message-passing approach. Because the algorithm operates on local implicit lines in the unstructured mesh, special care is required in partitioning the problem for parallel computing. A weighted partitioning strategy is described which avoids breaking the implicit lines across processor boundaries, while incurring minimal additional communication overhead. Good scalability is demonstrated on a 128 processor SGI Origin 2000 machine and on a 512 processor CRAY T3E machine for reasonably fine grids. The feasibility of performing large-scale unstructured grid calculations with the parallel multigrid algorithm is demonstrated by computing the flow over a partial-span flap wing high-lift geometry on a highly resolved grid of 13.5 million points in approximately 4 hours of wall clock time on the CRAY T3E.
Design and implementation of highly parallel pipelined VLSI systems

NASA Astrophysics Data System (ADS)

Delange, Alphonsus Anthonius Jozef

A methodology and its realization as a prototype CAD (Computer Aided Design) system for the design and analysis of complex multiprocessor systems is presented. The design is an iterative process in which the behavioral specifications of the system components are refined into structural descriptions consisting of interconnections and lower level components etc. A model for the representation and analysis of multiprocessor systems at several levels of abstraction and an implementation of a CAD system based on this model are described. A high level design language, an object oriented development kit for tool design, a design data management system, and design and analysis tools such as a high level simulator and graphics design interface which are integrated into the prototype system and graphics interface are described. Procedures for the synthesis of semiregular processor arrays, and to compute the switching of input/output signals, memory management and control of processor array, and sequencing and segmentation of input/output data streams due to partitioning and clustering of the processor array during the subsequent synthesis steps, are described. The architecture and control of a parallel system is designed and each component mapped to a module or module generator in a symbolic layout library, compacted for design rules of VLSI (Very Large Scale Integration) technology. An example of the design of a processor that is a useful building block for highly parallel pipelined systems in the signal/image processing domains is given.
Automatic mesh adaptivity for hybrid Monte Carlo/deterministic neutronics modeling of difficult shielding problems

DOE PAGES

Ibrahim, Ahmad M.; Wilson, Paul P.H.; Sawan, Mohamed E.; ...

2015-06-30

The CADIS and FW-CADIS hybrid Monte Carlo/deterministic techniques dramatically increase the efficiency of neutronics modeling, but their use in the accurate design analysis of very large and geometrically complex nuclear systems has been limited by the large number of processors and memory requirements for their preliminary deterministic calculations and final Monte Carlo calculation. Three mesh adaptivity algorithms were developed to reduce the memory requirements of CADIS and FW-CADIS without sacrificing their efficiency improvement. First, a macromaterial approach enhances the fidelity of the deterministic models without changing the mesh. Second, a deterministic mesh refinement algorithm generates meshes that capture as muchmore » geometric detail as possible without exceeding a specified maximum number of mesh elements. Finally, a weight window coarsening algorithm decouples the weight window mesh and energy bins from the mesh and energy group structure of the deterministic calculations in order to remove the memory constraint of the weight window map from the deterministic mesh resolution. The three algorithms were used to enhance an FW-CADIS calculation of the prompt dose rate throughout the ITER experimental facility. Using these algorithms resulted in a 23.3% increase in the number of mesh tally elements in which the dose rates were calculated in a 10-day Monte Carlo calculation and, additionally, increased the efficiency of the Monte Carlo simulation by a factor of at least 3.4. The three algorithms enabled this difficult calculation to be accurately solved using an FW-CADIS simulation on a regular computer cluster, eliminating the need for a world-class super computer.« less
A FAST ITERATIVE METHOD FOR SOLVING THE EIKONAL EQUATION ON TETRAHEDRAL DOMAINS

PubMed Central

Fu, Zhisong; Kirby, Robert M.; Whitaker, Ross T.

2014-01-01

Generating numerical solutions to the eikonal equation and its many variations has a broad range of applications in both the natural and computational sciences. Efficient solvers on cutting-edge, parallel architectures require new algorithms that may not be theoretically optimal, but that are designed to allow asynchronous solution updates and have limited memory access patterns. This paper presents a parallel algorithm for solving the eikonal equation on fully unstructured tetrahedral meshes. The method is appropriate for the type of fine-grained parallelism found on modern massively-SIMD architectures such as graphics processors and takes into account the particular constraints and capabilities of these computing platforms. This work builds on previous work for solving these equations on triangle meshes; in this paper we adapt and extend previous two-dimensional strategies to accommodate three-dimensional, unstructured, tetrahedralized domains. These new developments include a local update strategy with data compaction for tetrahedral meshes that provides solutions on both serial and parallel architectures, with a generalization to inhomogeneous, anisotropic speed functions. We also propose two new update schemes, specialized to mitigate the natural data increase observed when moving to three dimensions, and the data structures necessary for efficiently mapping data to parallel SIMD processors in a way that maintains computational density. Finally, we present descriptions of the implementations for a single CPU, as well as multicore CPUs with shared memory and SIMD architectures, with comparative results against state-of-the-art eikonal solvers. PMID:25221418
SutraPlot, a graphical post-processor for SUTRA, a model for ground-water flow with solute or energy transport

USGS Publications Warehouse

Souza, W.R.

1999-01-01

This report documents a graphical display post-processor (SutraPlot) for the U.S. Geological Survey Saturated-Unsaturated flow and solute or energy TRAnsport simulation model SUTRA, Version 2D3D.1. This version of SutraPlot is an upgrade to SutraPlot for the 2D-only SUTRA model (Souza, 1987). It has been modified to add 3D functionality, a graphical user interface (GUI), and enhanced graphic output options. Graphical options for 2D SUTRA (2-dimension) simulations include: drawing the 2D finite-element mesh, mesh boundary, and velocity vectors; plots of contours for pressure, saturation, concentration, and temperature within the model region; 2D finite-element based gridding and interpolation; and 2D gridded data export files. Graphical options for 3D SUTRA (3-dimension) simulations include: drawing the 3D finite-element mesh; plots of contours for pressure, saturation, concentration, and temperature in 2D sections of the 3D model region; 3D finite-element based gridding and interpolation; drawing selected regions of velocity vectors (projected on principal coordinate planes); and 3D gridded data export files. Installation instructions and a description of all graphic options are presented. A sample SUTRA problem is described and three step-by-step SutraPlot applications are provided. In addition, the methodology and numerical algorithms for the 2D and 3D finite-element based gridding and interpolation, developed for SutraPlot, are described. 1
Fast, Massively Parallel Data Processors

NASA Technical Reports Server (NTRS)

Heaton, Robert A.; Blevins, Donald W.; Davis, ED

1994-01-01

Proposed fast, massively parallel data processor contains 8x16 array of processing elements with efficient interconnection scheme and options for flexible local control. Processing elements communicate with each other on "X" interconnection grid with external memory via high-capacity input/output bus. This approach to conditional operation nearly doubles speed of various arithmetic operations.
Case for a field-programmable gate array multicore hybrid machine for an image-processing application

NASA Astrophysics Data System (ADS)

Rakvic, Ryan N.; Ives, Robert W.; Lira, Javier; Molina, Carlos

2011-01-01

General purpose computer designers have recently begun adding cores to their processors in order to increase performance. For example, Intel has adopted a homogeneous quad-core processor as a base for general purpose computing. PlayStation3 (PS3) game consoles contain a multicore heterogeneous processor known as the Cell, which is designed to perform complex image processing algorithms at a high level. Can modern image-processing algorithms utilize these additional cores? On the other hand, modern advancements in configurable hardware, most notably field-programmable gate arrays (FPGAs) have created an interesting question for general purpose computer designers. Is there a reason to combine FPGAs with multicore processors to create an FPGA multicore hybrid general purpose computer? Iris matching, a repeatedly executed portion of a modern iris-recognition algorithm, is parallelized on an Intel-based homogeneous multicore Xeon system, a heterogeneous multicore Cell system, and an FPGA multicore hybrid system. Surprisingly, the cheaper PS3 slightly outperforms the Intel-based multicore on a core-for-core basis. However, both multicore systems are beaten by the FPGA multicore hybrid system by >50%.
Thinking Like a Human

NASA Technical Reports Server (NTRS)

1999-01-01

Accurate Automation Corporation (AAC) of Chattanooga, TN, developed a neural network processor (NNP) for use onboard the NASA- and Air Force-sponsored LoFLYTE aircraft. The processor is modeled after connections in the brain.
An optical/digital processor - Hardware and applications

NASA Technical Reports Server (NTRS)

Casasent, D.; Sterling, W. M.

1975-01-01

A real-time two-dimensional hybrid processor consisting of a coherent optical system, an optical/digital interface, and a PDP-11/15 control minicomputer is described. The input electrical-to-optical transducer is an electron-beam addressed potassium dideuterium phosphate (KD2PO4) light valve. The requirements and hardware for the output optical-to-digital interface, which is constructed from modular computer building blocks, are presented. Initial experimental results demonstrating the operation of this hybrid processor in phased-array radar data processing, synthetic-aperture image correlation, and text correlation are included. The applications chosen emphasize the role of the interface in the analysis of data from an optical processor and possible extensions to the digital feedback control of an optical processor.
Numerical simulation of terahertz transmission of bilayer metallic meshes with different thickness of substrates

NASA Astrophysics Data System (ADS)

Zhang, Gaohui; Zhao, Guozhong; Zhang, Shengbo

2012-12-01

The terahertz transmission characteristics of bilayer metallic meshes are studied based on the finite difference time domain method. The bilayer well-shaped grid, the array of complementary square metallic pill and the cross wire-hole array were investigated. The results show that the bilayer well-shaped grid achieves a high-pass of filter function, while the bilayer array of complementary square metallic pill achieves a low-pass of filter function, the bilayer cross wire-hole array achieves a band-pass of filter function. Between two metallic microstructures, the medium need to be deposited. Obviously, medium thicknesses have an influence on the terahertz transmission characteristics of metallic microstructures. Simulation results show that with increasing the thicknesses of the medium the cut-off frequency of high-pass filter and low-pass filter move to low frequency. But the bilayer cross wire-hole array possesses two transmission peaks which display competition effect.

All-digital radar architecture

NASA Astrophysics Data System (ADS)

Molchanov, Pavlo A.

2014-10-01

All digital radar architecture requires exclude mechanical scan system. The phase antenna array is necessarily large because the array elements must be co-located with very precise dimensions and will need high accuracy phase processing system for aggregate and distribute T/R modules data to/from antenna elements. Even phase array cannot provide wide field of view. New nature inspired all digital radar architecture proposed. The fly's eye consists of multiple angularly spaced sensors giving the fly simultaneously thee wide-area visual coverage it needs to detect and avoid the threats around him. Fly eye radar antenna array consist multiple directional antennas loose distributed along perimeter of ground vehicle or aircraft and coupled with receiving/transmitting front end modules connected by digital interface to central processor. Non-steering antenna array allows creating all-digital radar with extreme flexible architecture. Fly eye radar architecture provides wide possibility of digital modulation and different waveform generation. Simultaneous correlation and integration of thousands signals per second from each point of surveillance area allows not only detecting of low level signals ((low profile targets), but help to recognize and classify signals (targets) by using diversity signals, polarization modulation and intelligent processing. Proposed all digital radar architecture with distributed directional antenna array can provide a 3D space vector to the jammer by verification direction of arrival for signals sources and as result jam/spoof protection not only for radar systems, but for communication systems and any navigation constellation system, for both encrypted or unencrypted signals, for not limited number or close positioned jammers.
Periodic Application of Concurrent Error Detection in Processor Array Architectures. PhD. Thesis -

NASA Technical Reports Server (NTRS)

Chen, Paul Peichuan

1993-01-01

Processor arrays can provide an attractive architecture for some applications. Featuring modularity, regular interconnection and high parallelism, such arrays are well-suited for VLSI/WSI implementations, and applications with high computational requirements, such as real-time signal processing. Preserving the integrity of results can be of paramount importance for certain applications. In these cases, fault tolerance should be used to ensure reliable delivery of a system's service. One aspect of fault tolerance is the detection of errors caused by faults. Concurrent error detection (CED) techniques offer the advantage that transient and intermittent faults may be detected with greater probability than with off-line diagnostic tests. Applying time-redundant CED techniques can reduce hardware redundancy costs. However, most time-redundant CED techniques degrade a system's performance.
State recovery and lockstep execution restart in a system with multiprocessor pairing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gara, Alan; Gschwind, Michael K; Salapura, Valentina

System, method and computer program product for a multiprocessing system to offer selective pairing of processor cores for increased processing reliability. A selective pairing facility is provided that selectively connects, i.e., pairs, multiple microprocessor or processor cores to provide one highly reliable thread (or thread group). Each paired microprocessor or processor cores that provide one highly reliable thread for high-reliability connect with a system components such as a memory "nest" (or memory hierarchy), an optional system controller, and optional interrupt controller, optional I/O or peripheral devices, etc. The memory nest is attached to a selective pairing facility via a switchmore » or a bus. Each selectively paired processor core is includes a transactional execution facility, whereing the system is configured to enable processor rollback to a previous state and reinitialize lockstep execution in order to recover from an incorrect execution when an incorrect execution has been detected by the selective pairing facility.« less
Pulse-coupled neural network implementation in FPGA

NASA Astrophysics Data System (ADS)

Waldemark, Joakim T. A.; Lindblad, Thomas; Lindsey, Clark S.; Waldemark, Karina E.; Oberg, Johnny; Millberg, Mikael

1998-03-01

Pulse Coupled Neural Networks (PCNN) are biologically inspired neural networks, mainly based on studies of the visual cortex of small mammals. The PCNN is very well suited as a pre- processor for image processing, particularly in connection with object isolation, edge detection and segmentation. Several implementations of PCNN on von Neumann computers, as well as on special parallel processing hardware devices (e.g. SIMD), exist. However, these implementations are not as flexible as required for many applications. Here we present an implementation in Field Programmable Gate Arrays (FPGA) together with a performance analysis. The FPGA hardware implementation may be considered a platform for further, extended implementations and easily expanded into various applications. The latter may include advanced on-line image analysis with close to real-time performance.
Fully digital routing logic for single-photon avalanche diode arrays in highly efficient time-resolved imaging

NASA Astrophysics Data System (ADS)

Cominelli, Alessandro; Acconcia, Giulia; Ghioni, Massimo; Rech, Ivan

2018-03-01

Time-correlated single-photon counting (TCSPC) is a powerful optical technique, which permits recording fast luminous signals with picosecond precision. Unfortunately, given its repetitive nature, TCSPC is recognized as a relatively slow technique, especially when a large time-resolved image has to be recorded. In recent years, there has been a fast trend toward the development of TCPSC imagers. Unfortunately, present systems still suffer from a trade-off between number of channels and performance. Even worse, the overall measurement speed is still limited well below the saturation of the transfer bandwidth toward the external processor. We present a routing algorithm that enables a smart connection between a 32×32 detector array and five shared high-performance converters able to provide an overall conversion rate up to 10 Gbit/s. The proposed solution exploits a fully digital logic circuit distributed in a tree structure to limit the number and length of interconnections, which is a major issue in densely integrated circuits. The behavior of the logic has been validated by means of a field-programmable gate array, while a fully integrated prototype has been designed in 180-nm technology and analyzed by means of postlayout simulations.
A Wireless Electronic Nose System Using a Fe2O3 Gas Sensing Array and Least Squares Support Vector Regression

PubMed Central

Song, Kai; Wang, Qi; Liu, Qi; Zhang, Hongquan; Cheng, Yingguo

2011-01-01

This paper describes the design and implementation of a wireless electronic nose (WEN) system which can online detect the combustible gases methane and hydrogen (CH4/H2) and estimate their concentrations, either singly or in mixtures. The system is composed of two wireless sensor nodes—a slave node and a master node. The former comprises a Fe2O3 gas sensing array for the combustible gas detection, a digital signal processor (DSP) system for real-time sampling and processing the sensor array data and a wireless transceiver unit (WTU) by which the detection results can be transmitted to the master node connected with a computer. A type of Fe2O3 gas sensor insensitive to humidity is developed for resistance to environmental influences. A threshold-based least square support vector regression (LS-SVR)estimator is implemented on a DSP for classification and concentration measurements. Experimental results confirm that LS-SVR produces higher accuracy compared with artificial neural networks (ANNs) and a faster convergence rate than the standard support vector regression (SVR). The designed WEN system effectively achieves gas mixture analysis in a real-time process. PMID:22346587
PCI-based WILDFIRE reconfigurable computing engines

NASA Astrophysics Data System (ADS)

Fross, Bradley K.; Donaldson, Robert L.; Palmer, Douglas J.

1996-10-01

WILDFORCE is the first PCI-based custom reconfigurable computer that is based on the Splash 2 technology transferred from the National Security Agency and the Institute for Defense Analyses, Supercomputing Research Center (SRC). The WILDFORCE architecture has many of the features of the WILDFIRE computer, such as field- programmable gate array (FPGA) based processing elements, linear array and crossbar interconnection, and high- performance memory and I/O subsystems. New features introduced in the PCI-based WILDFIRE systems include memory/processor options that can be added to any processing element. These options include static and dynamic memory, digital signal processors (DSPs), FPGAs, and microprocessors. In addition to memory/processor options, many different application specific connectors can be used to extend the I/O capabilities of the system, including systolic I/O, camera input and video display output. This paper also discusses how this new PCI-based reconfigurable computing engine is used for rapid-prototyping, real-time video processing and other DSP applications.
Optical signal processing of spatially distributed sensor data in smart structures

NASA Technical Reports Server (NTRS)

Bennett, K. D.; Claus, R. O.; Murphy, K. A.; Goette, A. M.

1989-01-01

Smart structures which contain dense two- or three-dimensional arrays of attached or embedded sensor elements inherently require signal multiplexing and processing capabilities to permit good spatial data resolution as well as the adequately short calculation times demanded by real time active feedback actuator drive circuitry. This paper reports the implementation of an in-line optical signal processor and its application in a structural sensing system which incorporates multiple discrete optical fiber sensor elements. The signal processor consists of an array of optical fiber couplers having tailored s-parameters and arranged to allow gray code amplitude scaling of sensor inputs. The use of this signal processor in systems designed to indicate the location of distributed strain and damage in composite materials, as well as to quantitatively characterize that damage, is described. Extension of similar signal processing methods to more complicated smart materials and structures applications are discussed.
Image-Based Focusing

NASA Astrophysics Data System (ADS)

Selker, Ted

1983-05-01

Lens focusing using a hardware model of a retina (Reticon RL256 light sensitive array) with a low cost processor (8085 with 512 bytes of ROM and 512 bytes of RAM) was built. This system was developed and tested on a variety of visual stimuli to demonstrate that: a)an algorithm which moves a lens to maximize the sum of the difference of light level on adjacent light sensors will converge to best focus in all but contrived situations. This is a simpler algorithm than any previously suggested; b) it is feasible to use unmodified video sensor arrays with in-expensive processors to aid video camera use. In the future, software could be developed to extend the processor's usefulness, possibly to track an actor by panning and zooming to give a earners operator increased ease of framing; c) lateral inhibition is an adequate basis for determining best focus. This supports a simple anatomically motivated model of how our brain focuses our eyes.
An acceleration framework for synthetic aperture radar algorithms

NASA Astrophysics Data System (ADS)

Kim, Youngsoo; Gloster, Clay S.; Alexander, Winser E.

2017-04-01

Algorithms for radar signal processing, such as Synthetic Aperture Radar (SAR) are computationally intensive and require considerable execution time on a general purpose processor. Reconfigurable logic can be used to off-load the primary computational kernel onto a custom computing machine in order to reduce execution time by an order of magnitude as compared to kernel execution on a general purpose processor. Specifically, Field Programmable Gate Arrays (FPGAs) can be used to accelerate these kernels using hardware-based custom logic implementations. In this paper, we demonstrate a framework for algorithm acceleration. We used SAR as a case study to illustrate the potential for algorithm acceleration offered by FPGAs. Initially, we profiled the SAR algorithm and implemented a homomorphic filter using a hardware implementation of the natural logarithm. Experimental results show a linear speedup by adding reasonably small processing elements in Field Programmable Gate Array (FPGA) as opposed to using a software implementation running on a typical general purpose processor.
Combination nickel foam expanded nickel screen electrical connection supports for solid oxide fuel cells

DOEpatents

Draper, Robert; Prevish, Thomas; Bronson, Angela; George, Raymond A.

2007-01-02

A solid oxide fuel assembly is made, wherein rows (14, 25) of fuel cells (17, 19, 21, 27, 29, 31), each having an outer interconnection (20) and an outer electrode (32), are disposed next to each other with corrugated, electrically conducting expanded metal mesh member (22) between each row of cells, the corrugated mesh (22) having top crown portions and bottom portions, where the top crown portion (40) have a top bonded open cell nickel foam (51) which contacts outer interconnections (20) of the fuel cells, said mesh and nickel foam electrically connecting each row of fuel cells, and where there are no more metal felt connections between any fuel cells.
Lithography-free fabrication of silicon nanowire and nanohole arrays by metal-assisted chemical etching

PubMed Central

2013-01-01

We demonstrated a novel, simple, and low-cost method to fabricate silicon nanowire (SiNW) arrays and silicon nanohole (SiNH) arrays based on thin silver (Ag) film dewetting process combined with metal-assisted chemical etching. Ag mesh with holes and semispherical Ag nanoparticles can be prepared by simple thermal annealing of Ag thin film on a silicon substrate. Both the diameter and the distribution of mesh holes as well as the nanoparticles can be manipulated by the film thickness and the annealing temperature. The silicon underneath Ag coverage was etched off with the catalysis of metal in an aqueous solution containing HF and an oxidant, which form silicon nanostructures (either SiNW or SiNH arrays). The morphologies of the corresponding etched SiNW and SiNH arrays matched well with that of Ag holes and nanoparticles. This novel method allows lithography-free fabrication of the SiNW and SiNH arrays with control of the size and distribution. PMID:23557325
Through-the-earth radio

DOEpatents

Reagor, David [Los Alamos, NM; Vasquez-Dominguez, Jose [Los Alamos, NM

2006-05-09

A method and apparatus for effective through-the-earth communication involves a signal input device connected to a transmitter operating at a predetermined frequency sufficiently low to effectively penetrate useful distances through-the earth, and having an analog to digital converter receiving the signal input and passing the signal input to a data compression circuit that is connected to an encoding processor, the encoding processor output being provided to a digital to analog converter. An amplifier receives the analog output from the digital to analog converter for amplifying said analog output and outputting said analog output to an antenna. A receiver having an antenna receives the analog output passes the analog signal to a band pass filter whose output is connected to an analog to digital converter that provides a digital signal to a decoding processor whose output is connected to an data decompressor, the data decompressor providing a decompressed digital signal to a digital to analog converter. An audio output device receives the analog output form the digital to analog converter for producing audible output.
A Framework for Parallel Unstructured Grid Generation for Complex Aerodynamic Simulations

NASA Technical Reports Server (NTRS)

Zagaris, George; Pirzadeh, Shahyar Z.; Chrisochoides, Nikos

2009-01-01

A framework for parallel unstructured grid generation targeting both shared memory multi-processors and distributed memory architectures is presented. The two fundamental building-blocks of the framework consist of: (1) the Advancing-Partition (AP) method used for domain decomposition and (2) the Advancing Front (AF) method used for mesh generation. Starting from the surface mesh of the computational domain, the AP method is applied recursively to generate a set of sub-domains. Next, the sub-domains are meshed in parallel using the AF method. The recursive nature of domain decomposition naturally maps to a divide-and-conquer algorithm which exhibits inherent parallelism. For the parallel implementation, the Master/Worker pattern is employed to dynamically balance the varying workloads of each task on the set of available CPUs. Performance results by this approach are presented and discussed in detail as well as future work and improvements.
Functional response of osteoblasts in functionally gradient titanium alloy mesh arrays processed by 3D additive manufacturing.

PubMed

Nune, K C; Kumar, A; Misra, R D K; Li, S J; Hao, Y L; Yang, R

2017-02-01

We elucidate here the osteoblasts functions and cellular activity in 3D printed interconnected porous architecture of functionally gradient Ti-6Al-4V alloy mesh structures in terms of cell proliferation and growth, distribution of cell nuclei, synthesis of proteins (actin, vinculin, and fibronectin), and calcium deposition. Cell culture studies with pre-osteoblasts indicated that the interconnected porous architecture of functionally gradient mesh arrays was conducive to osteoblast functions. However, there were statistically significant differences in the cellular response depending on the pore size in the functionally gradient structure. The interconnected porous architecture contributed to the distribution of cells from the large pore size (G1) to the small pore size (G3), with consequent synthesis of extracellular matrix and calcium precipitation. The gradient mesh structure significantly impacted cell adhesion and influenced the proliferation stage, such that there was high distribution of cells on struts of the gradient mesh structure. Actin and vinculin showed a significant difference in normalized expression level of protein per cell, which was absent in the case of fibronectin. Osteoblasts present on mesh struts formed a confluent sheet, bridging the pores through numerous cytoplasmic extensions. The gradient mesh structure fabricated by electron beam melting was explored to obtain fundamental insights on cellular activity with respect to osteoblast functions. Copyright © 2016 Elsevier B.V. All rights reserved.
Amorphous Ni(Fe)OxHy-coated nanocone arrays self-supported on stainless steel mesh as a promising oxygen-evolving anode for large scale water splitting

NASA Astrophysics Data System (ADS)

Shen, Junyu; Wang, Mei; Zhao, Liang; Zhang, Peili; Jiang, Jian; Liu, Jinxuan

2018-06-01

The development of highly efficient, robust, and cheap water oxidation electrodes is a major challenge in constructing industrially applicable electrolyzers for large-scale production of hydrogen from water. Herein we report a hierarchical stainless steel mesh electrode which features Ni(Fe)OxHy-coated self-supported nanocone arrays. Through a facile, mild, low-cost and readily scalable two-step fabrication procedure, the electrochemically active area of the optimized electrode is enlarged by a factor of 3.1 and the specific activity is enhanced by a factor of 250 at 265 mV overpotential compared with that of a corresponding pristine stainless steel mesh electrode. Moreover, the charge-transfer resistance is reduced from 4.47 Ω for the stainless steel mesh electrode to 0.13 Ω for the Ni(Fe)OxHy-coated nanocone array stainless steel mesh electrode. As a result, the cheap and easily fabricated electrode displays 280 and 303 mV low overpotentials to achieve high current densities of 500 and 1000 mA cmgeo-2, respectively, for oxygen evolution reaction in 1 M KOH. More importantly, the electrode exhibits a good stability over 340 h of chronopotentiometric test at 50 mA cmgeo-2 and only a slight attenuation (4.2%, ∼15 mV) in catalytic activity over 82 h electrolysis at a constant current density of 500 mA cmgeo-2.
Broadcasting collective operation contributions throughout a parallel computer

DOEpatents

Faraj, Ahmad [Rochester, MN

2012-02-21

Methods, systems, and products are disclosed for broadcasting collective operation contributions throughout a parallel computer. The parallel computer includes a plurality of compute nodes connected together through a data communications network. Each compute node has a plurality of processors for use in collective parallel operations on the parallel computer. Broadcasting collective operation contributions throughout a parallel computer according to embodiments of the present invention includes: transmitting, by each processor on each compute node, that processor's collective operation contribution to the other processors on that compute node using intra-node communications; and transmitting on a designated network link, by each processor on each compute node according to a serial processor transmission sequence, that processor's collective operation contribution to the other processors on the other compute nodes using inter-node communications.
Analysis of impact of general-purpose graphics processor units in supersonic flow modeling

NASA Astrophysics Data System (ADS)

Emelyanov, V. N.; Karpenko, A. G.; Kozelkov, A. S.; Teterina, I. V.; Volkov, K. N.; Yalozo, A. V.

2017-06-01

Computational methods are widely used in prediction of complex flowfields associated with off-normal situations in aerospace engineering. Modern graphics processing units (GPU) provide architectures and new programming models that enable to harness their large processing power and to design computational fluid dynamics (CFD) simulations at both high performance and low cost. Possibilities of the use of GPUs for the simulation of external and internal flows on unstructured meshes are discussed. The finite volume method is applied to solve three-dimensional unsteady compressible Euler and Navier-Stokes equations on unstructured meshes with high resolution numerical schemes. CUDA technology is used for programming implementation of parallel computational algorithms. Solutions of some benchmark test cases on GPUs are reported, and the results computed are compared with experimental and computational data. Approaches to optimization of the CFD code related to the use of different types of memory are considered. Speedup of solution on GPUs with respect to the solution on central processor unit (CPU) is compared. Performance measurements show that numerical schemes developed achieve 20-50 speedup on GPU hardware compared to CPU reference implementation. The results obtained provide promising perspective for designing a GPU-based software framework for applications in CFD.
P-HARP: A parallel dynamic spectral partitioner

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sohn, A.; Biswas, R.; Simon, H.D.

1997-05-01

Partitioning unstructured graphs is central to the parallel solution of problems in computational science and engineering. The authors have introduced earlier the sequential version of an inertial spectral partitioner called HARP which maintains the quality of recursive spectral bisection (RSB) while forming the partitions an order of magnitude faster than RSB. The serial HARP is known to be the fastest spectral partitioner to date, three to four times faster than similar partitioners on a variety of meshes. This paper presents a parallel version of HARP, called P-HARP. Two types of parallelism have been exploited: loop level parallelism and recursive parallelism.more » P-HARP has been implemented in MPI on the SGI/Cray T3E and the IBM SP2. Experimental results demonstrate that P-HARP can partition a mesh of over 100,000 vertices into 256 partitions in 0.25 seconds on a 64-processor T3E. Experimental results further show that P-HARP can give nearly a 20-fold speedup on 64 processors. These results indicate that graph partitioning is no longer a major bottleneck that hinders the advancement of computational science and engineering for dynamically-changing real-world applications.« less
Large scale generation of micro-droplet array by vapor condensation on mesh screen piece

PubMed Central

Xie, Jian; Xu, Jinliang; He, Xiaotian; Liu, Qi

2017-01-01

We developed a novel micro-droplet array system, which is based on the distinct three dimensional mesh screen structure and sintering and oxidation induced thermal-fluid performance. Mesh screen was sintered on a copper substrate by bonding the two components. Non-uniform residue stress is generated along weft wires, with larger stress on weft wire top location than elsewhere. Oxidation of the sintered package forms micro pits with few nanograsses on weft wire top location, due to the stress corrosion mechanism. Nanograsses grow elsewhere to show hydrophobic behavior. Thus, surface-energy-gradient weft wires are formed. Cooling the structure in a wet air environment nucleates water droplets on weft wire top location, which is more “hydrophilic” than elsewhere. Droplet size is well controlled by substrate temperature, air humidity and cooling time. Because warp wires do not contact copper substrate and there is a larger conductive thermal resistance between warp wire and weft wire, warp wires contribute less to condensation but function as supporting structure. The surface energy analysis of drops along weft wires explains why droplet array can be generated on the mesh screen piece. Because the commercial material is used, the droplet system is cost effective and can be used for large scale utilization. PMID:28054635

Large scale generation of micro-droplet array by vapor condensation on mesh screen piece.

PubMed

Xie, Jian; Xu, Jinliang; He, Xiaotian; Liu, Qi

2017-01-05

We developed a novel micro-droplet array system, which is based on the distinct three dimensional mesh screen structure and sintering and oxidation induced thermal-fluid performance. Mesh screen was sintered on a copper substrate by bonding the two components. Non-uniform residue stress is generated along weft wires, with larger stress on weft wire top location than elsewhere. Oxidation of the sintered package forms micro pits with few nanograsses on weft wire top location, due to the stress corrosion mechanism. Nanograsses grow elsewhere to show hydrophobic behavior. Thus, surface-energy-gradient weft wires are formed. Cooling the structure in a wet air environment nucleates water droplets on weft wire top location, which is more "hydrophilic" than elsewhere. Droplet size is well controlled by substrate temperature, air humidity and cooling time. Because warp wires do not contact copper substrate and there is a larger conductive thermal resistance between warp wire and weft wire, warp wires contribute less to condensation but function as supporting structure. The surface energy analysis of drops along weft wires explains why droplet array can be generated on the mesh screen piece. Because the commercial material is used, the droplet system is cost effective and can be used for large scale utilization.
Large scale generation of micro-droplet array by vapor condensation on mesh screen piece

NASA Astrophysics Data System (ADS)

Xie, Jian; Xu, Jinliang; He, Xiaotian; Liu, Qi

2017-01-01

We developed a novel micro-droplet array system, which is based on the distinct three dimensional mesh screen structure and sintering and oxidation induced thermal-fluid performance. Mesh screen was sintered on a copper substrate by bonding the two components. Non-uniform residue stress is generated along weft wires, with larger stress on weft wire top location than elsewhere. Oxidation of the sintered package forms micro pits with few nanograsses on weft wire top location, due to the stress corrosion mechanism. Nanograsses grow elsewhere to show hydrophobic behavior. Thus, surface-energy-gradient weft wires are formed. Cooling the structure in a wet air environment nucleates water droplets on weft wire top location, which is more “hydrophilic” than elsewhere. Droplet size is well controlled by substrate temperature, air humidity and cooling time. Because warp wires do not contact copper substrate and there is a larger conductive thermal resistance between warp wire and weft wire, warp wires contribute less to condensation but function as supporting structure. The surface energy analysis of drops along weft wires explains why droplet array can be generated on the mesh screen piece. Because the commercial material is used, the droplet system is cost effective and can be used for large scale utilization.
Stroboscope Controller for Imaging Helicopter Rotors

NASA Technical Reports Server (NTRS)

Jensen, Scott; Marmie, John; Mai, Nghia

2004-01-01

A versatile electronic timing-and-control unit, denoted a rotorcraft strobe controller, has been developed for use in controlling stroboscopes, lasers, video cameras, and other instruments for capturing still images of rotating machine parts especially helicopter rotors. This unit is designed to be compatible with a variety of sources of input shaftangle or timing signals and to be capable of generating a variety of output signals suitable for triggering instruments characterized by different input-signal specifications. It is also designed to be flexible and reconfigurable in that it can be modified and updated through changes in its control software, without need to change its hardware. Figure 1 is a block diagram of the rotorcraft strobe controller. The control processor is a high-density complementary metal oxide semiconductor, singlechip 8-bit microcontroller. It is connected to a 32K x 8 nonvolatile static random-access memory (RAM) module. Also connected to the control processor is a 32K 8 electrically programmable read-only-memory (EPROM) module, which is used to store the control software. Digital logic support circuitry is implemented in a field-programmable gate array (FPGA). A 240 x 128-dot, 40- character 16-line liquid-crystal display (LCD) module serves as a graphical user interface; the user provides input through a 16-key keypad mounted next to the LCD. A 12-bit digital-to-analog converter (DAC) generates a 0-to-10-V ramp output signal used as part of a rotor-blade monitoring system, while the control processor generates all the appropriate strobing signals. Optocouplers are used to isolate all input and output digital signals, and optoisolators are used to isolate all analog signals. The unit is designed to fit inside a 19-in. (.48-cm) rack-mount enclosure. Electronic components are mounted on a custom printed-circuit board (see Figure 2). Two power-conversion modules on the printedcircuit board convert AC power to +5 VDC and 15 VDC, respectively.
Polyhedral meshing in numerical analysis of conjugate heat transfer

NASA Astrophysics Data System (ADS)

Sosnowski, Marcin; Krzywanski, Jaroslaw; Grabowska, Karolina; Gnatowska, Renata

2018-06-01

Computational methods have been widely applied in conjugate heat transfer analysis. The very first and crucial step in such research is the meshing process which consists in dividing the analysed geometry into numerous small control volumes (cells). In Computational Fluid Dynamics (CFD) applications it is desirable to use the hexahedral cells as the resulting mesh is characterized by low numerical diffusion. Unfortunately generating such mesh can be a very time-consuming task and in case of complicated geometry - it may not be possible to generate cells of good quality. Therefore tetrahedral cells have been implemented into commercial pre-processors. Their advantage is the ease of its generation even in case of very complex geometry. On the other hand tetrahedrons cannot be stretched excessively without decreasing the mesh quality factor, so significantly larger number of cells has to be used in comparison to hexahedral mesh in order to achieve a reasonable accuracy. Moreover the numerical diffusion of tetrahedral elements is significantly higher. Therefore the polyhedral cells are proposed within the paper in order to combine the advantages of hexahedrons (low numerical diffusion resulting in accurate solution) and tetrahedrons (rapid semi-automatic generation) as well as to overcome the disadvantages of both the above mentioned mesh types. The major benefit of polyhedral mesh is that each individual cell has many neighbours, so gradients can be well approximated. Polyhedrons are also less sensitive to stretching than tetrahedrons which results in better mesh quality leading to improved numerical stability of the model. In addition, numerical diffusion is reduced due to mass exchange over numerous faces. This leads to a more accurate solution achieved with a lower cell count. Therefore detailed comparison of numerical modelling results concerning conjugate heat transfer using tetrahedral and polyhedral meshes is presented in the paper.
NbN A/D Conversion of IR Focal Plane Sensor Signal at 10 K

NASA Technical Reports Server (NTRS)

Eaton, L.; Durand, D.; Sandell, R.; Spargo, J.; Krabach, T.

1994-01-01

We are implementing a 12 bit SFQ counting ADC with parallel-to-serial readout using our established 10 K NbN capability. This circuit provides a key element of the analog signal processor (ASP) used in large infrared focal plane arrays. The circuit processes the signal data stream from a Si:As BIB detector array. A 10 mega samples per second (MSPS) pixel data stream flows from the chip at a 120 megabit bit rate in a format that is compatible with other superconductive time dependent processor (TDP) circuits being developed. We will discuss our planned ASP demonstration, the circuit design, and test results.
20-GFLOPS QR processor on a Xilinx Virtex-E FPGA

NASA Astrophysics Data System (ADS)

Walke, Richard L.; Smith, Robert W. M.; Lightbody, Gaye

2000-11-01

Adaptive beamforming can play an important role in sensor array systems in countering directional interference. In high-sample rate systems, such as radar and comms, the calculation of adaptive weights is a very computational task that requires highly parallel solutions. For systems where low power consumption and volume are important the only viable implementation is as an Application Specific Integrated Circuit (ASIC). However, the rapid advancement of Field Programmable Gate Array (FPGA) technology is enabling highly credible re-programmable solutions. In this paper we present the implementation of a scalable linear array processor for weight calculation using QR decomposition. We employ floating-point arithmetic with mantissa size optimized to the target application to minimize component size, and implement them as relationally placed macros (RPMs) on Xilinx Virtex FPGAs to achieve predictable dense layout and high-speed operation. We present results that show that 20GFLOPS of sustained computation on a single XCV3200E-8 Virtex-E FPGA is possible. We also describe the parameterized implementation of the floating-point operators and QR-processor, and the design methodology that enables us to rapidly generate complex FPGA implementations using the industry standard hardware description language VHDL.
A Real-Time Capable Software-Defined Receiver Using GPU for Adaptive Anti-Jam GPS Sensors

PubMed Central

Seo, Jiwon; Chen, Yu-Hsuan; De Lorenzo, David S.; Lo, Sherman; Enge, Per; Akos, Dennis; Lee, Jiyun

2011-01-01

Due to their weak received signal power, Global Positioning System (GPS) signals are vulnerable to radio frequency interference. Adaptive beam and null steering of the gain pattern of a GPS antenna array can significantly increase the resistance of GPS sensors to signal interference and jamming. Since adaptive array processing requires intensive computational power, beamsteering GPS receivers were usually implemented using hardware such as field-programmable gate arrays (FPGAs). However, a software implementation using general-purpose processors is much more desirable because of its flexibility and cost effectiveness. This paper presents a GPS software-defined radio (SDR) with adaptive beamsteering capability for anti-jam applications. The GPS SDR design is based on an optimized desktop parallel processing architecture using a quad-core Central Processing Unit (CPU) coupled with a new generation Graphics Processing Unit (GPU) having massively parallel processors. This GPS SDR demonstrates sufficient computational capability to support a four-element antenna array and future GPS L5 signal processing in real time. After providing the details of our design and optimization schemes for future GPU-based GPS SDR developments, the jamming resistance of our GPS SDR under synthetic wideband jamming is presented. Since the GPS SDR uses commercial-off-the-shelf hardware and processors, it can be easily adopted in civil GPS applications requiring anti-jam capabilities. PMID:22164116
A real-time capable software-defined receiver using GPU for adaptive anti-jam GPS sensors.

PubMed

Seo, Jiwon; Chen, Yu-Hsuan; De Lorenzo, David S; Lo, Sherman; Enge, Per; Akos, Dennis; Lee, Jiyun

2011-01-01

Due to their weak received signal power, Global Positioning System (GPS) signals are vulnerable to radio frequency interference. Adaptive beam and null steering of the gain pattern of a GPS antenna array can significantly increase the resistance of GPS sensors to signal interference and jamming. Since adaptive array processing requires intensive computational power, beamsteering GPS receivers were usually implemented using hardware such as field-programmable gate arrays (FPGAs). However, a software implementation using general-purpose processors is much more desirable because of its flexibility and cost effectiveness. This paper presents a GPS software-defined radio (SDR) with adaptive beamsteering capability for anti-jam applications. The GPS SDR design is based on an optimized desktop parallel processing architecture using a quad-core Central Processing Unit (CPU) coupled with a new generation Graphics Processing Unit (GPU) having massively parallel processors. This GPS SDR demonstrates sufficient computational capability to support a four-element antenna array and future GPS L5 signal processing in real time. After providing the details of our design and optimization schemes for future GPU-based GPS SDR developments, the jamming resistance of our GPS SDR under synthetic wideband jamming is presented. Since the GPS SDR uses commercial-off-the-shelf hardware and processors, it can be easily adopted in civil GPS applications requiring anti-jam capabilities.
Multiple Embedded Processors for Fault-Tolerant Computing

NASA Technical Reports Server (NTRS)

Bolotin, Gary; Watson, Robert; Katanyoutanant, Sunant; Burke, Gary; Wang, Mandy

2005-01-01

A fault-tolerant computer architecture has been conceived in an effort to reduce vulnerability to single-event upsets (spurious bit flips caused by impingement of energetic ionizing particles or photons). As in some prior fault-tolerant architectures, the redundancy needed for fault tolerance is obtained by use of multiple processors in one computer. Unlike prior architectures, the multiple processors are embedded in a single field-programmable gate array (FPGA). What makes this new approach practical is the recent commercial availability of FPGAs that are capable of having multiple embedded processors. A working prototype (see figure) consists of two embedded IBM PowerPC 405 processor cores and a comparator built on a Xilinx Virtex-II Pro FPGA. This relatively simple instantiation of the architecture implements an error-detection scheme. A planned future version, incorporating four processors and two comparators, would correct some errors in addition to detecting them.
Acoustooptic linear algebra processors - Architectures, algorithms, and applications

NASA Technical Reports Server (NTRS)

Casasent, D.

1984-01-01

Architectures, algorithms, and applications for systolic processors are described with attention to the realization of parallel algorithms on various optical systolic array processors. Systolic processors for matrices with special structure and matrices of general structure, and the realization of matrix-vector, matrix-matrix, and triple-matrix products and such architectures are described. Parallel algorithms for direct and indirect solutions to systems of linear algebraic equations and their implementation on optical systolic processors are detailed with attention to the pipelining and flow of data and operations. Parallel algorithms and their optical realization for LU and QR matrix decomposition are specifically detailed. These represent the fundamental operations necessary in the implementation of least squares, eigenvalue, and SVD solutions. Specific applications (e.g., the solution of partial differential equations, adaptive noise cancellation, and optimal control) are described to typify the use of matrix processors in modern advanced signal processing.
Fault detection and bypass in a sequence information signal processor

NASA Technical Reports Server (NTRS)

Peterson, John C. (Inventor); Chow, Edward T. (Inventor)

1992-01-01

The invention comprises a plurality of scan registers, each such register respectively associated with a processor element; an on-chip comparator, encoder and fault bypass register. Each scan register generates a unitary signal the logic state of which depends on the correctness of the input from the previous processor in the systolic array. These unitary signals are input to a common comparator which generates an output indicating whether or not an error has occurred. These unitary signals are also input to an encoder which identifies the location of any fault detected so that an appropriate multiplexer can be switched to bypass the faulty processor element. Input scan data can be readily programmed to fully exercise all of the processor elements so that no fault can remain undetected.
Implementation of a cone-beam backprojection algorithm on the cell broadband engine processor

NASA Astrophysics Data System (ADS)

Bockenbach, Olivier; Knaup, Michael; Kachelrieß, Marc

2007-03-01

Tomographic image reconstruction is computationally very demanding. In all cases the backprojection represents the performance bottleneck due to the high operational count and due to the high demand put on the memory subsystem. In the past, solving this problem has lead to the implementation of specific architectures, connecting Application Specific Integrated Circuits (ASICs) or Field Programmable Gate Arrays (FPGAs) to memory through dedicated high speed busses. More recently, there have also been attempt to use Graphic Processing Units (GPUs) to perform the backprojection step. Originally aimed at the gaming market, IBM, Toshiba and Sony have introduced the Cell Broadband Engine (CBE) processor, often considered as a multicomputer on a chip. Clocked at 3 GHz, the Cell allows for a theoretical performance of 192 GFlops and a peak data transfer rate over the internal bus of 200 GB/s. This performance indeed makes the Cell a very attractive architecture for implementing tomographic image reconstruction algorithms. In this study, we investigate the relative performance of a perspective backprojection algorithm when implemented on a standard PC and on the Cell processor. We compare these results to the performance achievable with FPGAs based boards and high end GPUs. The cone-beam backprojection performance was assessed by backprojecting a full circle scan of 512 projections of 1024x1024 pixels into a volume of size 512x512x512 voxels. It took 3.2 minutes on the PC (single CPU) and is as fast as 13.6 seconds on the Cell.
Accelerating Climate Simulations Through Hybrid Computing

NASA Technical Reports Server (NTRS)

Zhou, Shujia; Sinno, Scott; Cruz, Carlos; Purcell, Mark

2009-01-01

Unconventional multi-core processors (e.g., IBM Cell B/E and NYIDIDA GPU) have emerged as accelerators in climate simulation. However, climate models typically run on parallel computers with conventional processors (e.g., Intel and AMD) using MPI. Connecting accelerators to this architecture efficiently and easily becomes a critical issue. When using MPI for connection, we identified two challenges: (1) identical MPI implementation is required in both systems, and; (2) existing MPI code must be modified to accommodate the accelerators. In response, we have extended and deployed IBM Dynamic Application Virtualization (DAV) in a hybrid computing prototype system (one blade with two Intel quad-core processors, two IBM QS22 Cell blades, connected with Infiniband), allowing for seamlessly offloading compute-intensive functions to remote, heterogeneous accelerators in a scalable, load-balanced manner. Currently, a climate solar radiation model running with multiple MPI processes has been offloaded to multiple Cell blades with approx.10% network overhead.
Processor farming in two-level analysis of historical bridge

NASA Astrophysics Data System (ADS)

Krejčí, T.; Kruis, J.; Koudelka, T.; Šejnoha, M.

2017-11-01

This contribution presents a processor farming method in connection with a multi-scale analysis. In this method, each macro-scopic integration point or each finite element is connected with a certain meso-scopic problem represented by an appropriate representative volume element (RVE). The solution of a meso-scale problem provides then effective parameters needed on the macro-scale. Such an analysis is suitable for parallel computing because the meso-scale problems can be distributed among many processors. The application of the processor farming method to a real world masonry structure is illustrated by an analysis of Charles bridge in Prague. The three-dimensional numerical model simulates the coupled heat and moisture transfer of one half of arch No. 3. and it is a part of a complex hygro-thermo-mechanical analysis which has been developed to determine the influence of climatic loading on the current state of the bridge.
Vertical Scan (V-SCAN) for 3-D Grid Adaptive Mesh Refinement for an atmospheric Model Dynamical Core

NASA Astrophysics Data System (ADS)

Andronova, N. G.; Vandenberg, D.; Oehmke, R.; Stout, Q. F.; Penner, J. E.

2009-12-01

One of the major building blocks of a rigorous representation of cloud evolution in global atmospheric models is a parallel adaptive grid MPI-based communication library (an Adaptive Blocks for Locally Cartesian Topologies library -- ABLCarT), which manages the block-structured data layout, handles ghost cell updates among neighboring blocks and splits a block as refinements occur. The library has several modules that provide a layer of abstraction for adaptive refinement: blocks, which contain individual cells of user data; shells - the global geometry for the problem, including a sphere, reduced sphere, and now a 3D sphere; a load balancer for placement of blocks onto processors; and a communication support layer which encapsulates all data movement. A major performance concern with adaptive mesh refinement is how to represent calculations that have need to be sequenced in a particular order in a direction, such as calculating integrals along a specific path (e.g. atmospheric pressure or geopotential in the vertical dimension). This concern is compounded if the blocks have varying levels of refinement, or are scattered across different processors, as can be the case in parallel computing. In this paper we describe an implementation in ABLCarT of a vertical scan operation, which allows computing along vertical paths in the correct order across blocks transparent to their resolution and processor location. We test this functionality on a 2D and a 3D advection problem, which tests the performance of the model’s dynamics (transport) and physics (sources and sinks) for different model resolutions needed for inclusion of cloud formation.
Parallel matrix multiplication on the Connection Machine

NASA Technical Reports Server (NTRS)

Tichy, Walter F.

1988-01-01

Matrix multiplication is a computation and communication intensive problem. Six parallel algorithms for matrix multiplication on the Connection Machine are presented and compared with respect to their performance and processor usage. For n by n matrices, the algorithms have theoretical running times of O(n to the 2nd power log n), O(n log n), O(n), and O(log n), and require n, n to the 2nd power, n to the 2nd power, and n to the 3rd power processors, respectively. With careful attention to communication patterns, the theoretically predicted runtimes can indeed be achieved in practice. The parallel algorithms illustrate the tradeoffs between performance, communication cost, and processor usage.
A 15-MHz 1-3 Piezocomposite Concave Array Transducer for Ophthalmic Imaging.

PubMed

Cha, Jung Hyui; Kang, Byungwoo; Jang, Jihun; Chang, Jin Ho

2015-11-01

Because of the spherical shape of the human eye, the anterior segments of the eye, particularly the cornea and the lens, create high levels of refraction and reflection of ultrasound which negatively affect the performance of linear and convex arrays. To minimize the ultrasound energy loss, a 15-MHz concave array transducer was designed, fabricated, and characterized; its footprint is able to mesh well with the shape of the cornea. The concave array has a curvature with a radius of 15 mm and 128 elements with a 1.44- pitch. Its elevational focus and view angle are 30 mm and 72.3°, respectively, thus allowing the imaging area to cover the retinal region of interest in the posterior segment. As an active layer, a 1-3 piezocomposite was designed and fabricated in response to the bidirectional (i.e., azimuthal and elevational) curvature of the concave array and the high coupling coefficient. From the performance evaluation, it was found that the completed concave array is able to provide a center frequency of 15.95 MHz and a -6-dB fractional bandwidth of 67.8% after electrical tuning has been conducted. The crosstalk level was measured to be less than -25 dB. It was verified that the concave array is robust to the refraction and reflection from the cornea through pulse-echo testing using a custom-made eye-mimicking phantom. Furthermore, images of both the wire-target phantom and the ex vivo porcine eye were acquired by the finished concave array, which was connected to a commercial ultrasound scanner equipped with a research package. The evaluation results demonstrated that the developed concave array transducer is a possible alternative to conventional arrays for effectively imaging the posterior segment of the eye.
Parallel processor-based raster graphics system architecture

DOEpatents

Littlefield, Richard J.

1990-01-01

An apparatus for generating raster graphics images from the graphics command stream includes a plurality of graphics processors connected in parallel, each adapted to receive any part of the graphics command stream for processing the command stream part into pixel data. The apparatus also includes a frame buffer for mapping the pixel data to pixel locations and an interconnection network for interconnecting the graphics processors to the frame buffer. Through the interconnection network, each graphics processor may access any part of the frame buffer concurrently with another graphics processor accessing any other part of the frame buffer. The plurality of graphics processors can thereby transmit concurrently pixel data to pixel locations in the frame buffer.
FPGA-based multiprocessor system for injection molding control.

PubMed

Muñoz-Barron, Benigno; Morales-Velazquez, Luis; Romero-Troncoso, Rene J; Rodriguez-Donate, Carlos; Trejo-Hernandez, Miguel; Benitez-Rangel, Juan P; Osornio-Rios, Roque A

2012-10-18

The plastic industry is a very important manufacturing sector and injection molding is a widely used forming method in that industry. The contribution of this work is the development of a strategy to retrofit control of an injection molding machine based on an embedded system microprocessors sensor network on a field programmable gate array (FPGA) device. Six types of embedded processors are included in the system: a smart-sensor processor, a micro fuzzy logic controller, a programmable logic controller, a system manager, an IO processor and a communication processor. Temperature, pressure and position are controlled by the proposed system and experimentation results show its feasibility and robustness. As validation of the present work, a particular sample was successfully injected.
System for routing messages in a vertex symmetric network by using addresses formed from permutations of the transmission line indicees

DOEpatents

Faber, Vance; Moore, James W.

1992-01-01

A network of interconnected processors is formed from a vertex symmetric graph selected from graphs .GAMMA..sub.d (k) with degree d, diameter k, and (d+1)!/(d-k+1)! processors for each d.gtoreq.k and .GAMMA..sub.d (k,-1) with degree 3-1, diameter k+1, and (d+1)!/(d-k+1)! processors for each d.gtoreq.k.gtoreq.4. Each processor has an address formed by one of the permutations from a predetermined sequence of letters chosen a selected number of letters at a time, and an extended address formed by appending to the address the remaining ones of the predetermined sequence of letters. A plurality of transmission channels is provided from each of the processors, where each processor has one less channel than the selected number of letters forming the sequence. Where a network .GAMMA..sub.d (k,-1) is provided, no processor has a channel connected to form an edge in a direction .delta..sub.1. Each of the channels has an identification number selected from the sequence of letters and connected from a first processor having a first extended address to a second processor having a second address formed from a second extended address defined by moving to the front of the first extended address the letter found in the position within the first extended address defined by the channel identification number. The second address is then formed by selecting the first elements of the second extended address corresponding to the selected number used to form the address permutations.

Benchmarking GNU Radio Kernels and Multi-Processor Scheduling

DTIC Science & Technology

2013-01-14

AMD E350 APU , comparable to Atom • ARM Cortex A8 running on a Gumstix Overo on an Ettus USRP E110 The general testing procedure consists of • Build...Intel Atom, and the AMD E350 APU . 3.2 Multi-Processor Scheduling Figure 1: GFLOPs per second through an FFT array on an Intel i7. Example output from
Design and implementation of a high performance network security processor

NASA Astrophysics Data System (ADS)

Wang, Haixin; Bai, Guoqiang; Chen, Hongyi

2010-03-01

The last few years have seen many significant progresses in the field of application-specific processors. One example is network security processors (NSPs) that perform various cryptographic operations specified by network security protocols and help to offload the computation intensive burdens from network processors (NPs). This article presents a high performance NSP system architecture implementation intended for both internet protocol security (IPSec) and secure socket layer (SSL) protocol acceleration, which are widely employed in virtual private network (VPN) and e-commerce applications. The efficient dual one-way pipelined data transfer skeleton and optimised integration scheme of the heterogenous parallel crypto engine arrays lead to a Gbps rate NSP, which is programmable with domain specific descriptor-based instructions. The descriptor-based control flow fragments large data packets and distributes them to the crypto engine arrays, which fully utilises the parallel computation resources and improves the overall system data throughput. A prototyping platform for this NSP design is implemented with a Xilinx XC3S5000 based FPGA chip set. Results show that the design gives a peak throughput for the IPSec ESP tunnel mode of 2.85 Gbps with over 2100 full SSL handshakes per second at a clock rate of 95 MHz.
HARP: A Dynamic Inertial Spectral Partitioner

NASA Technical Reports Server (NTRS)

Simon, Horst D.; Sohn, Andrew; Biswas, Rupak

1997-01-01

Partitioning unstructured graphs is central to the parallel solution of computational science and engineering problems. Spectral partitioners, such recursive spectral bisection (RSB), have proven effecfive in generating high-quality partitions of realistically-sized meshes. The major problem which hindered their wide-spread use was their long execution times. This paper presents a new inertial spectral partitioner, called HARP. The main objective of the proposed approach is to quickly partition the meshes at runtime in a manner that works efficiently for real applications in the context of distributed-memory machines. The underlying principle of HARP is to find the eigenvectors of the unpartitioned vertices and then project them onto the eigerivectors of the original mesh. Results for various meshes ranging in size from 1000 to 100,000 vertices indicate that HARP can indeed partition meshes rapidly at runtime. Experimental results show that our largest mesh can be partitioned sequentially in only a few seconds on an SP2 which is several times faster than other spectral partitioners while maintaining the solution quality of the proven RSB method. A parallel WI version of HARP has also been implemented on IBM SP2 and Cray T3E. Parallel HARP, running on 64 processors SP2 and T3E, can partition a mesh containing more than 100,000 vertices into 64 subgrids in about half a second. These results indicate that graph partitioning can now be truly embedded in dynamically-changing real-world applications.
New Modular Ultrasonic Signal Processing Building Blocks for Real-Time Data Acquisition and Post Processing

NASA Astrophysics Data System (ADS)

Weber, Walter H.; Mair, H. Douglas; Jansen, Dion

2003-03-01

A suite of basic signal processors has been developed. These basic building blocks can be cascaded together to form more complex processors without the need for programming. The data structures between each of the processors are handled automatically. This allows a processor built for one purpose to be applied to any type of data such as images, waveform arrays and single values. The processors are part of Winspect Data Acquisition software. The new processors are fast enough to work on A-scan signals live while scanning. Their primary use is to extract features, reduce noise or to calculate material properties. The cascaded processors work equally well on live A-scan displays, live gated data or as a post-processing engine on saved data. Researchers are able to call their own MATLAB or C-code from anywhere within the processor structure. A built-in formula node processor that uses a simple algebraic editor may make external user programs unnecessary. This paper also discusses the problems associated with ad hoc software development and how graphical programming languages can tie up researchers writing software rather than designing experiments.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Newman, G.A.; Commer, M.

Three-dimensional (3D) geophysical imaging is now receiving considerable attention for electrical conductivity mapping of potential offshore oil and gas reservoirs. The imaging technology employs controlled source electromagnetic (CSEM) and magnetotelluric (MT) fields and treats geological media exhibiting transverse anisotropy. Moreover when combined with established seismic methods, direct imaging of reservoir fluids is possible. Because of the size of the 3D conductivity imaging problem, strategies are required exploiting computational parallelism and optimal meshing. The algorithm thus developed has been shown to scale to tens of thousands of processors. In one imaging experiment, 32,768 tasks/processors on the IBM Watson Research Blue Gene/Lmore » supercomputer were successfully utilized. Over a 24 hour period we were able to image a large scale field data set that previously required over four months of processing time on distributed clusters based on Intel or AMD processors utilizing 1024 tasks on an InfiniBand fabric. Electrical conductivity imaging using massively parallel computational resources produces results that cannot be obtained otherwise and are consistent with timeframes required for practical exploration problems.« less
Resistive-force theory for mesh-like superhydrophobic surfaces

NASA Astrophysics Data System (ADS)

Schnitzer, Ory; Yariv, Ehud

2018-03-01

A common realization of superhydrophobic surfaces makes use of a mesh-like geometry, where pockets of air are trapped in a periodic array of holes in a no-slip solid substrate. We consider the small-solid-fraction limit where the ribs of the mesh are narrow. In this limit, we obtain a simple leading-order approximation for the slip-length tensor of an arbitrary mesh geometry. This approximation scales as the solid-fraction logarithm, as anticipated by Ybert et al. [Phys. Fluids 19, 123601 (2007), 10.1063/1.2815730]; in the special case of a square mesh it agrees with the analytical results obtained by Davis and Lauga [Phys. Fluids 21, 113101 (2009), 10.1063/1.3250947].
DOE Office of Scientific and Technical Information (OSTI.GOV)

Reed, D.A.; Grunwald, D.C.

The spectrum of parallel processor designs can be divided into three sections according to the number and complexity of the processors. At one end there are simple, bit-serial processors. Any one of thee processors is of little value, but when it is coupled with many others, the aggregate computing power can be large. This approach to parallel processing can be likened to a colony of termites devouring a log. The most notable examples of this approach are the NASA/Goodyear Massively Parallel Processor, which has 16K one-bit processors, and the Thinking Machines Connection Machine, which has 64K one-bit processors. At themore » other end of the spectrum, a small number of processors, each built using the fastest available technology and the most sophisticated architecture, are combined. An example of this approach is the Cray X-MP. This type of parallel processing is akin to four woodmen attacking the log with chainsaws.« less
Expanded nickel screen electrical connection supports for solid oxide fuel cells

DOEpatents

Draper, Robert; Antol, Ronald F.; Zafred, Paolo R.

2002-01-01

A solid oxide fuel assembly is made, wherein rows (14, 24) of fuel cells (16, 18, 20, 26, 28, 30), each having an outer interconnection (36) and an outer electrode (32), are disposed next to each other with corrugated, electrically conducting expanded metal mesh (22) between each row of cells, the corrugated mesh (22) having top crown portions (40) and bottom shoulder portions (42), where the top crown portion (40) contacts outer interconnections (36) of the fuel cells (16, 18, 20) in a first row (14), and the bottom shoulder portions (42) contacts outer electrodes (32) of the fuel cells in a second row (24), said mesh electrically connecting each row of fuel cells, and where there are no metal felt connections between any fuel cells.
Fast computation of voxel-level brain connectivity maps from resting-state functional MRI using l₁-norm as approximation of Pearson's temporal correlation: proof-of-concept and example vector hardware implementation.

PubMed

Minati, Ludovico; Zacà, Domenico; D'Incerti, Ludovico; Jovicich, Jorge

2014-09-01

An outstanding issue in graph-based analysis of resting-state functional MRI is choice of network nodes. Individual consideration of entire brain voxels may represent a less biased approach than parcellating the cortex according to pre-determined atlases, but entails establishing connectedness for 1(9)-1(11) links, with often prohibitive computational cost. Using a representative Human Connectome Project dataset, we show that, following appropriate time-series normalization, it may be possible to accelerate connectivity determination replacing Pearson correlation with l1-norm. Even though the adjacency matrices derived from correlation coefficients and l1-norms are not identical, their similarity is high. Further, we describe and provide in full an example vector hardware implementation of l1-norm on an array of 4096 zero instruction-set processors. Calculation times <1000 s are attainable, removing the major deterrent to voxel-based resting-sate network mapping and revealing fine-grained node degree heterogeneity. L1-norm should be given consideration as a substitute for correlation in very high-density resting-state functional connectivity analyses. Copyright © 2014 IPEM. Published by Elsevier Ltd. All rights reserved.
A fault-tolerant information processing concept for space vehicles.

NASA Technical Reports Server (NTRS)

Hopkins, A. L., Jr.

1971-01-01

A distributed fault-tolerant information processing system is proposed, comprising a central multiprocessor, dedicated local processors, and multiplexed input-output buses connecting them together. The processors in the multiprocessor are duplicated for error detection, which is felt to be less expensive than using coded redundancy of comparable effectiveness. Error recovery is made possible by a triplicated scratchpad memory in each processor. The main multiprocessor memory uses replicated memory for error detection and correction. Local processors use any of three conventional redundancy techniques: voting, duplex pairs with backup, and duplex pairs in independent subsystems.
Operational compatibility of 30-centimeter-diameter ion thruster with integrally regulated solar array power source

NASA Technical Reports Server (NTRS)

Gooder, S. T.

1977-01-01

System tests were performed in which Integrally Regulated Solar Arrays (IRSA's) were used to directly power the beam and accelerator loads of a 30-cm-diameter, electron bombardment, mercury ion thruster. The remaining thruster loads were supplied from conventional power-processing circuits. This combination of IRSA's and conventional circuits formed a hybrid power processor. Thruster performance was evaluated at 3/4- and 1-A beam currents with both the IRSA-hybrid and conventional power processors and was found to be identical for both systems. Power processing is significantly more efficient with the hybrid system. System dynamics and IRSA response to thruster arcs are also examined.
Arranging computer architectures to create higher-performance controllers

NASA Technical Reports Server (NTRS)

Jacklin, Stephen A.

1988-01-01

Techniques for integrating microprocessors, array processors, and other intelligent devices in control systems are reviewed, with an emphasis on the (re)arrangement of components to form distributed or parallel processing systems. Consideration is given to the selection of the host microprocessor, increasing the power and/or memory capacity of the host, multitasking software for the host, array processors to reduce computation time, the allocation of real-time and non-real-time events to different computer subsystems, intelligent devices to share the computational burden for real-time events, and intelligent interfaces to increase communication speeds. The case of a helicopter vibration-suppression and stabilization controller is analyzed as an example, and significant improvements in computation and throughput rates are demonstrated.
General linear codes for fault-tolerant matrix operations on processor arrays

NASA Technical Reports Server (NTRS)

Nair, V. S. S.; Abraham, J. A.

1988-01-01

Various checksum codes have been suggested for fault-tolerant matrix computations on processor arrays. Use of these codes is limited due to potential roundoff and overflow errors. Numerical errors may also be misconstrued as errors due to physical faults in the system. In this a set of linear codes is identified which can be used for fault-tolerant matrix operations such as matrix addition, multiplication, transposition, and LU-decomposition, with minimum numerical error. Encoding schemes are given for some of the example codes which fall under the general set of codes. With the help of experiments, a rule of thumb for the selection of a particular code for a given application is derived.
A programmable systolic array correlator as a trigger processor for electron pairs in rich (ring image Cherenkov) counters

NASA Astrophysics Data System (ADS)

Männer, R.

1989-12-01

This paper describes a systolic array processor for a ring image Cherenkov counter which is capable of identifying pairs of electron circles with a known radius and a certain minimum distance within 15 μs. The processor is a very flexible and fast device. It consists of 128 x 128 processing elements (PEs), where one PE is assigned to each pixel of the image. All PEs run synchronously at 40 MHz. The identification of electron circles is done by correlating the detector image with the proper circle circumference. Circle centers are found by peak detection in the correlation result. A second correlation with a circle disc allows circles of closed electron pairs to be rejected. The trigger decision is generated if a pseudo adder detects at least two remaining circles. The device is controlled by a freely programmable sequencer. A VLSI chip containing 8 x 8 PEs is being developed using a VENUS design system and will be produced in 2μ CMOS technology.
Eigensolution of finite element problems in a completely connected parallel architecture

NASA Technical Reports Server (NTRS)

Akl, Fred A.; Morel, Michael R.

1989-01-01

A parallel algorithm for the solution of the generalized eigenproblem in linear elastic finite element analysis, (K)(phi)=(M)(phi)(omega), where (K) and (M) are of order N, and (omega) is of order q is presented. The parallel algorithm is based on a completely connected parallel architecture in which each processor is allowed to communicate with all other processors. The algorithm has been successfully implemented on a tightly coupled multiple-instruction-multiple-data (MIMD) parallel processing computer, Cray X-MP. A finite element model is divided into m domains each of which is assumed to process n elements. Each domain is then assigned to a processor, or to a logical processor (task) if the number of domains exceeds the number of physical processors. The macro-tasking library routines are used in mapping each domain to a user task. Computational speed-up and efficiency are used to determine the effectiveness of the algorithm. The effect of the number of domains, the number of degrees-of-freedom located along the global fronts and the dimension of the subspace on the performance of the algorithm are investigated. For a 64-element rectangular plate, speed-ups of 1.86, 3.13, 3.18 and 3.61 are achieved on two, four, six and eight processors, respectively.
Study the effect of reservoir spatial heterogeneity on CO2 sequestration under an uncertainty quantification (UQ) software framework

NASA Astrophysics Data System (ADS)

Fang, Y.; Hou, J.; Engel, D.; Lin, G.; Yin, J.; Han, B.; Fang, Z.; Fountoulakis, V.

2011-12-01

In this study, we introduce an uncertainty quantification (UQ) software framework for carbon sequestration, with the focus of studying being the effect of spatial heterogeneity of reservoir properties on CO2 migration. We use a sequential Gaussian method (SGSIM) to generate realizations of permeability fields with various spatial statistical attributes. To deal with the computational difficulties, we integrate the following ideas/approaches: 1) firstly, we use three different sampling approaches (probabilistic collocation, quasi-Monte Carlo, and adaptive sampling approaches) to reduce the required forward calculations while trying to explore the parameter space and quantify the input uncertainty; 2) secondly, we use eSTOMP as the forward modeling simulator. eSTOMP is implemented using the Global Arrays toolkit (GA) that is based on one-sided inter-processor communication and supports a shared memory programming style on distributed memory platforms. It provides highly-scalable performance. It uses a data model to partition most of the large scale data structures into a relatively small number of distinct classes. The lower level simulator infrastructure (e.g. meshing support, associated data structures, and data mapping to processors) is separated from the higher level physics and chemistry algorithmic routines using a grid component interface; and 3) besides the faster model and more efficient algorithms to speed up the forward calculation, we built an adaptive system infrastructure to select the best possible data transfer mechanisms, to optimally allocate system resources to improve performance, and to integrate software packages and data for composing carbon sequestration simulation, computation, analysis, estimation and visualization. We will demonstrate the framework with a given CO2 injection scenario in a heterogeneous sandstone reservoir.
FPGA-Based Multiprocessor System for Injection Molding Control

PubMed Central

Muñoz-Barron, Benigno; Morales-Velazquez, Luis; Romero-Troncoso, Rene J.; Rodriguez-Donate, Carlos; Trejo-Hernandez, Miguel; Benitez-Rangel, Juan P.; Osornio-Rios, Roque A.

2012-01-01

The plastic industry is a very important manufacturing sector and injection molding is a widely used forming method in that industry. The contribution of this work is the development of a strategy to retrofit control of an injection molding machine based on an embedded system microprocessors sensor network on a field programmable gate array (FPGA) device. Six types of embedded processors are included in the system: a smart-sensor processor, a micro fuzzy logic controller, a programmable logic controller, a system manager, an IO processor and a communication processor. Temperature, pressure and position are controlled by the proposed system and experimentation results show its feasibility and robustness. As validation of the present work, a particular sample was successfully injected. PMID:23202036
Study of a programmable high speed processor for use on-board satellites

NASA Astrophysics Data System (ADS)

Degavre, J. Cl.; Okkes, R.; Gaillat, G.

The availability of VLSI programmable devices will significantly enhance satellite on-board data processing capabilities. A case study is presented which indicates that computation-intensive processing applications requiring the execution of 100 megainstructions/sec are within the CD power constraints of satellites. It is noted that the current progress in semicustom design technique development and in achievable gate array densities, together with the recent announcement of improved monochip processors, are encouraging the development of an on-board programmable processor architecture able to associate the devices that will appear in communication and military markets.
Interconnection networks

DOEpatents

Faber, V.; Moore, J.W.

1988-06-20

A network of interconnected processors is formed from a vertex symmetric graph selected from graphs GAMMA/sub d/(k) with degree d, diameter k, and (d + 1)exclamation/ (d /minus/ k + 1)exclamation processors for each d greater than or equal to k and GAMMA/sub d/(k, /minus/1) with degree d /minus/ 1, diameter k + 1, and (d + 1)exclamation/(d /minus/ k + 1)exclamation processors for each d greater than or equal to k greater than or equal to 4. Each processor has an address formed by one of the permutations from a predetermined sequence of letters chosen a selected number of letters at a time, and an extended address formed by appending to the address the remaining ones of the predetermined sequence of letters. A plurality of transmission channels is provided from each of the processors, where each processor has one less channel than the selected number of letters forming the sequence. Where a network GAMMA/sub d/(k, /minus/1) is provided, no processor has a channel connected to form an edge in a direction delta/sub 1/. Each of the channels has an identification number selected from the sequence of letters and connected from a first processor having a first extended address to a second processor having a second address formed from a second extended address defined by moving to the front of the first extended address the letter found in the position within the first extended address defined by the channel identification number. The second address is then formed by selecting the first elements of the second extended address corresponding to the selected number used to form the address permutations. 9 figs.
Implementation and Assessment of Advanced Analog Vector-Matrix Processor

NASA Technical Reports Server (NTRS)

Gary, Charles K.; Bualat, Maria G.; Lum, Henry, Jr. (Technical Monitor)

1994-01-01

This paper discusses the design and implementation of an analog optical vecto-rmatrix coprocessor with a throughput of 128 Mops for a personal computer. Vector matrix calculations are inherently parallel, providing a promising domain for the use of optical calculators. However, to date, digital optical systems have proven too cumbersome to replace electronics, and analog processors have not demonstrated sufficient accuracy in large scale systems. The goal of the work described in this paper is to demonstrate a viable optical coprocessor for linear operations. The analog optical processor presented has been integrated with a personal computer to provide full functionality and is the first demonstration of an optical linear algebra processor with a throughput greater than 100 Mops. The optical vector matrix processor consists of a laser diode source, an acoustooptical modulator array to input the vector information, a liquid crystal spatial light modulator to input the matrix information, an avalanche photodiode array to read out the result vector of the vector matrix multiplication, as well as transport optics and the electronics necessary to drive the optical modulators and interface to the computer. The intent of this research is to provide a low cost, highly energy efficient coprocessor for linear operations. Measurements of the analog accuracy of the processor performing 128 Mops are presented along with an assessment of the implications for future systems. A range of noise sources, including cross-talk, source amplitude fluctuations, shot noise at the detector, and non-linearities of the optoelectronic components are measured and compared to determine the most significant source of error. The possibilities for reducing these sources of error are discussed. Also, the total error is compared with that expected from a statistical analysis of the individual components and their relation to the vector-matrix operation. The sufficiency of the measured accuracy of the processor is compared with that required for a range of typical problems. Calculations resolving alloy concentrations from spectral plume data of rocket engines are implemented on the optical processor, demonstrating its sufficiency for this problem. We also show how this technology can be easily extended to a 100 x 100 10 MHz (200 Cops) processor.

Automatic Mesh Generation of Hybrid Mesh on Valves in Multiple Positions in Feedline Systems

NASA Technical Reports Server (NTRS)

Ross, Douglass H.; Ito, Yasushi; Dorothy, Fredric W.; Shih, Alan M.; Peugeot, John

2010-01-01

Fluid flow simulations through a valve often require evaluation of the valve in multiple opening positions. A mesh has to be generated for the valve for each position and compounding. The problem is the fact that the valve is typically part of a larger feedline system. In this paper, we propose to develop a system to create meshes for feedline systems with parametrically controlled valve openings. Herein we outline two approaches to generate the meshes for a valve in a feedline system at multiple positions. There are two issues that must be addressed. The first is the creation of the mesh on the valve for multiple positions. The second is the generation of the mesh for the total feedline system including the valve. For generation of the mesh on the valve, we will describe the use of topology matching and mesh generation parameter transfer. For generation of the total feedline system, we will describe two solutions that we have implemented. In both cases the valve is treated as a component in the feedline system. In the first method the geometry of the valve in the feedline system is replaced with a valve at a different opening position. Geometry is created to connect the valve to the feedline system. Then topology for the valve is created and the portion of the topology for the valve is topology matched to the standard valve in a different position. The mesh generation parameters are transferred and then the volume mesh for the whole feedline system is generated. The second method enables the user to generate the volume mesh on the valve in multiple open positions external to the feedline system, to insert it into the volume mesh of the feedline system, and to reduce the amount of computer time required for mesh generation because only two small volume meshes connecting the valve to the feedline mesh need to be updated.
Fog Harvesting with Harps.

PubMed

Shi, Weiwei; Anderson, Mark J; Tulkoff, Joshua B; Kennedy, Brook S; Boreyko, Jonathan B

2018-04-11

Fog harvesting is a useful technique for obtaining fresh water in arid climates. The wire meshes currently utilized for fog harvesting suffer from dual constraints: coarse meshes cannot efficiently capture microscopic fog droplets, whereas fine meshes suffer from clogging issues. Here, we design and fabricate fog harvesters comprising an array of vertical wires, which we call "fog harps". Under controlled laboratory conditions, the fog-harvesting rates for fog harps with three different wire diameters were compared to conventional meshes of equivalent dimensions. As expected for the mesh structures, the mid-sized wires exhibited the largest fog collection rate, with a drop-off in performance for the fine or coarse meshes. In contrast, the fog-harvesting rate continually increased with decreasing wire diameter for the fog harps due to efficient droplet shedding that prevented clogging. This resulted in a 3-fold enhancement in the fog-harvesting rate for the harp design compared to an equivalent mesh.
Project PARAS: Phased array radio astronomy from space

NASA Technical Reports Server (NTRS)

Nuss, Kenneth; Hoffmann, Christopher; Dungan, Michael; Madden, Michael; Bendakhlia, Monia

1992-01-01

An orbiting radio telescope is proposed which, when operated in a very long baseline interferometry (VLBI) scheme, would allow higher than currently available angular resolution and dynamic range in the maps and the ability to observe rapidly changing astronomical sources. Using passive phased array technology, the proposed design consists of 656 hexagonal modules forming a 150-m diameter antenna dish. Each observatory module is largely autonomous, having its own photovoltaic power supply and low-noise receiver and processor for phase shifting. The signals received by the modules are channeled via fiber optics to the central control computer in the central bus module. After processing and multiplexing, the data are transmitted to telemetry stations on the ground. The truss frame supporting each observatory panel is a novel hybrid structure consisting of a bottom graphite/epoxy tubular triangle and rigidized inflatable Kevlar tubes connecting the top observatory panel and the bottom triangle. Attitude control and station keeping functions will be performed by a system of momentum wheels in the bus and four propulsion modules located at the compass points on the periphery of the observatory dish. Each propulsion module has four monopropellant thrusters and four hydrazine arcjets, the latter supported by either a photovoltaic array or a radioisotope thermoelectric generator. The total mass of the spacecraft is about 20,500 kg.
Direct drive options for electric propulsion systems

NASA Technical Reports Server (NTRS)

Hamley, John A.

1995-01-01

Power processing units (PPU's) in an electric propulsion system provide many challenging integration issues. The PPU must provide power to the electric thruster while maintaining compatibility with all of the spacecraft power and data systems. Inefficiencies in the power processor produce heat, which must be radiated to the environment in order to ensure reliable operation. Although PPU efficiencies are generally greater than 0.9, heat loads are often substantial. This heat must be rejected by thermal control systems which generally have specific masses of 15-30 kg/kW. PPU's also represent a large fraction of the electric propulsion system dry mass. Simplification or elimination of power processing in a propulsion system would reduce the electric propulsion system specific mass and improve the overall reliability and performance. A direct drive system would eliminate all or some of the power supplies required to operate a thruster by directly connecting the various thruster loads to the solar array. The development of concentrator solar arrays has enabled power bus voltages in excess of 300 V which is high enough for direct drive applications for Hall thrusters such as the Stationary Plasma Thruster (SPT). The option of solar array direct drive for SPT's is explored to provide a comparison between conventional and direct drive system mass.
Efficiently modeling neural networks on massively parallel computers

NASA Technical Reports Server (NTRS)

Farber, Robert M.

1993-01-01

Neural networks are a very useful tool for analyzing and modeling complex real world systems. Applying neural network simulations to real world problems generally involves large amounts of data and massive amounts of computation. To efficiently handle the computational requirements of large problems, we have implemented at Los Alamos a highly efficient neural network compiler for serial computers, vector computers, vector parallel computers, and fine grain SIMD computers such as the CM-2 connection machine. This paper describes the mapping used by the compiler to implement feed-forward backpropagation neural networks for a SIMD (Single Instruction Multiple Data) architecture parallel computer. Thinking Machines Corporation has benchmarked our code at 1.3 billion interconnects per second (approximately 3 gigaflops) on a 64,000 processor CM-2 connection machine (Singer 1990). This mapping is applicable to other SIMD computers and can be implemented on MIMD computers such as the CM-5 connection machine. Our mapping has virtually no communications overhead with the exception of the communications required for a global summation across the processors (which has a sub-linear runtime growth on the order of O(log(number of processors)). We can efficiently model very large neural networks which have many neurons and interconnects and our mapping can extend to arbitrarily large networks (within memory limitations) by merging the memory space of separate processors with fast adjacent processor interprocessor communications. This paper will consider the simulation of only feed forward neural network although this method is extendable to recurrent networks.
Parallel Unsteady Overset Mesh Methodology for Adaptive and Moving Grids with Multiple Solvers

DTIC Science & Technology

2010-01-01

Research Laboratory Hampton, Virginia Jayanarayanan Sitaraman National Institute of Aerospace Hampton, Virginia ABSTRACT This paper describes a new...Army Research Laboratory ,Hampton, VA, , , 8. PERFORMING ORGANIZATION REPORT NUMBER 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) NATO/RTO...results section ( 3.6 and 3.5). Good linear scalability was observed for all three cases up to 12 processors. Beyond that the scalability drops off
Limit characteristics of digital optoelectronic processor

NASA Astrophysics Data System (ADS)

Kolobrodov, V. G.; Tymchik, G. S.; Kolobrodov, M. S.

2018-01-01

In this article, the limiting characteristics of a digital optoelectronic processor are explored. The limits are defined by diffraction effects and a matrix structure of the devices for input and output of optical signals. The purpose of a present research is to optimize the parameters of the processor's components. The developed physical and mathematical model of DOEP allowed to establish the limit characteristics of the processor, restricted by diffraction effects and an array structure of the equipment for input and output of optical signals, as well as to optimize the parameters of the processor's components. The diameter of the entrance pupil of the Fourier lens is determined by the size of SLM and the pixel size of the modulator. To determine the spectral resolution, it is offered to use a concept of an optimum phase when the resolved diffraction maxima coincide with the pixel centers of the radiation detector.
Asynchronous Communication Scheme For Hypercube Computer

NASA Technical Reports Server (NTRS)

Madan, Herb S.

1988-01-01

Scheme devised for asynchronous-message communication system for Mark III hypercube concurrent-processor network. Network consists of up to 1,024 processing elements connected electrically as though were at corners of 10-dimensional cube. Each node contains two Motorola 68020 processors along with Motorola 68881 floating-point processor utilizing up to 4 megabytes of shared dynamic random-access memory. Scheme intended to support applications requiring passage of both polled or solicited and unsolicited messages.
Through-the-earth radio

DOEpatents

Reagor, David; Vasquez-Dominguez, Jose

2006-12-12

A through-the-earth communication system that includes a digital signal input device; a transmitter operating at a predetermined frequency sufficiently low to effectively penetrate useful distances through-the earth; a data compression circuit that is connected to an encoding processor; an amplifier that receives encoded output from the encoding processor for amplifying the output and transmitting the data to an antenna; and a receiver with an antenna, a band pass filter, a decoding processor, and a data decompressor.
Method of generating a surface mesh

DOEpatents

Shepherd, Jason F [Albuquerque, NM; Benzley, Steven [Provo, UT; Grover, Benjamin T [Tracy, CA

2008-03-04

A method and machine-readable medium provide a technique to generate and modify a quadrilateral finite element surface mesh using dual creation and modification. After generating a dual of a surface (mesh), a predetermined algorithm may be followed to generate and modify a surface mesh of quadrilateral elements. The predetermined algorithm may include the steps of generating two-dimensional cell regions in dual space, determining existing nodes in primal space, generating new nodes in the dual space, and connecting nodes to form the quadrilateral elements (faces) for the generated and modifiable surface mesh.
Implementation of a Configurable Fault Tolerant Processor (CFTP) Using Internal Triple Modular Redundancy (TMR)

DTIC Science & Technology

2005-12-01

Upsets in SRAM FPGAs,” Military and Aerospace Applications of Programmable Logic Devices, September 2002. 8. Wakerly , John F,. “Microcomputer...change. The goal of the Configurable Fault Tolerant Processor (CFTP) Project is to explore, develop and demonstrate the applicability of using off-the...develop and demonstrate the applicability of using commercial-of-the-shelf (COTS) Field Programmable Gate Arrays (FPGA) in the design of
An Environment IoT Sensor Network for Monitoring the Environment

NASA Astrophysics Data System (ADS)

Martinez, K.; Hart, J. K.; Bragg, O.; Black, A.; Bader, S.; Basford, P. J.; Bragg, G. M.; Fabre, A.

2016-12-01

The Internet of Things is a term which has emerged to describe the increase of Internet connectivity of everyday objects. While wireless sensor networks have developed highly energy efficient designs they need a step-change in their interoperability and usability to become more commonly used in Earth Science. IoT techniques can bring many of these advances while reusing some of the technologies developed for low power sensing. Here we concentrate on developing effective use of internet protocols throughout a low power sensor network. This includes 6LowPAN to provide a mesh IPv6 network, 40mW 868 MHz CC1120 radio transceivers to save power but provide kilometre range, a CC2538 ARM® Cortex®-M3 as main processor and CoAP to provide a binary HTTP-like interface to the nodes. We discuss in detail a system we deployed to monitor periglacial, peat and fluvial processes in the Scottish Highlands. The system linked initial nodes 3km away further up the mountain 2km away and used a CoAP GET sequence from a base station in the valley to gather the data. The IPv6 addressing and tunnelling allowed direct connectivity to desktops in Southampton. This provides insights into how the combination of low power techniques and emerging internet standards will bring advantages in interoperability, heterogeneity, usability and maintainability.
Dynamic load balance scheme for the DSMC algorithm

DOE Office of Scientific and Technical Information (OSTI.GOV)

Li, Jin; Geng, Xiangren; Jiang, Dingwu

The direct simulation Monte Carlo (DSMC) algorithm, devised by Bird, has been used over a wide range of various rarified flow problems in the past 40 years. While the DSMC is suitable for the parallel implementation on powerful multi-processor architecture, it also introduces a large load imbalance across the processor array, even for small examples. The load imposed on a processor by a DSMC calculation is determined to a large extent by the total of simulator particles upon it. Since most flows are impulsively started with initial distribution of particles which is surely quite different from the steady state, themore » total of simulator particles will change dramatically. The load balance based upon an initial distribution of particles will break down as the steady state of flow is reached. The load imbalance and huge computational cost of DSMC has limited its application to rarefied or simple transitional flows. In this paper, by taking advantage of METIS, a software for partitioning unstructured graphs, and taking the total of simulator particles in each cell as a weight information, the repartitioning based upon the principle that each processor handles approximately the equal total of simulator particles has been achieved. The computation must pause several times to renew the total of simulator particles in each processor and repartition the whole domain again. Thus the load balance across the processors array holds in the duration of computation. The parallel efficiency can be improved effectively. The benchmark solution of a cylinder submerged in hypersonic flow has been simulated numerically. Besides, hypersonic flow past around a complex wing-body configuration has also been simulated. The results have displayed that, for both of cases, the computational time can be reduced by about 50%.« less
Analysis and simulation tools for solar array power systems

NASA Astrophysics Data System (ADS)

Pongratananukul, Nattorn

This dissertation presents simulation tools developed specifically for the design of solar array power systems. Contributions are made in several aspects of the system design phases, including solar source modeling, system simulation, and controller verification. A tool to automate the study of solar array configurations using general purpose circuit simulators has been developed based on the modeling of individual solar cells. Hierarchical structure of solar cell elements, including semiconductor properties, allows simulation of electrical properties as well as the evaluation of the impact of environmental conditions. A second developed tool provides a co-simulation platform with the capability to verify the performance of an actual digital controller implemented in programmable hardware such as a DSP processor, while the entire solar array including the DC-DC power converter is modeled in software algorithms running on a computer. This "virtual plant" allows developing and debugging code for the digital controller, and also to improve the control algorithm. One important task in solar arrays is to track the maximum power point on the array in order to maximize the power that can be delivered. Digital controllers implemented with programmable processors are particularly attractive for this task because sophisticated tracking algorithms can be implemented and revised when needed to optimize their performance. The proposed co-simulation tools are thus very valuable in developing and optimizing the control algorithm, before the system is built. Examples that demonstrate the effectiveness of the proposed methodologies are presented. The proposed simulation tools are also valuable in the design of multi-channel arrays. In the specific system that we have designed and tested, the control algorithm is implemented on a single digital signal processor. In each of the channels the maximum power point is tracked individually. In the prototype we built, off-the-shelf commercial DC-DC converters were utilized. At the end, the overall performance of the entire system was evaluated using solar array simulators capable of simulating various I-V characteristics, and also by using an electronic load. Experimental results are presented.
Binary mesh partitioning for cache-efficient visualization.

PubMed

Tchiboukdjian, Marc; Danjean, Vincent; Raffin, Bruno

2010-01-01

One important bottleneck when visualizing large data sets is the data transfer between processor and memory. Cache-aware (CA) and cache-oblivious (CO) algorithms take into consideration the memory hierarchy to design cache efficient algorithms. CO approaches have the advantage to adapt to unknown and varying memory hierarchies. Recent CA and CO algorithms developed for 3D mesh layouts significantly improve performance of previous approaches, but they lack of theoretical performance guarantees. We present in this paper a {\\schmi O}(N\\log N) algorithm to compute a CO layout for unstructured but well shaped meshes. We prove that a coherent traversal of a N-size mesh in dimension d induces less than N/B+{\\schmi O}(N/M;{1/d}) cache-misses where B and M are the block size and the cache size, respectively. Experiments show that our layout computation is faster and significantly less memory consuming than the best known CO algorithm. Performance is comparable to this algorithm for classical visualization algorithm access patterns, or better when the BSP tree produced while computing the layout is used as an acceleration data structure adjusted to the layout. We also show that cache oblivious approaches lead to significant performance increases on recent GPU architectures.
Onboard Experiment Data Support Facility

NASA Technical Reports Server (NTRS)

1976-01-01

An onboard array structure has been devised for end to end processing of data from multiple spaceborne sensors. The array constitutes sets of programmable pipeline processors whose elements perform each assigned function in 0.25 microseconds. This space shuttle computer system can handle data rates from a few bits to over 100 megabits per second.
Improved Magnetic STAR Methods for Real-Time, Point-by-Point Localization of Unexploded Ordnance and Buried Mines

DTIC Science & Technology

2008-09-01

of magnetic UXO. The prototype STAR Sensor comprises: a) A cubic array of eight fluxgate magnetometers . b) A 24-channel data acquisition/signal...array (shaded boxes) of eight low noise Triaxial Fluxgate Magnetometers (TFM) develops 24 channels of vector B- field data. Processor hardware
Radial cold trap

DOEpatents

Grundy, Brian R.

1981-01-01

The radial cold trap comprises a housing having a plurality of mesh bands disposed therein. The mesh bands comprise concentrically arranged bands of mesh with the mesh specific surface area of each band increasing from the outermost mesh band to the innermost mesh band. An inlet nozzle is attached to the outside section of the housing while an outlet nozzle is attached to the inner portion of the housing so as to be concentrically connected to the innermost mesh band. An inlet baffle having orifices therein may be disposed around the outermost mesh band and within the housing for directing the flow of the fluid from the inlet nozzle to the outermost mesh band in a uniform manner. The flow of fluid passes through each consecutive mesh band and into the outlet nozzle. The circular pattern of the symmetrically arranged mesh packing allows for better utilization of the entire cold trap volume.
Radial cold trap

DOEpatents

Grundy, B.R.

1981-09-29

The radial cold trap comprises a housing having a plurality of mesh bands disposed therein. The mesh bands comprise concentrically arranged bands of mesh with the mesh specific surface area of each band increasing from the outermost mesh band to the innermost mesh band. An inlet nozzle is attached to the outside section of the housing while an outlet nozzle is attached to the inner portion of the housing so as to be concentrically connected to the innermost mesh band. An inlet baffle having orifices therein may be disposed around the outermost mesh band and within the housing for directing the flow of the fluid from the inlet nozzle to the outermost mesh band in a uniform manner. The flow of fluid passes through each consecutive mesh band and into the outlet nozzle. The circular pattern of the symmetrically arranged mesh packing allows for better utilization of the entire cold trap volume. 2 figs.
Scheduler for multiprocessor system switch with selective pairing

DOEpatents

Gara, Alan; Gschwind, Michael Karl; Salapura, Valentina

2015-01-06

System, method and computer program product for scheduling threads in a multiprocessing system with selective pairing of processor cores for increased processing reliability. A selective pairing facility is provided that selectively connects, i.e., pairs, multiple microprocessor or processor cores to provide one highly reliable thread (or thread group). The method configures the selective pairing facility to use checking provide one highly reliable thread for high-reliability and allocate threads to corresponding processor cores indicating need for hardware checking. The method configures the selective pairing facility to provide multiple independent cores and allocate threads to corresponding processor cores indicating inherent resilience.

Visualization of Octree Adaptive Mesh Refinement (AMR) in Astrophysical Simulations

NASA Astrophysics Data System (ADS)

Labadens, M.; Chapon, D.; Pomaréde, D.; Teyssier, R.

2012-09-01

Computer simulations are important in current cosmological research. Those simulations run in parallel on thousands of processors, and produce huge amount of data. Adaptive mesh refinement is used to reduce the computing cost while keeping good numerical accuracy in regions of interest. RAMSES is a cosmological code developed by the Commissariat à l'énergie atomique et aux énergies alternatives (English: Atomic Energy and Alternative Energies Commission) which uses Octree adaptive mesh refinement. Compared to grid based AMR, the Octree AMR has the advantage to fit very precisely the adaptive resolution of the grid to the local problem complexity. However, this specific octree data type need some specific software to be visualized, as generic visualization tools works on Cartesian grid data type. This is why the PYMSES software has been also developed by our team. It relies on the python scripting language to ensure a modular and easy access to explore those specific data. In order to take advantage of the High Performance Computer which runs the RAMSES simulation, it also uses MPI and multiprocessing to run some parallel code. We would like to present with more details our PYMSES software with some performance benchmarks. PYMSES has currently two visualization techniques which work directly on the AMR. The first one is a splatting technique, and the second one is a custom ray tracing technique. Both have their own advantages and drawbacks. We have also compared two parallel programming techniques with the python multiprocessing library versus the use of MPI run. The load balancing strategy has to be smartly defined in order to achieve a good speed up in our computation. Results obtained with this software are illustrated in the context of a massive, 9000-processor parallel simulation of a Milky Way-like galaxy.
An optimized routing algorithm for the automated assembly of standard multimode ribbon fibers in a full-mesh optical backplane

NASA Astrophysics Data System (ADS)

Basile, Vito; Guadagno, Gianluca; Ferrario, Maddalena; Fassi, Irene

2018-03-01

In this paper a parametric, modular and scalable algorithm allowing a fully automated assembly of a backplane fiber-optic interconnection circuit is presented. This approach guarantees the optimization of the optical fiber routing inside the backplane with respect to specific criteria (i.e. bending power losses), addressing both transmission performance and overall costs issues. Graph theory has been exploited to simplify the complexity of the NxN full-mesh backplane interconnection topology, firstly, into N independent sub-circuits and then, recursively, into a limited number of loops easier to be generated. Afterwards, the proposed algorithm selects a set of geometrical and architectural parameters whose optimization allows to identify the optimal fiber optic routing for each sub-circuit of the backplane. The topological and numerical information provided by the algorithm are then exploited to control a robot which performs the automated assembly of the backplane sub-circuits. The proposed routing algorithm can be extended to any array architecture and number of connections thanks to its modularity and scalability. Finally, the algorithm has been exploited for the automated assembly of an 8x8 optical backplane realized with standard multimode (MM) 12-fiber ribbons.
Dynamic and thermal response finite element models of multi-body space structural configurations

NASA Technical Reports Server (NTRS)

Edighoffer, Harold H.

1987-01-01

Presented is structural dynamics modeling of two multibody space structural configurations. The first configuration is a generic space station model of a cylindrical habitation module, two solar array panels, radiator panel, and central connecting tube. The second is a 15-m hoop-column antenna. Discussed is the special joint elimination sequence used for these large finite element models, so that eigenvalues could be extracted. The generic space station model aided test configuration design and analysis/test data correlation. The model consisted of six finite element models, one of each substructure and one of all substructures as a system. Static analysis and tests at the substructure level fine-tuned the finite element models. The 15-m hoop-column antenna is a truss column and structural ring interconnected with tension stabilizing cables. To the cables, pretensioned mesh membrane elements were attached to form four parabolic shaped antennae, one per quadrant. Imposing thermal preloads in the cables and mesh elements produced pretension in the finite element model. Thermal preload variation in the 96 control cables was adjusted to maintain antenna shape within the required tolerance and to give pointing accuracy.
Two-dimensional acousto-optic processor using circular antenna array with a Butler matrix

NASA Astrophysics Data System (ADS)

Lee, Jim P.

1992-09-01

A two-dimensional acousto-optic signal processor is shown to be useful for providing simultaneous spectrum analysis and direction finding of radar signals over an instantaneous field of view of 360 deg. A system analysis with emphasis on the direction-finding aspect of this new architecture is presented. The peak location of the optical pattern provides a direct measure of bearing, independent of signal frequency. In addition, the sidelobe levels of the pattern can be effectively reduced using amplitude weighting. Performance parameters, such as mainlobe beamwidth, peak-sidelobe level, and pointing error, are analyzed as a function of the Gaussian laser illumination profile and the number of channels. Finally, a comparison with a linear antenna array architecture is also discussed.
A fast adaptive convex hull algorithm on two-dimensional processor arrays with a reconfigurable BUS system

NASA Technical Reports Server (NTRS)

Olariu, S.; Schwing, J.; Zhang, J.

1991-01-01

A bus system that can change dynamically to suit computational needs is referred to as reconfigurable. We present a fast adaptive convex hull algorithm on a two-dimensional processor array with a reconfigurable bus system (2-D PARBS, for short). Specifically, we show that computing the convex hull of a planar set of n points taken O(log n/log m) time on a 2-D PARBS of size mn x n with 3 less than or equal to m less than or equal to n. Our result implies that the convex hull of n points in the plane can be computed in O(1) time in a 2-D PARBS of size n(exp 1.5) x n.
A site oriented supercomputer for theoretical physics: The Fermilab Advanced Computer Program Multi Array Processor System (ACMAPS)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nash, T.; Atac, R.; Cook, A.

1989-03-06

The ACPMAPS multipocessor is a highly cost effective, local memory parallel computer with a hypercube or compound hypercube architecture. Communication requires the attention of only the two communicating nodes. The design is aimed at floating point intensive, grid like problems, particularly those with extreme computing requirements. The processing nodes of the system are single board array processors, each with a peak power of 20 Mflops, supported by 8 Mbytes of data and 2 Mbytes of instruction memory. The system currently being assembled has a peak power of 5 Gflops. The nodes are based on the Weitek XL Chip set. Themore » system delivers performance at approximately $300/Mflop. 8 refs., 4 figs.« less
Transport Protocols for Wireless Mesh Networks

NASA Astrophysics Data System (ADS)

Eddie Law, K. L.

Transmission control protocol (TCP) provides reliable connection-oriented services between any two end systems on the Internet. With TCP congestion control algorithm, multiple TCP connections can share network and link resources simultaneously. These TCP congestion control mechanisms have been operating effectively in wired networks. However, performance of TCP connections degrades rapidly in wireless and lossy networks. To sustain the throughput performance of TCP connections in wireless networks, design modifications may be required accordingly in the TCP flow control algorithm, and potentially, in association with other protocols in other layers for proper adaptations. In this chapter, we explain the limitations of the latest TCP congestion control algorithm, and then review some popular designs for TCP connections to operate effectively in wireless mesh network infrastructure.
Processing techniques for software based SAR processors

NASA Technical Reports Server (NTRS)

Leung, K.; Wu, C.

1983-01-01

Software SAR processing techniques defined to treat Shuttle Imaging Radar-B (SIR-B) data are reviewed. The algorithms are devised for the data processing procedure selection, SAR correlation function implementation, multiple array processors utilization, cornerturning, variable reference length azimuth processing, and range migration handling. The Interim Digital Processor (IDP) originally implemented for handling Seasat SAR data has been adapted for the SIR-B, and offers a resolution of 100 km using a processing procedure based on the Fast Fourier Transformation fast correlation approach. Peculiarities of the Seasat SAR data processing requirements are reviewed, along with modifications introduced for the SIR-B. An Advanced Digital SAR Processor (ADSP) is under development for use with the SIR-B in the 1986 time frame as an upgrade for the IDP, which will be in service in 1984-5.
Full-mesh T- and O-band wavelength router based on arrayed waveguide gratings.

PubMed

Idris, Nazirul A; Yoshizawa, Katsumi; Tomomatsu, Yasunori; Sudo, Makoto; Hajikano, Tadashi; Kubo, Ryogo; Zervas, Georgios; Tsuda, Hiroyuki

2016-01-11

We propose an ultra-broadband full-mesh wavelength router supporting the T- and O-bands using 3 stages of cascaded arrayed waveguide gratings (AWGs). The router architecture is based on a combination of waveband and channel routing by coarse and fine AWGs, respectively. We fabricated several T-band-specific silica-based AWGs and quantum dot semiconductor optical ampliers as part of the router, and demonstrated 10 Gbps data transmission for several wavelengths throughout a range of 7.4 THz. The power penalties were below 1 dB. Wavelength routing was also demonstrated, where tuning time within a 9.4-nm-wide waveband was below 400 ms.
Integral glass encapsulation for solar arrays

NASA Technical Reports Server (NTRS)

Landis, G. A.

1981-01-01

Electrostatic bonding technology, an encapsulation technique for terrestrial solar array was developed. The process produces full integral, hermetic bonds with no adhesives or pottants. Panels of six solar cells on a simple glass superstrate were produced. Electrostatic bonding for making the cell front contact was also developed. A metal mesh is trapped into contact with the cell front during the bonding process. Six cell panels using the bonded mesh as the only cell front contact were produced. The possibility of using lower cost glass, with a higher thermal expansion mismatch to silicon, by making lower temperature bonds is developed. However, this requires a planar surface cell.
Heat exchange apparatus

DOEpatents

Degtiarenko, Pavel V.

2003-08-12

A heat exchange apparatus comprising a coolant conduit or heat sink having attached to its surface a first radial array of spaced-apart parallel plate fins or needles and a second radial array of spaced-apart parallel plate fins or needles thermally coupled to a body to be cooled and meshed with, but not contacting the first radial array of spaced-apart parallel plate fins or needles.
Experimental Study on the Velocity and Efficiency Characteristics of a Serial Staged Needle Array-Mesh Type EHD Gas Pump

NASA Astrophysics Data System (ADS)

Qiu, Wei; Xia, Lingzhi; Yang, Lanjun; Zhang, Qiaogen; Xiao, Lei; Chen, Li

2011-12-01

The ionic wind has good application prospects in the fields of air flow control and heat transfer enhancement. The key for successful applications is how to improve the velocity and how to increase the active area of the ionic wind. This paper designed a needle array-mesh type electrohydrodynamic (EHD) gas pump. The use of needle array electrode where corona discharge started simultaneously could enlarge the active area. The velocity of the ionic wind could increase by placing several single-stage ionic wind generators in series appropriately, called as serial staged generator. The maximum average flow velocity of 16.1 m/s and volumetric flow of 303.5 L/min were achieved at the outlet of a 25-stage gas pump and the conversion efficiency was approximately 2.2%.
Combining 3d Volume and Mesh Models for Representing Complicated Heritage Buildings

NASA Astrophysics Data System (ADS)

Tsai, F.; Chang, H.; Lin, Y.-W.

2017-08-01

This study developed a simple but effective strategy to combine 3D volume and mesh models for representing complicated heritage buildings and structures. The idea is to seamlessly integrate 3D parametric or polyhedral models and mesh-based digital surfaces to generate a hybrid 3D model that can take advantages of both modeling methods. The proposed hybrid model generation framework is separated into three phases. Firstly, after acquiring or generating 3D point clouds of the target, these 3D points are partitioned into different groups. Secondly, a parametric or polyhedral model of each group is generated based on plane and surface fitting algorithms to represent the basic structure of that region. A "bare-bones" model of the target can subsequently be constructed by connecting all 3D volume element models. In the third phase, the constructed bare-bones model is used as a mask to remove points enclosed by the bare-bones model from the original point clouds. The remaining points are then connected to form 3D surface mesh patches. The boundary points of each surface patch are identified and these boundary points are projected onto the surfaces of the bare-bones model. Finally, new meshes are created to connect the projected points and original mesh boundaries to integrate the mesh surfaces with the 3D volume model. The proposed method was applied to an open-source point cloud data set and point clouds of a local historical structure. Preliminary results indicated that the reconstructed hybrid models using the proposed method can retain both fundamental 3D volume characteristics and accurate geometric appearance with fine details. The reconstructed hybrid models can also be used to represent targets in different levels of detail according to user and system requirements in different applications.
Block Copolymers as Templates for Arrays of Carbon Nanotubes

NASA Technical Reports Server (NTRS)

Bronikowski, Michael; Hunt, Brian

2003-01-01

A method of manufacturing regular arrays of precisely sized, shaped, positioned, and oriented carbon nanotubes has been proposed. Arrays of carbon nanotubes could prove useful in such diverse applications as communications (especially for filtering of signals), biotechnology (for sequencing of DNA and separation of chemicals), and micro- and nanoelectronics (as field emitters and as signal transducers and processors). The method is expected to be suitable for implementation in standard semiconductor-device fabrication facilities.
High-performance reconfigurable hardware architecture for restricted Boltzmann machines.

PubMed

Ly, Daniel Le; Chow, Paul

2010-11-01

Despite the popularity and success of neural networks in research, the number of resulting commercial or industrial applications has been limited. A primary cause for this lack of adoption is that neural networks are usually implemented as software running on general-purpose processors. Hence, a hardware implementation that can exploit the inherent parallelism in neural networks is desired. This paper investigates how the restricted Boltzmann machine (RBM), which is a popular type of neural network, can be mapped to a high-performance hardware architecture on field-programmable gate array (FPGA) platforms. The proposed modular framework is designed to reduce the time complexity of the computations through heavily customized hardware engines. A method to partition large RBMs into smaller congruent components is also presented, allowing the distribution of one RBM across multiple FPGA resources. The framework is tested on a platform of four Xilinx Virtex II-Pro XC2VP70 FPGAs running at 100 MHz through a variety of different configurations. The maximum performance was obtained by instantiating an RBM of 256 × 256 nodes distributed across four FPGAs, which resulted in a computational speed of 3.13 billion connection-updates-per-second and a speedup of 145-fold over an optimized C program running on a 2.8-GHz Intel processor.
Parallel architectures for iterative methods on adaptive, block structured grids

NASA Technical Reports Server (NTRS)

Gannon, D.; Vanrosendale, J.

1983-01-01

A parallel computer architecture well suited to the solution of partial differential equations in complicated geometries is proposed. Algorithms for partial differential equations contain a great deal of parallelism. But this parallelism can be difficult to exploit, particularly on complex problems. One approach to extraction of this parallelism is the use of special purpose architectures tuned to a given problem class. The architecture proposed here is tuned to boundary value problems on complex domains. An adaptive elliptic algorithm which maps effectively onto the proposed architecture is considered in detail. Two levels of parallelism are exploited by the proposed architecture. First, by making use of the freedom one has in grid generation, one can construct grids which are locally regular, permitting a one to one mapping of grids to systolic style processor arrays, at least over small regions. All local parallelism can be extracted by this approach. Second, though there may be a regular global structure to the grids constructed, there will be parallelism at this level. One approach to finding and exploiting this parallelism is to use an architecture having a number of processor clusters connected by a switching network. The use of such a network creates a highly flexible architecture which automatically configures to the problem being solved.
Integration of Mesh Optimization with 3D All-Hex Mesh Generation, LDRD Subcase 3504340000, Final Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

KNUPP,PATRICK; MITCHELL,SCOTT A.

1999-11-01

In an attempt to automatically produce high-quality all-hex meshes, we investigated a mesh improvement strategy: given an initial poor-quality all-hex mesh, we iteratively changed the element connectivity, adding and deleting elements and nodes, and optimized the node positions. We found a set of hex reconnection primitives. We improved the optimization algorithms so they can untangle a negative-Jacobian mesh, even considering Jacobians on the boundary, and subsequently optimize the condition number of elements in an untangled mesh. However, even after applying both the primitives and optimization we were unable to produce high-quality meshes in certain regions. Our experiences suggest that manymore » boundary configurations of quadrilaterals admit no hexahedral mesh with positive Jacobians, although we have no proof of this.« less
Rain rate instrument for deployment at sea, phase 2

NASA Technical Reports Server (NTRS)

Steele, Jimmy W.

1992-01-01

This report describes, in detail, the SBIR Phase 2 contracting effort provided for by NASA Contract Number NAS8-38481 in which a prototype Rain Rate Sensor was developed. FWG Model RP101A is a fully functional rain rate and droplet size analyzing instrument. The RP101A is a fully functional rain rate and droplet size analyzing instrument. The RP101A consists of a fiber optic probe containing a 32-fiber array connected to an electronic signal processor. When interfaced to an IBM compatible personal computer and configured with appropriate software, the RP101A is capable of measuring rain rates and particles ranging in size from around 300 microns up to 6 to 7 millimeters. FWG Associates, Inc. intends to develop a production model from the prototype and continue the effort under NASA's SBIR Phase 3 program.
Moving Object Detection Using Scanning Camera on a High-Precision Intelligent Holder.

PubMed

Chen, Shuoyang; Xu, Tingfa; Li, Daqun; Zhang, Jizhou; Jiang, Shenwang

2016-10-21

During the process of moving object detection in an intelligent visual surveillance system, a scenario with complex background is sure to appear. The traditional methods, such as "frame difference" and "optical flow", may not able to deal with the problem very well. In such scenarios, we use a modified algorithm to do the background modeling work. In this paper, we use edge detection to get an edge difference image just to enhance the ability of resistance illumination variation. Then we use a "multi-block temporal-analyzing LBP (Local Binary Pattern)" algorithm to do the segmentation. In the end, a connected component is used to locate the object. We also produce a hardware platform, the core of which consists of the DSP (Digital Signal Processor) and FPGA (Field Programmable Gate Array) platforms and the high-precision intelligent holder.
Parallel deterministic neutronics with AMR in 3D

DOE Office of Scientific and Technical Information (OSTI.GOV)

Clouse, C.; Ferguson, J.; Hendrickson, C.

1997-12-31

AMTRAN, a three dimensional Sn neutronics code with adaptive mesh refinement (AMR) has been parallelized over spatial domains and energy groups and runs on the Meiko CS-2 with MPI message passing. Block refined AMR is used with linear finite element representations for the fluxes, which allows for a straight forward interpretation of fluxes at block interfaces with zoning differences. The load balancing algorithm assumes 8 spatial domains, which minimizes idle time among processors.

A novel compensation method of insertion losses for wavelet inverse-transform processors using surface acoustic wave devices.

PubMed

Lu, Wenke; Zhu, Changchun

2011-11-01

The objective of this research was to investigate the possibility of compensating for the insertion losses of the wavelet inverse-transform processors using SAW devices. The motivation for this work was prompted by the processors which are of large insertion losses. In this paper, the insertion losses are the key problem of the wavelet inverse-transform processors using SAW devices. A novel compensation method of the insertion losses is achieved in this study. When the output ends of the wavelet inverse-transform processors are respectively connected to the amplifiers, their insertion losses can be compensated for. The bandwidths of the amplifiers and their adjustment method are also given in this paper. © 2011 American Institute of Physics
Topological numbering of features on a mesh

NASA Technical Reports Server (NTRS)

Atallah, Mikhail J.; Hambrusch, Susanne E.; Tewinkel, Lynn E.

1988-01-01

Assume a nxn binary image is given containing horizontally convex features; i.e., for each feature, each of its row's pixels form an interval on that row. The problem of assigning topological numbers to such features is considered; i.e., assign a number to every feature f so that all features to the left of f have a smaller number assigned to them. This problem arises in solutions to the stereo matching problem. A parallel algorithm to solve the topological numbering problem in O(n) time on an nxn mesh of processors is presented. The key idea of the solution is to create a tree from which the topological numbers can be obtained even though the tree does not uniquely represent the to the left of relationship of the features.
Implementation of digital equality comparator circuit on memristive memory crossbar array using material implication logic

NASA Astrophysics Data System (ADS)

Haron, Adib; Mahdzair, Fazren; Luqman, Anas; Osman, Nazmie; Junid, Syed Abdul Mutalib Al

2018-03-01

One of the most significant constraints of Von Neumann architecture is the limited bandwidth between memory and processor. The cost to move data back and forth between memory and processor is considerably higher than the computation in the processor itself. This architecture significantly impacts the Big Data and data-intensive application such as DNA analysis comparison which spend most of the processing time to move data. Recently, the in-memory processing concept was proposed, which is based on the capability to perform the logic operation on the physical memory structure using a crossbar topology and non-volatile resistive-switching memristor technology. This paper proposes a scheme to map digital equality comparator circuit on memristive memory crossbar array. The 2-bit, 4-bit, 8-bit, 16-bit, 32-bit, and 64-bit of equality comparator circuit are mapped on memristive memory crossbar array by using material implication logic in a sequential and parallel method. The simulation results show that, for the 64-bit word size, the parallel mapping exhibits 2.8× better performance in total execution time than sequential mapping but has a trade-off in terms of energy consumption and area utilization. Meanwhile, the total crossbar area can be reduced by 1.2× for sequential mapping and 1.5× for parallel mapping both by using the overlapping technique.
Massively parallel electrical conductivity imaging of the subsurface: Applications to hydrocarbon exploration

NASA Astrophysics Data System (ADS)

Newman, Gregory A.; Commer, Michael

2009-07-01

Three-dimensional (3D) geophysical imaging is now receiving considerable attention for electrical conductivity mapping of potential offshore oil and gas reservoirs. The imaging technology employs controlled source electromagnetic (CSEM) and magnetotelluric (MT) fields and treats geological media exhibiting transverse anisotropy. Moreover when combined with established seismic methods, direct imaging of reservoir fluids is possible. Because of the size of the 3D conductivity imaging problem, strategies are required exploiting computational parallelism and optimal meshing. The algorithm thus developed has been shown to scale to tens of thousands of processors. In one imaging experiment, 32,768 tasks/processors on the IBM Watson Research Blue Gene/L supercomputer were successfully utilized. Over a 24 hour period we were able to image a large scale field data set that previously required over four months of processing time on distributed clusters based on Intel or AMD processors utilizing 1024 tasks on an InfiniBand fabric. Electrical conductivity imaging using massively parallel computational resources produces results that cannot be obtained otherwise and are consistent with timeframes required for practical exploration problems.
Feasibility study, software design, layout and simulation of a two-dimensional Fast Fourier Transform machine for use in optical array interferometry

NASA Technical Reports Server (NTRS)

Boriakoff, Valentin

1994-01-01

The goal of this project was the feasibility study of a particular architecture of a digital signal processing machine operating in real time which could do in a pipeline fashion the computation of the fast Fourier transform (FFT) of a time-domain sampled complex digital data stream. The particular architecture makes use of simple identical processors (called inner product processors) in a linear organization called a systolic array. Through computer simulation the new architecture to compute the FFT with systolic arrays was proved to be viable, and computed the FFT correctly and with the predicted particulars of operation. Integrated circuits to compute the operations expected of the vital node of the systolic architecture were proven feasible, and even with a 2 micron VLSI technology can execute the required operations in the required time. Actual construction of the integrated circuits was successful in one variant (fixed point) and unsuccessful in the other (floating point).
Massively parallel processor computer

NASA Technical Reports Server (NTRS)

Fung, L. W. (Inventor)

1983-01-01

An apparatus for processing multidimensional data with strong spatial characteristics, such as raw image data, characterized by a large number of parallel data streams in an ordered array is described. It comprises a large number (e.g., 16,384 in a 128 x 128 array) of parallel processing elements operating simultaneously and independently on single bit slices of a corresponding array of incoming data streams under control of a single set of instructions. Each of the processing elements comprises a bidirectional data bus in communication with a register for storing single bit slices together with a random access memory unit and associated circuitry, including a binary counter/shift register device, for performing logical and arithmetical computations on the bit slices, and an I/O unit for interfacing the bidirectional data bus with the data stream source. The massively parallel processor architecture enables very high speed processing of large amounts of ordered parallel data, including spatial translation by shifting or sliding of bits vertically or horizontally to neighboring processing elements.
Asynchronous parallel status comparator

DOEpatents

Arnold, Jeffrey W.; Hart, Mark M.

1992-01-01

Apparatus for matching asynchronously received signals and determining whether two or more out of a total number of possible signals match. The apparatus comprises, in one embodiment, an array of sensors positioned in discrete locations and in communication with one or more processors. The processors will receive signals if the sensors detect a change in the variable sensed from a nominal to a special condition and will transmit location information in the form of a digital data set to two or more receivers. The receivers collect, read, latch and acknowledge the data sets and forward them to decoders that produce an output signal for each data set received. The receivers also periodically reset the system following each scan of the sensor array. A comparator then determines if any two or more, as specified by the user, of the output signals corresponds to the same location. A sufficient number of matches produces a system output signal that activates a system to restore the array to its nominal condition.
Asynchronous parallel status comparator

DOEpatents

Arnold, J.W.; Hart, M.M.

1992-12-15

Disclosed is an apparatus for matching asynchronously received signals and determining whether two or more out of a total number of possible signals match. The apparatus comprises, in one embodiment, an array of sensors positioned in discrete locations and in communication with one or more processors. The processors will receive signals if the sensors detect a change in the variable sensed from a nominal to a special condition and will transmit location information in the form of a digital data set to two or more receivers. The receivers collect, read, latch and acknowledge the data sets and forward them to decoders that produce an output signal for each data set received. The receivers also periodically reset the system following each scan of the sensor array. A comparator then determines if any two or more, as specified by the user, of the output signals corresponds to the same location. A sufficient number of matches produces a system output signal that activates a system to restore the array to its nominal condition. 4 figs.
FPGA-based Upgrade to RITS-6 Control System, Designed with EMP Considerations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Harold D. Anderson, John T. Williams

2009-07-01

The existing control system for the RITS-6, a 20-MA 3-MV pulsed-power accelerator located at Sandia National Laboratories, was built as a system of analog switches because the operators needed to be close enough to the machine to hear pulsed-power breakdowns, yet the electromagnetic pulse (EMP) emitted would disable any processor-based solutions. The resulting system requires operators to activate and deactivate a series of 110-V relays manually in a complex order. The machine is sensitive to both the order of operation and the time taken between steps. A mistake in either case would cause a misfire and possible machine damage. Basedmore » on these constraints, a field-programmable gate array (FPGA) was chosen as the core of a proposed upgrade to the control system. An FPGA is a series of logic elements connected during programming. Based on their connections, the elements can mimic primitive logic elements, a process called synthesis. The circuit is static; all paths exist simultaneously and do not depend on a processor. This should make it less sensitive to EMP. By shielding it and using good electromagnetic interference-reduction practices, it should continue to operate well in the electrically noisy environment. The FPGA has two advantages over the existing system. In manual operation mode, the synthesized logic gates keep the operators in sequence. In addition, a clock signal and synthesized countdown circuit provides an automated sequence, with adjustable delays, for quickly executing the time-critical portions of charging and firing. The FPGA is modeled as a set of states, each state being a unique set of values for the output signals. The state is determined by the input signals, and in the automated segment by the value of the synthesized countdown timer, with the default mode placing the system in a safe configuration. Unlike a processor-based system, any system stimulus that results in an abort situation immediately executes a shutdown, with only a tens-of-nanoseconds delay to propagate across the FPGA. This paper discusses the design, installation, and testing of the proposed system upgrade, including failure statistics and modifications to the original design.« less
Next-generation biomedical implants using additive manufacturing of complex, cellular and functional mesh arrays.

PubMed

Murr, L E; Gaytan, S M; Medina, F; Lopez, H; Martinez, E; Machado, B I; Hernandez, D H; Martinez, L; Lopez, M I; Wicker, R B; Bracke, J

2010-04-28

In this paper, we examine prospects for the manufacture of patient-specific biomedical implants replacing hard tissues (bone), particularly knee and hip stems and large bone (femoral) intramedullary rods, using additive manufacturing (AM) by electron beam melting (EBM). Of particular interest is the fabrication of complex functional (biocompatible) mesh arrays. Mesh elements or unit cells can be divided into different regions in order to use different cell designs in different areas of the component to produce various or continually varying (functionally graded) mesh densities. Numerous design elements have been used to fabricate prototypes by AM using EBM of Ti-6Al-4V powders, where the densities have been compared with the elastic (Young) moduli determined by resonant frequency and damping analysis. Density optimization at the bone-implant interface can allow for bone ingrowth and cementless implant components. Computerized tomography (CT) scans of metal (aluminium alloy) foam have also allowed for the building of Ti-6Al-4V foams by embedding the digital-layered scans in computer-aided design or software models for EBM. Variations in mesh complexity and especially strut (or truss) dimensions alter the cooling and solidification rate, which alters the alpha-phase (hexagonal close-packed) microstructure by creating mixtures of alpha/alpha' (martensite) observed by optical and electron metallography. Microindentation hardness measurements are characteristic of these microstructures and microstructure mixtures (alpha/alpha') and sizes.
Soft-core processor study for node-based architectures.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Van Houten, Jonathan Roger; Jarosz, Jason P.; Welch, Benjamin James

2008-09-01

Node-based architecture (NBA) designs for future satellite projects hold the promise of decreasing system development time and costs, size, weight, and power and positioning the laboratory to address other emerging mission opportunities quickly. Reconfigurable Field Programmable Gate Array (FPGA) based modules will comprise the core of several of the NBA nodes. Microprocessing capabilities will be necessary with varying degrees of mission-specific performance requirements on these nodes. To enable the flexibility of these reconfigurable nodes, it is advantageous to incorporate the microprocessor into the FPGA itself, either as a hardcore processor built into the FPGA or as a soft-core processor builtmore » out of FPGA elements. This document describes the evaluation of three reconfigurable FPGA based processors for use in future NBA systems--two soft cores (MicroBlaze and non-fault-tolerant LEON) and one hard core (PowerPC 405). Two standard performance benchmark applications were developed for each processor. The first, Dhrystone, is a fixed-point operation metric. The second, Whetstone, is a floating-point operation metric. Several trials were run at varying code locations, loop counts, processor speeds, and cache configurations. FPGA resource utilization was recorded for each configuration. Cache configurations impacted the results greatly; for optimal processor efficiency it is necessary to enable caches on the processors. Processor caches carry a penalty; cache error mitigation is necessary when operating in a radiation environment.« less
An architecture for real-time vision processing

NASA Technical Reports Server (NTRS)

Chien, Chiun-Hong

1994-01-01

To study the feasibility of developing an architecture for real time vision processing, a task queue server and parallel algorithms for two vision operations were designed and implemented on an i860-based Mercury Computing System 860VS array processor. The proposed architecture treats each vision function as a task or set of tasks which may be recursively divided into subtasks and processed by multiple processors coordinated by a task queue server accessible by all processors. Each idle processor subsequently fetches a task and associated data from the task queue server for processing and posts the result to shared memory for later use. Load balancing can be carried out within the processing system without the requirement for a centralized controller. The author concludes that real time vision processing cannot be achieved without both sequential and parallel vision algorithms and a good parallel vision architecture.
A microcomputer based frequency-domain processor for laser Doppler anemometry

NASA Technical Reports Server (NTRS)

Horne, W. Clifton; Adair, Desmond

1988-01-01

A prototype multi-channel laser Doppler anemometry (LDA) processor was assembled using a wideband transient recorder and a microcomputer with an array processor for fast Fourier transform (FFT) computations. The prototype instrument was used to acquire, process, and record signals from a three-component wind tunnel LDA system subject to various conditions of noise and flow turbulence. The recorded data was used to evaluate the effectiveness of burst acceptance criteria, processing algorithms, and selection of processing parameters such as record length. The recorded signals were also used to obtain comparative estimates of signal-to-noise ratio between time-domain and frequency-domain signal detection schemes. These comparisons show that the FFT processing scheme allows accurate processing of signals for which the signal-to-noise ratio is 10 to 15 dB less than is practical using counter processors.
Scan Directed Load Balancing for Highly-Parallel Mesh-Connected Computers

DTIC Science & Technology

1991-07-01

DTIC ~ ELECTE OCT 2 41991 AD-A242 045 Scan Directed Load Balancing for Highly-Parallel Mesh-Connected Computers’ Edoardo S. Biagioni Jan F. Prins...Department of Computer Science University of North Carolina Chapel Hill, N.C. 27599-3175 USA biagioni @cs.unc.edu prinsOcs.unc.edu Abstract Scan Directed...MasPar Computer Corpora- tion. Bibliography [1] Edoardo S. Biagioni . Scan Directed Load Balancing. PhD thesis., University of North Carolina, Chapel Hill
Parallel eigenanalysis of finite element models in a completely connected architecture

NASA Technical Reports Server (NTRS)

Akl, F. A.; Morel, M. R.

1989-01-01

A parallel algorithm is presented for the solution of the generalized eigenproblem in linear elastic finite element analysis, (K)(phi) = (M)(phi)(omega), where (K) and (M) are of order N, and (omega) is order of q. The concurrent solution of the eigenproblem is based on the multifrontal/modified subspace method and is achieved in a completely connected parallel architecture in which each processor is allowed to communicate with all other processors. The algorithm was successfully implemented on a tightly coupled multiple-instruction multiple-data parallel processing machine, Cray X-MP. A finite element model is divided into m domains each of which is assumed to process n elements. Each domain is then assigned to a processor or to a logical processor (task) if the number of domains exceeds the number of physical processors. The macrotasking library routines are used in mapping each domain to a user task. Computational speed-up and efficiency are used to determine the effectiveness of the algorithm. The effect of the number of domains, the number of degrees-of-freedom located along the global fronts and the dimension of the subspace on the performance of the algorithm are investigated. A parallel finite element dynamic analysis program, p-feda, is documented and the performance of its subroutines in parallel environment is analyzed.
Lithium niobate guided-wave beam former for steering phased-array antennas.

PubMed

Armenise, M N; Passaro, V M; Noviello, G

1994-09-10

We present the theoretical investigation, design, and simulation of a novel guided-wave optical processor for L-band-transmission beam forming in a linear array of phased active antennas. The proposed configuration includes two contradirectional surface acoustic-wave transducers, and it is based on a Y-cut, X-propagating Ti:LiNbO(3) planar waveguide supporting the lowest-order modes of both polarizations (TE(0) and TM(0)) at the free-space wavelength λ = 0.85 µm. A detailed comparison between the processor we propose and other optical and electronic architectures reported in the literature is carried out, exhibiting a number of significant advantages in terms of weight, total chip size, and power consumption, when the number of antenna elements is greater than 50.
VLSI 'smart' I/O module development

NASA Astrophysics Data System (ADS)

Kirk, Dan

The developmental history, design, and operation of the MIL-STD-1553A/B discrete and serial module (DSM) for the U.S. Navy AN/AYK-14(V) avionics computer are described and illustrated with diagrams. The ongoing preplanned product improvement for the AN/AYK-14(V) includes five dual-redundant MIL-STD-1553 channels based on DSMs. The DSM is a front-end processor for transferring data to and from a common memory, sharing memory with a host processor to provide improved 'smart' input/output performance. Each DSM comprises three hardware sections: three VLSI-6000 semicustomized CMOS arrays, memory units to support the arrays, and buffers and resynchronization circuits. The DSM hardware module design, VLSI-6000 design tools, controlware and test software, and checkout procedures (using a hardware simulator) are characterized in detail.
Parallel processing data network of master and slave transputers controlled by a serial control network

DOEpatents

Crosetto, D.B.

1996-12-31

The present device provides for a dynamically configurable communication network having a multi-processor parallel processing system having a serial communication network and a high speed parallel communication network. The serial communication network is used to disseminate commands from a master processor to a plurality of slave processors to effect communication protocol, to control transmission of high density data among nodes and to monitor each slave processor`s status. The high speed parallel processing network is used to effect the transmission of high density data among nodes in the parallel processing system. Each node comprises a transputer, a digital signal processor, a parallel transfer controller, and two three-port memory devices. A communication switch within each node connects it to a fast parallel hardware channel through which all high density data arrives or leaves the node. 6 figs.
Double-layer interlaced nested multi-ring array metallic mesh for high-performance transparent electromagnetic interference shielding.

PubMed

Wang, Heyan; Lu, Zhengang; Liu, Yeshu; Tan, Jiubin; Ma, Limin; Lin, Shen

2017-04-15

We report a nested multi-ring array metallic mesh (NMA-MM) that shows a highly uniform diffraction pattern theoretically and experimentally. Then a high-performance transparent electromagnetic interference (EMI) shielding structure is constituted by the double-layer interlaced NMA-MMs separated by transparent quartz-glass substrate. Experimental results show that double-layer interlaced NMA-MM structure exhibits a shielding effectiveness (SE) of over 27 dB in the Ku-band, with a maximal SE of 37 dB at 12 GHz, normalized optical transmittance of 90%, and minimal image quality degradation due to the interlaced arrangement. It thus shows great potential for practical applications in transparent EMI shielding devices.
Reconfigurable pipelined processor

DOE Office of Scientific and Technical Information (OSTI.GOV)

Saccardi, R.J.

1989-09-19

This patent describes a reconfigurable pipelined processor for processing data. It comprises: a plurality of memory devices for storing bits of data; a plurality of arithmetic units for performing arithmetic functions with the data; cross bar means for connecting the memory devices with the arithmetic units for transferring data therebetween; at least one counter connected with the cross bar means for providing a source of addresses to the memory devices; at least one variable tick delay device connected with each of the memory devices and arithmetic units; and means for providing control bits to the variable tick delay device formore » variably controlling the input and output operations thereof to selectively delay the memory devices and arithmetic units to align the data for processing in a selected sequence.« less

78 FR 67099 - Submission for OMB Review; Comment Request

Federal Register 2010, 2011, 2012, 2013, 2014

2013-11-08

... professionals, who provide meals in institutional settings, can locate processors who manufacture foods... Service Title: USDA Food Connect Web site. OMB Control Number: 0581-0224. Summary of Collection: The USDA Food Connect Web site (previously known as the USDA Food and Commodity Connection Web site) operates...
Tensorial Basis Spline Collocation Method for Poisson's Equation

NASA Astrophysics Data System (ADS)

Plagne, Laurent; Berthou, Jean-Yves

2000-01-01

This paper aims to describe the tensorial basis spline collocation method applied to Poisson's equation. In the case of a localized 3D charge distribution in vacuum, this direct method based on a tensorial decomposition of the differential operator is shown to be competitive with both iterative BSCM and FFT-based methods. We emphasize the O(h4) and O(h6) convergence of TBSCM for cubic and quintic splines, respectively. We describe the implementation of this method on a distributed memory parallel machine. Performance measurements on a Cray T3E are reported. Our code exhibits high performance and good scalability: As an example, a 27 Gflops performance is obtained when solving Poisson's equation on a 2563 non-uniform 3D Cartesian mesh by using 128 T3E-750 processors. This represents 215 Mflops per processors.
Special purpose parallel computer architecture for real-time control and simulation in robotic applications

NASA Technical Reports Server (NTRS)

Fijany, Amir (Inventor); Bejczy, Antal K. (Inventor)

1993-01-01

This is a real-time robotic controller and simulator which is a MIMD-SIMD parallel architecture for interfacing with an external host computer and providing a high degree of parallelism in computations for robotic control and simulation. It includes a host processor for receiving instructions from the external host computer and for transmitting answers to the external host computer. There are a plurality of SIMD microprocessors, each SIMD processor being a SIMD parallel processor capable of exploiting fine grain parallelism and further being able to operate asynchronously to form a MIMD architecture. Each SIMD processor comprises a SIMD architecture capable of performing two matrix-vector operations in parallel while fully exploiting parallelism in each operation. There is a system bus connecting the host processor to the plurality of SIMD microprocessors and a common clock providing a continuous sequence of clock pulses. There is also a ring structure interconnecting the plurality of SIMD microprocessors and connected to the clock for providing the clock pulses to the SIMD microprocessors and for providing a path for the flow of data and instructions between the SIMD microprocessors. The host processor includes logic for controlling the RRCS by interpreting instructions sent by the external host computer, decomposing the instructions into a series of computations to be performed by the SIMD microprocessors, using the system bus to distribute associated data among the SIMD microprocessors, and initiating activity of the SIMD microprocessors to perform the computations on the data by procedure call.
Transfer and alignment of random single-walled carbon nanotube films by contact printing.

PubMed

Liu, Huaping; Takagi, Daisuke; Chiashi, Shohei; Homma, Yoshikazu

2010-02-23

We present a simple method to transfer large-area random single-walled carbon nanotube (SWCNT) films grown on SiO(2) substrates onto another surface through a simple contact printing process. The transferred random SWCNT films can be assembled into highly ordered, dense regular arrays with high uniformity and reproducibility by sliding the growth substrate during the transfer process. The position of the transferred SWCNT film can be controlled by predefined patterns on the receiver substrates. The process is compatible with a variety of substrates, and even metal meshes for transmission electron microscopy (TEM) can be used as receiver substrates. Thus, suspended web-like SWCNT networks and aligned SWCNT arrays can be formed over the grids of TEM meshes, so that the structures of the transferred SWCNTs can be directly observed by TEM. This simple technique can be used to controllably transfer SWCNTs for property studies, for the fabrication of devices, or even as support films for TEM meshes.
Computer aided stress analysis of long bones utilizing computer tomography

DOE Office of Scientific and Technical Information (OSTI.GOV)

Marom, S.A.

1986-01-01

A computer aided analysis method, utilizing computed tomography (CT) has been developed, which together with a finite element program determines the stress-displacement pattern in a long bone section. The CT data file provides the geometry, the density and the material properties for the generated finite element model. A three-dimensional finite element model of a tibial shaft is automatically generated from the CT file by a pre-processing procedure for a finite element program. The developed pre-processor includes an edge detection algorithm which determines the boundaries of the reconstructed cross-sectional images of the scanned bone. A mesh generation procedure than automatically generatesmore » a three-dimensional mesh of a user-selected refinement. The elastic properties needed for the stress analysis are individually determined for each model element using the radiographic density (CT number) of each pixel with the elemental borders. The elastic modulus is determined from the CT radiographic density by using an empirical relationship from the literature. The generated finite element model, together with applied loads, determined from existing gait analysis and initial displacements, comprise a formatted input for the SAP IV finite element program. The output of this program, stresses and displacements at the model elements and nodes, are sorted and displayed by a developed post-processor to provide maximum and minimum values at selected locations in the model.« less
C-MOS array design techniques: SUMC multiprocessor system study

NASA Technical Reports Server (NTRS)

Clapp, W. A.; Helbig, W. A.; Merriam, A. S.

1972-01-01

The current capabilities of LSI techniques for speed and reliability, plus the possibilities of assembling large configurations of LSI logic and storage elements, have demanded the study of multiprocessors and multiprocessing techniques, problems, and potentialities. Evaluated are three previous systems studies for a space ultrareliable modular computer multiprocessing system, and a new multiprocessing system is proposed that is flexibly configured with up to four central processors, four 1/0 processors, and 16 main memory units, plus auxiliary memory and peripheral devices. This multiprocessor system features a multilevel interrupt, qualified S/360 compatibility for ground-based generation of programs, virtual memory management of a storage hierarchy through 1/0 processors, and multiport access to multiple and shared memory units.
Methods for operating parallel computing systems employing sequenced communications

DOEpatents

Benner, R.E.; Gustafson, J.L.; Montry, G.R.

1999-08-10

A parallel computing system and method are disclosed having improved performance where a program is concurrently run on a plurality of nodes for reducing total processing time, each node having a processor, a memory, and a predetermined number of communication channels connected to the node and independently connected directly to other nodes. The present invention improves performance of the parallel computing system by providing a system which can provide efficient communication between the processors and between the system and input and output devices. A method is also disclosed which can locate defective nodes with the computing system. 15 figs.
Methods for operating parallel computing systems employing sequenced communications

DOEpatents

Benner, Robert E.; Gustafson, John L.; Montry, Gary R.

1999-01-01

A parallel computing system and method having improved performance where a program is concurrently run on a plurality of nodes for reducing total processing time, each node having a processor, a memory, and a predetermined number of communication channels connected to the node and independently connected directly to other nodes. The present invention improves performance of performance of the parallel computing system by providing a system which can provide efficient communication between the processors and between the system and input and output devices. A method is also disclosed which can locate defective nodes with the computing system.
Purifier-integrated methanol reformer for fuel cell vehicles

NASA Astrophysics Data System (ADS)

Han, Jaesung; Kim, Il-soo; Choi, Keun-Sup

We developed a compact, 3-kW, purifier-integrated modular reformer which becomes the building block of full-scale 30-kW or 50-kW methanol fuel processors for fuel cell vehicles. Our proprietary technologies regarding hydrogen purification by composite metal membrane and catalytic combustion by washcoated wire-mesh catalyst were combined with the conventional methanol steam-reforming technology, resulting in higher conversion, excellent quality of product hydrogen, and better thermal efficiency than any other systems using preferential oxidation. In this system, steam reforming, hydrogen purification, and catalytic combustion all take place in a single reactor so that the whole system is compact and easy to operate. Hydrogen from the module is ultrahigh pure (99.9999% or better), hence there is no power degradation of PEMFC stack due to contamination by CO. Also, since only pure hydrogen is supplied to the anode of the PEMFC stack, 100% hydrogen utilization is possible in the stack. The module produces 2.3 Nm 3/h of hydrogen, which is equivalent to 3 kW when PEMFC has 43% efficiency. Thermal efficiency (HHV of product H 2/HHV of MeOH in) of the module is 89% and the power density of the module is 0.77 kW/l. This work was conducted in cooperation with Hyundai Motor Company in the form of a Korean national project. Currently the module is under test with an actual fuel cell stack in order to verify its performance. Sooner or later a full-scale 30-kW system will be constructed by connecting these modules in series and parallel and will serve as the fuel processor for the Korean first fuel cell hybrid vehicle.
Multinode reconfigurable pipeline computer

NASA Technical Reports Server (NTRS)

Nosenchuck, Daniel M. (Inventor); Littman, Michael G. (Inventor)

1989-01-01

A multinode parallel-processing computer is made up of a plurality of innerconnected, large capacity nodes each including a reconfigurable pipeline of functional units such as Integer Arithmetic Logic Processors, Floating Point Arithmetic Processors, Special Purpose Processors, etc. The reconfigurable pipeline of each node is connected to a multiplane memory by a Memory-ALU switch NETwork (MASNET). The reconfigurable pipeline includes three (3) basic substructures formed from functional units which have been found to be sufficient to perform the bulk of all calculations. The MASNET controls the flow of signals from the memory planes to the reconfigurable pipeline and vice versa. the nodes are connectable together by an internode data router (hyperspace router) so as to form a hypercube configuration. The capability of the nodes to conditionally configure the pipeline at each tick of the clock, without requiring a pipeline flush, permits many powerful algorithms to be implemented directly.
Deposition and post-processing techniques for transparent conductive films

DOE Office of Scientific and Technical Information (OSTI.GOV)

Christoforo, Mark Greyson; Mehra, Saahil; Salleo, Alberto

2017-07-04

In one embodiment, a method is provided for fabrication of a semitransparent conductive mesh. A first solution having conductive nanowires suspended therein and a second solution having nanoparticles suspended therein are sprayed toward a substrate, the spraying forming a mist. The mist is processed, while on the substrate, to provide a semitransparent conductive material in the form of a mesh having the conductive nanowires and nanoparticles. The nanoparticles are configured and arranged to direct light passing through the mesh. Connections between the nanowires provide conductivity through the mesh.
Graph Partitioning for Parallel Applications in Heterogeneous Grid Environments

NASA Technical Reports Server (NTRS)

Bisws, Rupak; Kumar, Shailendra; Das, Sajal K.; Biegel, Bryan (Technical Monitor)

2002-01-01

The problem of partitioning irregular graphs and meshes for parallel computations on homogeneous systems has been extensively studied. However, these partitioning schemes fail when the target system architecture exhibits heterogeneity in resource characteristics. With the emergence of technologies such as the Grid, it is imperative to study the partitioning problem taking into consideration the differing capabilities of such distributed heterogeneous systems. In our model, the heterogeneous system consists of processors with varying processing power and an underlying non-uniform communication network. We present in this paper a novel multilevel partitioning scheme for irregular graphs and meshes, that takes into account issues pertinent to Grid computing environments. Our partitioning algorithm, called MiniMax, generates and maps partitions onto a heterogeneous system with the objective of minimizing the maximum execution time of the parallel distributed application. For experimental performance study, we have considered both a realistic mesh problem from NASA as well as synthetic workloads. Simulation results demonstrate that MiniMax generates high quality partitions for various classes of applications targeted for parallel execution in a distributed heterogeneous environment.
A spring system method for a mesh generation problem

NASA Astrophysics Data System (ADS)

Romanov, A.

2018-04-01

A new direct method for the 2d-mesh generation for a simply-connected domain using a spring system is observed. The method can be used with other methods to modify a mesh for growing solid problems. Advantages and disadvantages of the method are shown. Different types of boundary conditions are explored. The results of modelling for different target domains are given. Some applications for composite materials are studied.
Cosmology on a Mesh

NASA Astrophysics Data System (ADS)

Gill, Stuart P. D.; Knebe, Alexander; Gibson, Brad K.; Flynn, Chris; Ibata, Rodrigo A.; Lewis, Geraint F.

2003-04-01

An adaptive multi grid approach to simulating the formation of structure from collisionless dark matter is described. MLAPM (Multi-Level Adaptive Particle Mesh) is one of the most efficient serial codes available on the cosmological "market" today. As part of Swinburne University's role in the development of the Square Kilometer Array, we are implementing hydrodynamics, feedback, and radiative transfer within the MLAPM adaptive mesh, in order to simulate baryonic processes relevant to the interstellar and intergalactic media at high redshift. We will outline our progress to date in applying the existing MLAPM to a study of the decay of satellite galaxies within massive host potentials.
Upset Characterization of the PowerPC405 Hard-core Processor Embedded in Virtex-II Pro Field Programmable Gate Arrays

NASA Technical Reports Server (NTRS)

Swift, Gary M.; Allen, Gregory S.; Farmanesh, Farhad; George, Jeffrey; Petrick, David J.; Chayab, Fayez

2006-01-01

Shown in this presentation are recent results for the upset susceptibility of the various types of memory elements in the embedded PowerPC405 in the Xilinx V2P40 FPGA. For critical flight designs where configuration upsets are mitigated effectively through appropriate design triplication and configuration scrubbing, these upsets of processor elements can dominate the system error rate. Data from irradiations with both protons and heavy ions are given and compared using available models.
Integrated High-Speed Torque Control System for a Robotic Joint

NASA Technical Reports Server (NTRS)

Davis, Donald R. (Inventor); Radford, Nicolaus A. (Inventor); Permenter, Frank Noble (Inventor); Valvo, Michael C. (Inventor); Askew, R. Scott (Inventor)

2013-01-01

A control system for achieving high-speed torque for a joint of a robot includes a printed circuit board assembly (PCBA) having a collocated joint processor and high-speed communication bus. The PCBA may also include a power inverter module (PIM) and local sensor conditioning electronics (SCE) for processing sensor data from one or more motor position sensors. Torque control of a motor of the joint is provided via the PCBA as a high-speed torque loop. Each joint processor may be embedded within or collocated with the robotic joint being controlled. Collocation of the joint processor, PIM, and high-speed bus may increase noise immunity of the control system, and the localized processing of sensor data from the joint motor at the joint level may minimize bus cabling to and from each control node. The joint processor may include a field programmable gate array (FPGA).
CPU architecture for a fast and energy-saving calculation of convolution neural networks

NASA Astrophysics Data System (ADS)

Knoll, Florian J.; Grelcke, Michael; Czymmek, Vitali; Holtorf, Tim; Hussmann, Stephan

2017-06-01

One of the most difficult problem in the use of artificial neural networks is the computational capacity. Although large search engine companies own specially developed hardware to provide the necessary computing power, for the conventional user only remains the state of the art method, which is the use of a graphic processing unit (GPU) as a computational basis. Although these processors are well suited for large matrix computations, they need massive energy. Therefore a new processor on the basis of a field programmable gate array (FPGA) has been developed and is optimized for the application of deep learning. This processor is presented in this paper. The processor can be adapted for a particular application (in this paper to an organic farming application). The power consumption is only a fraction of a GPU application and should therefore be well suited for energy-saving applications.
Computations on the massively parallel processor at the Goddard Space Flight Center

NASA Technical Reports Server (NTRS)

Strong, James P.

1991-01-01

Described are four significant algorithms implemented on the massively parallel processor (MPP) at the Goddard Space Flight Center. Two are in the area of image analysis. Of the other two, one is a mathematical simulation experiment and the other deals with the efficient transfer of data between distantly separated processors in the MPP array. The first algorithm presented is the automatic determination of elevations from stereo pairs. The second algorithm solves mathematical logistic equations capable of producing both ordered and chaotic (or random) solutions. This work can potentially lead to the simulation of artificial life processes. The third algorithm is the automatic segmentation of images into reasonable regions based on some similarity criterion, while the fourth is an implementation of a bitonic sort of data which significantly overcomes the nearest neighbor interconnection constraints on the MPP for transferring data between distant processors.
Design of a ``Digital Atlas Vme Electronics'' (DAVE) module

NASA Astrophysics Data System (ADS)

Goodrick, M.; Robinson, D.; Shaw, R.; Postranecky, M.; Warren, M.

2012-01-01

ATLAS-SCT has developed a new ATLAS trigger card, 'Digital Atlas Vme Electronics' (``DAVE''). The unit is designed to provide a versatile array of interface and logic resources, including a large FPGA. It interfaces to both VME bus and USB hosts. DAVE aims to provide exact ATLAS CTP (ATLAS Central Trigger Processor) functionality, with random trigger, simple and complex deadtime, ECR (Event Counter Reset), BCR (Bunch Counter Reset) etc. being generated to give exactly the same conditions in standalone running as experienced in combined runs. DAVE provides additional hardware and a large amount of free firmware resource to allow users to add or change functionality. The combination of the large number of individually programmable inputs and outputs in various formats, with very large external RAM and other components all connected to the FPGA, also makes DAVE a powerful and versatile FPGA utility card.
Moving Object Detection Using Scanning Camera on a High-Precision Intelligent Holder

PubMed Central

Chen, Shuoyang; Xu, Tingfa; Li, Daqun; Zhang, Jizhou; Jiang, Shenwang

2016-01-01

During the process of moving object detection in an intelligent visual surveillance system, a scenario with complex background is sure to appear. The traditional methods, such as “frame difference” and “optical flow”, may not able to deal with the problem very well. In such scenarios, we use a modified algorithm to do the background modeling work. In this paper, we use edge detection to get an edge difference image just to enhance the ability of resistance illumination variation. Then we use a “multi-block temporal-analyzing LBP (Local Binary Pattern)” algorithm to do the segmentation. In the end, a connected component is used to locate the object. We also produce a hardware platform, the core of which consists of the DSP (Digital Signal Processor) and FPGA (Field Programmable Gate Array) platforms and the high-precision intelligent holder. PMID:27775671

A digital video tracking system

NASA Astrophysics Data System (ADS)

Giles, M. K.

1980-01-01

The Real-Time Videotheodolite (RTV) was developed in connection with the requirement to replace film as a recording medium to obtain the real-time location of an object in the field-of-view (FOV) of a long focal length theodolite. Design philosophy called for a system capable of discriminatory judgment in identifying the object to be tracked with 60 independent observations per second, capable of locating the center of mass of the object projection on the image plane within about 2% of the FOV in rapidly changing background/foreground situations, and able to generate a predicted observation angle for the next observation. A description is given of a number of subsystems of the RTV, taking into account the processor configuration, the video processor, the projection processor, the tracker processor, the control processor, and the optics interface and imaging subsystem.
Syringe-Injectable Electronics with a Plug-and-Play Input/Output Interface.

PubMed

Schuhmann, Thomas G; Yao, Jun; Hong, Guosong; Fu, Tian-Ming; Lieber, Charles M

2017-09-13

Syringe-injectable mesh electronics represent a new paradigm for brain science and neural prosthetics by virtue of the stable seamless integration of the electronics with neural tissues, a consequence of the macroporous mesh electronics structure with all size features similar to or less than individual neurons and tissue-like flexibility. These same properties, however, make input/output (I/O) connection to measurement electronics challenging, and work to-date has required methods that could be difficult to implement by the life sciences community. Here we present a new syringe-injectable mesh electronics design with plug-and-play I/O interfacing that is rapid, scalable, and user-friendly to nonexperts. The basic design tapers the ultraflexible mesh electronics to a narrow stem that routes all of the device/electrode interconnects to I/O pads that are inserted into a standard zero insertion force (ZIF) connector. Studies show that the entire plug-and-play mesh electronics can be delivered through capillary needles with precise targeting using microliter-scale injection volumes similar to the standard mesh electronics design. Electrical characterization of mesh electronics containing platinum (Pt) electrodes and silicon (Si) nanowire field-effect transistors (NW-FETs) demonstrates the ability to interface arbitrary devices with a contact resistance of only 3 Ω. Finally, in vivo injection into mice required only minutes for I/O connection and yielded expected local field potential (LFP) recordings from a compact head-stage compatible with chronic studies. Our results substantially lower barriers for use by new investigators and open the door for increasingly sophisticated and multifunctional mesh electronics designs for both basic and translational studies.
NeuroFlow: A General Purpose Spiking Neural Network Simulation Platform using Customizable Processors.

PubMed

Cheung, Kit; Schultz, Simon R; Luk, Wayne

2015-01-01

NeuroFlow is a scalable spiking neural network simulation platform for off-the-shelf high performance computing systems using customizable hardware processors such as Field-Programmable Gate Arrays (FPGAs). Unlike multi-core processors and application-specific integrated circuits, the processor architecture of NeuroFlow can be redesigned and reconfigured to suit a particular simulation to deliver optimized performance, such as the degree of parallelism to employ. The compilation process supports using PyNN, a simulator-independent neural network description language, to configure the processor. NeuroFlow supports a number of commonly used current or conductance based neuronal models such as integrate-and-fire and Izhikevich models, and the spike-timing-dependent plasticity (STDP) rule for learning. A 6-FPGA system can simulate a network of up to ~600,000 neurons and can achieve a real-time performance of 400,000 neurons. Using one FPGA, NeuroFlow delivers a speedup of up to 33.6 times the speed of an 8-core processor, or 2.83 times the speed of GPU-based platforms. With high flexibility and throughput, NeuroFlow provides a viable environment for large-scale neural network simulation.
NeuroFlow: A General Purpose Spiking Neural Network Simulation Platform using Customizable Processors

PubMed Central

Cheung, Kit; Schultz, Simon R.; Luk, Wayne

2016-01-01

NeuroFlow is a scalable spiking neural network simulation platform for off-the-shelf high performance computing systems using customizable hardware processors such as Field-Programmable Gate Arrays (FPGAs). Unlike multi-core processors and application-specific integrated circuits, the processor architecture of NeuroFlow can be redesigned and reconfigured to suit a particular simulation to deliver optimized performance, such as the degree of parallelism to employ. The compilation process supports using PyNN, a simulator-independent neural network description language, to configure the processor. NeuroFlow supports a number of commonly used current or conductance based neuronal models such as integrate-and-fire and Izhikevich models, and the spike-timing-dependent plasticity (STDP) rule for learning. A 6-FPGA system can simulate a network of up to ~600,000 neurons and can achieve a real-time performance of 400,000 neurons. Using one FPGA, NeuroFlow delivers a speedup of up to 33.6 times the speed of an 8-core processor, or 2.83 times the speed of GPU-based platforms. With high flexibility and throughput, NeuroFlow provides a viable environment for large-scale neural network simulation. PMID:26834542
Parallel processor for real-time structural control

NASA Astrophysics Data System (ADS)

Tise, Bert L.

1993-07-01

A parallel processor that is optimized for real-time linear control has been developed. This modular system consists of A/D modules, D/A modules, and floating-point processor modules. The scalable processor uses up to 1,000 Motorola DSP96002 floating-point processors for a peak computational rate of 60 GFLOPS. Sampling rates up to 625 kHz are supported by this analog-in to analog-out controller. The high processing rate and parallel architecture make this processor suitable for computing state-space equations and other multiply/accumulate-intensive digital filters. Processor features include 14-bit conversion devices, low input-to-output latency, 240 Mbyte/s synchronous backplane bus, low-skew clock distribution circuit, VME connection to host computer, parallelizing code generator, and look- up-tables for actuator linearization. This processor was designed primarily for experiments in structural control. The A/D modules sample sensors mounted on the structure and the floating- point processor modules compute the outputs using the programmed control equations. The outputs are sent through the D/A module to the power amps used to drive the structure's actuators. The host computer is a Sun workstation. An OpenWindows-based control panel is provided to facilitate data transfer to and from the processor, as well as to control the operating mode of the processor. A diagnostic mode is provided to allow stimulation of the structure and acquisition of the structural response via sensor inputs.
Optical microwave filter based on spectral slicing by use of arrayed waveguide gratings.

PubMed

Pastor, Daniel; Ortega, Beatriz; Capmany, José; Sales, Salvador; Martinez, Alfonso; Muñoz, Pascual

2003-10-01

We have experimentally demonstrated a new optical signal processor based on the use of arrayed waveguide gratings. The structure exploits the concept of spectral slicing combined with the use of an optical dispersive medium. The approach presents increased flexibility from previous slicing-based structures in terms of tunability, reconfiguration, and apodization of the samples or coefficients of the transversal optical filter.
Radiation-Hardened Wafer Scale Integration

DTIC Science & Technology

1989-10-25

unlimited. LEXINGTON MASSACHUSETTS EXECUTIVE SUMMARY A focal plane processor (FPP) for a large array of LWIR photodetectors on a space platform must...It seems certain that large. scanning LWIR arrays will once again be of interest in the future, though their specifications will differ from those... nonuniformity and defects in the ZMR material, but films of good quality produced by this technique are now available commercially from Kopin Corporation. Such
Sandia’s Current Energy Conversion module for the Flexible-Mesh Delft3D flow solver v. 1.0

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chartand, Chris; Jagers, Bert

The DOE has funded Sandia National Labs (SNL) to develop an open-source modeling tool to guide the design and layout of marine hydrokinetic (MHK) arrays to maximize power production while minimizing environmental effects. This modeling framework simulates flows through and around a MHK arrays while quantifying environmental responses. As an augmented version of the Dutch company, Deltares’s, environmental hydrodynamics code, Delft3D, SNL-Delft3D-CEC-FM includes a new module that simulates energy conversion (momentum withdrawal) by MHK current energy conversion devices with commensurate changes in the turbulent kinetic energy and its dissipation rate. SNL-Delft3D-CEC-FM modified the Delft3D flexible mesh flow solver, DFlowFM.
Parallel Unsteady Overset Mesh Methodology for a Multi-Solver Paradigm with Adaptive Cartesian Grids

DTIC Science & Technology

2008-08-21

Engineer, U.S. Army Research Laboratory ., Matthew.W.Floros@nasa.gov, AIAA Member ‡Senior Research Scientist, Scaled Numerical Physics LLC., awissink...IV.E and IV.D). Good linear scalability was observed for all three cases up to 12 processors. Beyond that the scalability drops off depending on grid...Research Laboratory for the usage of SUGGAR module and Yikloon Lee at NAVAIR for the usage of the NAVAIR-IHC code. 13 of 22 American Institute of
Two-dimensional optoelectronic interconnect-processor and its operational bit error rate

NASA Astrophysics Data System (ADS)

Liu, J. Jiang; Gollsneider, Brian; Chang, Wayne H.; Carhart, Gary W.; Vorontsov, Mikhail A.; Simonis, George J.; Shoop, Barry L.

2004-10-01

Two-dimensional (2-D) multi-channel 8x8 optical interconnect and processor system were designed and developed using complementary metal-oxide-semiconductor (CMOS) driven 850-nm vertical-cavity surface-emitting laser (VCSEL) arrays and the photodetector (PD) arrays with corresponding wavelengths. We performed operation and bit-error-rate (BER) analysis on this free-space integrated 8x8 VCSEL optical interconnects driven by silicon-on-sapphire (SOS) circuits. Pseudo-random bit stream (PRBS) data sequence was used in operation of the interconnects. Eye diagrams were measured from individual channels and analyzed using a digital oscilloscope at data rates from 155 Mb/s to 1.5 Gb/s. Using a statistical model of Gaussian distribution for the random noise in the transmission, we developed a method to compute the BER instantaneously with the digital eye-diagrams. Direct measurements on this interconnects were also taken on a standard BER tester for verification. We found that the results of two methods were in the same order and within 50% accuracy. The integrated interconnects were investigated in an optoelectronic processing architecture of digital halftoning image processor. Error diffusion networks implemented by the inherently parallel nature of photonics promise to provide high quality digital halftoned images.
Landsat real-time processing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Davis, E.L.

A novel method for performing real-time acquisition and processing Landsat/EROS data covers all aspects including radiometric and geometric corrections of multispectral scanner or return-beam vidicon inputs, image enhancement, statistical analysis, feature extraction, and classification. Radiometric transformations include bias/gain adjustment, noise suppression, calibration, scan angle compensation, and illumination compensation, including topography and atmospheric effects. Correction or compensation for geometric distortion includes sensor-related distortions, such as centering, skew, size, scan nonlinearity, radial symmetry, and tangential symmetry. Also included are object image-related distortions such as aspect angle (altitude), scale distortion (altitude), terrain relief, and earth curvature. Ephemeral corrections are also applied to compensatemore » for satellite forward movement, earth rotation, altitude variations, satellite vibration, and mirror scan velocity. Image enhancement includes high-pass, low-pass, and Laplacian mask filtering and data restoration for intermittent losses. Resource classification is provided by statistical analysis including histograms, correlational analysis, matrix manipulations, and determination of spectral responses. Feature extraction includes spatial frequency analysis, which is used in parallel discriminant functions in each array processor for rapid determination. The technique uses integrated parallel array processors that decimate the tasks concurrently under supervision of a control processor. The operator-machine interface is optimized for programming ease and graphics image windowing.« less
CoNNeCT Baseband Processor Module Boot Code SoftWare (BCSW)

NASA Technical Reports Server (NTRS)

Yamamoto, Clifford K.; Orozco, David S.; Byrne, D. J.; Allen, Steven J.; Sahasrabudhe, Adit; Lang, Minh

2012-01-01

This software provides essential startup and initialization routines for the CoNNeCT baseband processor module (BPM) hardware upon power-up. A command and data handling (C&DH) interface is provided via 1553 and diagnostic serial interfaces to invoke operational, reconfiguration, and test commands within the code. The BCSW has features unique to the hardware it is responsible for managing. In this case, the CoNNeCT BPM is configured with an updated CPU (Atmel AT697 SPARC processor) and a unique set of memory and I/O peripherals that require customized software to operate. These features include configuration of new AT697 registers, interfacing to a new HouseKeeper with a flash controller interface, a new dual Xilinx configuration/scrub interface, and an updated 1553 remote terminal (RT) core. The BCSW is intended to provide a "safe" mode for the BPM when initially powered on or when an unexpected trap occurs, causing the processor to reset. The BCSW allows the 1553 bus controller in the spacecraft or payload controller to operate the BPM over 1553 to upload code; upload Xilinx bit files; perform rudimentary tests; read, write, and copy the non-volatile flash memory; and configure the Xilinx interface. Commands also exist over 1553 to cause the CPU to jump or call a specified address to begin execution of user-supplied code. This may be in the form of a real-time operating system, test routine, or specific application code to run on the BPM.
Copper Mesh Templated by Breath-Figure Polymer Films as Flexible Transparent Electrodes for Organic Photovoltaic Devices.

PubMed

Zhou, Weixin; Chen, Jun; Li, Yi; Wang, Danbei; Chen, Jianyu; Feng, Xiaomiao; Huang, Zhendong; Liu, Ruiqing; Lin, Xiujing; Zhang, Hongmei; Mi, Baoxiu; Ma, Yanwen

2016-05-04

Metal mesh is a significant candidate of flexible transparent electrodes to substitute the current state-of-the-art material indium tin oxide (ITO) for future flexible electronics. However, there remains a challenge to fabricate metal mesh with order patterns by a bottom-up approach. In this work, high-quality Cu mesh transparent electrodes with ordered pore arrays are prepared by using breath-figure polymer films as template. The optimal Cu mesh films present a sheet resistance of 28.7 Ω·sq(-1) at a transparency of 83.5%. The work function of Cu mesh electrode is tuned from 4.6 to 5.1 eV by Ag deposition and the following short-time UV-ozone treatment, matching well with the PSS (5.2 eV) hole extraction layer. The modified Cu mesh electrodes show remarkable potential as a substitute of ITO/PET in the flexible OPV and OLED devices. The OPV cells constructed on our Cu mesh electrodes present a similar power conversion efficiency of 2.04% as those on ITO/PET electrodes. The flexible OLED prototype devices can achieve a brightness of 10 000 cd at an operation voltage of 8 V.
Self-Avoiding Walks over Adaptive Triangular Grids

NASA Technical Reports Server (NTRS)

Heber, Gerd; Biswas, Rupak; Gao, Guang R.; Saini, Subhash (Technical Monitor)

1998-01-01

In this paper, we present a new approach to constructing a "self-avoiding" walk through a triangular mesh. Unlike the popular approach of visiting mesh elements using space-filling curves which is based on a geometric embedding, our approach is combinatorial in the sense that it uses the mesh connectivity only. We present an algorithm for constructing a self-avoiding walk which can be applied to any unstructured triangular mesh. The complexity of the algorithm is O(n x log(n)), where n is the number of triangles in the mesh. We show that for hierarchical adaptive meshes, the algorithm can be easily parallelized by taking advantage of the regularity of the refinement rules. The proposed approach should be very useful in the run-time partitioning and load balancing of adaptive unstructured grids.
Description and Simulation of a Fast Packet Switch Architecture for Communication Satellites

NASA Technical Reports Server (NTRS)

Quintana, Jorge A.; Lizanich, Paul J.

1995-01-01

The NASA Lewis Research Center has been developing the architecture for a multichannel communications signal processing satellite (MCSPS) as part of a flexible, low-cost meshed-VSAT (very small aperture terminal) network. The MCSPS architecture is based on a multifrequency, time-division-multiple-access (MF-TDMA) uplink and a time-division multiplex (TDM) downlink. There are eight uplink MF-TDMA beams, and eight downlink TDM beams, with eight downlink dwells per beam. The information-switching processor, which decodes, stores, and transmits each packet of user data to the appropriate downlink dwell onboard the satellite, has been fully described by using VHSIC (Very High Speed Integrated-Circuit) Hardware Description Language (VHDL). This VHDL code, which was developed in-house to simulate the information switching processor, showed that the architecture is both feasible and viable. This paper describes a shared-memory-per-beam architecture, its VHDL implementation, and the simulation efforts.
Dynamic Load Balancing for Adaptive Computations on Distributed-Memory Machines

NASA Technical Reports Server (NTRS)

1999-01-01

Dynamic load balancing is central to adaptive mesh-based computations on large-scale parallel computers. The principal investigator has investigated various issues on the dynamic load balancing problem under NASA JOVE and JAG rants. The major accomplishments of the project are two graph partitioning algorithms and a load balancing framework. The S-HARP dynamic graph partitioner is known to be the fastest among the known dynamic graph partitioners to date. It can partition a graph of over 100,000 vertices in 0.25 seconds on a 64- processor Cray T3E distributed-memory multiprocessor while maintaining the scalability of over 16-fold speedup. Other known and widely used dynamic graph partitioners take over a second or two while giving low scalability of a few fold speedup on 64 processors. These results have been published in journals and peer-reviewed flagship conferences.
A Streaming Language Implementation of the Discontinuous Galerkin Method

NASA Technical Reports Server (NTRS)

Barth, Timothy; Knight, Timothy

2005-01-01

We present a Brook streaming language implementation of the 3-D discontinuous Galerkin method for compressible fluid flow on tetrahedral meshes. Efficient implementation of the discontinuous Galerkin method using the streaming model of computation introduces several algorithmic design challenges. Using a cycle-accurate simulator, performance characteristics have been obtained for the Stanford Merrimac stream processor. The current Merrimac design achieves 128 Gflops per chip and the desktop board is populated with 16 chips yielding a peak performance of 2 Teraflops. Total parts cost for the desktop board is less than $20K. Current cycle-accurate simulations for discretizations of the 3-D compressible flow equations yield approximately 40-50% of the peak performance of the Merrimac streaming processor chip. Ongoing work includes the assessment of the performance of the same algorithm on the 2 Teraflop desktop board with a target goal of achieving 1 Teraflop performance.
Compact VLSI neural computer integrated with active pixel sensor for real-time ATR applications

NASA Astrophysics Data System (ADS)

Fang, Wai-Chi; Udomkesmalee, Gabriel; Alkalai, Leon

1997-04-01

A compact VLSI neural computer integrated with an active pixel sensor has been under development to mimic what is inherent in biological vision systems. This electronic eye- brain computer is targeted for real-time machine vision applications which require both high-bandwidth communication and high-performance computing for data sensing, synergy of multiple types of sensory information, feature extraction, target detection, target recognition, and control functions. The neural computer is based on a composite structure which combines Annealing Cellular Neural Network (ACNN) and Hierarchical Self-Organization Neural Network (HSONN). The ACNN architecture is a programmable and scalable multi- dimensional array of annealing neurons which are locally connected with their local neurons. Meanwhile, the HSONN adopts a hierarchical structure with nonlinear basis functions. The ACNN+HSONN neural computer is effectively designed to perform programmable functions for machine vision processing in all levels with its embedded host processor. It provides a two order-of-magnitude increase in computation power over the state-of-the-art microcomputer and DSP microelectronics. A compact current-mode VLSI design feasibility of the ACNN+HSONN neural computer is demonstrated by a 3D 16X8X9-cube neural processor chip design in a 2-micrometers CMOS technology. Integration of this neural computer as one slice of a 4'X4' multichip module into the 3D MCM based avionics architecture for NASA's New Millennium Program is also described.
Fault tolerant, radiation hard, high performance digital signal processor

NASA Technical Reports Server (NTRS)

Holmann, Edgar; Linscott, Ivan R.; Maurer, Michael J.; Tyler, G. L.; Libby, Vibeke

1990-01-01

An architecture has been developed for a high-performance VLSI digital signal processor that is highly reliable, fault-tolerant, and radiation-hard. The signal processor, part of a spacecraft receiver designed to support uplink radio science experiments at the outer planets, organizes the connections between redundant arithmetic resources, register files, and memory through a shuffle exchange communication network. The configuration of the network and the state of the processor resources are all under microprogram control, which both maps the resources according to algorithmic needs and reconfigures the processing should a failure occur. In addition, the microprogram is reloadable through the uplink to accommodate changes in the science objectives throughout the course of the mission. The processor will be implemented with silicon compiler tools, and its design will be verified through silicon compilation simulation at all levels from the resources to full functionality. By blending reconfiguration with redundancy the processor implementation is fault-tolerant and reliable, and possesses the long expected lifetime needed for a spacecraft mission to the outer planets.
Set processing in a network environment. [data bases and magnetic disks and tapes

NASA Technical Reports Server (NTRS)

Hardgrave, W. T.

1975-01-01

A combination of a local network, a mass storage system, and an autonomous set processor serving as a data/storage management machine is described. Its characteristics include: content-accessible data bases usable from all connected devices; efficient storage/access of large data bases; simple and direct programming with data manipulation and storage management handled by the set processor; simple data base design and entry from source representation to set processor representation with no predefinition necessary; capability available for user sort/order specification; significant reduction in tape/disk pack storage and mounts; flexible environment that allows upgrading hardware/software configuration without causing major interruptions in service; minimal traffic on data communications network; and improved central memory usage on large processors.

Damping and support in high-temperature superconducting levitation systems

DOEpatents

Hull, John R [Sammamish, WA; McIver, Carl R [Everett, WA; Mittleider, John A [Kent, WA

2009-12-15

Methods and apparatuses to provide improved auxiliary damping for superconducting bearings in superconducting levitation systems are disclosed. In a superconducting bearing, a cryostat housing the superconductors is connected to a ground state with a combination of a damping strip of material, a set of linkage arms to provide vertical support, and spring washers to provide stiffness. Alternately, the superconducting bearing may be supported by a cryostat connected to a ground state by posts constructed from a mesh of fibers, with the damping and stiffness controlled by the fiber composition, size, and mesh geometry.
Corrosion Prediction with Parallel Finite Element Modeling for Coupled Hygro-Chemo Transport into Concrete under Chloride-Rich Environment

PubMed Central

Na, Okpin; Cai, Xiao-Chuan; Xi, Yunping

2017-01-01

The prediction of the chloride-induced corrosion is very important because of the durable life of concrete structure. To simulate more realistic durability performance of concrete structures, complex scientific methods and more accurate material models are needed. In order to predict the robust results of corrosion initiation time and to describe the thin layer from concrete surface to reinforcement, a large number of fine meshes are also used. The purpose of this study is to suggest more realistic physical model regarding coupled hygro-chemo transport and to implement the model with parallel finite element algorithm. Furthermore, microclimate model with environmental humidity and seasonal temperature is adopted. As a result, the prediction model of chloride diffusion under unsaturated condition was developed with parallel algorithms and was applied to the existing bridge to validate the model with multi-boundary condition. As the number of processors increased, the computational time decreased until the number of processors became optimized. Then, the computational time increased because the communication time between the processors increased. The framework of present model can be extended to simulate the multi-species de-icing salts ingress into non-saturated concrete structures in future work. PMID:28772714
Web-based DAQ systems: connecting the user and electronics front-ends

NASA Astrophysics Data System (ADS)

Lenzi, Thomas

2016-12-01

Web technologies are quickly evolving and are gaining in computational power and flexibility, allowing for a paradigm shift in the field of Data Acquisition (DAQ) systems design. Modern web browsers offer the possibility to create intricate user interfaces and are able to process and render complex data. Furthermore, new web standards such as WebSockets allow for fast real-time communication between the server and the user with minimal overhead. Those improvements make it possible to move the control and monitoring operations from the back-end servers directly to the user and to the front-end electronics, thus reducing the complexity of the data acquisition chain. Moreover, web-based DAQ systems offer greater flexibility, accessibility, and maintainability on the user side than traditional applications which often lack portability and ease of use. As proof of concept, we implemented a simplified DAQ system on a mid-range Spartan6 Field Programmable Gate Array (FPGA) development board coupled to a digital front-end readout chip. The system is connected to the Internet and can be accessed from any web browser. It is composed of custom code to control the front-end readout and of a dual soft-core Microblaze processor to communicate with the client.
Fault-Tolerant, Radiation-Hard DSP

NASA Technical Reports Server (NTRS)

Czajkowski, David

2011-01-01

Commercial digital signal processors (DSPs) for use in high-speed satellite computers are challenged by the damaging effects of space radiation, mainly single event upsets (SEUs) and single event functional interrupts (SEFIs). Innovations have been developed for mitigating the effects of SEUs and SEFIs, enabling the use of very-highspeed commercial DSPs with improved SEU tolerances. Time-triple modular redundancy (TTMR) is a method of applying traditional triple modular redundancy on a single processor, exploiting the VLIW (very long instruction word) class of parallel processors. TTMR improves SEU rates substantially. SEFIs are solved by a SEFI-hardened core circuit, external to the microprocessor. It monitors the health of the processor, and if a SEFI occurs, forces the processor to return to performance through a series of escalating events. TTMR and hardened-core solutions were developed for both DSPs and reconfigurable field-programmable gate arrays (FPGAs). This includes advancement of TTMR algorithms for DSPs and reconfigurable FPGAs, plus a rad-hard, hardened-core integrated circuit that services both the DSP and FPGA. Additionally, a combined DSP and FPGA board architecture was fully developed into a rad-hard engineering product. This technology enables use of commercial off-the-shelf (COTS) DSPs in computers for satellite and other space applications, allowing rapid deployment at a much lower cost. Traditional rad-hard space computers are very expensive and typically have long lead times. These computers are either based on traditional rad-hard processors, which have extremely low computational performance, or triple modular redundant (TMR) FPGA arrays, which suffer from power and complexity issues. Even more frustrating is that the TMR arrays of FPGAs require a fixed, external rad-hard voting element, thereby causing them to lose much of their reconfiguration capability and in some cases significant speed reduction. The benefits of COTS high-performance signal processing include significant increase in onboard science data processing, enabling orders of magnitude reduction in required communication bandwidth for science data return, orders of magnitude improvement in onboard mission planning and critical decision making, and the ability to rapidly respond to changing mission environments, thus enabling opportunistic science and orders of magnitude reduction in the cost of mission operations through reduction of required staff. Additional benefits of COTS-based, high-performance signal processing include the ability to leverage considerable commercial and academic investments in advanced computing tools, techniques, and infra structure, and the familiarity of the science and IT community with these computing environments.
Comparison of neuronal spike exchange methods on a Blue Gene/P supercomputer.

PubMed

Hines, Michael; Kumar, Sameer; Schürmann, Felix

2011-01-01

For neural network simulations on parallel machines, interprocessor spike communication can be a significant portion of the total simulation time. The performance of several spike exchange methods using a Blue Gene/P (BG/P) supercomputer has been tested with 8-128 K cores using randomly connected networks of up to 32 M cells with 1 k connections per cell and 4 M cells with 10 k connections per cell, i.e., on the order of 4·10(10) connections (K is 1024, M is 1024(2), and k is 1000). The spike exchange methods used are the standard Message Passing Interface (MPI) collective, MPI_Allgather, and several variants of the non-blocking Multisend method either implemented via non-blocking MPI_Isend, or exploiting the possibility of very low overhead direct memory access (DMA) communication available on the BG/P. In all cases, the worst performing method was that using MPI_Isend due to the high overhead of initiating a spike communication. The two best performing methods-the persistent Multisend method using the Record-Replay feature of the Deep Computing Messaging Framework DCMF_Multicast; and a two-phase multisend in which a DCMF_Multicast is used to first send to a subset of phase one destination cores, which then pass it on to their subset of phase two destination cores-had similar performance with very low overhead for the initiation of spike communication. Departure from ideal scaling for the Multisend methods is almost completely due to load imbalance caused by the large variation in number of cells that fire on each processor in the interval between synchronization. Spike exchange time itself is negligible since transmission overlaps with computation and is handled by a DMA controller. We conclude that ideal performance scaling will be ultimately limited by imbalance between incoming processor spikes between synchronization intervals. Thus, counterintuitively, maximization of load balance requires that the distribution of cells on processors should not reflect neural net architecture but be randomly distributed so that sets of cells which are burst firing together should be on different processors with their targets on as large a set of processors as possible.
Parallel processing data network of master and slave transputers controlled by a serial control network

DOEpatents

Crosetto, Dario B.

1996-01-01

The present device provides for a dynamically configurable communication network having a multi-processor parallel processing system having a serial communication network and a high speed parallel communication network. The serial communication network is used to disseminate commands from a master processor (100) to a plurality of slave processors (200) to effect communication protocol, to control transmission of high density data among nodes and to monitor each slave processor's status. The high speed parallel processing network is used to effect the transmission of high density data among nodes in the parallel processing system. Each node comprises a transputer (104), a digital signal processor (114), a parallel transfer controller (106), and two three-port memory devices. A communication switch (108) within each node (100) connects it to a fast parallel hardware channel (70) through which all high density data arrives or leaves the node.
Finite element meshing approached as a global minimization process

DOE Office of Scientific and Technical Information (OSTI.GOV)

WITKOWSKI,WALTER R.; JUNG,JOSEPH; DOHRMANN,CLARK R.

2000-03-01

The ability to generate a suitable finite element mesh in an automatic fashion is becoming the key to being able to automate the entire engineering analysis process. However, placing an all-hexahedron mesh in a general three-dimensional body continues to be an elusive goal. The approach investigated in this research is fundamentally different from any other that is known of by the authors. A physical analogy viewpoint is used to formulate the actual meshing problem which constructs a global mathematical description of the problem. The analogy used was that of minimizing the electrical potential of a system charged particles within amore » charged domain. The particles in the presented analogy represent duals to mesh elements (i.e., quads or hexes). Particle movement is governed by a mathematical functional which accounts for inter-particles repulsive, attractive and alignment forces. This functional is minimized to find the optimal location and orientation of each particle. After the particles are connected a mesh can be easily resolved. The mathematical description for this problem is as easy to formulate in three-dimensions as it is in two- or one-dimensions. The meshing algorithm was developed within CoMeT. It can solve the two-dimensional meshing problem for convex and concave geometries in a purely automated fashion. Investigation of the robustness of the technique has shown a success rate of approximately 99% for the two-dimensional geometries tested. Run times to mesh a 100 element complex geometry were typically in the 10 minute range. Efficiency of the technique is still an issue that needs to be addressed. Performance is an issue that is critical for most engineers generating meshes. It was not for this project. The primary focus of this work was to investigate and evaluate a meshing algorithm/philosophy with efficiency issues being secondary. The algorithm was also extended to mesh three-dimensional geometries. Unfortunately, only simple geometries were tested before this project ended. The primary complexity in the extension was in the connectivity problem formulation. Defining all of the interparticle interactions that occur in three-dimensions and expressing them in mathematical relationships is very difficult.« less
MOAB : a mesh-oriented database.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tautges, Timothy James; Ernst, Corey; Stimpson, Clint

A finite element mesh is used to decompose a continuous domain into a discretized representation. The finite element method solves PDEs on this mesh by modeling complex functions as a set of simple basis functions with coefficients at mesh vertices and prescribed continuity between elements. The mesh is one of the fundamental types of data linking the various tools in the FEA process (mesh generation, analysis, visualization, etc.). Thus, the representation of mesh data and operations on those data play a very important role in FEA-based simulations. MOAB is a component for representing and evaluating mesh data. MOAB can storemore » structured and unstructured mesh, consisting of elements in the finite element 'zoo'. The functional interface to MOAB is simple yet powerful, allowing the representation of many types of metadata commonly found on the mesh. MOAB is optimized for efficiency in space and time, based on access to mesh in chunks rather than through individual entities, while also versatile enough to support individual entity access. The MOAB data model consists of a mesh interface instance, mesh entities (vertices and elements), sets, and tags. Entities are addressed through handles rather than pointers, to allow the underlying representation of an entity to change without changing the handle to that entity. Sets are arbitrary groupings of mesh entities and other sets. Sets also support parent/child relationships as a relation distinct from sets containing other sets. The directed-graph provided by set parent/child relationships is useful for modeling topological relations from a geometric model or other metadata. Tags are named data which can be assigned to the mesh as a whole, individual entities, or sets. Tags are a mechanism for attaching data to individual entities and sets are a mechanism for describing relations between entities; the combination of these two mechanisms is a powerful yet simple interface for representing metadata or application-specific data. For example, sets and tags can be used together to describe geometric topology, boundary condition, and inter-processor interface groupings in a mesh. MOAB is used in several ways in various applications. MOAB serves as the underlying mesh data representation in the VERDE mesh verification code. MOAB can also be used as a mesh input mechanism, using mesh readers included with MOAB, or as a translator between mesh formats, using readers and writers included with MOAB. The remainder of this report is organized as follows. Section 2, 'Getting Started', provides a few simple examples of using MOAB to perform simple tasks on a mesh. Section 3 discusses the MOAB data model in more detail, including some aspects of the implementation. Section 4 summarizes the MOAB function API. Section 5 describes some of the tools included with MOAB, and the implementation of mesh readers/writers for MOAB. Section 6 contains a brief description of MOAB's relation to the TSTT mesh interface. Section 7 gives a conclusion and future plans for MOAB development. Section 8 gives references cited in this report. A reference description of the full MOAB API is contained in Section 9.« less
Numerical aerodynamic simulation facility preliminary study, volume 2 and appendices

NASA Technical Reports Server (NTRS)

1977-01-01

Data to support results obtained in technology assessment studies are presented. Objectives, starting points, and future study tasks are outlined. Key design issues discussed in appendices include: data allocation, transposition network design, fault tolerance and trustworthiness, logic design, processing element of existing components, number of processors, the host system, alternate data base memory designs, number representation, fast div 521 instruction, architectures, and lockstep array versus synchronizable array machine comparison.
Efficient Feature Extraction and Likelihood Fusion for Vehicle Tracking in Low Frame Rate Airborne Video

DTIC Science & Technology

2010-07-01

imagery, persistent sensor array I. Introduction New device fabrication technologies and heterogeneous embedded processors have led to the emergence of a...geometric occlusions between target and sensor , motion blur, urban scene complexity, and high data volumes. In practical terms the targets are small...distributed airborne narrow-field-of-view video sensor networks. Airborne camera arrays combined with com- putational photography techniques enable the
TRIGA: Telecommunications Protocol Processing Subsystem Using Reconfigurable Interoperable Gate Arrays

NASA Technical Reports Server (NTRS)

Pang, Jackson; Pingree, Paula J.; Torgerson, J. Leigh

2006-01-01

We present the Telecommunications protocol processing subsystem using Reconfigurable Interoperable Gate Arrays (TRIGA), a novel approach that unifies fault tolerance, error correction coding and interplanetary communication protocol off-loading to implement CCSDS File Delivery Protocol and Datalink layers. The new reconfigurable architecture offers more than one order of magnitude throughput increase while reducing footprint requirements in memory, command and data handling processor utilization, communication system interconnects and power consumption.
Challenges for Wireless Mesh Networks to provide reliable carrier-grade services

NASA Astrophysics Data System (ADS)

von Hugo, D.; Bayer, N.

2011-08-01

Provision of mobile and wireless services today within a competitive environment and driven by a huge amount of steadily emerging new services and applications is both challenge and chance for radio network operators. Deployment and operation of an infrastructure for mobile and wireless broadband connectivity generally requires planning effort and large investments. A promising approach to reduce expenses for radio access networking is offered by Wireless Mesh Networks (WMNs). Here traditional dedicated backhaul connections to each access point are replaced by wireless multi-hop links between neighbouring access nodes and few gateways to the backbone employing standard radio technology. Such a solution provides at the same time high flexibility in both deployment and the amount of offered capacity and shall reduce overall expenses. On the other hand currently available mesh solutions do not provide carrier grade service quality and reliability and often fail to cope with high traffic load. EU project CARMEN (CARrier grade MEsh Networks) was initiated to incorporate different heterogeneous technologies and new protocols to allow for reliable transmission over "best effort" radio channels, to support a reliable mobility and network management, self-configuration and dynamic resource usage, and thus to offer a permanent or temporary broadband access at high cost efficiency. The contribution provides an overview on preliminary project results with focus on main technical challenges from a research and implementation point of view. Especially impact of mesh topology on the overall system performance in terms of throughput and connection reliability and aspects of a dedicated hybrid mobility management solution will be discussed.
CAPRI (Computational Analysis PRogramming Interface): A Solid Modeling Based Infra-Structure for Engineering Analysis and Design Simulations

NASA Technical Reports Server (NTRS)

Haimes, Robert; Follen, Gregory J.

1998-01-01

CAPRI is a CAD-vendor neutral application programming interface designed for the construction of analysis and design systems. By allowing access to the geometry from within all modules (grid generators, solvers and post-processors) such tasks as meshing on the actual surfaces, node enrichment by solvers and defining which mesh faces are boundaries (for the solver and visualization system) become simpler. The overall reliance on file 'standards' is minimized. This 'Geometry Centric' approach makes multi-physics (multi-disciplinary) analysis codes much easier to build. By using the shared (coupled) surface as the foundation, CAPRI provides a single call to interpolate grid-node based data from the surface discretization in one volume to another. Finally, design systems are possible where the results can be brought back into the CAD system (and therefore manufactured) because all geometry construction and modification are performed using the CAD system's geometry kernel.
Preprocessor and postprocessor computer programs for a radial-flow finite-element model

USGS Publications Warehouse

Pucci, A.A.; Pope, D.A.

1987-01-01

Preprocessing and postprocessing computer programs that enhance the utility of the U.S. Geological Survey radial-flow model have been developed. The preprocessor program: (1) generates a triangular finite element mesh from minimal data input, (2) produces graphical displays and tabulations of data for the mesh , and (3) prepares an input data file to use with the radial-flow model. The postprocessor program is a version of the radial-flow model, which was modified to (1) produce graphical output for simulation and field results, (2) generate a statistic for comparing the simulation results with observed data, and (3) allow hydrologic properties to vary in the simulated region. Examples of the use of the processor programs for a hypothetical aquifer test are presented. Instructions for the data files, format instructions, and a listing of the preprocessor and postprocessor source codes are given in the appendixes. (Author 's abstract)
Parallel Processing of Adaptive Meshes with Load Balancing

NASA Technical Reports Server (NTRS)

Das, Sajal K.; Harvey, Daniel J.; Biswas, Rupak; Biegel, Bryan (Technical Monitor)

2001-01-01

Many scientific applications involve grids that lack a uniform underlying structure. These applications are often also dynamic in nature in that the grid structure significantly changes between successive phases of execution. In parallel computing environments, mesh adaptation of unstructured grids through selective refinement/coarsening has proven to be an effective approach. However, achieving load balance while minimizing interprocessor communication and redistribution costs is a difficult problem. Traditional dynamic load balancers are mostly inadequate because they lack a global view of system loads across processors. In this paper, we propose a novel and general-purpose load balancer that utilizes symmetric broadcast networks (SBN) as the underlying communication topology, and compare its performance with a successful global load balancing environment, called PLUM, specifically created to handle adaptive unstructured applications. Our experimental results on an IBM SP2 demonstrate that the SBN-based load balancer achieves lower redistribution costs than that under PLUM by overlapping processing and data migration.
FPGA wavelet processor design using language for instruction-set architectures (LISA)

NASA Astrophysics Data System (ADS)

Meyer-Bäse, Uwe; Vera, Alonzo; Rao, Suhasini; Lenk, Karl; Pattichis, Marios

2007-04-01

The design of an microprocessor is a long, tedious, and error-prone task consisting of typically three design phases: architecture exploration, software design (assembler, linker, loader, profiler), architecture implementation (RTL generation for FPGA or cell-based ASIC) and verification. The Language for instruction-set architectures (LISA) allows to model a microprocessor not only from instruction-set but also from architecture description including pipelining behavior that allows a design and development tool consistency over all levels of the design. To explore the capability of the LISA processor design platform a.k.a. CoWare Processor Designer we present in this paper three microprocessor designs that implement a 8/8 wavelet transform processor that is typically used in today's FBI fingerprint compression scheme. We have designed a 3 stage pipelined 16 bit RISC processor (NanoBlaze). Although RISC μPs are usually considered "fast" processors due to design concept like constant instruction word size, deep pipelines and many general purpose registers, it turns out that DSP operations consume essential processing time in a RISC processor. In a second step we have used design principles from programmable digital signal processor (PDSP) to improve the throughput of the DWT processor. A multiply-accumulate operation along with indirect addressing operation were the key to achieve higher throughput. A further improvement is possible with today's FPGA technology. Today's FPGAs offer a large number of embedded array multipliers and it is now feasible to design a "true" vector processor (TVP). A multiplication of two vectors can be done in just one clock cycle with our TVP, a complete scalar product in two clock cycles. Code profiling and Xilinx FPGA ISE synthesis results are provided that demonstrate the essential improvement that a TVP has compared with traditional RISC or PDSP designs.
Parallel processor for real-time structural control

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tise, B.L.

1992-01-01

A parallel processor that is optimized for real-time linear control has been developed. This modular system consists of A/D modules, D/A modules, and floating-point processor modules. The scalable processor uses up to 1,000 Motorola DSP96002 floating-point processors for a peak computational rate of 60 GFLOPS. Sampling rates up to 625 kHz are supported by this analog-in to analog-out controller. The high processing rate and parallel architecture make this processor suitable for computing state-space equations and other multiply/accumulate-intensive digital filters. Processor features include 14-bit conversion devices, low input-output latency, 240 Mbyte/s synchronous backplane bus, low-skew clock distribution circuit, VME connection tomore » host computer, parallelizing code generator, and look-up-tables for actuator linearization. This processor was designed primarily for experiments in structural control. The A/D modules sample sensors mounted on the structure and the floating-point processor modules compute the outputs using the programmed control equations. The outputs are sent through the D/A module to the power amps used to drive the structure's actuators. The host computer is a Sun workstation. An Open Windows-based control panel is provided to facilitate data transfer to and from the processor, as well as to control the operating mode of the processor. A diagnostic mode is provided to allow stimulation of the structure and acquisition of the structural response via sensor inputs.« less
Face classification using electronic synapses

NASA Astrophysics Data System (ADS)

Yao, Peng; Wu, Huaqiang; Gao, Bin; Eryilmaz, Sukru Burc; Huang, Xueyao; Zhang, Wenqiang; Zhang, Qingtian; Deng, Ning; Shi, Luping; Wong, H.-S. Philip; Qian, He

2017-05-01

Conventional hardware platforms consume huge amount of energy for cognitive learning due to the data movement between the processor and the off-chip memory. Brain-inspired device technologies using analogue weight storage allow to complete cognitive tasks more efficiently. Here we present an analogue non-volatile resistive memory (an electronic synapse) with foundry friendly materials. The device shows bidirectional continuous weight modulation behaviour. Grey-scale face classification is experimentally demonstrated using an integrated 1024-cell array with parallel online training. The energy consumption within the analogue synapses for each iteration is 1,000 × (20 ×) lower compared to an implementation using Intel Xeon Phi processor with off-chip memory (with hypothetical on-chip digital resistive random access memory). The accuracy on test sets is close to the result using a central processing unit. These experimental results consolidate the feasibility of analogue synaptic array and pave the way toward building an energy efficient and large-scale neuromorphic system.
TOGA - A GNSS Reflections Instrument for Remote Sensing Using Beamforming

NASA Technical Reports Server (NTRS)

Esterhuizen, S.; Meehan, T. K.; Robison, D.

2009-01-01

Remotely sensing the Earth's surface using GNSS signals as bi-static radar sources is one of the most challenging applications for radiometric instrument design. As part of NASA's Instrument Incubator Program, our group at JPL has built a prototype instrument, TOGA (Time-shifted, Orthometric, GNSS Array), to address a variety of GNSS science needs. Observing GNSS reflections is major focus of the design/development effort. The TOGA design features a steerable beam antenna array which can form a high-gain antenna pattern in multiple directions simultaneously. Multiple FPGAs provide flexible digital signal processing logic to process both GPS and Galileo reflections. A Linux OS based science processor serves as experiment scheduler and data post-processor. This paper outlines the TOGA design approach as well as preliminary results of reflection data collected from test flights over the Pacific ocean. This reflections data demonstrates observation of the GPS L1/L2C/L5 signals.
Development of a General-Purpose Analysis System Based on a Programmable Fluid Processor Final Report CRADA No. TC-2027-01

DOE Office of Scientific and Technical Information (OSTI.GOV)

McConaghy, C. F.; Gascoyne, P. R.

The purpose ofthis project was to develop a general-purpose analysis system based on a programmable fluid processor (PFP). The PFP is an array of electrodes surrounded by fluid reservoirs and injectors. Injected droplets of various reagents are manjpulated and combined on the array by Dielectrophoretic (DEP) forces. The goal was to create a small handheld device that could accomplish the tasks currently undertaken by much larger, time consuming, manual manipulation in the lab. The entire effo1t was funded by DARPA under the Bio-Flips program. MD Anderson Cancer Center was the PI for the DARPA effort. The Bio-Flips program was amore » 3- year program that ran from September 2000 to September 2003. The CRADA was somewhat behind the Bi-Flips program running from June 2001 to June 2004 with a no cost extension to September 2004.« less

Face classification using electronic synapses.

PubMed

Yao, Peng; Wu, Huaqiang; Gao, Bin; Eryilmaz, Sukru Burc; Huang, Xueyao; Zhang, Wenqiang; Zhang, Qingtian; Deng, Ning; Shi, Luping; Wong, H-S Philip; Qian, He

2017-05-12

Conventional hardware platforms consume huge amount of energy for cognitive learning due to the data movement between the processor and the off-chip memory. Brain-inspired device technologies using analogue weight storage allow to complete cognitive tasks more efficiently. Here we present an analogue non-volatile resistive memory (an electronic synapse) with foundry friendly materials. The device shows bidirectional continuous weight modulation behaviour. Grey-scale face classification is experimentally demonstrated using an integrated 1024-cell array with parallel online training. The energy consumption within the analogue synapses for each iteration is 1,000 × (20 ×) lower compared to an implementation using Intel Xeon Phi processor with off-chip memory (with hypothetical on-chip digital resistive random access memory). The accuracy on test sets is close to the result using a central processing unit. These experimental results consolidate the feasibility of analogue synaptic array and pave the way toward building an energy efficient and large-scale neuromorphic system.
Methodology for fast detection of false sharing in threaded scientific codes

DOEpatents

Chung, I-Hsin; Cong, Guojing; Murata, Hiroki; Negishi, Yasushi; Wen, Hui-Fang

2014-11-25

A profiling tool identifies a code region with a false sharing potential. A static analysis tool classifies variables and arrays in the identified code region. A mapping detection library correlates memory access instructions in the identified code region with variables and arrays in the identified code region while a processor is running the identified code region. The mapping detection library identifies one or more instructions at risk, in the identified code region, which are subject to an analysis by a false sharing detection library. A false sharing detection library performs a run-time analysis of the one or more instructions at risk while the processor is re-running the identified code region. The false sharing detection library determines, based on the performed run-time analysis, whether two different portions of the cache memory line are accessed by the generated binary code.
Microcalorimeters with Germanium Thermistors for High Resolution Soft and Hard X-ray Astronomy

NASA Technical Reports Server (NTRS)

Silver, E.

2003-01-01

This is a progress report for the first year of a three year Space Research and Technology (SR&T) grant to continue the advancement of neutron transmutation doped (NTD-based) microcalorimeters. We have re-prioritized certain aspects of the statement of work and chose to emphasize issues of array development in the first year rather than wait until year two. Consequently, some of the projects scheduled for the first year were delayed to the second year. Here we report on our progress to: a) Build and test a 1 x 4 element array and to investigate electrical and thermal cross-talk; b) Build a multiplexed 4 channel analog pulse processor; c) Build a digital pulse processor that can accommodate 4 channels with independent triggers; d) Develop a proportional thermal baseline restoration system compatible with the constant voltage mode of microcalorimeter operation.
High-speed, automatic controller design considerations for integrating array processor, multi-microprocessor, and host computer system architectures

NASA Technical Reports Server (NTRS)

Jacklin, S. A.; Leyland, J. A.; Warmbrodt, W.

1985-01-01

Modern control systems must typically perform real-time identification and control, as well as coordinate a host of other activities related to user interaction, online graphics, and file management. This paper discusses five global design considerations which are useful to integrate array processor, multimicroprocessor, and host computer system architectures into versatile, high-speed controllers. Such controllers are capable of very high control throughput, and can maintain constant interaction with the nonreal-time or user environment. As an application example, the architecture of a high-speed, closed-loop controller used to actively control helicopter vibration is briefly discussed. Although this system has been designed for use as the controller for real-time rotorcraft dynamics and control studies in a wind tunnel environment, the controller architecture can generally be applied to a wide range of automatic control applications.
Resiliency in Future Cyber Combat

DTIC Science & Technology

2016-04-04

including the Internet , telecommunications networks, computer systems, and embed- ded processors and controllers.”6 One important point emerging from the...definition is that while the Internet is part of cyberspace, it is not all of cyberspace. Any computer processor capable of communicating with a...central proces- sor on a modern car are all part of cyberspace, although only some of them are routinely connected to the Internet . Most modern
Compiling for Application Specific Computational Acceleration in Reconfigurable Architectures Final Report CRADA No. TSB-2033-01

DOE Office of Scientific and Technical Information (OSTI.GOV)

De Supinski, B.; Caliga, D.

2017-09-28

The primary objective of this project was to develop memory optimization technology to efficiently deliver data to, and distribute data within, the SRC-6's Field Programmable Gate Array- ("FPGA") based Multi-Adaptive Processors (MAPs). The hardware/software approach was to explore efficient MAP configurations and generate the compiler technology to exploit those configurations. This memory accessing technology represents an important step towards making reconfigurable symmetric multi-processor (SMP) architectures that will be a costeffective solution for large-scale scientific computing.
The Role of Chronic Mesh Infection in Delayed-Onset Vaginal Mesh Complications or Recurrent Urinary Tract Infections: Results From Explanted Mesh Cultures.

PubMed

Mellano, Erin M; Nakamura, Leah Y; Choi, Judy M; Kang, Diana C; Grisales, Tamara; Raz, Shlomo; Rodriguez, Larissa V

2016-01-01

Vaginal mesh complications necessitating excision are increasingly prevalent. We aim to study whether subclinical chronically infected mesh contributes to the development of delayed-onset mesh complications or recurrent urinary tract infections (UTIs). Women undergoing mesh removal from August 2013 through May 2014 were identified by surgical code for vaginal mesh removal. Only women undergoing removal of anti-incontinence mesh were included. Exclusion criteria included any women undergoing simultaneous prolapse mesh removal. We abstracted preoperative and postoperative information from the medical record and compared mesh culture results from patients with and without mesh extrusion, de novo recurrent UTIs, and delayed-onset pain. One hundred seven women with only anti-incontinence mesh removed were included in the analysis. Onset of complications after mesh placement was within the first 6 months in 70 (65%) of 107 and delayed (≥6 months) in 37 (35%) of 107. A positive culture from the explanted mesh was obtained from 82 (77%) of 107 patients, and 40 (37%) of 107 were positive with potential pathogens. There were no significant differences in culture results when comparing patients with delayed-onset versus immediate pain, extrusion with no extrusion, and de novo recurrent UTIs with no infections. In this large cohort of patients with mesh removed for a diverse array of complications, cultures of the explanted vaginal mesh demonstrate frequent low-density bacterial colonization. We found no differences in culture results from women with delayed-onset pain versus acute pain, vaginal mesh extrusions versus no extrusions, or recurrent UTIs using standard culture methods. Chronic prosthetic infections in other areas of medicine are associated with bacterial biofilms, which are resistant to typical culture techniques. Further studies using culture-independent methods are needed to investigate the potential role of chronic bacterial infections in delayed vaginal mesh complications.
Performance evaluation of multi-channel wireless mesh networks with embedded systems.

PubMed

Lam, Jun Huy; Lee, Sang-Gon; Tan, Whye Kit

2012-01-01

Many commercial wireless mesh network (WMN) products are available in the marketplace with their own proprietary standards, but interoperability among the different vendors is not possible. Open source communities have their own WMN implementation in accordance with the IEEE 802.11s draft standard, Linux open80211s project and FreeBSD WMN implementation. While some studies have focused on the test bed of WMNs based on the open80211s project, none are based on the FreeBSD. In this paper, we built an embedded system using the FreeBSD WMN implementation that utilizes two channels and evaluated its performance. This implementation allows the legacy system to connect to the WMN independent of the type of platform and distributes the load between the two non-overlapping channels. One channel is used for the backhaul connection and the other one is used to connect to the stations to wireless mesh network. By using the power efficient 802.11 technology, this device can also be used as a gateway for the wireless sensor network (WSN).
Cooled particle accelerator target

DOEpatents

Degtiarenko, Pavel V.

2005-06-14

A novel particle beam target comprising: a rotating target disc mounted on a retainer and thermally coupled to a first array of spaced-apart parallel plate fins that extend radially inwardly from the retainer and mesh without physical contact with a second array of spaced-apart parallel plate fins that extend radially outwardly from and are thermally coupled to a cooling mechanism capable of removing heat from said second array of spaced-apart fins and located within the first array of spaced-apart parallel fins. Radiant thermal exchange between the two arrays of parallel plate fins provides removal of heat from the rotating disc. A method of cooling the rotating target is also described.
Refreshing Music: Fog Harvesting with Harps

NASA Astrophysics Data System (ADS)

Shi, Weiwei; Anderson, Mark; Kennedy, Brook; Boreyko, Jonathan

2017-11-01

Fog harvesting is a useful technique for obtaining fresh water in arid climates. The wire meshes currently utilized for fog harvesting suffer from dual constraints: coarse meshes cannot efficiently capture fog, while fine meshes suffer from clogging issues. Here, we design a new type of fog harvester comprised of an array of vertical wires, which we call ``fog harps.'' To investigate the water collection efficiency, three fog harps were designed with different diameters (254 μm, 508 μm and 1.30 mm) but the same pitch-to-diameter ratio of 2. For comparison, three different size meshes were purchased with equivalent dimensions. As expected for the mesh structures, the mid-sized wires performed the best, with a drop-off in performance for the fine or coarse meshes. In contrast, the fog harvesting rate continually increased with decreasing wire diameter for the fog harps, due to its low hysteresis that prevented droplet clogging. This resulted in a 3-fold enhancement in the fog harvesting rate for the harp form factor compared to the mesh. The lack of a performance ceiling for the harps suggest that even greater enhancements could be achieved by scaling down to yet smaller sizes.
NCC Simulation Model: Simulating the operations of the network control center, phase 2

NASA Technical Reports Server (NTRS)

Benjamin, Norman M.; Paul, Arthur S.; Gill, Tepper L.

1992-01-01

The simulation of the network control center (NCC) is in the second phase of development. This phase seeks to further develop the work performed in phase one. Phase one concentrated on the computer systems and interconnecting network. The focus of phase two will be the implementation of the network message dialogues and the resources controlled by the NCC. These resources are requested, initiated, monitored and analyzed via network messages. In the NCC network messages are presented in the form of packets that are routed across the network. These packets are generated, encoded, decoded and processed by the network host processors that generate and service the message traffic on the network that connects these hosts. As a result, the message traffic is used to characterize the work done by the NCC and the connected network. Phase one of the model development represented the NCC as a network of bi-directional single server queues and message generating sources. The generators represented the external segment processors. The served based queues represented the host processors. The NCC model consists of the internal and external processors which generate message traffic on the network that links these hosts. To fully realize the objective of phase two it is necessary to identify and model the processes in each internal processor. These processes live in the operating system of the internal host computers and handle tasks such as high speed message exchanging, ISN and NFE interface, event monitoring, network monitoring, and message logging. Inter process communication is achieved through the operating system facilities. The overall performance of the host is determined by its ability to service messages generated by both internal and external processors.
A High-Throughput Processor for Flight Control Research Using Small UAVs

NASA Technical Reports Server (NTRS)

Klenke, Robert H.; Sleeman, W. C., IV; Motter, Mark A.

2006-01-01

There are numerous autopilot systems that are commercially available for small (<100 lbs) UAVs. However, they all share several key disadvantages for conducting aerodynamic research, chief amongst which is the fact that most utilize older, slower, 8- or 16-bit microcontroller technologies. This paper describes the development and testing of a flight control system (FCS) for small UAV s based on a modern, high throughput, embedded processor. In addition, this FCS platform contains user-configurable hardware resources in the form of a Field Programmable Gate Array (FPGA) that can be used to implement custom, application-specific hardware. This hardware can be used to off-load routine tasks such as sensor data collection, from the FCS processor thereby further increasing the computational throughput of the system.
Optical systolic solutions of linear algebraic equations

NASA Technical Reports Server (NTRS)

Neuman, C. P.; Casasent, D.

1984-01-01

The philosophy and data encoding possible in systolic array optical processor (SAOP) were reviewed. The multitude of linear algebraic operations achievable on this architecture is examined. These operations include such linear algebraic algorithms as: matrix-decomposition, direct and indirect solutions, implicit and explicit methods for partial differential equations, eigenvalue and eigenvector calculations, and singular value decomposition. This architecture can be utilized to realize general techniques for solving matrix linear and nonlinear algebraic equations, least mean square error solutions, FIR filters, and nested-loop algorithms for control engineering applications. The data flow and pipelining of operations, design of parallel algorithms and flexible architectures, application of these architectures to computationally intensive physical problems, error source modeling of optical processors, and matching of the computational needs of practical engineering problems to the capabilities of optical processors are emphasized.
Watermarking on 3D mesh based on spherical wavelet transform.

PubMed

Jin, Jian-Qiu; Dai, Min-Ya; Bao, Hu-Jun; Peng, Qun-Sheng

2004-03-01

In this paper we propose a robust watermarking algorithm for 3D mesh. The algorithm is based on spherical wavelet transform. Our basic idea is to decompose the original mesh into a series of details at different scales by using spherical wavelet transform; the watermark is then embedded into the different levels of details. The embedding process includes: global sphere parameterization, spherical uniform sampling, spherical wavelet forward transform, embedding watermark, spherical wavelet inverse transform, and at last resampling the mesh watermarked to recover the topological connectivity of the original model. Experiments showed that our algorithm can improve the capacity of the watermark and the robustness of watermarking against attacks.
Design and implementation of projects with Xilinx Zynq FPGA: a practical case

NASA Astrophysics Data System (ADS)

Travaglini, R.; D'Antone, I.; Meneghini, S.; Rignanese, L.; Zuffa, M.

The main advantage when using FPGAs with embedded processors is the availability of additional several high-performance resources in the same physical device. Moreover, the FPGA programmability allows for connect custom peripherals. Xilinx have designed a programmable device named Zynq-7000 (simply called Zynq in the following), which integrates programmable logic (identical to the other Xilinx "serie 7" devices) with a System on Chip (SOC) based on two embedded ARM processors. Since both parts are deeply connected, the designers benefit from performance of hardware SOC and flexibility of programmability as well. In this paper a design developed by the Electronic Design Department at the Bologna Division of INFN will be presented as a practical case of project based on Zynq device. It is developed by using a commercial board called ZedBoard hosting a FMC mezzanine with a 12-bit 500 MS/s ADC. The Zynq FPGA on the ZedBoard receives digital outputs from the ADC and send them to the acquisition PC, after proper formatting, through a Gigabit Ethernet link. The major focus of the paper will be about the methodology to develop a Zynq-based design with the Xilinx Vivado software, enlightening how to configure the SOC and connect it with the programmable logic. Firmware design techniques will be presented: in particular both VHDL and IP core based strategies will be discussed. Further, the procedure to develop software for the embedded processor will be presented. Finally, some debugging tools, like the embedded Logic Analyzer, will be shown. Advantages and disadvantages with respect to adopting FPGA without embedded processors will be discussed.
Parallel processing approach to transform-based image coding

NASA Astrophysics Data System (ADS)

Normile, James O.; Wright, Dan; Chu, Ken; Yeh, Chia L.

1991-06-01

This paper describes a flexible parallel processing architecture designed for use in real time video processing. The system consists of floating point DSP processors connected to each other via fast serial links, each processor has access to a globally shared memory. A multiple bus architecture in combination with a dual ported memory allows communication with a host control processor. The system has been applied to prototyping of video compression and decompression algorithms. The decomposition of transform based algorithms for decompression into a form suitable for parallel processing is described. A technique for automatic load balancing among the processors is developed and discussed, results ar presented with image statistics and data rates. Finally techniques for accelerating the system throughput are analyzed and results from the application of one such modification described.
CORDIC-based digital signal processing (DSP) element for adaptive signal processing

NASA Astrophysics Data System (ADS)

Bolstad, Gregory D.; Neeld, Kenneth B.

1995-04-01

The High Performance Adaptive Weight Computation (HAWC) processing element is a CORDIC based application specific DSP element that, when connected in a linear array, can perform extremely high throughput (100s of GFLOPS) matrix arithmetic operations on linear systems of equations in real time. In particular, it very efficiently performs the numerically intense computation of optimal least squares solutions for large, over-determined linear systems. Most techniques for computing solutions to these types of problems have used either a hard-wired, non-programmable systolic array approach, or more commonly, programmable DSP or microprocessor approaches. The custom logic methods can be efficient, but are generally inflexible. Approaches using multiple programmable generic DSP devices are very flexible, but suffer from poor efficiency and high computation latencies, primarily due to the large number of DSP devices that must be utilized to achieve the necessary arithmetic throughput. The HAWC processor is implemented as a highly optimized systolic array, yet retains some of the flexibility of a programmable data-flow system, allowing efficient implementation of algorithm variations. This provides flexible matrix processing capabilities that are one to three orders of magnitude less expensive and more dense than the current state of the art, and more importantly, allows a realizable solution to matrix processing problems that were previously considered impractical to physically implement. HAWC has direct applications in RADAR, SONAR, communications, and image processing, as well as in many other types of systems.
An overlapped grid method for multigrid, finite volume/difference flow solvers: MaGGiE

NASA Technical Reports Server (NTRS)

Baysal, Oktay; Lessard, Victor R.

1990-01-01

The objective is to develop a domain decomposition method via overlapping/embedding the component grids, which is to be used by upwind, multi-grid, finite volume solution algorithms. A computer code, given the name MaGGiE (Multi-Geometry Grid Embedder) is developed to meet this objective. MaGGiE takes independently generated component grids as input, and automatically constructs the composite mesh and interpolation data, which can be used by the finite volume solution methods with or without multigrid convergence acceleration. Six demonstrative examples showing various aspects of the overlap technique are presented and discussed. These cases are used for developing the procedure for overlapping grids of different topologies, and to evaluate the grid connection and interpolation data for finite volume calculations on a composite mesh. Time fluxes are transferred between mesh interfaces using a trilinear interpolation procedure. Conservation losses are minimal at the interfaces using this method. The multi-grid solution algorithm, using the coaser grid connections, improves the convergence time history as compared to the solution on composite mesh without multi-gridding.
The whole mesh deformation model: a fast image segmentation method suitable for effective parallelization

NASA Astrophysics Data System (ADS)

Lenkiewicz, Przemyslaw; Pereira, Manuela; Freire, Mário M.; Fernandes, José

2013-12-01

In this article, we propose a novel image segmentation method called the whole mesh deformation (WMD) model, which aims at addressing the problems of modern medical imaging. Such problems have raised from the combination of several factors: (1) significant growth of medical image volumes sizes due to increasing capabilities of medical acquisition devices; (2) the will to increase the complexity of image processing algorithms in order to explore new functionality; (3) change in processor development and turn towards multi processing units instead of growing bus speeds and the number of operations per second of a single processing unit. Our solution is based on the concept of deformable models and is characterized by a very effective and precise segmentation capability. The proposed WMD model uses a volumetric mesh instead of a contour or a surface to represent the segmented shapes of interest, which allows exploiting more information in the image and obtaining results in shorter times, independently of image contents. The model also offers a good ability for topology changes and allows effective parallelization of workflow, which makes it a very good choice for large datasets. We present a precise model description, followed by experiments on artificial images and real medical data.
The aerodynamic characteristics of vortex ingestion for the F/A-18 inlet duct

NASA Technical Reports Server (NTRS)

Anderson, Bernhard H.

1991-01-01

A Reduced Navier-Stokes (RNS) solution technique was successfully combined with the concept of partitioned geometry and mesh generation to form a very efficient 3D RNS code aimed at the analysis-design engineering environment. Partitioned geometry and mesh generation is a pre-processor to augment existing geometry and grid generation programs which allows the solver to (1) recluster an existing gridlife mesh lattice, and (2) perturb an existing gridfile definition to alter the cross-sectional shape and inlet duct centerline distribution without returning to the external geometry and grid generator. The present results provide a quantitative validation of the initial value space marching 3D RNS procedure and demonstrates accurate predictions of the engine face flow field, with a separation present in the inlet duct as well as when vortex generators are installed to supress flow separation. The present results also demonstrate the ability of the 3D RNS procedure to analyze the flow physics associated with vortex ingestion in general geometry ducts such as the F/A-18 inlet. At the conditions investigated, these interactions are basically inviscid like, i.e., the dominant aerodynamic characteristics have their origin in inviscid flow theory.

Progress in Computational Simulation of Earthquakes

NASA Technical Reports Server (NTRS)

Donnellan, Andrea; Parker, Jay; Lyzenga, Gregory; Judd, Michele; Li, P. Peggy; Norton, Charles; Tisdale, Edwin; Granat, Robert

2006-01-01

GeoFEST(P) is a computer program written for use in the QuakeSim project, which is devoted to development and improvement of means of computational simulation of earthquakes. GeoFEST(P) models interacting earthquake fault systems from the fault-nucleation to the tectonic scale. The development of GeoFEST( P) has involved coupling of two programs: GeoFEST and the Pyramid Adaptive Mesh Refinement Library. GeoFEST is a message-passing-interface-parallel code that utilizes a finite-element technique to simulate evolution of stress, fault slip, and plastic/elastic deformation in realistic materials like those of faulted regions of the crust of the Earth. The products of such simulations are synthetic observable time-dependent surface deformations on time scales from days to decades. Pyramid Adaptive Mesh Refinement Library is a software library that facilitates the generation of computational meshes for solving physical problems. In an application of GeoFEST(P), a computational grid can be dynamically adapted as stress grows on a fault. Simulations on workstations using a few tens of thousands of stress and displacement finite elements can now be expanded to multiple millions of elements with greater than 98-percent scaled efficiency on over many hundreds of parallel processors (see figure).
Redundant disk arrays: Reliable, parallel secondary storage. Ph.D. Thesis

NASA Technical Reports Server (NTRS)

Gibson, Garth Alan

1990-01-01

During the past decade, advances in processor and memory technology have given rise to increases in computational performance that far outstrip increases in the performance of secondary storage technology. Coupled with emerging small-disk technology, disk arrays provide the cost, volume, and capacity of current disk subsystems, by leveraging parallelism, many times their performance. Unfortunately, arrays of small disks may have much higher failure rates than the single large disks they replace. Redundant arrays of inexpensive disks (RAID) use simple redundancy schemes to provide high data reliability. The data encoding, performance, and reliability of redundant disk arrays are investigated. Organizing redundant data into a disk array is treated as a coding problem. Among alternatives examined, codes as simple as parity are shown to effectively correct single, self-identifying disk failures.
Multimode power processor

DOEpatents

O'Sullivan, G.A.; O'Sullivan, J.A.

1999-07-27

In one embodiment, a power processor which operates in three modes: an inverter mode wherein power is delivered from a battery to an AC power grid or load; a battery charger mode wherein the battery is charged by a generator; and a parallel mode wherein the generator supplies power to the AC power grid or load in parallel with the battery. In the parallel mode, the system adapts to arbitrary non-linear loads. The power processor may operate on a per-phase basis wherein the load may be synthetically transferred from one phase to another by way of a bumpless transfer which causes no interruption of power to the load when transferring energy sources. Voltage transients and frequency transients delivered to the load when switching between the generator and battery sources are minimized, thereby providing an uninterruptible power supply. The power processor may be used as part of a hybrid electrical power source system which may contain, in one embodiment, a photovoltaic array, diesel engine, and battery power sources. 31 figs.
Multimode power processor

DOEpatents

O'Sullivan, George A.; O'Sullivan, Joseph A.

1999-01-01

In one embodiment, a power processor which operates in three modes: an inverter mode wherein power is delivered from a battery to an AC power grid or load; a battery charger mode wherein the battery is charged by a generator; and a parallel mode wherein the generator supplies power to the AC power grid or load in parallel with the battery. In the parallel mode, the system adapts to arbitrary non-linear loads. The power processor may operate on a per-phase basis wherein the load may be synthetically transferred from one phase to another by way of a bumpless transfer which causes no interruption of power to the load when transferring energy sources. Voltage transients and frequency transients delivered to the load when switching between the generator and battery sources are minimized, thereby providing an uninterruptible power supply. The power processor may be used as part of a hybrid electrical power source system which may contain, in one embodiment, a photovoltaic array, diesel engine, and battery power sources.
Environmentally adaptive processing for shallow ocean applications: A sequential Bayesian approach.

PubMed

Candy, J V

2015-09-01

The shallow ocean is a changing environment primarily due to temperature variations in its upper layers directly affecting sound propagation throughout. The need to develop processors capable of tracking these changes implies a stochastic as well as an environmentally adaptive design. Bayesian techniques have evolved to enable a class of processors capable of performing in such an uncertain, nonstationary (varying statistics), non-Gaussian, variable shallow ocean environment. A solution to this problem is addressed by developing a sequential Bayesian processor capable of providing a joint solution to the modal function tracking and environmental adaptivity problem. Here, the focus is on the development of both a particle filter and an unscented Kalman filter capable of providing reasonable performance for this problem. These processors are applied to hydrophone measurements obtained from a vertical array. The adaptivity problem is attacked by allowing the modal coefficients and/or wavenumbers to be jointly estimated from the noisy measurement data along with tracking of the modal functions while simultaneously enhancing the noisy pressure-field measurements.
Software-Reconfigurable Processors for Spacecraft

NASA Technical Reports Server (NTRS)

Farrington, Allen; Gray, Andrew; Bell, Bryan; Stanton, Valerie; Chong, Yong; Peters, Kenneth; Lee, Clement; Srinivasan, Jeffrey

2005-01-01

A report presents an overview of an architecture for a software-reconfigurable network data processor for a spacecraft engaged in scientific exploration. When executed on suitable electronic hardware, the software performs the functions of a physical layer (in effect, acts as a software radio in that it performs modulation, demodulation, pulse-shaping, error correction, coding, and decoding), a data-link layer, a network layer, a transport layer, and application-layer processing of scientific data. The software-reconfigurable network processor is undergoing development to enable rapid prototyping and rapid implementation of communication, navigation, and scientific signal-processing functions; to provide a long-lived communication infrastructure; and to provide greatly improved scientific-instrumentation and scientific-data-processing functions by enabling science-driven in-flight reconfiguration of computing resources devoted to these functions. This development is an extension of terrestrial radio and network developments (e.g., in the cellular-telephone industry) implemented in software running on such hardware as field-programmable gate arrays, digital signal processors, traditional digital circuits, and mixed-signal application-specific integrated circuits (ASICs).
Scalable Engineering of Quantum Optical Information Processing Architectures (SEQUOIA)

DTIC Science & Technology

2016-12-13

arrays. Figure 4: An 8-channel fiber-coupled SNSPD array. 1.4 Post -fabrication-tunable linear optic fabrication We have analyzed the...performance of the programmable nanophotonic processor (PNP) that is dynamically tunable via post -fabrication active phase tuning to predict the scaling of...various device losses. PACS numbers: 42.50. Ex , 03.67.Dd, 03.67.Lx, 42.50.Dv I. INTRODUCTION Quantum key distribution (QKD) enables two distant authenticated
A new measuring machine in Paris

NASA Technical Reports Server (NTRS)

Guibert, J.; Charvin, P.

1984-01-01

A new photographic measuring machine is under construction at the Paris Observatory. The amount of transmitted light is measured by a linear array of 1024 photodiodes. Carriage control, data acquisition and on line processing are performed by microprocessors, a S.E.L. 32/27 computer, and an AP 120-B Array Processor. It is expected that a Schmidt telescope plate of size 360 mm square will be scanned in one hour with pixel size of ten microns.
Testability Design Rating System: Testability Handbook. Volume 1

DTIC Science & Technology

1992-02-01

4-10 4.7.5 Summary of False BIT Alarms (FBA) ............................. 4-10 4.7.6 Smart BIT Technique...Circuit Board PGA Pin Grid Array PLA Programmable Logic Array PLD Programmable Logic Device PN Pseudo-Random Number PREDICT Probabilistic Estimation of...11 4.7.6 Smart BIT ( reference: RADC-TR-85-198). " Smart " BIT is a term given to BIT circuitry in a system LRU which includes dedicated processor/memory
Hardware multiplier processor

DOEpatents

Pierce, Paul E.

1986-01-01

A hardware processor is disclosed which in the described embodiment is a memory mapped multiplier processor that can operate in parallel with a 16 bit microcomputer. The multiplier processor decodes the address bus to receive specific instructions so that in one access it can write and automatically perform single or double precision multiplication involving a number written to it with or without addition or subtraction with a previously stored number. It can also, on a single read command automatically round and scale a previously stored number. The multiplier processor includes two concatenated 16 bit multiplier registers, two 16 bit concatenated 16 bit multipliers, and four 16 bit product registers connected to an internal 16 bit data bus. A high level address decoder determines when the multiplier processor is being addressed and first and second low level address decoders generate control signals. In addition, certain low order address lines are used to carry uncoded control signals. First and second control circuits coupled to the decoders generate further control signals and generate a plurality of clocking pulse trains in response to the decoded and address control signals.
Hardware multiplier processor

DOEpatents

Pierce, P.E.

A hardware processor is disclosed which in the described embodiment is a memory mapped multiplier processor that can operate in parallel with a 16 bit microcomputer. The multiplier processor decodes the address bus to receive specific instructions so that in one access it can write and automatically perform single or double precision multiplication involving a number written to it with or without addition or subtraction with a previously stored number. It can also, on a single read command automatically round and scale a previously stored number. The multiplier processor includes two concatenated 16 bit multiplier registers, two 16 bit concatenated 16 bit multipliers, and four 16 bit product registers connected to an internal 16 bit data bus. A high level address decoder determines when the multiplier processor is being addressed and first and second low level address decoders generate control signals. In addition, certain low order address lines are used to carry uncoded control signals. First and second control circuits coupled to the decoders generate further control signals and generate a plurality of clocking pulse trains in response to the decoded and address control signals.
Modular chemiresistive sensor

DOE Office of Scientific and Technical Information (OSTI.GOV)

Alam, Maksudul M.; Sampathkumaran, Uma

The present invention relates to a modular chemiresistive sensor. In particular, a modular chemiresistive sensor for hypergolic fuel and oxidizer leak detection, carbon dioxide monitoring and detection of disease biomarkers. The sensor preferably has two gold or platinum electrodes mounted on a silicon substrate where the electrodes are connected to a power source and are separated by a gap of 0.5 to 4.0 .mu.M. A polymer nanowire or carbon nanotube spans the gap between the electrodes and connects the electrodes electrically. The electrodes are further connected to a circuit board having a processor and data storage, where the processor canmore » measure current and voltage values between the electrodes and compare the current and voltage values with current and voltage values stored in the data storage and assigned to particular concentrations of a pre-determined substance such as those listed above or a variety of other substances.« less
PARAS program: Phased array radio astronomy from space

NASA Astrophysics Data System (ADS)

Jakubowski, Antoni K.; Haynes, David A.; Nuss, Ken; Hoffmann, Chris; Madden, Michael; Dungan, Michael

1992-06-01

An orbiting radio telescope is proposed which, when operated in a Very Long Baseline Interferometry (VLBLI) scheme, would allow higher (than currently available) angular resolution and dynamic range in the maps, and the ability of observing rapidly changing astronomical sources. Using a passive phases array technology, the proposed design consists of 656 hexagonal modules forming a 150 meter diameter dish. Each observatory module is largely autonomous, having its own photovoltaic power supply and low-noise receiver and processor for phase shifting. The signals received by the modules are channeled via fiber optics to the central control computer in the central bus module. After processing and multiplexing, the data is transmitted to telemetry stations on the ground. The truss frame supporting each observatory pane is a hybrid structure consisting of a bottom graphite/epoxy tubular triangle and rigidized inflatable Kevlar tubes connecting the top observatory panel and bottom triangle. Attitude control and stationkeeping functions are performed by a system of momentum wheels in the bus and four propulsion modules located at the compass points on the periphery of the observatory dish. Each propulsion module has four monopropellant thrusters and six hydrazine arcjets, the latter supported by a nuclear reactor. The total mass of the spacecraft is 22,060 kg.
PARAS program: Phased array radio astronomy from space

NASA Technical Reports Server (NTRS)

Jakubowski, Antoni K.; Haynes, David A.; Nuss, Ken; Hoffmann, Chris; Madden, Michael; Dungan, Michael

1992-01-01

An orbiting radio telescope is proposed which, when operated in a Very Long Baseline Interferometry (VLBLI) scheme, would allow higher (than currently available) angular resolution and dynamic range in the maps, and the ability of observing rapidly changing astronomical sources. Using a passive phases array technology, the proposed design consists of 656 hexagonal modules forming a 150 meter diameter dish. Each observatory module is largely autonomous, having its own photovoltaic power supply and low-noise receiver and processor for phase shifting. The signals received by the modules are channeled via fiber optics to the central control computer in the central bus module. After processing and multiplexing, the data is transmitted to telemetry stations on the ground. The truss frame supporting each observatory pane is a hybrid structure consisting of a bottom graphite/epoxy tubular triangle and rigidized inflatable Kevlar tubes connecting the top observatory panel and bottom triangle. Attitude control and stationkeeping functions are performed by a system of momentum wheels in the bus and four propulsion modules located at the compass points on the periphery of the observatory dish. Each propulsion module has four monopropellant thrusters and six hydrazine arcjets, the latter supported by a nuclear reactor. The total mass of the spacecraft is 22,060 kg.
Nearly Interactive Parabolized Navier-Stokes Solver for High Speed Forebody and Inlet Flows

NASA Technical Reports Server (NTRS)

Benson, Thomas J.; Liou, May-Fun; Jones, William H.; Trefny, Charles J.

2009-01-01

A system of computer programs is being developed for the preliminary design of high speed inlets and forebodies. The system comprises four functions: geometry definition, flow grid generation, flow solver, and graphics post-processor. The system runs on a dedicated personal computer using the Windows operating system and is controlled by graphical user interfaces written in MATLAB (The Mathworks, Inc.). The flow solver uses the Parabolized Navier-Stokes equations to compute millions of mesh points in several minutes. Sample two-dimensional and three-dimensional calculations are demonstrated in the paper.
Link failure detection in a parallel computer

DOEpatents

Archer, Charles J.; Blocksome, Michael A.; Megerian, Mark G.; Smith, Brian E.

2010-11-09

Methods, apparatus, and products are disclosed for link failure detection in a parallel computer including compute nodes connected in a rectangular mesh network, each pair of adjacent compute nodes in the rectangular mesh network connected together using a pair of links, that includes: assigning each compute node to either a first group or a second group such that adjacent compute nodes in the rectangular mesh network are assigned to different groups; sending, by each of the compute nodes assigned to the first group, a first test message to each adjacent compute node assigned to the second group; determining, by each of the compute nodes assigned to the second group, whether the first test message was received from each adjacent compute node assigned to the first group; and notifying a user, by each of the compute nodes assigned to the second group, whether the first test message was received.
Wake Vortex Avoidance System and Method

NASA Technical Reports Server (NTRS)

Shams, Qamar A. (Inventor); Zuckerwar, Allan J. (Inventor); Knight, Howard K. (Inventor)

2017-01-01

A wake vortex avoidance system includes a microphone array configured to detect low frequency sounds. A signal processor determines a geometric mean coherence based on the detected low frequency sounds. A display displays wake vortices based on the determined geometric mean coherence.
Dedicated hardware processor and corresponding system-on-chip design for real-time laser speckle imaging.

PubMed

Jiang, Chao; Zhang, Hongyan; Wang, Jia; Wang, Yaru; He, Heng; Liu, Rui; Zhou, Fangyuan; Deng, Jialiang; Li, Pengcheng; Luo, Qingming

2011-11-01

Laser speckle imaging (LSI) is a noninvasive and full-field optical imaging technique which produces two-dimensional blood flow maps of tissues from the raw laser speckle images captured by a CCD camera without scanning. We present a hardware-friendly algorithm for the real-time processing of laser speckle imaging. The algorithm is developed and optimized specifically for LSI processing in the field programmable gate array (FPGA). Based on this algorithm, we designed a dedicated hardware processor for real-time LSI in FPGA. The pipeline processing scheme and parallel computing architecture are introduced into the design of this LSI hardware processor. When the LSI hardware processor is implemented in the FPGA running at the maximum frequency of 130 MHz, up to 85 raw images with the resolution of 640×480 pixels can be processed per second. Meanwhile, we also present a system on chip (SOC) solution for LSI processing by integrating the CCD controller, memory controller, LSI hardware processor, and LCD display controller into a single FPGA chip. This SOC solution also can be used to produce an application specific integrated circuit for LSI processing.
Replication of Space-Shuttle Computers in FPGAs and ASICs

NASA Technical Reports Server (NTRS)

Ferguson, Roscoe C.

2008-01-01

A document discusses the replication of the functionality of the onboard space-shuttle general-purpose computers (GPCs) in field-programmable gate arrays (FPGAs) and application-specific integrated circuits (ASICs). The purpose of the replication effort is to enable utilization of proven space-shuttle flight software and software-development facilities to the extent possible during development of software for flight computers for a new generation of launch vehicles derived from the space shuttles. The replication involves specifying the instruction set of the central processing unit and the input/output processor (IOP) of the space-shuttle GPC in a hardware description language (HDL). The HDL is synthesized to form a "core" processor in an FPGA or, less preferably, in an ASIC. The core processor can be used to create a flight-control card to be inserted into a new avionics computer. The IOP of the GPC as implemented in the core processor could be designed to support data-bus protocols other than that of a multiplexer interface adapter (MIA) used in the space shuttle. Hence, a computer containing the core processor could be tailored to communicate via the space-shuttle GPC bus and/or one or more other buses.
Dynamically programmable cache

NASA Astrophysics Data System (ADS)

Nakkar, Mouna; Harding, John A.; Schwartz, David A.; Franzon, Paul D.; Conte, Thomas

1998-10-01

Reconfigurable machines have recently been used as co- processors to accelerate the execution of certain algorithms or program subroutines. The problems with the above approach include high reconfiguration time and limited partial reconfiguration. By far the most critical problems are: (1) the small on-chip memory which results in slower execution time, and (2) small FPGA areas that cannot implement large subroutines. Dynamically Programmable Cache (DPC) is a novel architecture for embedded processors which offers solutions to the above problems. To solve memory access problems, DPC processors merge reconfigurable arrays with the data cache at various cache levels to create a multi-level reconfigurable machines. As a result DPC machines have both higher data accessibility and FPGA memory bandwidth. To solve the limited FPGA resource problem, DPC processors implemented multi-context switching (Virtualization) concept. Virtualization allows implementation of large subroutines with fewer FPGA cells. Additionally, DPC processors can parallelize the execution of several operations resulting in faster execution time. In this paper, the speedup improvement for DPC machines are shown to be 5X faster than an Altera FLEX10K FPGA chip and 2X faster than a Sun Ultral SPARC station for two different algorithms (convolution and motion estimation).

Multiple-access phased array antenna simulator for a digital beam-forming system investigation

NASA Technical Reports Server (NTRS)

Kerczewski, Robert J.; Yu, John; Walton, Joanne C.; Perl, Thomas D.; Andro, Monty; Alexovich, Robert E.

1992-01-01

Future versions of data relay satellite systems are currently being planned by NASA. Being given consideration for implementation are on-board digital beamforming techniques which will allow multiple users to simultaneously access a single S-band phased array antenna system. To investigate the potential performance of such a system, a laboratory simulator has been developed at NASA's Lewis Research Center. This paper describes the system simulator, and in particular, the requirements, design and performance of a key subsystem, the phased array antenna simulator, which provides realistic inputs to the digital processor including multiple signals, noise, and nonlinearities.
Multiple-access phased array antenna simulator for a digital beam forming system investigation

NASA Technical Reports Server (NTRS)

Kerczewski, Robert J.; Yu, John; Walton, Joanne C.; Perl, Thomas D.; Andro, Monty; Alexovich, Robert E.

1992-01-01

Future versions of data relay satellite systems are currently being planned by NASA. Being given consideration for implementation are on-board digital beamforming techniques which will allow multiple users to simultaneously access a single S-band phased array antenna system. To investigate the potential performance of such a system, a laboratory simulator has been developed at NASA's Lewis Research Center. This paper describes the system simulator, and in particular, the requirements, design, and performance of a key subsystem, the phased array antenna simulator, which provides realistic inputs to the digital processor including multiple signals, noise, and nonlinearities.
Digital Beamforming Scatterometer

NASA Technical Reports Server (NTRS)

Rincon, Rafael F.; Vega, Manuel; Kman, Luko; Buenfil, Manuel; Geist, Alessandro; Hillard, Larry; Racette, Paul

2009-01-01

This paper discusses scatterometer measurements collected with multi-mode Digital Beamforming Synthetic Aperture Radar (DBSAR) during the SMAP-VEX 2008 campaign. The 2008 SMAP Validation Experiment was conducted to address a number of specific questions related to the soil moisture retrieval algorithms. SMAP-VEX 2008 consisted on a series of aircraft-based.flights conducted on the Eastern Shore of Maryland and Delaware in the fall of 2008. Several other instruments participated in the campaign including the Passive Active L-Band System (PALS), the Marshall Airborne Polarimetric Imaging Radiometer (MAPIR), and the Global Positioning System Reflectometer (GPSR). This campaign was the first SMAP Validation Experiment. DBSAR is a multimode radar system developed at NASA/Goddard Space Flight Center that combines state-of-the-art radar technologies, on-board processing, and advances in signal processing techniques in order to enable new remote sensing capabilities applicable to Earth science and planetary applications [l]. The instrument can be configured to operate in scatterometer, Synthetic Aperture Radar (SAR), or altimeter mode. The system builds upon the L-band Imaging Scatterometer (LIS) developed as part of the RadSTAR program. The radar is a phased array system designed to fly on the NASA P3 aircraft. The instrument consists of a programmable waveform generator, eight transmit/receive (T/R) channels, a microstrip antenna, and a reconfigurable data acquisition and processor system. Each transmit channel incorporates a digital attenuator, and digital phase shifter that enables amplitude and phase modulation on transmit. The attenuators, phase shifters, and calibration switches are digitally controlled by the radar control card (RCC) on a pulse by pulse basis. The antenna is a corporate fed microstrip patch-array centered at 1.26 GHz with a 20 MHz bandwidth. Although only one feed is used with the present configuration, a provision was made for separate corporate feeds for vertical and horizontal polarization. System upgrades to dual polarization are currently under way. The DBSAR processor is a reconfigurable data acquisition and processor system capable of real-time, high-speed data processing. DBSAR uses an FPGA-based architecture to implement digitally down-conversion, in-phase and quadrature (I/Q) demodulation, and subsequent radar specific algorithms. The core of the processor board consists of an analog-to-digital (AID) section, three Altera Stratix field programmable gate arrays (FPGAs), an ARM microcontroller, several memory devices, and an Ethernet interface. The processor also interfaces with a navigation board consisting of a GPS and a MEMS gyro. The processor has been configured to operate in scatterometer, Synthetic Aperture Radar (SAR), and altimeter modes. All the modes are based on digital beamforming which is a digital process that generates the far-field beam patterns at various scan angles from voltages sampled in the antenna array. This technique allows steering the received beam and controlling its beam-width and side-lobe. Several beamforming techniques can be implemented each characterized by unique strengths and weaknesses, and each applicable to different measurement scenarios. In Scatterometer mode, the radar is capable to.generate a wide beam or scan a narrow beam on transmit, and to steer the received beam on processing while controlling its beamwidth and side-lobe level. Table I lists some important radar characteristics
Syringe Injectable Electronics: Precise Targeted Delivery with Quantitative Input/Output Connectivity.

PubMed

Hong, Guosong; Fu, Tian-Ming; Zhou, Tao; Schuhmann, Thomas G; Huang, Jinlin; Lieber, Charles M

2015-10-14

Syringe-injectable mesh electronics with tissue-like mechanical properties and open macroporous structures is an emerging powerful paradigm for mapping and modulating brain activity. Indeed, the ultraflexible macroporous structure has exhibited unprecedented minimal/noninvasiveness and the promotion of attractive interactions with neurons in chronic studies. These same structural features also pose new challenges and opportunities for precise targeted delivery in specific brain regions and quantitative input/output (I/O) connectivity needed for reliable electrical measurements. Here, we describe new results that address in a flexible manner both of these points. First, we have developed a controlled injection approach that maintains the extended mesh structure during the "blind" injection process, while also achieving targeted delivery with ca. 20 μm spatial precision. Optical and microcomputed tomography results from injections into tissue-like hydrogel, ex vivo brain tissue, and in vivo brains validate our basic approach and demonstrate its generality. Second, we present a general strategy to achieve up to 100% multichannel I/O connectivity using an automated conductive ink printing methodology to connect the mesh electronics and a flexible flat cable, which serves as the standard "plug-in" interface to measurement electronics. Studies of resistance versus printed line width were used to identify optimal conditions, and moreover, frequency-dependent noise measurements show that the flexible printing process yields values comparable to commercial flip-chip bonding technology. Our results address two key challenges faced by syringe-injectable electronics and thereby pave the way for facile in vivo applications of injectable mesh electronics as a general and powerful tool for long-term mapping and modulation of brain activity in fundamental neuroscience through therapeutic biomedical studies.
Real-time phase correlation based integrated system for seizure detection

NASA Astrophysics Data System (ADS)

Romaine, James B.; Delgado-Restituto, Manuel; Leñero-Bardallo, Juan A.; Rodríguez-Vázquez, Ángel

2017-05-01

This paper reports a low area, low power, integer-based digital processor for the calculation of phase synchronization between two neural signals. The processor calculates the phase-frequency content of a signal by identifying the specific time periods associated with two consecutive minima. The simplicity of this phase-frequency content identifier allows for the digital processor to utilize only basic digital blocks, such as registers, counters, adders and subtractors, without incorporating any complex multiplication and or division algorithms. In fact, the processor, fabricated in a 0.18μm CMOS process, only occupies an area of 0.0625μm2 and consumes 12.5nW from a 1.2V supply voltage when operated at 128kHz. These low-area, low-power features make the proposed processor a valuable computing element in closed loop neural prosthesis for the treatment of neural diseases, such as epilepsy, or for extracting functional connectivity maps between different recording sites in the brain.
Analysis of Modeling Parameters on Threaded Screws.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Vigil, Miquela S.; Brake, Matthew Robert; Vangoethem, Douglas

2015-06-01

Assembled mechanical systems often contain a large number of bolted connections. These bolted connections (joints) are integral aspects of the load path for structural dynamics, and, consequently, are paramount for calculating a structure's stiffness and energy dissipation prop- erties. However, analysts have not found the optimal method to model appropriately these bolted joints. The complexity of the screw geometry cause issues when generating a mesh of the model. This paper will explore different approaches to model a screw-substrate connec- tion. Model parameters such as mesh continuity, node alignment, wedge angles, and thread to body element size ratios are examined. Themore » results of this study will give analysts a better understanding of the influences of these parameters and will aide in finding the optimal method to model bolted connections.« less
Parallel asynchronous systems and image processing algorithms

NASA Technical Reports Server (NTRS)

Coon, D. D.; Perera, A. G. U.

1989-01-01

A new hardware approach to implementation of image processing algorithms is described. The approach is based on silicon devices which would permit an independent analog processing channel to be dedicated to evey pixel. A laminar architecture consisting of a stack of planar arrays of the device would form a two-dimensional array processor with a 2-D array of inputs located directly behind a focal plane detector array. A 2-D image data stream would propagate in neuronlike asynchronous pulse coded form through the laminar processor. Such systems would integrate image acquisition and image processing. Acquisition and processing would be performed concurrently as in natural vision systems. The research is aimed at implementation of algorithms, such as the intensity dependent summation algorithm and pyramid processing structures, which are motivated by the operation of natural vision systems. Implementation of natural vision algorithms would benefit from the use of neuronlike information coding and the laminar, 2-D parallel, vision system type architecture. Besides providing a neural network framework for implementation of natural vision algorithms, a 2-D parallel approach could eliminate the serial bottleneck of conventional processing systems. Conversion to serial format would occur only after raw intensity data has been substantially processed. An interesting challenge arises from the fact that the mathematical formulation of natural vision algorithms does not specify the means of implementation, so that hardware implementation poses intriguing questions involving vision science.
Embedded System Implementation on FPGA System With μCLinux OS

NASA Astrophysics Data System (ADS)

Fairuz Muhd Amin, Ahmad; Aris, Ishak; Syamsul Azmir Raja Abdullah, Raja; Kalos Zakiah Sahbudin, Ratna

2011-02-01

Embedded systems are taking on more complicated tasks as the processors involved become more powerful. The embedded systems have been widely used in many areas such as in industries, automotives, medical imaging, communications, speech recognition and computer vision. The complexity requirements in hardware and software nowadays need a flexibility system for further enhancement in any design without adding new hardware. Therefore, any changes in the design system will affect the processor that need to be changed. To overcome this problem, a System On Programmable Chip (SOPC) has been designed and developed using Field Programmable Gate Array (FPGA). A softcore processor, NIOS II 32-bit RISC, which is the microprocessor core was utilized in FPGA system together with the embedded operating system(OS), μClinux. In this paper, an example of web server is explained and demonstrated
Managing seafood processing wastewater on the Oregon coast: A time of transition

DOE Office of Scientific and Technical Information (OSTI.GOV)

Anderson, M.D.; Miner, J.R.

1997-12-01

Seafood processors along the Oregon coast practice a wastewater management plan that is unique within the state. Most of these operations discharge wastewater under a General Permit issued by the Oregon Department of Environmental Quality (DEQ) that requires only that they screen the wastewater to remove particles that will not pass through a 40 mesh screen. The General Permit was issued in February of 1992 and was scheduled to expire at the end of December, 1996. It has been extended until a replacement is adopted. Alternatives are currently under consideration by the DEQ. A second issue is the increasing competitionmore » for water within the coastal communities that are experiencing a growing tourist industry and a static water supply. Tourism and seafood processing both have their peak water demands during the summer months when fresh water supplies are most limited. Disposal of solid wastes has been simplified for many of the processors along the Lower Columbia River by a Fisheries Enhancement Program which allows processors to grind the solid waste then to discharge it into the stream under appropriate tidal conditions. There is no data which indicates water quality damage from this practice nor is there clear evidence of enhanced fishery productivity.« less
An object-oriented approach for parallel self adaptive mesh refinement on block structured grids

NASA Technical Reports Server (NTRS)

Lemke, Max; Witsch, Kristian; Quinlan, Daniel

1993-01-01

Self-adaptive mesh refinement dynamically matches the computational demands of a solver for partial differential equations to the activity in the application's domain. In this paper we present two C++ class libraries, P++ and AMR++, which significantly simplify the development of sophisticated adaptive mesh refinement codes on (massively) parallel distributed memory architectures. The development is based on our previous research in this area. The C++ class libraries provide abstractions to separate the issues of developing parallel adaptive mesh refinement applications into those of parallelism, abstracted by P++, and adaptive mesh refinement, abstracted by AMR++. P++ is a parallel array class library to permit efficient development of architecture independent codes for structured grid applications, and AMR++ provides support for self-adaptive mesh refinement on block-structured grids of rectangular non-overlapping blocks. Using these libraries, the application programmers' work is greatly simplified to primarily specifying the serial single grid application and obtaining the parallel and self-adaptive mesh refinement code with minimal effort. Initial results for simple singular perturbation problems solved by self-adaptive multilevel techniques (FAC, AFAC), being implemented on the basis of prototypes of the P++/AMR++ environment, are presented. Singular perturbation problems frequently arise in large applications, e.g. in the area of computational fluid dynamics. They usually have solutions with layers which require adaptive mesh refinement and fast basic solvers in order to be resolved efficiently.
Numerical evaluation of moiré pattern in touch sensor module with electrode mesh structure in oblique view

NASA Astrophysics Data System (ADS)

Pournoury, M.; Zamiri, A.; Kim, T. Y.; Yurlov, V.; Oh, K.

2016-03-01

Capacitive touch sensor screen with the metal materials has recently become qualified for substitution of ITO; however several obstacles still have to be solved. One of the most important issues is moiré phenomenon. The visibility problem of the metal-mesh, in touch sensor module (TSM) is numerically considered in this paper. Based on human eye contract sensitivity function (CSF), moiré pattern of TSM electrode mesh structure is simulated with MATLAB software for 8 inch screen display in oblique view. Standard deviation of the generated moiré by the superposition of electrode mesh and screen image is calculated to find the optimal parameters which provide the minimum moiré visibility. To create the screen pixel array and mesh electrode, rectangular function is used. The filtered image, in frequency domain, is obtained by multiplication of Fourier transform of the finite mesh pattern (product of screen pixel and mesh electrode) with the calculated CSF function for three different observer distances (L=200, 300 and 400 mm). It is observed that the discrepancy between analytical and numerical results is less than 0.6% for 400 mm viewer distance. Moreover, in the case of oblique view due to considering the thickness of the finite film between mesh electrodes and screen, different points of minimum standard deviation of moiré pattern are predicted compared to normal view.
WATERLOPP V2/64: A highly parallel machine for numerical computation

NASA Astrophysics Data System (ADS)

Ostlund, Neil S.

1985-07-01

Current technological trends suggest that the high performance scientific machines of the future are very likely to consist of a large number (greater than 1024) of processors connected and communicating with each other in some as yet undetermined manner. Such an assembly of processors should behave as a single machine in obtaining numerical solutions to scientific problems. However, the appropriate way of organizing both the hardware and software of such an assembly of processors is an unsolved and active area of research. It is particularly important to minimize the organizational overhead of interprocessor comunication, global synchronization, and contention for shared resources if the performance of a large number ( n) of processors is to be anything like the desirable n times the performance of a single processor. In many situations, adding a processor actually decreases the performance of the overall system since the extra organizational overhead is larger than the extra processing power added. The systolic loop architecture is a new multiple processor architecture which attemps at a solution to the problem of how to organize a large number of asynchronous processors into an effective computational system while minimizing the organizational overhead. This paper gives a brief overview of the basic systolic loop architecture, systolic loop algorithms for numerical computation, and a 64-processor implementation of the architecture, WATERLOOP V2/64, that is being used as a testbed for exploring the hardware, software, and algorithmic aspects of the architecture.
A general gridding, discretization, and coarsening methodology for modeling flow in porous formations with discrete geological features

NASA Astrophysics Data System (ADS)

Karimi-Fard, M.; Durlofsky, L. J.

2016-10-01

A comprehensive framework for modeling flow in porous media containing thin, discrete features, which could be high-permeability fractures or low-permeability deformation bands, is presented. The key steps of the methodology are mesh generation, fine-grid discretization, upscaling, and coarse-grid discretization. Our specialized gridding technique combines a set of intersecting triangulated surfaces by constructing approximate intersections using existing edges. This procedure creates a conforming mesh of all surfaces, which defines the internal boundaries for the volumetric mesh. The flow equations are discretized on this conforming fine mesh using an optimized two-point flux finite-volume approximation. The resulting discrete model is represented by a list of control-volumes with associated positions and pore-volumes, and a list of cell-to-cell connections with associated transmissibilities. Coarse models are then constructed by the aggregation of fine-grid cells, and the transmissibilities between adjacent coarse cells are obtained using flow-based upscaling procedures. Through appropriate computation of fracture-matrix transmissibilities, a dual-continuum representation is obtained on the coarse scale in regions with connected fracture networks. The fine and coarse discrete models generated within the framework are compatible with any connectivity-based simulator. The applicability of the methodology is illustrated for several two- and three-dimensional examples. In particular, we consider gas production from naturally fractured low-permeability formations, and transport through complex fracture networks. In all cases, highly accurate solutions are obtained with significant model reduction.
High-density percutaneous chronic connector for neural prosthetics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shah, Kedar G.; Bennett, William J.; Pannu, Satinderpall S.

2015-09-22

A high density percutaneous chronic connector, having first and second connector structures each having an array of magnets surrounding a mounting cavity. A first electrical feedthrough array is seated in the mounting cavity of the first connector structure and a second electrical feedthrough array is seated in the mounting cavity of the second connector structure, with a feedthrough interconnect matrix positioned between a top side of the first electrical feedthrough array and a bottom side of the second electrical feedthrough array to electrically connect the first electrical feedthrough array to the second electrical feedthrough array. The two arrays of magnetsmore » are arranged to attract in a first angular position which connects the first and second connector structures together and electrically connects the percutaneously connected device to the external electronics, and to repel in a second angular position to facilitate removal of the second connector structure from the first connector structure.« less
Augmenting computer networks

NASA Technical Reports Server (NTRS)

Bokhari, S. H.; Raza, A. D.

1984-01-01

Three methods of augmenting computer networks by adding at most one link per processor are discussed: (1) A tree of N nodes may be augmented such that the resulting graph has diameter no greater than 4log sub 2((N+2)/3)-2. Thi O(N(3)) algorithm can be applied to any spanning tree of a connected graph to reduce the diameter of that graph to O(log N); (2) Given a binary tree T and a chain C of N nodes each, C may be augmented to produce C so that T is a subgraph of C. This algorithm is O(N) and may be used to produce augmented chains or rings that have diameter no greater than 2log sub 2((N+2)/3) and are planar; (3) Any rectangular two-dimensional 4 (8) nearest neighbor array of size N = 2(k) may be augmented so that it can emulate a single step shuffle-exchange network of size N/2 in 3(t) time steps.
Error estimation and adaptive mesh refinement for parallel analysis of shell structures

NASA Technical Reports Server (NTRS)

Keating, Scott C.; Felippa, Carlos A.; Park, K. C.

1994-01-01

The formulation and application of element-level, element-independent error indicators is investigated. This research culminates in the development of an error indicator formulation which is derived based on the projection of element deformation onto the intrinsic element displacement modes. The qualifier 'element-level' means that no information from adjacent elements is used for error estimation. This property is ideally suited for obtaining error values and driving adaptive mesh refinements on parallel computers where access to neighboring elements residing on different processors may incur significant overhead. In addition such estimators are insensitive to the presence of physical interfaces and junctures. An error indicator qualifies as 'element-independent' when only visible quantities such as element stiffness and nodal displacements are used to quantify error. Error evaluation at the element level and element independence for the error indicator are highly desired properties for computing error in production-level finite element codes. Four element-level error indicators have been constructed. Two of the indicators are based on variational formulation of the element stiffness and are element-dependent. Their derivations are retained for developmental purposes. The second two indicators mimic and exceed the first two in performance but require no special formulation of the element stiffness mesh refinement which we demonstrate for two dimensional plane stress problems. The parallelizing of substructures and adaptive mesh refinement is discussed and the final error indicator using two-dimensional plane-stress and three-dimensional shell problems is demonstrated.
Status report of the end-to-end ASKAP software system: towards early science operations

NASA Astrophysics Data System (ADS)

Guzman, Juan Carlos; Chapman, Jessica; Marquarding, Malte; Whiting, Matthew

2016-08-01

The Australian SKA Pathfinder (ASKAP) is a novel centimetre radio synthesis telescope currently in the commissioning phase and located in the midwest region of Western Australia. It comprises of 36 x 12 m diameter reflector antennas each equipped with state-of-the-art and award winning Phased Array Feeds (PAF) technology. The PAFs provide a wide, 30 square degree field-of-view by forming up to 36 separate dual-polarisation beams at once. This results in a high data rate: 70 TB of correlated visibilities in an 8-hour observation, requiring custom-written, high-performance software running in dedicated High Performance Computing (HPC) facilities. The first six antennas equipped with first-generation PAF technology (Mark I), named the Boolardy Engineering Test Array (BETA) have been in use since 2014 as a platform to test PAF calibration and imaging techniques, and along the way it has been producing some great science results. Commissioning of the ASKAP Array Release 1, that is the first six antennas with second-generation PAFs (Mark II) is currently under way. An integral part of the instrument is the Central Processor platform hosted at the Pawsey Supercomputing Centre in Perth, which executes custom-written software pipelines, designed specifically to meet the ASKAP imaging requirements of wide field of view and high dynamic range. There are three key hardware components of the Central Processor: The ingest nodes (16 x node cluster), the fast temporary storage (1 PB Lustre file system) and the processing supercomputer (200 TFlop system). This High-Performance Computing (HPC) platform is managed and supported by the Pawsey support team. Due to the limited amount of data generated by BETA and the first ASKAP Array Release, the Central Processor platform has been running in a more "traditional" or user-interactive mode. But this is about to change: integration and verification of the online ingest pipeline starts in early 2016, which is required to support the full 300 MHz bandwidth for Array Release 1; followed by the deployment of the real-time data processing components. In addition to the Central Processor, the first production release of the CSIRO ASKAP Science Data Archive (CASDA) has also been deployed in one of the Pawsey Supercomputing Centre facilities and it is integrated to the end-to-end ASKAP data flow system. This paper describes the current status of the "end-to-end" data flow software system from preparing observations to data acquisition, processing and archiving; and the challenges of integrating an HPC facility as a key part of the instrument. It also shares some lessons learned since the start of integration activities and the challenges ahead in preparation for the start of the Early Science program.
High-Performance, Radiation-Hardened Electronics for Space Environments

NASA Technical Reports Server (NTRS)

Keys, Andrew S.; Watson, Michael D.; Frazier, Donald O.; Adams, James H.; Johnson, Michael A.; Kolawa, Elizabeth A.

2007-01-01

The Radiation Hardened Electronics for Space Environments (RHESE) project endeavors to advance the current state-of-the-art in high-performance, radiation-hardened electronics and processors, ensuring successful performance of space systems required to operate within extreme radiation and temperature environments. Because RHESE is a project within the Exploration Technology Development Program (ETDP), RHESE's primary customers will be the human and robotic missions being developed by NASA's Exploration Systems Mission Directorate (ESMD) in partial fulfillment of the Vision for Space Exploration. Benefits are also anticipated for NASA's science missions to planetary and deep-space destinations. As a technology development effort, RHESE provides a broad-scoped, full spectrum of approaches to environmentally harden space electronics, including new materials, advanced design processes, reconfigurable hardware techniques, and software modeling of the radiation environment. The RHESE sub-project tasks are: SelfReconfigurable Electronics for Extreme Environments, Radiation Effects Predictive Modeling, Radiation Hardened Memory, Single Event Effects (SEE) Immune Reconfigurable Field Programmable Gate Array (FPGA) (SIRF), Radiation Hardening by Software, Radiation Hardened High Performance Processors (HPP), Reconfigurable Computing, Low Temperature Tolerant MEMS by Design, and Silicon-Germanium (SiGe) Integrated Electronics for Extreme Environments. These nine sub-project tasks are managed by technical leads as located across five different NASA field centers, including Ames Research Center, Goddard Space Flight Center, the Jet Propulsion Laboratory, Langley Research Center, and Marshall Space Flight Center. The overall RHESE integrated project management responsibility resides with NASA's Marshall Space Flight Center (MSFC). Initial technology development emphasis within RHESE focuses on the hardening of Field Programmable Gate Arrays (FPGA)s and Field Programmable Analog Arrays (FPAA)s for use in reconfigurable architectures. As these component/chip level technologies mature, the RHESE project emphasis shifts to focus on efforts encompassing total processor hardening techniques and board-level electronic reconfiguration techniques featuring spare and interface modularity. This phased approach to distributing emphasis between technology developments provides hardened FPGA/FPAAs for early mission infusion, then migrates to hardened, board-level, high speed processors with associated memory elements and high density storage for the longer duration missions encountered for Lunar Outpost and Mars Exploration occurring later in the Constellation schedule.
DC switching regulated power supply for driving an inductive load

DOEpatents

Dyer, G.R.

1983-11-29

A dc switching regulated power supply for driving an inductive load is provided. The regulator basic circuit is a bridge arrangement of diodes and transistors. First and second opposite legs of the bridge are formed by first and second parallel-connected transistor arrays, respectively, while the third and fourth legs of the bridge are formed by appropriately connected first and second parallel connected diode arrays, respectively. A dc power supply is connected to the input of the bridge and the output is connected to the load. A servo controller is provided to control the switching rate of the transistors to maintain a desired current to the load. The regulator may be operated in three stages or modes: (1) for current runup in the load, both first and second transistor switch arrays are turned on and current is supplied to the load through both transistor arrays. (2) When load current reaches the desired level, the first switch is turned off, and load current flywheels through the second switch array and the fourth leg diode array connecting the second switch array in series with the load. Current is maintained by alternating between modes 1 and 2 at a suitable duty cycle and switching rate set by the controller. (3) Rapid current rundown is accomplished by turning both switch arrays off, allowing load current to be dumped back into the source through the third and fourth diode arrays connecting the source in series opposition with the load to recover energy from the inductive load.
Edge delamination of composite laminates subject to combined tension and torsional loading

NASA Technical Reports Server (NTRS)

Hooper, Steven J.

1990-01-01

Delamination is a common failure mode of laminated composite materials. Edge delamination is important since it results in reduced stiffness and strength of the laminate. The tension/torsion load condition is of particular significance to the structural integrity of composite helicopter rotor systems. Material coupons can easily be tested under this type of loading in servo-hydraulic tension/torsion test stands using techniques very similar to those used for the Edge Delamination Tensile Test (EDT) delamination specimen. Edge delamination of specimens loaded in tension was successfully analyzed by several investigators using both classical laminate theory and quasi-three dimensional (Q3D) finite element techniques. The former analysis technique can be used to predict the total strain energy release rate, while the latter technique enables the calculation of the mixed-mode strain energy release rates. The Q3D analysis is very efficient since it produces a three-dimensional solution to a two-dimensional domain. A computer program was developed which generates PATRAN commands to generate the finite element model. PATRAN is a pre- and post-processor which is commonly used with a variety of finite element programs such as MCS/NASTRAN. The program creates a sufficiently dense mesh at the delamination crack tips to support a mixed-mode fracture mechanics analysis. The program creates a coarse mesh in those regions where the gradients in the stress field are low (away from the delamination regions). A transition mesh is defined between these regions. This program is capable of generating a mesh for an arbitrarily oriented matrix crack. This program significantly reduces the modeling time required to generate these finite element meshes, thus providing a realistic tool with which to investigate the tension torsion problem.

FILMPAR: A parallel algorithm designed for the efficient and accurate computation of thin film flow on functional surfaces containing micro-structure

NASA Astrophysics Data System (ADS)

Lee, Y. C.; Thompson, H. M.; Gaskell, P. H.

2009-12-01

FILMPAR is a highly efficient and portable parallel multigrid algorithm for solving a discretised form of the lubrication approximation to three-dimensional, gravity-driven, continuous thin film free-surface flow over substrates containing micro-scale topography. While generally applicable to problems involving heterogeneous and distributed features, for illustrative purposes the algorithm is benchmarked on a distributed memory IBM BlueGene/P computing platform for the case of flow over a single trench topography, enabling direct comparison with complementary experimental data and existing serial multigrid solutions. Parallel performance is assessed as a function of the number of processors employed and shown to lead to super-linear behaviour for the production of mesh-independent solutions. In addition, the approach is used to solve for the case of flow over a complex inter-connected topographical feature and a description provided of how FILMPAR could be adapted relatively simply to solve for a wider class of related thin film flow problems. Program summaryProgram title: FILMPAR Catalogue identifier: AEEL_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEEL_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 530 421 No. of bytes in distributed program, including test data, etc.: 1 960 313 Distribution format: tar.gz Programming language: C++ and MPI Computer: Desktop, server Operating system: Unix/Linux Mac OS X Has the code been vectorised or parallelised?: Yes. Tested with up to 128 processors RAM: 512 MBytes Classification: 12 External routines: GNU C/C++, MPI Nature of problem: Thin film flows over functional substrates containing well-defined single and complex topographical features are of enormous significance, having a wide variety of engineering, industrial and physical applications. However, despite recent modelling advances, the accurate numerical solution of the equations governing such problems is still at a relatively early stage. Indeed, recent studies employing a simplifying long-wave approximation have shown that highly efficient numerical methods are necessary to solve the resulting lubrication equations in order to achieve the level of grid resolution required to accurately capture the effects of micro- and nano-scale topographical features. Solution method: A portable parallel multigrid algorithm has been developed for the above purpose, for the particular case of flow over submerged topographical features. Within the multigrid framework adopted, a W-cycle is used to accelerate convergence in respect of the time dependent nature of the problem, with relaxation sweeps performed using a fixed number of pre- and post-Red-Black Gauss-Seidel Newton iterations. In addition, the algorithm incorporates automatic adaptive time-stepping to avoid the computational expense associated with repeated time-step failure. Running time: 1.31 minutes using 128 processors on BlueGene/P with a problem size of over 16.7 million mesh points.
All-solid-state Z-scheme system arrays of Fe2V4O13/RGO/CdS for visible light-driving photocatalytic CO2 reduction into renewable hydrocarbon fuel.

PubMed

Li, Ping; Zhou, Yong; Li, Haijin; Xu, Qinfeng; Meng, Xianguang; Meng, Xiangguang; Wang, Xiaoyong; Xiao, Min; Zou, Zhigang

2015-01-14

An all-solid-state Z-scheme system array consisting of an Fe2V4O13 nanoribbon (NR)/reduced graphene oxide (RGO)/CdS nanoparticle grown on the stainless-steel mesh was rationally designed for photoconversion of gaseous CO2 into renewable hydrocarbon fuels (methane: CH4).
Multi-anode microchannel arrays - New detectors for imaging and spectroscopy in space

NASA Technical Reports Server (NTRS)

Timothy, J. G.; Bybee, R. L.

1983-01-01

Consideration is given to the construction and operation of multi-anode microchannel array detector systems having formats as large as 256 x 1024 pixels. Such arrays are being developed for imaging and spectroscopy at soft X-ray, ultraviolet and visible wavelengths from balloons, sounding rockets and space probes. Both discrete-anode and coincidence-anode arrays are described. Two types of photocathode structures are evaluated: an opaque photocathode deposited directly on the curved-channel MCP and an activated cathode deposited on a proximity-focused mesh. Future work will include sensitivity optimization in the different wavelength regions and the development of detector tubes with semitransparent proximity-focused photocathodes.
Calculation of the force acting on a micro-sized particle with optical vortex array laser beam tweezers

NASA Astrophysics Data System (ADS)

Kuo, Chun-Fu; Chu, Shu-Chun

2013-03-01

Optical vortices possess several special properties, including carrying optical angular momentum (OAM) and exhibiting zero intensity. Vortex array laser beams have attracts many interests due to its special mesh field distributions, which show great potential in the application of multiple optical traps and dark optical traps. Previously study developed an Ince-Gaussian Mode (IGM)-based vortex array laser beam1. This study develops a simulation model based on the discrete dipole approximation (DDA) method for calculating the resultant force acting on a micro-sized spherical dielectric particle that situated at the beam waist of the IGM-based vortex array laser beams1.
Intercluster Connection in Cognitive Wireless Mesh Networks Based on Intelligent Network Coding

NASA Astrophysics Data System (ADS)

Chen, Xianfu; Zhao, Zhifeng; Jiang, Tao; Grace, David; Zhang, Honggang

2009-12-01

Cognitive wireless mesh networks have great flexibility to improve spectrum resource utilization, within which secondary users (SUs) can opportunistically access the authorized frequency bands while being complying with the interference constraint as well as the QoS (Quality-of-Service) requirement of primary users (PUs). In this paper, we consider intercluster connection between the neighboring clusters under the framework of cognitive wireless mesh networks. Corresponding to the collocated clusters, data flow which includes the exchanging of control channel messages usually needs four time slots in traditional relaying schemes since all involved nodes operate in half-duplex mode, resulting in significant bandwidth efficiency loss. The situation is even worse at the gateway node connecting the two colocated clusters. A novel scheme based on network coding is proposed in this paper, which needs only two time slots to exchange the same amount of information mentioned above. Our simulation shows that the network coding-based intercluster connection has the advantage of higher bandwidth efficiency compared with the traditional strategy. Furthermore, how to choose an optimal relaying transmission power level at the gateway node in an environment of coexisting primary and secondary users is discussed. We present intelligent approaches based on reinforcement learning to solve the problem. Theoretical analysis and simulation results both show that the intelligent approaches can achieve optimal throughput for the intercluster relaying in the long run.
Simulation of Fault Tolerance in a Hypercube Arrangement of Discrete Processors.

DTIC Science & Technology

1987-12-01

Geometric Properties .................... 22 Binary Properties ....................... 26 Intel Hypercube Hardware Arrangement ... 28 IV. Cube-Connected... Properties of the CCC..............35 CCC Redundancy............................... 38 iii 6L V. Re-Configurable Cube-Connected Cycles ....... 40 Global...o........ 74 iv List of Figures Page Figure 1: Hypercubes of Different Dimensions ......... 21 Figure 2: Hypercube Properties
Atmospheric Modeling And Sensor Simulation (AMASS) study

NASA Technical Reports Server (NTRS)

Parker, K. G.

1985-01-01

A 4800 band synchronous communications link was established between the Perkin-Elmer (P-E) 3250 Atmospheric Modeling and Sensor Simulation (AMASS) system and the Cyber 205 located at the Goddard Space Flight Center. An extension study of off-the-shelf array processors offering standard interface to the Perkin-Elmer was conducted to determine which would meet computational requirements of the division. A Floating Point Systems AP-120B was borrowed from another Marshall Space Flight Center laboratory for evaluation. It was determined that available array processors did not offer significantly more capabilities than the borrowed unit, although at least three other vendors indicated that standard Perkin-Elmer interfaces would be marketed in the future. Therefore, the recommendation was made to continue to utilize the 120B ad to keep monitoring the AP market. Hardware necessary to support requirements of the ASD as well as to enhance system performance was specified and procured. Filters were implemented on the Harris/McIDAS system including two-dimensional lowpass, gradient, Laplacian, and bicubic interpolation routines.
A new multifunction acousto-optic signal processor

NASA Technical Reports Server (NTRS)

Berg, N. J.; Casseday, M. W.; Filipov, A. N.; Pellegrino, J. M.

1984-01-01

An acousto-optic architecture for simultaneously obtaining time integration correlation and high-speed power spectrum analysis was constructed using commercially available TeO2 modulators and photodiode detector-arrays. The correlator section of the processor uses coherent interferometry to attain maximum bandwidth and dynamic range while achieving a time-bandwidth product of 1 million. Two correllator outputs are achieved in this system configuration. One is optically filtered and magnified 2 : 1 to decrease the spatial frequency to a level where a 25-MHz bandwidth may be sampled by a 62-mm array with elements on 25-micro centers. The other output is magnified by a factor of 10 such that the center 4 microseconds of information is available for estimation of time-difference-of-arrival to within 10 ns. The Bragg cell spectrum-analyzer section, which also has two outputs, resolves a 25-MHz instantaneous bandwidth to 25 kHz and can determine discrete-frequency reception time to within 15 microseconds. A microprocessor combines spectrum analysis information with that obtained from the correlator.
A pipelined architecture for real time correction of non-uniformity in infrared focal plane arrays imaging system using multiprocessors

NASA Astrophysics Data System (ADS)

Zou, Liang; Fu, Zhuang; Zhao, YanZheng; Yang, JunYan

2010-07-01

This paper proposes a kind of pipelined electric circuit architecture implemented in FPGA, a very large scale integrated circuit (VLSI), which efficiently deals with the real time non-uniformity correction (NUC) algorithm for infrared focal plane arrays (IRFPA). Dual Nios II soft-core processors and a DSP with a 64+ core together constitute this image system. Each processor undertakes own systematic task, coordinating its work with each other's. The system on programmable chip (SOPC) in FPGA works steadily under the global clock frequency of 96Mhz. Adequate time allowance makes FPGA perform NUC image pre-processing algorithm with ease, which has offered favorable guarantee for the work of post image processing in DSP. And at the meantime, this paper presents a hardware (HW) and software (SW) co-design in FPGA. Thus, this systematic architecture yields an image processing system with multiprocessor, and a smart solution to the satisfaction with the performance of the system.
Compact electro-optical module with polymer waveguides on a flexible substrate for high-density board-level communication

NASA Astrophysics Data System (ADS)

Weiss, J. R. M.; Lamprecht, T.; Meier, N.; Dangel, R.; Horst, F.; Jubin, D.; Beyeler, R.; Offrein, B. J.

2010-02-01

We report on the co-packaging of electrical CMOS transceiver and VCSEL chip arrays on a flexible electrical substrate with optical polymer waveguides. The electro-optical components are attached to the substrate edge and butt-coupled to the waveguides. Electrically conductive silver-ink connects them to the substrate at an angle of 90°. The final assembly contacts the surface of a package laminate with an integrated compressible connector. The module can be folded to save space, requires only a small footprint on the package laminate and provides short electrical high-speed signal paths. With our approach, the electro-optical package becomes a compact electro-optical module with integrated polymer waveguides terminated with either optical connectors (e.g., at the card edge) or with an identical assembly for a second processor on the board. Consequently, no costly subassemblies and connectors are needed, and a very high integration density and scalability to virtually arbitrary channel counts and towards very high data rates (20+ Gbps) become possible. Future cost targets of much less than US$1 per Gbps will be reached by employing standard PCB materials and technologies that are well established in the industry. Moreover, our technology platform has both electrical and optical connectivity and functionality.
Microlaser-based compact optical neuro-processors (Invited Paper)

NASA Astrophysics Data System (ADS)

Paek, Eung Gi; Chan, Winston K.; Zah, Chung-En; Cheung, Kwok-wai; Curtis, L.; Chang-Hasnain, Constance J.

1992-10-01

This paper reviews the recent progress in the development of holographic neural networks using surface-emitting laser diode arrays (SELDAs). Since the previous work on ultrafast holographic memory readout system and a robust incoherent correlator, progress has been made in several areas: the use of an array of monolithic `neurons' to reconstruct holographic memories; two-dimensional (2-D) wavelength-division multiplexing (WDM) for image transmission through a single-mode fiber; and finally, an associative memory using time- division multiplexing (TDM). Experimental demonstrations on these are presented.
FPGA Acceleration of the phylogenetic likelihood function for Bayesian MCMC inference methods.

PubMed

Zierke, Stephanie; Bakos, Jason D

2010-04-12

Likelihood (ML)-based phylogenetic inference has become a popular method for estimating the evolutionary relationships among species based on genomic sequence data. This method is used in applications such as RAxML, GARLI, MrBayes, PAML, and PAUP. The Phylogenetic Likelihood Function (PLF) is an important kernel computation for this method. The PLF consists of a loop with no conditional behavior or dependencies between iterations. As such it contains a high potential for exploiting parallelism using micro-architectural techniques. In this paper, we describe a technique for mapping the PLF and supporting logic onto a Field Programmable Gate Array (FPGA)-based co-processor. By leveraging the FPGA's on-chip DSP modules and the high-bandwidth local memory attached to the FPGA, the resultant co-processor can accelerate ML-based methods and outperform state-of-the-art multi-core processors. We use the MrBayes 3 tool as a framework for designing our co-processor. For large datasets, we estimate that our accelerated MrBayes, if run on a current-generation FPGA, achieves a 10x speedup relative to software running on a state-of-the-art server-class microprocessor. The FPGA-based implementation achieves its performance by deeply pipelining the likelihood computations, performing multiple floating-point operations in parallel, and through a natural log approximation that is chosen specifically to leverage a deeply pipelined custom architecture. Heterogeneous computing, which combines general-purpose processors with special-purpose co-processors such as FPGAs and GPUs, is a promising approach for high-performance phylogeny inference as shown by the growing body of literature in this field. FPGAs in particular are well-suited for this task because of their low power consumption as compared to many-core processors and Graphics Processor Units (GPUs).
Development Of A Three-Dimensional Circuit Integration Technology And Computer Architecture

NASA Astrophysics Data System (ADS)

Etchells, R. D.; Grinberg, J.; Nudd, G. R.

1981-12-01

This paper is the first of a series 1,2,3 describing a range of efforts at Hughes Research Laboratories, which are collectively referred to as "Three-Dimensional Microelectronics." The technology being developed is a combination of a unique circuit fabrication/packaging technology and a novel processing architecture. The packaging technology greatly reduces the parasitic impedances associated with signal-routing in complex VLSI structures, while simultaneously allowing circuit densities orders of magnitude higher than the current state-of-the-art. When combined with the 3-D processor architecture, the resulting machine exhibits a one- to two-order of magnitude simultaneous improvement over current state-of-the-art machines in the three areas of processing speed, power consumption, and physical volume. The 3-D architecture is essentially that commonly referred to as a "cellular array", with the ultimate implementation having as many as 512 x 512 processors working in parallel. The three-dimensional nature of the assembled machine arises from the fact that the chips containing the active circuitry of the processor are stacked on top of each other. In this structure, electrical signals are passed vertically through the chips via thermomigrated aluminum feedthroughs. Signals are passed between adjacent chips by micro-interconnects. This discussion presents a broad view of the total effort, as well as a more detailed treatment of the fabrication and packaging technologies themselves. The results of performance simulations of the completed 3-D processor executing a variety of algorithms are also presented. Of particular pertinence to the interests of the focal-plane array community is the simulation of the UNICORNS nonuniformity correction algorithms as executed by the 3-D architecture.
A GaAs vector processor based on parallel RISC microprocessors

NASA Astrophysics Data System (ADS)

Misko, Tim A.; Rasset, Terry L.

A vector processor architecture based on the development of a 32-bit microprocessor using gallium arsenide (GaAs) technology has been developed. The McDonnell Douglas vector processor (MVP) will be fabricated completely from GaAs digital integrated circuits. The MVP architecture includes a vector memory of 1 megabyte, a parallel bus architecture with eight processing elements connected in parallel, and a control processor. The processing elements consist of a reduced instruction set CPU (RISC) with four floating-point coprocessor units and necessary memory interface functions. This architecture has been simulated for several benchmark programs including complex fast Fourier transform (FFT), complex inner product, trigonometric functions, and sort-merge routine. The results of this study indicate that the MVP can process a 1024-point complex FFT at a speed of 112 microsec (389 megaflops) while consuming approximately 618 W of power in a volume of approximately 0.1 ft-cubed.
Practical, redundant, failure-tolerant, self-reconfiguring embedded system architecture

DOEpatents

Klarer, Paul R.; Hayward, David R.; Amai, Wendy A.

2006-10-03

This invention relates to system architectures, specifically failure-tolerant and self-reconfiguring embedded system architectures. The invention provides both a method and architecture for redundancy. There can be redundancy in both software and hardware for multiple levels of redundancy. The invention provides a self-reconfiguring architecture for activating redundant modules whenever other modules fail. The architecture comprises: a communication backbone connected to two or more processors and software modules running on each of the processors. Each software module runs on one processor and resides on one or more of the other processors to be available as a backup module in the event of failure. Each module and backup module reports its status over the communication backbone. If a primary module does not report, its backup module takes over its function. If the primary module becomes available again, the backup module returns to its backup status.
Distributed micro-radar system for detection and tracking of low-profile, low-altitude targets

NASA Astrophysics Data System (ADS)

Gorwara, Ashok; Molchanov, Pavlo

2016-05-01

Proposed airborne surveillance radar system can detect, locate, track, and classify low-profile, low-altitude targets: from traditional fixed and rotary wing aircraft to non-traditional targets like unmanned aircraft systems (drones) and even small projectiles. Distributed micro-radar system is the next step in the development of passive monopulse direction finder proposed by Stephen E. Lipsky in the 80s. To extend high frequency limit and provide high sensitivity over the broadband of frequencies, multiple angularly spaced directional antennas are coupled with front end circuits and separately connected to a direction finder processor by a digital interface. Integration of antennas with front end circuits allows to exclude waveguide lines which limits system bandwidth and creates frequency dependent phase errors. Digitizing of received signals proximate to antennas allows loose distribution of antennas and dramatically decrease phase errors connected with waveguides. Accuracy of direction finding in proposed micro-radar in this case will be determined by time accuracy of digital processor and sampling frequency. Multi-band, multi-functional antennas can be distributed around the perimeter of a Unmanned Aircraft System (UAS) and connected to the processor by digital interface or can be distributed between swarm/formation of mini/micro UAS and connected wirelessly. Expendable micro-radars can be distributed by perimeter of defense object and create multi-static radar network. Low-profile, lowaltitude, high speed targets, like small projectiles, create a Doppler shift in a narrow frequency band. This signal can be effectively filtrated and detected with high probability. Proposed micro-radar can work in passive, monostatic or bistatic regime.
A Reduced Order Model of Force Displacement Curves for the Failure of Mechanical Bolts in Tension.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Moore, Keegan J.; Sandia National Lab.; Brake, Matthew Robert

2015-12-01

Assembled mechanical systems often contain a large number of bolted connections. These bolted connections (joints) are integral aspects of the load path for structural dynamics, and, consequently, are paramount for calculating a structure's stiffness and energy dissipation prop- erties. However, analysts have not found the optimal method to model appropriately these bolted joints. The complexity of the screw geometry causes issues when generating a mesh of the model. This report will explore different approaches to model a screw-substrate connec- tion. Model parameters such as mesh continuity, node alignment, wedge angles, and thread to body element size ratios are examined. Themore » results of this study will give analysts a better understanding of the influences of these parameters and will aide in finding the optimal method to model bolted connections.« less
An FPGA computing demo core for space charge simulation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wu, Jinyuan; Huang, Yifei; /Fermilab

2009-01-01

In accelerator physics, space charge simulation requires large amount of computing power. In a particle system, each calculation requires time/resource consuming operations such as multiplications, divisions, and square roots. Because of the flexibility of field programmable gate arrays (FPGAs), we implemented this task with efficient use of the available computing resources and completely eliminated non-calculating operations that are indispensable in regular micro-processors (e.g. instruction fetch, instruction decoding, etc.). We designed and tested a 16-bit demo core for computing Coulomb's force in an Altera Cyclone II FPGA device. To save resources, the inverse square-root cube operation in our design is computedmore » using a memory look-up table addressed with nine to ten most significant non-zero bits. At 200 MHz internal clock, our demo core reaches a throughput of 200 M pairs/s/core, faster than a typical 2 GHz micro-processor by about a factor of 10. Temperature and power consumption of FPGAs were also lower than those of micro-processors. Fast and convenient, FPGAs can serve as alternatives to time-consuming micro-processors for space charge simulation.« less
Discrete differential geometry: The nonplanar quadrilateral mesh

NASA Astrophysics Data System (ADS)

Twining, Carole J.; Marsland, Stephen

2012-06-01

We consider the problem of constructing a discrete differential geometry defined on nonplanar quadrilateral meshes. Physical models on discrete nonflat spaces are of inherent interest, as well as being used in applications such as computation for electromagnetism, fluid mechanics, and image analysis. However, the majority of analysis has focused on triangulated meshes. We consider two approaches: discretizing the tensor calculus, and a discrete mesh version of differential forms. While these two approaches are equivalent in the continuum, we show that this is not true in the discrete case. Nevertheless, we show that it is possible to construct mesh versions of the Levi-Civita connection (and hence the tensorial covariant derivative and the associated covariant exterior derivative), the torsion, and the curvature. We show how discrete analogs of the usual vector integral theorems are constructed in such a way that the appropriate conservation laws hold exactly on the mesh, rather than only as approximations to the continuum limit. We demonstrate the success of our method by constructing a mesh version of classical electromagnetism and discuss how our formalism could be used to deal with other physical models, such as fluids.
Semi-regular remeshing based trust region spherical geometry image for 3D deformed mesh used MLWNN

NASA Astrophysics Data System (ADS)

Dhibi, Naziha; Elkefi, Akram; Bellil, Wajdi; Ben Amar, Chokri

2017-03-01

Triangular surface are now widely used for modeling three-dimensional object, since these models are very high resolution and the geometry of the mesh is often very dense, it is then necessary to remesh this object to reduce their complexity, the mesh quality (connectivity regularity) must be ameliorated. In this paper, we review the main methods of semi-regular remeshing of the state of the art, given the semi-regular remeshing is mainly relevant for wavelet-based compression, then we present our method for re-meshing based trust region spherical geometry image to have good scheme of 3d mesh compression used to deform 3D meh based on Multi library Wavelet Neural Network structure (MLWNN). Experimental results show that the progressive re-meshing algorithm capable of obtaining more compact representations and semi-regular objects and yield an efficient compression capabilities with minimal set of features used to have good 3D deformation scheme.

Sound-field reproduction systems using fixed-directivity loudspeakers.

PubMed

Poletti, M; Fazi, F M; Nelson, P A

2010-06-01

Sound reproduction systems using open arrays of loudspeakers in rooms suffer from degradations due to room reflections. These reflections can be reduced using pre-compensation of the loudspeaker signals, but this requires calibration of the array in the room, and is processor-intensive. This paper examines 3D sound reproduction systems using spherical arrays of fixed-directivity loudspeakers which reduce the sound field radiated outside the array. A generalized form of the simple source formulation and a mode-matching solution are derived for the required loudspeaker weights. The exterior field is derived and expressions for the exterior power and direct to reverberant ratio are derived. The theoretical results and simulations confirm that minimum interference occurs for loudspeakers which have hyper-cardioid polar responses.
Video rate morphological processor based on a redundant number representation

NASA Astrophysics Data System (ADS)

Kuczborski, Wojciech; Attikiouzel, Yianni; Crebbin, Gregory A.

1992-03-01

This paper presents a video rate morphological processor for automated visual inspection of printed circuit boards, integrated circuit masks, and other complex objects. Inspection algorithms are based on gray-scale mathematical morphology. Hardware complexity of the known methods of real-time implementation of gray-scale morphology--the umbra transform and the threshold decomposition--has prompted us to propose a novel technique which applied an arithmetic system without carrying propagation. After considering several arithmetic systems, a redundant number representation has been selected for implementation. Two options are analyzed here. The first is a pure signed digit number representation (SDNR) with the base of 4. The second option is a combination of the base-2 SDNR (to represent gray levels of images) and the conventional twos complement code (to represent gray levels of structuring elements). Operation principle of the morphological processor is based on the concept of the digit level systolic array. Individual processing units and small memory elements create a pipeline. The memory elements store current image windows (kernels). All operation primitives of processing units apply a unified direction of digit processing: most significant digit first (MSDF). The implementation technology is based on the field programmable gate arrays by Xilinx. This paper justified the rationality of a new approach to logic design, which is the decomposition of Boolean functions instead of Boolean minimization.
Parallel implementation of a Lagrangian-based model on an adaptive mesh in C++: Application to sea-ice

NASA Astrophysics Data System (ADS)

Samaké, Abdoulaye; Rampal, Pierre; Bouillon, Sylvain; Ólason, Einar

2017-12-01

We present a parallel implementation framework for a new dynamic/thermodynamic sea-ice model, called neXtSIM, based on the Elasto-Brittle rheology and using an adaptive mesh. The spatial discretisation of the model is done using the finite-element method. The temporal discretisation is semi-implicit and the advection is achieved using either a pure Lagrangian scheme or an Arbitrary Lagrangian Eulerian scheme (ALE). The parallel implementation presented here focuses on the distributed-memory approach using the message-passing library MPI. The efficiency and the scalability of the parallel algorithms are illustrated by the numerical experiments performed using up to 500 processor cores of a cluster computing system. The performance obtained by the proposed parallel implementation of the neXtSIM code is shown being sufficient to perform simulations for state-of-the-art sea ice forecasting and geophysical process studies over geographical domain of several millions squared kilometers like the Arctic region.
Processing of thermionic power on an electrically propelled spacecraft

NASA Technical Reports Server (NTRS)

Macie, T. W.

1973-01-01

A study to define the power processing equipment required between a thermionic reactor and an array of mercury-ion thrusters for a nuclear electric propulsion system is reported. Observations and recommendations that resulted from this study were: (1) the preferred thermionic-fuel-element source voltages are 23 V or higher; (2) transistor characteristics exert a strong effect on power processor mass; (3) the power processor mass could be considerably reduced should the magnetic materials that exhibit low losses at high frequencies, that have a high Curie point, and that can operate at 15 to 20 kG become avaliable; (4) electrical component packaging on the radiator could reduce the area that is sensitive to meteoroid penetration, thereby reducing the meteoroid shielding mass requirement; (5) an experimental model of the power processor design should be built and tested to verify the efficiencies, masses, and all the automatic operational aspects of the design.
Fuzzy logic particle tracking velocimetry

NASA Technical Reports Server (NTRS)

Wernet, Mark P.

1993-01-01

Fuzzy logic has proven to be a simple and robust method for process control. Instead of requiring a complex model of the system, a user defined rule base is used to control the process. In this paper the principles of fuzzy logic control are applied to Particle Tracking Velocimetry (PTV). Two frames of digitally recorded, single exposure particle imagery are used as input. The fuzzy processor uses the local particle displacement information to determine the correct particle tracks. Fuzzy PTV is an improvement over traditional PTV techniques which typically require a sequence (greater than 2) of image frames for accurately tracking particles. The fuzzy processor executes in software on a PC without the use of specialized array or fuzzy logic processors. A pair of sample input images with roughly 300 particle images each, results in more than 200 velocity vectors in under 8 seconds of processing time.
Reconfigurable L-Band Radar

NASA Technical Reports Server (NTRS)

Rincon, Rafael F.

2008-01-01

The reconfigurable L-Band radar is an ongoing development at NASA/GSFC that exploits the capability inherently in phased array radar systems with a state-of-the-art data acquisition and real-time processor in order to enable multi-mode measurement techniques in a single radar architecture. The development leverages on the L-Band Imaging Scatterometer, a radar system designed for the development and testing of new radar techniques; and the custom-built DBSAR processor, a highly reconfigurable, high speed data acquisition and processing system. The radar modes currently implemented include scatterometer, synthetic aperture radar, and altimetry; and plans to add new modes such as radiometry and bi-static GNSS signals are being formulated. This development is aimed at enhancing the radar remote sensing capabilities for airborne and spaceborne applications in support of Earth Science and planetary exploration This paper describes the design of the radar and processor systems, explains the operational modes, and discusses preliminary measurements and future plans.
Comparing an FPGA to a Cell for an Image Processing Application

NASA Astrophysics Data System (ADS)

Rakvic, Ryan N.; Ngo, Hau; Broussard, Randy P.; Ives, Robert W.

2010-12-01

Modern advancements in configurable hardware, most notably Field-Programmable Gate Arrays (FPGAs), have provided an exciting opportunity to discover the parallel nature of modern image processing algorithms. On the other hand, PlayStation3 (PS3) game consoles contain a multicore heterogeneous processor known as the Cell, which is designed to perform complex image processing algorithms at a high performance. In this research project, our aim is to study the differences in performance of a modern image processing algorithm on these two hardware platforms. In particular, Iris Recognition Systems have recently become an attractive identification method because of their extremely high accuracy. Iris matching, a repeatedly executed portion of a modern iris recognition algorithm, is parallelized on an FPGA system and a Cell processor. We demonstrate a 2.5 times speedup of the parallelized algorithm on the FPGA system when compared to a Cell processor-based version.
An optical processor for object recognition and tracking

NASA Technical Reports Server (NTRS)

Sloan, J.; Udomkesmalee, S.

1987-01-01

The design and development of a miniaturized optical processor that performs real time image correlation are described. The optical correlator utilizes the Vander Lugt matched spatial filter technique. The correlation output, a focused beam of light, is imaged onto a CMOS photodetector array. In addition to performing target recognition, the device also tracks the target. The hardware, composed of optical and electro-optical components, occupies only 590 cu cm of volume. A complete correlator system would also include an input imaging lens. This optical processing system is compact, rugged, requires only 3.5 watts of operating power, and weighs less than 3 kg. It represents a major achievement in miniaturizing optical processors. When considered as a special-purpose processing unit, it is an attractive alternative to conventional digital image recognition processing. It is conceivable that the combined technology of both optical and ditital processing could result in a very advanced robot vision system.
System data communication structures for active-control transport aircraft, volume 1

NASA Technical Reports Server (NTRS)

Hopkins, A. L.; Martin, J. H.; Brock, L. D.; Jansson, D. G.; Serben, S.; Smith, T. B.; Hanley, L. D.

1981-01-01

Candidate data communication techniques are identified, including dedicated links, local buses, broadcast buses, multiplex buses, and mesh networks. The design methodology for mesh networks is then discussed, including network topology and node architecture. Several concepts of power distribution are reviewed, including current limiting and mesh networks for power. The technology issues of packaging, transmission media, and lightning are addressed, and, finally, the analysis tools developed to aid in the communication design process are described. There are special tools to analyze the reliability and connectivity of networks and more general reliability analysis tools for all types of systems.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Learn, Mark Walter

Sandia National Laboratories is currently developing new processing and data communication architectures for use in future satellite payloads. These architectures will leverage the flexibility and performance of state-of-the-art static-random-access-memory-based Field Programmable Gate Arrays (FPGAs). One such FPGA is the radiation-hardened version of the Virtex-5 being developed by Xilinx. However, not all features of this FPGA are being radiation-hardened by design and could still be susceptible to on-orbit upsets. One such feature is the embedded hard-core PPC440 processor. Since this processor is implemented in the FPGA as a hard-core, traditional mitigation approaches such as Triple Modular Redundancy (TMR) are not availablemore » to improve the processor's on-orbit reliability. The goal of this work is to investigate techniques that can help mitigate the embedded hard-core PPC440 processor within the Virtex-5 FPGA other than TMR. Implementing various mitigation schemes reliably within the PPC440 offers a powerful reconfigurable computing resource to these node-based processing architectures. This document summarizes the work done on the cache mitigation scheme for the embedded hard-core PPC440 processor within the Virtex-5 FPGAs, and describes in detail the design of the cache mitigation scheme and the testing conducted at the radiation effects facility on the Texas A&M campus.« less
Surface modification of polypropylene mesh devices with cyclodextrin via cold plasma for hernia repair: Characterization and antibacterial properties

NASA Astrophysics Data System (ADS)

Sanbhal, Noor; Mao, Ying; Sun, Gang; Xu, Rui Fang; Zhang, Qian; Wang, Lu

2018-05-01

Light weight polypropylene (PP) mesh is the most widely used implant among all other synthetic meshes for hernia repair. However, infection is the complication associated to all synthetic meshes after hernia repair. Thus, to manage mesh related infection; antibacterial drug is generally loaded to surgical implants to supply drug locally in mesh implanted site. Nevertheless, PP mesh restricts the loading of antibacterial drug at operated area due to its low wettability. The aim of this study was to introduce a novel antimicrobial PP mesh modified with β-cyclodextrine (CD) and loaded with antimicrobial agent for infection prevention. A cold oxygen plasma treatment was able to activate the surfaces of polypropylene fibers, and then CD was incorporated onto the surfaces of PP fibers. Afterward, triclosan, as a model antibacterial agent, was loaded into CD cavity to provide desired antibacterial functions. The modified polypropylene mesh samples CD-Tric-1, CD-Tric-3 exhibited excellent inhibition zone and continuous antibacterial efficacy against E. coli and S. aureus up to 6 and 7 days respectively. Results of AFM, SEM, FTIR and antibacterial tests evidenced that oxygen plasma process is necessary to increase chemical connection between CD molecules and PP fibers. The samples were also characterized by using EDX, XRD, TGA, DSC and water contact angle.
AHF: Array-Based Half-Facet Data Structure for Mixed-Dimensional and Non-manifold Meshes

DTIC Science & Technology

2013-10-13

19a. NAME OF RESPONSIBLE PERSON 19b. TELEPHONE NUMBER James Glimm V. Dyedov, N. Ray, D. Einstein , X. Jiao, T.J. Tautges 611102 c. THIS PAGE The...Ray, D. Einstein , X. Jiao, and T. Tautges mesh data structures. Examples of such new demanding applications include coupled multiphysics simulations and...be composed of a union of topologically 1-D, 2-D, 4 V. Dyedov, N. Ray, D. Einstein , X. Jiao, and T. Tautges and 3-D objects, such as a mixture of
Recommend to Your Librarian

< Previous Issue | Next Issue >

Volume 29, Issue1 (January 2005)

Articles in the Current Issue:

Research Article

Homogenization framework for three-dimensional elastoplastic finite element analysis of a grouted pipe-roofing reinforcement method for tunnelling

NASA Astrophysics Data System (ADS)

Bae, G. J.; Shin, H. S.; Sicilia, C.; Choi, Y. G.; Lim, J. J.

2005-01-01

This paper deals with the grouted pipe-roofing reinforcement method that is used in the construction of tunnels through weak grounds. This system consists on installing, prior to the excavation of a length of tunnel, an array of pipes forming a kind of umbrella above the area to be excavated. In some cases, these pipes are later used to inject grout to strengthen the ground and connect the pipes.This system has proven to be very efficient in reducing tunnel convergence and water inflow when tunnelling through weak grounds. However, due to the geometrical and mechanical complexity of the problem, existing finite element frameworks are inappropriate to simulate tunnelling using this method.In this paper, a mathematical framework based on a homogenization technique to simulate grouted pipe-roofing reinforced ground and its implementation into a 3-D finite element programme that can consider stage construction situations are presented. The constitutive model developed allows considering the main design parameters of the problem and only requires geometrical and mechanical properties of the constituents. Additionally, the use of a homogenization approach implies that the generation of the finite element mesh can be easily produced and that re-meshing is not required as basic geometrical parameters such as the orientation of the pipes are changed.The model developed is used to simulate tunnelling with the grouted pipe-roofing reinforcement method. From the analyses, the effects of the main design parameters on the elastic and the elastoplastic analyses are considered. Copyright

Scalable air cathode microbial fuel cells using glass fiber separators, plastic mesh supporters, and graphite fiber brush anodes.

PubMed

Zhang, Xiaoyuan; Cheng, Shaoan; Liang, Peng; Huang, Xia; Logan, Bruce E

2011-01-01

The combined use of brush anodes and glass fiber (GF1) separators, and plastic mesh supporters were used here for the first time to create a scalable microbial fuel cell architecture. Separators prevented short circuiting of closely-spaced electrodes, and cathode supporters were used to avoid water gaps between the separator and cathode that can reduce power production. The maximum power density with a separator and supporter and a single cathode was 75 ± 1 W/m(3). Removing the separator decreased power by 8%. Adding a second cathode increased power to 154 ± 1 W/m(3). Current was increased by connecting two MFCs connected in parallel. These results show that brush anodes, combined with a glass fiber separator and a plastic mesh supporter, produce a useful MFC architecture that is inherently scalable due to good insulation between the electrodes and a compact architecture. Copyright © 2010 Elsevier Ltd. All rights reserved.

Vibration energy harvesting using a piezoelectric circular diaphragm array.

PubMed

Wang, Wei; Yang, Tongqing; Chen, Xurui; Yao, Xi

2012-09-01

This paper presents a method for harvesting electric energy from mechanical vibration using a mechanically excited piezoelectric circular membrane array. The piezoelectric circular diaphragm array consists of four plates with series and parallel connection, and the electrical characteristics of the array are examined under dynamic conditions. With an optimal load resistor of 160 kΩ, an output power of 28 mW was generated from the array in series connection at 150 Hz under a prestress of 0.8 N and a vibration acceleration of 9.8 m/s(2), whereas a maximal output power of 27 mW can be obtained from the array in parallel connection through a resistive load of 11 kΩ under the same frequency, prestress, and acceleration conditions. The results show that using a piezoelectric circular diaphragm array can significantly increase the output of energy compared with the use of a single plate. By choosing an appropriate connection pattern (series or parallel connections) among the plates, the equivalent impedance of the energy harvesting devices can be tailored to meet the matched load of different applications for maximal power output.

Microstructural architecture developed in the fabrication of solid and open-cellular copper components by additive manufacturing using electron beam melting

NASA Astrophysics Data System (ADS)

Ramirez, Diana Alejandra

The fabrication of Cu components were first built by additive manufacturing using electron beam melting (EBM) from low-purity, atomized Cu powder containing a high density of Cu2O precipitates leading to a novel example of precipitate-dislocation architecture. These microstructures exhibit cell-like arrays (1-3microm) in the horizontal reference plane perpendicular to the build direction with columnar-like arrays extending from ~12 to >60 microm in length and corresponding spatial dimensions of 1-3 microm. These observations were observed by the use of optical metallography, and scanning and transmission electron microscopy. The hardness measurements were taken both on the atomized powder and the Cu components. The hardness for these architectures ranged from ~HV 83 to 88, in contrast to the original Cu powder microindentation hardness of HV 72 and the commercial Cu base plate hardness of HV 57. These observations were utilized for the fabrication of open-cellular copper structures by additive manufacturing using EBM and illustrated the ability to fabricate some form of controlled microstructural architecture by EBM parameter alteration or optimizing. The fabrication of these structures ranged in densities from 0.73g/cm3 to 6.67g/cm3. These structures correspond to four different articulated mesh arrays. While these components contained some porosity as a consequence of some unmelted regions, the Cu2O precipitates also contributed to a reduced density. Using X-ray Diffraction showed the approximate volume fraction estimated to be ~2%. The addition of precipitates created in the EBM melt scan formed microstructural arrays which contributed to hardening contributing to the strength of mesh struts and foam ligaments. The measurements of relative stiffness versus relative density plots for Cu compared very closely with Ti-6Al-4V open cellular structures - both mesh and foams. The Cu reticulated mesh structures exhibit a slope of n = 2 in contrast to a slope of n = 2.4 for the stochastic Cu foams, consistent with the Gibson-Ashby foam model where n = 2. These open cellular structure components exhibit considerable potential for novel, complex, multi-functional electrical and thermal management systems, especially complex, monolithic heat exchange device.

Bus-Programmable Slave Card

NASA Technical Reports Server (NTRS)

Hall, William A.

1990-01-01

Slave microprocessors in multimicroprocessor computing system contains modified circuit cards programmed via bus connecting master processor with slave microprocessors. Enables interactive, microprocessor-based, single-loop control. Confers ability to load and run program from master/slave bus, without need for microprocessor development station. Tristate buffers latch all data and information on status. Slave central processing unit never connected directly to bus.

Iterative current mode per pixel ADC for 3D SoftChip implementation in CMOS

NASA Astrophysics Data System (ADS)

Lachowicz, Stefan W.; Rassau, Alexander; Lee, Seung-Minh; Eshraghian, Kamran; Lee, Mike M.

2003-04-01

Mobile multimedia communication has rapidly become a significant area of research and development constantly challenging boundaries on a variety of technological fronts. The processing requirements for the capture, conversion, compression, decompression, enhancement, display, etc. of increasingly higher quality multimedia content places heavy demands even on current ULSI (ultra large scale integration) systems, particularly for mobile applications where area and power are primary considerations. The ADC presented in this paper is designed for a vertically integrated (3D) system comprising two distinct layers bonded together using Indium bump technology. The top layer is a CMOS imaging array containing analogue-to-digital converters, and a buffer memory. The bottom layer takes the form of a configurable array processor (CAP), a highly parallel array of soft programmable processors capable of carrying out complex processing tasks directly on data stored in the top plane. This paper presents a ADC scheme for the image capture plane. The analogue photocurrent or sampled voltage is transferred to the ADC via a column or a column/row bus. In the proposed system, an array of analogue-to-digital converters is distributed, so that a one-bit cell is associated with one sensor. The analogue-to-digital converters are algorithmic current-mode converters. Eight such cells are cascaded to form an 8-bit converter. Additionally, each photo-sensor is equipped with a current memory cell, and multiple conversions are performed with scaled values of the photocurrent for colour processing.

Temperature-Adaptive Circuits on Reconfigurable Analog Arrays

NASA Technical Reports Server (NTRS)

Stoica, Adrian; Zebulum, Ricardo S.; Keymeulen, Didier; Ramesham, Rajeshuni; Neff, Joseph; Katkoori, Srinivas

2006-01-01

Demonstration of a self-reconfigurable Integrated Circuit (IC) that would operate under extreme temperature (-180 C and 120 C) and radiation (300krad), without the protection of thermal controls and radiation shields. Self-Reconfigurable Electronics platform: a) Evolutionary Processor (EP) to run reconfiguration mechanism; b) Reconfigurable chip (FPGA, FPAA, etc).

High-Order Moving Overlapping Grid Methodology in a Spectral Element Method

NASA Astrophysics Data System (ADS)

Merrill, Brandon E.

A moving overlapping mesh methodology that achieves spectral accuracy in space and up to second-order accuracy in time is developed for solution of unsteady incompressible flow equations in three-dimensional domains. The targeted applications are in aerospace and mechanical engineering domains and involve problems in turbomachinery, rotary aircrafts, wind turbines and others. The methodology is built within the dual-session communication framework initially developed for stationary overlapping meshes. The methodology employs semi-implicit spectral element discretization of equations in each subdomain and explicit treatment of subdomain interfaces with spectrally-accurate spatial interpolation and high-order accurate temporal extrapolation, and requires few, if any, iterations, yet maintains the global accuracy and stability of the underlying flow solver. Mesh movement is enabled through the Arbitrary Lagrangian-Eulerian formulation of the governing equations, which allows for prescription of arbitrary velocity values at discrete mesh points. The stationary and moving overlapping mesh methodologies are thoroughly validated using two- and three-dimensional benchmark problems in laminar and turbulent flows. The spatial and temporal global convergence, for both methods, is documented and is in agreement with the nominal order of accuracy of the underlying solver. Stationary overlapping mesh methodology was validated to assess the influence of long integration times and inflow-outflow global boundary conditions on the performance. In a turbulent benchmark of fully-developed turbulent pipe flow, the turbulent statistics are validated against the available data. Moving overlapping mesh simulations are validated on the problems of two-dimensional oscillating cylinder and a three-dimensional rotating sphere. The aerodynamic forces acting on these moving rigid bodies are determined, and all results are compared with published data. Scaling tests, with both methodologies, show near linear strong scaling, even for moderately large processor counts. The moving overlapping mesh methodology is utilized to investigate the effect of an upstream turbulent wake on a three-dimensional oscillating NACA0012 extruded airfoil. A direct numerical simulation (DNS) at Reynolds Number 44,000 is performed for steady inflow incident upon the airfoil oscillating between angle of attack 5.6° and 25° with reduced frequency k=0.16. Results are contrasted with subsequent DNS of the same oscillating airfoil in a turbulent wake generated by a stationary upstream cylinder.

«

21

22

23

24

25

»

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Top

Site Map

Website Policies

Vulnerability Disclosure Program

Contact Us