high speed parallel: Topics by Science.gov

Sample records for high speed parallel

A parallel architecture of interpolated timing recovery for high- speed data transfer rate and wide capture-range

NASA Astrophysics Data System (ADS)

Higashino, Satoru; Kobayashi, Shoei; Yamagami, Tamotsu

2007-06-01

High data transfer rate has been demanded for data storage devices along increasing the storage capacity. In order to increase the transfer rate, high-speed data processing techniques in read-channel devices are required. Generally, parallel architecture is utilized for the high-speed digital processing. We have developed a new architecture of Interpolated Timing Recovery (ITR) to achieve high-speed data transfer rate and wide capture-range in read-channel devices for the information storage channels. It facilitates the parallel implementation on large-scale-integration (LSI) devices.
Design and Performance of a 1 ms High-Speed Vision Chip with 3D-Stacked 140 GOPS Column-Parallel PEs †.

PubMed

Nose, Atsushi; Yamazaki, Tomohiro; Katayama, Hironobu; Uehara, Shuji; Kobayashi, Masatsugu; Shida, Sayaka; Odahara, Masaki; Takamiya, Kenichi; Matsumoto, Shizunori; Miyashita, Leo; Watanabe, Yoshihiro; Izawa, Takashi; Muramatsu, Yoshinori; Nitta, Yoshikazu; Ishikawa, Masatoshi

2018-04-24

We have developed a high-speed vision chip using 3D stacking technology to address the increasing demand for high-speed vision chips in diverse applications. The chip comprises a 1/3.2-inch, 1.27 Mpixel, 500 fps (0.31 Mpixel, 1000 fps, 2 × 2 binning) vision chip with 3D-stacked column-parallel Analog-to-Digital Converters (ADCs) and 140 Giga Operation per Second (GOPS) programmable Single Instruction Multiple Data (SIMD) column-parallel PEs for new sensing applications. The 3D-stacked structure and column parallel processing architecture achieve high sensitivity, high resolution, and high-accuracy object positioning.
Parallel Guessing: A Strategy for High-Speed Computation

DTIC Science & Technology

1984-09-19

for using additional hardware to obtain higher processing speed). In this paper we argue that parallel guessing for image analysis is a useful...from a true solution, or the correctness of a guess, can be readily checked. We review image - analysis algorithms having a parallel guessing or
Design of a dataway processor for a parallel image signal processing system

NASA Astrophysics Data System (ADS)

Nomura, Mitsuru; Fujii, Tetsuro; Ono, Sadayasu

1995-04-01

Recently, demands for high-speed signal processing have been increasing especially in the field of image data compression, computer graphics, and medical imaging. To achieve sufficient power for real-time image processing, we have been developing parallel signal-processing systems. This paper describes a communication processor called 'dataway processor' designed for a new scalable parallel signal-processing system. The processor has six high-speed communication links (Dataways), a data-packet routing controller, a RISC CORE, and a DMA controller. Each communication link operates at 8-bit parallel in a full duplex mode at 50 MHz. Moreover, data routing, DMA, and CORE operations are processed in parallel. Therefore, sufficient throughput is available for high-speed digital video signals. The processor is designed in a top- down fashion using a CAD system called 'PARTHENON.' The hardware is fabricated using 0.5-micrometers CMOS technology, and its hardware is about 200 K gates.
Parallelized multi–graphics processing unit framework for high-speed Gabor-domain optical coherence microscopy

PubMed Central

Tankam, Patrice; Santhanam, Anand P.; Lee, Kye-Sung; Won, Jungeun; Canavesi, Cristina; Rolland, Jannick P.

2014-01-01

Abstract. Gabor-domain optical coherence microscopy (GD-OCM) is a volumetric high-resolution technique capable of acquiring three-dimensional (3-D) skin images with histological resolution. Real-time image processing is needed to enable GD-OCM imaging in a clinical setting. We present a parallelized and scalable multi-graphics processing unit (GPU) computing framework for real-time GD-OCM image processing. A parallelized control mechanism was developed to individually assign computation tasks to each of the GPUs. For each GPU, the optimal number of amplitude-scans (A-scans) to be processed in parallel was selected to maximize GPU memory usage and core throughput. We investigated five computing architectures for computational speed-up in processing 1000×1000 A-scans. The proposed parallelized multi-GPU computing framework enables processing at a computational speed faster than the GD-OCM image acquisition, thereby facilitating high-speed GD-OCM imaging in a clinical setting. Using two parallelized GPUs, the image processing of a 1×1×0.6 mm3 skin sample was performed in about 13 s, and the performance was benchmarked at 6.5 s with four GPUs. This work thus demonstrates that 3-D GD-OCM data may be displayed in real-time to the examiner using parallelized GPU processing. PMID:24695868
Parallelized multi-graphics processing unit framework for high-speed Gabor-domain optical coherence microscopy.

PubMed

Tankam, Patrice; Santhanam, Anand P; Lee, Kye-Sung; Won, Jungeun; Canavesi, Cristina; Rolland, Jannick P

2014-07-01

Gabor-domain optical coherence microscopy (GD-OCM) is a volumetric high-resolution technique capable of acquiring three-dimensional (3-D) skin images with histological resolution. Real-time image processing is needed to enable GD-OCM imaging in a clinical setting. We present a parallelized and scalable multi-graphics processing unit (GPU) computing framework for real-time GD-OCM image processing. A parallelized control mechanism was developed to individually assign computation tasks to each of the GPUs. For each GPU, the optimal number of amplitude-scans (A-scans) to be processed in parallel was selected to maximize GPU memory usage and core throughput. We investigated five computing architectures for computational speed-up in processing 1000×1000 A-scans. The proposed parallelized multi-GPU computing framework enables processing at a computational speed faster than the GD-OCM image acquisition, thereby facilitating high-speed GD-OCM imaging in a clinical setting. Using two parallelized GPUs, the image processing of a 1×1×0.6 mm3 skin sample was performed in about 13 s, and the performance was benchmarked at 6.5 s with four GPUs. This work thus demonstrates that 3-D GD-OCM data may be displayed in real-time to the examiner using parallelized GPU processing.
Parallel processing data network of master and slave transputers controlled by a serial control network

DOEpatents

Crosetto, D.B.

1996-12-31

The present device provides for a dynamically configurable communication network having a multi-processor parallel processing system having a serial communication network and a high speed parallel communication network. The serial communication network is used to disseminate commands from a master processor to a plurality of slave processors to effect communication protocol, to control transmission of high density data among nodes and to monitor each slave processor`s status. The high speed parallel processing network is used to effect the transmission of high density data among nodes in the parallel processing system. Each node comprises a transputer, a digital signal processor, a parallel transfer controller, and two three-port memory devices. A communication switch within each node connects it to a fast parallel hardware channel through which all high density data arrives or leaves the node. 6 figs.
Parallel processing data network of master and slave transputers controlled by a serial control network

DOEpatents

Crosetto, Dario B.

1996-01-01

The present device provides for a dynamically configurable communication network having a multi-processor parallel processing system having a serial communication network and a high speed parallel communication network. The serial communication network is used to disseminate commands from a master processor (100) to a plurality of slave processors (200) to effect communication protocol, to control transmission of high density data among nodes and to monitor each slave processor's status. The high speed parallel processing network is used to effect the transmission of high density data among nodes in the parallel processing system. Each node comprises a transputer (104), a digital signal processor (114), a parallel transfer controller (106), and two three-port memory devices. A communication switch (108) within each node (100) connects it to a fast parallel hardware channel (70) through which all high density data arrives or leaves the node.
Providing a parallel and distributed capability for JMASS using SPEEDES

NASA Astrophysics Data System (ADS)

Valinski, Maria; Driscoll, Jonathan; McGraw, Robert M.; Meyer, Bob

2002-07-01

The Joint Modeling And Simulation System (JMASS) is a Tri-Service simulation environment that supports engineering and engagement-level simulations. As JMASS is expanded to support other Tri-Service domains, the current set of modeling services must be expanded for High Performance Computing (HPC) applications by adding support for advanced time-management algorithms, parallel and distributed topologies, and high speed communications. By providing support for these services, JMASS can better address modeling domains requiring parallel computationally intense calculations such clutter, vulnerability and lethality calculations, and underwater-based scenarios. A risk reduction effort implementing some HPC services for JMASS using the SPEEDES (Synchronous Parallel Environment for Emulation and Discrete Event Simulation) Simulation Framework has recently concluded. As an artifact of the JMASS-SPEEDES integration, not only can HPC functionality be brought to the JMASS program through SPEEDES, but an additional HLA-based capability can be demonstrated that further addresses interoperability issues. The JMASS-SPEEDES integration provided a means of adding HLA capability to preexisting JMASS scenarios through an implementation of the standard JMASS port communication mechanism that allows players to communicate.
Development of gallium arsenide high-speed, low-power serial parallel interface modules: Executive summary

NASA Technical Reports Server (NTRS)

1988-01-01

Final report to NASA LeRC on the development of gallium arsenide (GaAS) high-speed, low power serial/parallel interface modules. The report discusses the development and test of a family of 16, 32 and 64 bit parallel to serial and serial to parallel integrated circuits using a self aligned gate MESFET technology developed at the Honeywell Sensors and Signal Processing Laboratory. Lab testing demonstrated 1.3 GHz clock rates at a power of 300 mW. This work was accomplished under contract number NAS3-24676.
Dynamic performance of high speed solenoid valve with parallel coils

NASA Astrophysics Data System (ADS)

Kong, Xiaowu; Li, Shizhen

2014-07-01

The methods of improving the dynamic performance of high speed on/off solenoid valve include increasing the magnetic force of armature and the slew rate of coil current, decreasing the mass and stroke of moving parts. The increase of magnetic force usually leads to the decrease of current slew rate, which could increase the delay time of the dynamic response of solenoid valve. Using a high voltage to drive coil can solve this contradiction, but a high driving voltage can also lead to more cost and a decrease of safety and reliability. In this paper, a new scheme of parallel coils is investigated, in which the single coil of solenoid is replaced by parallel coils with same ampere turns. Based on the mathematic model of high speed solenoid valve, the theoretical formula for the delay time of solenoid valve is deduced. Both the theoretical analysis and the dynamic simulation show that the effect of dividing a single coil into N parallel sub-coils is close to that of driving the single coil with N times of the original driving voltage as far as the delay time of solenoid valve is concerned. A specific test bench is designed to measure the dynamic performance of high speed on/off solenoid valve. The experimental results also prove that both the delay time and switching time of the solenoid valves can be decreased greatly by adopting the parallel coil scheme. This research presents a simple and practical method to improve the dynamic performance of high speed on/off solenoid valve.
Digital intermediate frequency QAM modulator using parallel processing

DOEpatents

Pao, Hsueh-Yuan [Livermore, CA; Tran, Binh-Nien [San Ramon, CA

2008-05-27

The digital Intermediate Frequency (IF) modulator applies to various modulation types and offers a simple and low cost method to implement a high-speed digital IF modulator using field programmable gate arrays (FPGAs). The architecture eliminates multipliers and sequential processing by storing the pre-computed modulated cosine and sine carriers in ROM look-up-tables (LUTs). The high-speed input data stream is parallel processed using the corresponding LUTs, which reduces the main processing speed, allowing the use of low cost FPGAs.
High speed parallel spectral-domain OCT using spectrally encoded line-field illumination

NASA Astrophysics Data System (ADS)

Lee, Kye-Sung; Hur, Hwan; Bae, Ji Yong; Kim, I. Jong; Kim, Dong Uk; Nam, Ki-Hwan; Kim, Geon-Hee; Chang, Ki Soo

2018-01-01

We report parallel spectral-domain optical coherence tomography (OCT) at 500 000 A-scan/s. This is the highest-speed spectral-domain (SD) OCT system using a single line camera. Spectrally encoded line-field scanning is proposed to increase the imaging speed in SD-OCT effectively, and the tradeoff between speed, depth range, and sensitivity is demonstrated. We show that three imaging modes of 125k, 250k, and 500k A-scan/s can be simply switched according to the sample to be imaged considering the depth range and sensitivity. To demonstrate the biological imaging performance of the high-speed imaging modes of the spectrally encoded line-field OCT system, human skin and a whole leaf were imaged at the speed of 250k and 500k A-scan/s, respectively. In addition, there is no sensitivity dependence in the B-scan direction, which is implicit in line-field parallel OCT using line focusing of a Gaussian beam with a cylindrical lens.
An FPGA-based High Speed Parallel Signal Processing System for Adaptive Optics Testbed

NASA Astrophysics Data System (ADS)

Kim, H.; Choi, Y.; Yang, Y.

In this paper a state-of-the-art FPGA (Field Programmable Gate Array) based high speed parallel signal processing system (SPS) for adaptive optics (AO) testbed with 1 kHz wavefront error (WFE) correction frequency is reported. The AO system consists of Shack-Hartmann sensor (SHS) and deformable mirror (DM), tip-tilt sensor (TTS), tip-tilt mirror (TTM) and an FPGA-based high performance SPS to correct wavefront aberrations. The SHS is composed of 400 subapertures and the DM 277 actuators with Fried geometry, requiring high speed parallel computing capability SPS. In this study, the target WFE correction speed is 1 kHz; therefore, it requires massive parallel computing capabilities as well as strict hard real time constraints on measurements from sensors, matrix computation latency for correction algorithms, and output of control signals for actuators. In order to meet them, an FPGA based real-time SPS with parallel computing capabilities is proposed. In particular, the SPS is made up of a National Instrument's (NI's) real time computer and five FPGA boards based on state-of-the-art Xilinx Kintex 7 FPGA. Programming is done with NI's LabView environment, providing flexibility when applying different algorithms for WFE correction. It also facilitates faster programming and debugging environment as compared to conventional ones. One of the five FPGA's is assigned to measure TTS and calculate control signals for TTM, while the rest four are used to receive SHS signal, calculate slops for each subaperture and correction signal for DM. With this parallel processing capabilities of the SPS the overall closed-loop WFE correction speed of 1 kHz has been achieved. System requirements, architecture and implementation issues are described; furthermore, experimental results are also given.
Evaluation of the power consumption of a high-speed parallel robot

NASA Astrophysics Data System (ADS)

Han, Gang; Xie, Fugui; Liu, Xin-Jun

2018-06-01

An inverse dynamic model of a high-speed parallel robot is established based on the virtual work principle. With this dynamic model, a new evaluation method is proposed to measure the power consumption of the robot during pick-and-place tasks. The power vector is extended in this method and used to represent the collinear velocity and acceleration of the moving platform. Afterward, several dynamic performance indices, which are homogenous and possess obvious physical meanings, are proposed. These indices can evaluate the power input and output transmissibility of the robot in a workspace. The distributions of the power input and output transmissibility of the high-speed parallel robot are derived with these indices and clearly illustrated in atlases. Furtherly, a low-power-consumption workspace is selected for the robot.
An Efficient Fuzzy Controller Design for Parallel Connected Induction Motor Drives

NASA Astrophysics Data System (ADS)

Usha, S.; Subramani, C.

2018-04-01

Generally, an induction motors are highly non-linear and has a complex time varying dynamics. This makes the speed control of an induction motor a challenging issue in the industries. But, due to the recent trends in the power electronic devices and intelligent controllers, the speed control of the induction motor is achieved by including non-linear characteristics also. Conventionally a single inverter is used to run one induction motor in industries. In the traction applications, two or more inductions motors are operated in parallel to reduce the size and cost of induction motors. In this application, the parallel connected induction motors can be driven by a single inverter unit. The stability problems may introduce in the parallel operation under low speed operating conditions. Hence, the speed deviations should be reduce with help of suitable controllers. The speed control of the parallel connected system is performed by PID controller and fuzzy logic controller. In this paper the speed response of the induction motor for the rating of IHP, 1440 rpm, and 50Hz with these controller are compared in time domain specifications. The stability analysis of the system also performed under low speed using matlab platform. The hardware model is developed for speed control using fuzzy logic controller which exhibited superior performances over the other controller.
PEM-PCA: a parallel expectation-maximization PCA face recognition architecture.

PubMed

Rujirakul, Kanokmon; So-In, Chakchai; Arnonkijpanich, Banchar

2014-01-01

Principal component analysis or PCA has been traditionally used as one of the feature extraction techniques in face recognition systems yielding high accuracy when requiring a small number of features. However, the covariance matrix and eigenvalue decomposition stages cause high computational complexity, especially for a large database. Thus, this research presents an alternative approach utilizing an Expectation-Maximization algorithm to reduce the determinant matrix manipulation resulting in the reduction of the stages' complexity. To improve the computational time, a novel parallel architecture was employed to utilize the benefits of parallelization of matrix computation during feature extraction and classification stages including parallel preprocessing, and their combinations, so-called a Parallel Expectation-Maximization PCA architecture. Comparing to a traditional PCA and its derivatives, the results indicate lower complexity with an insignificant difference in recognition precision leading to high speed face recognition systems, that is, the speed-up over nine and three times over PCA and Parallel PCA.
Parallel pulse processing and data acquisition for high speed, low error flow cytometry

DOEpatents

van den Engh, Gerrit J.; Stokdijk, Willem

1992-01-01

A digitally synchronized parallel pulse processing and data acquisition system for a flow cytometer has multiple parallel input channels with independent pulse digitization and FIFO storage buffer. A trigger circuit controls the pulse digitization on all channels. After an event has been stored in each FIFO, a bus controller moves the oldest entry from each FIFO buffer onto a common data bus. The trigger circuit generates an ID number for each FIFO entry, which is checked by an error detection circuit. The system has high speed and low error rate.
Seeing the forest for the trees: Networked workstations as a parallel processing computer

NASA Technical Reports Server (NTRS)

Breen, J. O.; Meleedy, D. M.

1992-01-01

Unlike traditional 'serial' processing computers in which one central processing unit performs one instruction at a time, parallel processing computers contain several processing units, thereby, performing several instructions at once. Many of today's fastest supercomputers achieve their speed by employing thousands of processing elements working in parallel. Few institutions can afford these state-of-the-art parallel processors, but many already have the makings of a modest parallel processing system. Workstations on existing high-speed networks can be harnessed as nodes in a parallel processing environment, bringing the benefits of parallel processing to many. While such a system can not rival the industry's latest machines, many common tasks can be accelerated greatly by spreading the processing burden and exploiting idle network resources. We study several aspects of this approach, from algorithms to select nodes to speed gains in specific tasks. With ever-increasing volumes of astronomical data, it becomes all the more necessary to utilize our computing resources fully.
Parallel pulse processing and data acquisition for high speed, low error flow cytometry

DOEpatents

Engh, G.J. van den; Stokdijk, W.

1992-09-22

A digitally synchronized parallel pulse processing and data acquisition system for a flow cytometer has multiple parallel input channels with independent pulse digitization and FIFO storage buffer. A trigger circuit controls the pulse digitization on all channels. After an event has been stored in each FIFO, a bus controller moves the oldest entry from each FIFO buffer onto a common data bus. The trigger circuit generates an ID number for each FIFO entry, which is checked by an error detection circuit. The system has high speed and low error rate. 17 figs.

A high-speed linear algebra library with automatic parallelism

NASA Technical Reports Server (NTRS)

Boucher, Michael L.

1994-01-01

Parallel or distributed processing is key to getting highest performance workstations. However, designing and implementing efficient parallel algorithms is difficult and error-prone. It is even more difficult to write code that is both portable to and efficient on many different computers. Finally, it is harder still to satisfy the above requirements and include the reliability and ease of use required of commercial software intended for use in a production environment. As a result, the application of parallel processing technology to commercial software has been extremely small even though there are numerous computationally demanding programs that would significantly benefit from application of parallel processing. This paper describes DSSLIB, which is a library of subroutines that perform many of the time-consuming computations in engineering and scientific software. DSSLIB combines the high efficiency and speed of parallel computation with a serial programming model that eliminates many undesirable side-effects of typical parallel code. The result is a simple way to incorporate the power of parallel processing into commercial software without compromising maintainability, reliability, or ease of use. This gives significant advantages over less powerful non-parallel entries in the market.
The science of computing - Parallel computation

NASA Technical Reports Server (NTRS)

Denning, P. J.

1985-01-01

Although parallel computation architectures have been known for computers since the 1920s, it was only in the 1970s that microelectronic components technologies advanced to the point where it became feasible to incorporate multiple processors in one machine. Concommitantly, the development of algorithms for parallel processing also lagged due to hardware limitations. The speed of computing with solid-state chips is limited by gate switching delays. The physical limit implies that a 1 Gflop operational speed is the maximum for sequential processors. A computer recently introduced features a 'hypercube' architecture with 128 processors connected in networks at 5, 6 or 7 points per grid, depending on the design choice. Its computing speed rivals that of supercomputers, but at a fraction of the cost. The added speed with less hardware is due to parallel processing, which utilizes algorithms representing different parts of an equation that can be broken into simpler statements and processed simultaneously. Present, highly developed computer languages like FORTRAN, PASCAL, COBOL, etc., rely on sequential instructions. Thus, increased emphasis will now be directed at parallel processing algorithms to exploit the new architectures.
Wide-field high-speed space-division multiplexing optical coherence tomography using an integrated photonic device

PubMed Central

Huang, Yongyang; Badar, Mudabbir; Nitkowski, Arthur; Weinroth, Aaron; Tansu, Nelson; Zhou, Chao

2017-01-01

Space-division multiplexing optical coherence tomography (SDM-OCT) is a recently developed parallel OCT imaging method in order to achieve multi-fold speed improvement. However, the assembly of fiber optics components used in the first prototype system was labor-intensive and susceptible to errors. Here, we demonstrate a high-speed SDM-OCT system using an integrated photonic chip that can be reliably manufactured with high precisions and low per-unit cost. A three-layer cascade of 1 × 2 splitters was integrated in the photonic chip to split the incident light into 8 parallel imaging channels with ~3.7 mm optical delay in air between each channel. High-speed imaging (~1s/volume) of porcine eyes ex vivo and wide-field imaging (~18.0 × 14.3 mm2) of human fingers in vivo were demonstrated with the chip-based SDM-OCT system. PMID:28856055
Parallel implementation of all-digital timing recovery for high-speed and real-time optical coherent receivers.

PubMed

Zhou, Xian; Chen, Xue

2011-05-09

The digital coherent receivers combine coherent detection with digital signal processing (DSP) to compensate for transmission impairments, and therefore are a promising candidate for future high-speed optical transmission system. However, the maximum symbol rate supported by such real-time receivers is limited by the processing rate of hardware. In order to cope with this difficulty, the parallel processing algorithms is imperative. In this paper, we propose a novel parallel digital timing recovery loop (PDTRL) based on our previous work. Furthermore, for increasing the dynamic dispersion tolerance range of receivers, we embed a parallel adaptive equalizer in the PDTRL. This parallel joint scheme (PJS) can be used to complete synchronization, equalization and polarization de-multiplexing simultaneously. Finally, we demonstrate that PDTRL and PJS allow the hardware to process 112 Gbit/s POLMUX-DQPSK signal at the hundreds MHz range. © 2011 Optical Society of America
High-speed real-time animated displays on the ADAGE (trademark) RDS 3000 raster graphics system

NASA Technical Reports Server (NTRS)

Kahlbaum, William M., Jr.; Ownbey, Katrina L.

1989-01-01

Techniques which may be used to increase the animation update rate of real-time computer raster graphic displays are discussed. They were developed on the ADAGE RDS 3000 graphic system in support of the Advanced Concepts Simulator at the NASA Langley Research Center. These techniques involve the use of a special purpose parallel processor, for high-speed character generation. The description of the parallel processor includes the Barrel Shifter which is part of the hardware and is the key to the high-speed character rendition. The final result of this total effort was a fourfold increase in the update rate of an existing primary flight display from 4 to 16 frames per second.
Thread concept for automatic task parallelization in image analysis

NASA Astrophysics Data System (ADS)

Lueckenhaus, Maximilian; Eckstein, Wolfgang

1998-09-01

Parallel processing of image analysis tasks is an essential method to speed up image processing and helps to exploit the full capacity of distributed systems. However, writing parallel code is a difficult and time-consuming process and often leads to an architecture-dependent program that has to be re-implemented when changing the hardware. Therefore it is highly desirable to do the parallelization automatically. For this we have developed a special kind of thread concept for image analysis tasks. Threads derivated from one subtask may share objects and run in the same context but may process different threads of execution and work on different data in parallel. In this paper we describe the basics of our thread concept and show how it can be used as basis of an automatic task parallelization to speed up image processing. We further illustrate the design and implementation of an agent-based system that uses image analysis threads for generating and processing parallel programs by taking into account the available hardware. The tests made with our system prototype show that the thread concept combined with the agent paradigm is suitable to speed up image processing by an automatic parallelization of image analysis tasks.
Study of Electromagnetic Repulsion Switch to High Speed Reclosing and Recover Time Characteristics of Superconductor

NASA Astrophysics Data System (ADS)

Koyama, Tomonori; Kaiho, Katsuyuki; Yamaguchi, Iwao; Yanabu, Satoru

Using a high-temperature superconductor, we constructed and tested a model superconducting fault current limiter (SFCL). The superconductor and vacuum interrupter as the commutation switch were connected in parallel using a bypass coil. When the fault current flows in this equipment, the superconductor is quenched and the current is then transferred to the parallel coil due to the voltage drop in the superconductor. This large current in the parallel coil actuates the magnetic repulsion mechanism of the vacuum interrupter and the current in the superconductor is broken. Using this equipment, the current flow time in the superconductor can be easily minimized. On the other hand, the fault current is also easily limited by large reactance of the parallel coil. This system has many merits. So, we introduced to electromagnetic repulsion switch. There is duty of high speed re-closing after interrupting fault current in the electrical power system. So the SFCL should be recovered to superconducting state before high speed re-closing. But, superconductor generated heat at the time of quench. It takes time to recover superconducting state. Therefore it is a matter of recovery time. In this paper, we studied recovery time of superconductor. Also, we proposed electromagnetic repulsion switch with reclosing system.
Fluid/Structure Interaction Studies of Aircraft Using High Fidelity Equations on Parallel Computers

NASA Technical Reports Server (NTRS)

Guruswamy, Guru; VanDalsem, William (Technical Monitor)

1994-01-01

Abstract Aeroelasticity which involves strong coupling of fluids, structures and controls is an important element in designing an aircraft. Computational aeroelasticity using low fidelity methods such as the linear aerodynamic flow equations coupled with the modal structural equations are well advanced. Though these low fidelity approaches are computationally less intensive, they are not adequate for the analysis of modern aircraft such as High Speed Civil Transport (HSCT) and Advanced Subsonic Transport (AST) which can experience complex flow/structure interactions. HSCT can experience vortex induced aeroelastic oscillations whereas AST can experience transonic buffet associated structural oscillations. Both aircraft may experience a dip in the flutter speed at the transonic regime. For accurate aeroelastic computations at these complex fluid/structure interaction situations, high fidelity equations such as the Navier-Stokes for fluids and the finite-elements for structures are needed. Computations using these high fidelity equations require large computational resources both in memory and speed. Current conventional super computers have reached their limitations both in memory and speed. As a result, parallel computers have evolved to overcome the limitations of conventional computers. This paper will address the transition that is taking place in computational aeroelasticity from conventional computers to parallel computers. The paper will address special techniques needed to take advantage of the architecture of new parallel computers. Results will be illustrated from computations made on iPSC/860 and IBM SP2 computer by using ENSAERO code that directly couples the Euler/Navier-Stokes flow equations with high resolution finite-element structural equations.
Genetic Parallel Programming: design and implementation.

PubMed

Cheang, Sin Man; Leung, Kwong Sak; Lee, Kin Hong

2006-01-01

This paper presents a novel Genetic Parallel Programming (GPP) paradigm for evolving parallel programs running on a Multi-Arithmetic-Logic-Unit (Multi-ALU) Processor (MAP). The MAP is a Multiple Instruction-streams, Multiple Data-streams (MIMD), general-purpose register machine that can be implemented on modern Very Large-Scale Integrated Circuits (VLSIs) in order to evaluate genetic programs at high speed. For human programmers, writing parallel programs is more difficult than writing sequential programs. However, experimental results show that GPP evolves parallel programs with less computational effort than that of their sequential counterparts. It creates a new approach to evolving a feasible problem solution in parallel program form and then serializes it into a sequential program if required. The effectiveness and efficiency of GPP are investigated using a suite of 14 well-studied benchmark problems. Experimental results show that GPP speeds up evolution substantially.
Fast Face-Recognition Optical Parallel Correlator Using High Accuracy Correlation Filter

NASA Astrophysics Data System (ADS)

Watanabe, Eriko; Kodate, Kashiko

2005-11-01

We designed and fabricated a fully automatic fast face recognition optical parallel correlator [E. Watanabe and K. Kodate: Appl. Opt. 44 (2005) 5666] based on the VanderLugt principle. The implementation of an as-yet unattained ultra high-speed system was aided by reconfiguring the system to make it suitable for easier parallel processing, as well as by composing a higher accuracy correlation filter and high-speed ferroelectric liquid crystal-spatial light modulator (FLC-SLM). In running trial experiments using this system (dubbed FARCO), we succeeded in acquiring remarkably low error rates of 1.3% for false match rate (FMR) and 2.6% for false non-match rate (FNMR). Given the results of our experiments, the aim of this paper is to examine methods of designing correlation filters and arranging database image arrays for even faster parallel correlation, underlining the issues of calculation technique, quantization bit rate, pixel size and shift from optical axis. The correlation filter has proved its excellent performance and higher precision than classical correlation and joint transform correlator (JTC). Moreover, arrangement of multi-object reference images leads to 10-channel correlation signals, as sharply marked as those of a single channel. This experiment result demonstrates great potential for achieving the process speed of 10000 face/s.
A programmable computational image sensor for high-speed vision

NASA Astrophysics Data System (ADS)

Yang, Jie; Shi, Cong; Long, Xitian; Wu, Nanjian

2013-08-01

In this paper we present a programmable computational image sensor for high-speed vision. This computational image sensor contains four main blocks: an image pixel array, a massively parallel processing element (PE) array, a row processor (RP) array and a RISC core. The pixel-parallel PE is responsible for transferring, storing and processing image raw data in a SIMD fashion with its own programming language. The RPs are one dimensional array of simplified RISC cores, it can carry out complex arithmetic and logic operations. The PE array and RP array can finish great amount of computation with few instruction cycles and therefore satisfy the low- and middle-level high-speed image processing requirement. The RISC core controls the whole system operation and finishes some high-level image processing algorithms. We utilize a simplified AHB bus as the system bus to connect our major components. Programming language and corresponding tool chain for this computational image sensor are also developed.
High-speed technique based on a parallel projection correlation procedure for digital image correlation

NASA Astrophysics Data System (ADS)

Zaripov, D. I.; Renfu, Li

2018-05-01

The implementation of high-efficiency digital image correlation methods based on a zero-normalized cross-correlation (ZNCC) procedure for high-speed, time-resolved measurements using a high-resolution digital camera is associated with big data processing and is often time consuming. In order to speed-up ZNCC computation, a high-speed technique based on a parallel projection correlation procedure is proposed. The proposed technique involves the use of interrogation window projections instead of its two-dimensional field of luminous intensity. This simplification allows acceleration of ZNCC computation up to 28.8 times compared to ZNCC calculated directly, depending on the size of interrogation window and region of interest. The results of three synthetic test cases, such as a one-dimensional uniform flow, a linear shear flow and a turbulent boundary-layer flow, are discussed in terms of accuracy. In the latter case, the proposed technique is implemented together with an iterative window-deformation technique. On the basis of the results of the present work, the proposed technique is recommended to be used for initial velocity field calculation, with further correction using more accurate techniques.
Parallelization and automatic data distribution for nuclear reactor simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Liebrock, L.M.

1997-07-01

Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directlymore » affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed.« less
Cedar Project---Original goals and progress to date

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cybenko, G.; Kuck, D.; Padua, D.

1990-11-28

This work encompasses a broad attack on high speed parallel processing. Hardware, software, applications development, and performance evaluation and visualization as well as research topics are proposed. Our goal is to develop practical parallel processing for the 1990's.
SPEEDES - A multiple-synchronization environment for parallel discrete-event simulation

NASA Technical Reports Server (NTRS)

Steinman, Jeff S.

1992-01-01

Synchronous Parallel Environment for Emulation and Discrete-Event Simulation (SPEEDES) is a unified parallel simulation environment. It supports multiple-synchronization protocols without requiring users to recompile their code. When a SPEEDES simulation runs on one node, all the extra parallel overhead is removed automatically at run time. When the same executable runs in parallel, the user preselects the synchronization algorithm from a list of options. SPEEDES currently runs on UNIX networks and on the California Institute of Technology/Jet Propulsion Laboratory Mark III Hypercube. SPEEDES also supports interactive simulations. Featured in the SPEEDES environment is a new parallel synchronization approach called Breathing Time Buckets. This algorithm uses some of the conservative techniques found in Time Bucket synchronization, along with the optimism that characterizes the Time Warp approach. A mathematical model derived from first principles predicts the performance of Breathing Time Buckets. Along with the Breathing Time Buckets algorithm, this paper discusses the rules for processing events in SPEEDES, describes the implementation of various other synchronization protocols supported by SPEEDES, describes some new ones for the future, discusses interactive simulations, and then gives some performance results.
Flexible All-Digital Receiver for Bandwidth Efficient Modulations

NASA Technical Reports Server (NTRS)

Gray, Andrew; Srinivasan, Meera; Simon, Marvin; Yan, Tsun-Yee

2000-01-01

An all-digital high data rate parallel receiver architecture developed jointly by Goddard Space Flight Center and the Jet Propulsion Laboratory is presented. This receiver utilizes only a small number of high speed components along with a majority of lower speed components operating in a parallel frequency domain structure implementable in CMOS, and can currently process up to 600 Mbps with standard QPSK modulation. Performance results for this receiver for bandwidth efficient QPSK modulation schemes such as square-root raised cosine pulse shaped QPSK and Feher's patented QPSK are presented, demonstrating the flexibility of the receiver architecture.
Self-calibrated correlation imaging with k-space variant correlation functions.

PubMed

Li, Yu; Edalati, Masoud; Du, Xingfu; Wang, Hui; Cao, Jie J

2018-03-01

Correlation imaging is a previously developed high-speed MRI framework that converts parallel imaging reconstruction into the estimate of correlation functions. The presented work aims to demonstrate this framework can provide a speed gain over parallel imaging by estimating k-space variant correlation functions. Because of Fourier encoding with gradients, outer k-space data contain higher spatial-frequency image components arising primarily from tissue boundaries. As a result of tissue-boundary sparsity in the human anatomy, neighboring k-space data correlation varies from the central to the outer k-space. By estimating k-space variant correlation functions with an iterative self-calibration method, correlation imaging can benefit from neighboring k-space data correlation associated with both coil sensitivity encoding and tissue-boundary sparsity, thereby providing a speed gain over parallel imaging that relies only on coil sensitivity encoding. This new approach is investigated in brain imaging and free-breathing neonatal cardiac imaging. Correlation imaging performs better than existing parallel imaging techniques in simulated brain imaging acceleration experiments. The higher speed enables real-time data acquisition for neonatal cardiac imaging in which physiological motion is fast and non-periodic. With k-space variant correlation functions, correlation imaging gives a higher speed than parallel imaging and offers the potential to image physiological motion in real-time. Magn Reson Med 79:1483-1494, 2018. © 2017 International Society for Magnetic Resonance in Medicine. © 2017 International Society for Magnetic Resonance in Medicine.
Double lead spiral platen parallel jaw end effector

NASA Technical Reports Server (NTRS)

Beals, David C.

1989-01-01

The double lead spiral platen parallel jaw end effector is an extremely powerful, compact, and highly controllable end effector that represents a significant improvement in gripping force and efficiency over the LaRC Puma (LP) end effector. The spiral end effector is very simple in its design and has relatively few parts. The jaw openings are highly predictable and linear, making it an ideal candidate for remote control. The finger speed is within acceptable working limits and can be modified to meet the user needs; for instance, greater finger speed could be obtained by increasing the pitch of the spiral. The force relaxation is comparable to the other tested units. Optimization of the end effector design would involve a compromise of force and speed for a given application.
Integration experiences and performance studies of A COTS parallel archive systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, Hsing-bung; Scott, Cody; Grider, Bary

2010-01-01

Current and future Archive Storage Systems have been asked to (a) scale to very high bandwidths, (b) scale in metadata performance, (c) support policy-based hierarchical storage management capability, (d) scale in supporting changing needs of very large data sets, (e) support standard interface, and (f) utilize commercial-off-the-shelf(COTS) hardware. Parallel file systems have been asked to do the same thing but at one or more orders of magnitude faster in performance. Archive systems continue to move closer to file systems in their design due to the need for speed and bandwidth, especially metadata searching speeds such as more caching and lessmore » robust semantics. Currently the number of extreme highly scalable parallel archive solutions is very small especially those that will move a single large striped parallel disk file onto many tapes in parallel. We believe that a hybrid storage approach of using COTS components and innovative software technology can bring new capabilities into a production environment for the HPC community much faster than the approach of creating and maintaining a complete end-to-end unique parallel archive software solution. In this paper, we relay our experience of integrating a global parallel file system and a standard backup/archive product with a very small amount of additional code to provide a scalable, parallel archive. Our solution has a high degree of overlap with current parallel archive products including (a) doing parallel movement to/from tape for a single large parallel file, (b) hierarchical storage management, (c) ILM features, (d) high volume (non-single parallel file) archives for backup/archive/content management, and (e) leveraging all free file movement tools in Linux such as copy, move, ls, tar, etc. We have successfully applied our working COTS Parallel Archive System to the current world's first petaflop/s computing system, LANL's Roadrunner, and demonstrated its capability to address requirements of future archival storage systems.« less
Integration experiments and performance studies of a COTS parallel archive system

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, Hsing-bung; Scott, Cody; Grider, Gary

2010-06-16

Current and future Archive Storage Systems have been asked to (a) scale to very high bandwidths, (b) scale in metadata performance, (c) support policy-based hierarchical storage management capability, (d) scale in supporting changing needs of very large data sets, (e) support standard interface, and (f) utilize commercial-off-the-shelf (COTS) hardware. Parallel file systems have been asked to do the same thing but at one or more orders of magnitude faster in performance. Archive systems continue to move closer to file systems in their design due to the need for speed and bandwidth, especially metadata searching speeds such as more caching andmore » less robust semantics. Currently the number of extreme highly scalable parallel archive solutions is very small especially those that will move a single large striped parallel disk file onto many tapes in parallel. We believe that a hybrid storage approach of using COTS components and innovative software technology can bring new capabilities into a production environment for the HPC community much faster than the approach of creating and maintaining a complete end-to-end unique parallel archive software solution. In this paper, we relay our experience of integrating a global parallel file system and a standard backup/archive product with a very small amount of additional code to provide a scalable, parallel archive. Our solution has a high degree of overlap with current parallel archive products including (a) doing parallel movement to/from tape for a single large parallel file, (b) hierarchical storage management, (c) ILM features, (d) high volume (non-single parallel file) archives for backup/archive/content management, and (e) leveraging all free file movement tools in Linux such as copy, move, Is, tar, etc. We have successfully applied our working COTS Parallel Archive System to the current world's first petafiop/s computing system, LANL's Roadrunner machine, and demonstrated its capability to address requirements of future archival storage systems.« less

High-speed prediction of crystal structures for organic molecules

NASA Astrophysics Data System (ADS)

Obata, Shigeaki; Goto, Hitoshi

2015-02-01

We developed a master-worker type parallel algorithm for allocating tasks of crystal structure optimizations to distributed compute nodes, in order to improve a performance of simulations for crystal structure predictions. The performance experiments were demonstrated on TUT-ADSIM supercomputer system (HITACHI HA8000-tc/HT210). The experimental results show that our parallel algorithm could achieve speed-ups of 214 and 179 times using 256 processor cores on crystal structure optimizations in predictions of crystal structures for 3-aza-bicyclo(3.3.1)nonane-2,4-dione and 2-diazo-3,5-cyclohexadiene-1-one, respectively. We expect that this parallel algorithm is always possible to reduce computational costs of any crystal structure predictions.
Open | SpeedShop: An Open Source Infrastructure for Parallel Performance Analysis

DOE PAGES

Schulz, Martin; Galarowicz, Jim; Maghrak, Don; ...

2008-01-01

Over the last decades a large number of performance tools has been developed to analyze and optimize high performance applications. Their acceptance by end users, however, has been slow: each tool alone is often limited in scope and comes with widely varying interfaces and workflow constraints, requiring different changes in the often complex build and execution infrastructure of the target application. We started the Open | SpeedShop project about 3 years ago to overcome these limitations and provide efficient, easy to apply, and integrated performance analysis for parallel systems. Open | SpeedShop has two different faces: it provides an interoperable tool set covering themore » most common analysis steps as well as a comprehensive plugin infrastructure for building new tools. In both cases, the tools can be deployed to large scale parallel applications using DPCL/Dyninst for distributed binary instrumentation. Further, all tools developed within or on top of Open | SpeedShop are accessible through multiple fully equivalent interfaces including an easy-to-use GUI as well as an interactive command line interface reducing the usage threshold for those tools.« less
The development speed paradox: can increasing development speed reduce R&D productivity?

PubMed

Lendrem, Dennis W; Lendrem, B Clare

2014-03-01

In the 1990s the pharmaceutical industry sought to increase R&D productivity by shifting development tasks into parallel to reduce development cycle times and increase development speed. This paper presents a simple model demonstrating that, when attrition rates are high as in pharmaceutical development, such development speed initiatives can increase the expected time for the first successful molecule to complete development. Increasing the development speed of successful molecules could actually reduce R&D productivity - the development speed paradox. Copyright © 2013 Elsevier Ltd. All rights reserved.
A simulation-based study of HighSpeed TCP and its deployment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Souza, Evandro de

2003-05-01

The current congestion control mechanism used in TCP has difficulty reaching full utilization on high speed links, particularly on wide-area connections. For example, the packet drop rate needed to fill a Gigabit pipe using the present TCP protocol is below the currently achievable fiber optic error rates. HighSpeed TCP was recently proposed as a modification of TCP's congestion control mechanism to allow it to achieve reasonable performance in high speed wide-area links. In this research, simulation results showing the performance of HighSpeed TCP and the impact of its use on the present implementation of TCP are presented. Network conditions includingmore » different degrees of congestion, different levels of loss rate, different degrees of bursty traffic and two distinct router queue management policies were simulated. The performance and fairness of HighSpeed TCP were compared to the existing TCP and solutions for bulk-data transfer using parallel streams.« less
SEAL FOR HIGH SPEED CENTRIFUGE

DOEpatents

Skarstrom, C.W.

1957-12-17

A seal is described for a high speed centrifuge wherein the centrifugal force of rotation acts on the gasket to form a tight seal. The cylindrical rotating bowl of the centrifuge contains a closure member resting on a shoulder in the bowl wall having a lower surface containing bands of gasket material, parallel and adjacent to the cylinder wall. As the centrifuge speed increases, centrifugal force acts on the bands of gasket material forcing them in to a sealing contact against the cylinder wall. This arrangememt forms a simple and effective seal for high speed centrifuges, replacing more costly methods such as welding a closure in place.
High speed infrared imaging system and method

DOEpatents

Zehnder, Alan T.; Rosakis, Ares J.; Ravichandran, G.

2001-01-01

A system and method for radiation detection with an increased frame rate. A semi-parallel processing configuration is used to process a row or column of pixels in a focal-plane array in parallel to achieve a processing rate up to and greater than 1 million frames per second.
Conceptual design and kinematic analysis of a novel parallel robot for high-speed pick-and-place operations

NASA Astrophysics Data System (ADS)

Meng, Qizhi; Xie, Fugui; Liu, Xin-Jun

2018-06-01

This paper deals with the conceptual design, kinematic analysis and workspace identification of a novel four degrees-of-freedom (DOFs) high-speed spatial parallel robot for pick-and-place operations. The proposed spatial parallel robot consists of a base, four arms and a 1½ mobile platform. The mobile platform is a major innovation that avoids output singularity and offers the advantages of both single and double platforms. To investigate the characteristics of the robot's DOFs, a line graph method based on Grassmann line geometry is adopted in mobility analysis. In addition, the inverse kinematics is derived, and the constraint conditions to identify the correct solution are also provided. On the basis of the proposed concept, the workspace of the robot is identified using a set of presupposed parameters by taking input and output transmission index as the performance evaluation criteria.
Parallel fuzzy connected image segmentation on GPU

PubMed Central

Zhuge, Ying; Cao, Yong; Udupa, Jayaram K.; Miller, Robert W.

2011-01-01

Purpose: Image segmentation techniques using fuzzy connectedness (FC) principles have shown their effectiveness in segmenting a variety of objects in several large applications. However, one challenge in these algorithms has been their excessive computational requirements when processing large image datasets. Nowadays, commodity graphics hardware provides a highly parallel computing environment. In this paper, the authors present a parallel fuzzy connected image segmentation algorithm implementation on NVIDIA’s compute unified device Architecture (cuda) platform for segmenting medical image data sets. Methods: In the FC algorithm, there are two major computational tasks: (i) computing the fuzzy affinity relations and (ii) computing the fuzzy connectedness relations. These two tasks are implemented as cuda kernels and executed on GPU. A dramatic improvement in speed for both tasks is achieved as a result. Results: Our experiments based on three data sets of small, medium, and large data size demonstrate the efficiency of the parallel algorithm, which achieves a speed-up factor of 24.4x, 18.1x, and 10.3x, correspondingly, for the three data sets on the NVIDIA Tesla C1060 over the implementation of the algorithm on CPU, and takes 0.25, 0.72, and 15.04 s, correspondingly, for the three data sets. Conclusions: The authors developed a parallel algorithm of the widely used fuzzy connected image segmentation method on the NVIDIA GPUs, which are far more cost- and speed-effective than both cluster of workstations and multiprocessing systems. A near-interactive speed of segmentation has been achieved, even for the large data set. PMID:21859037
Parallel fuzzy connected image segmentation on GPU.

PubMed

Zhuge, Ying; Cao, Yong; Udupa, Jayaram K; Miller, Robert W

2011-07-01

Image segmentation techniques using fuzzy connectedness (FC) principles have shown their effectiveness in segmenting a variety of objects in several large applications. However, one challenge in these algorithms has been their excessive computational requirements when processing large image datasets. Nowadays, commodity graphics hardware provides a highly parallel computing environment. In this paper, the authors present a parallel fuzzy connected image segmentation algorithm implementation on NVIDIA's compute unified device Architecture (CUDA) platform for segmenting medical image data sets. In the FC algorithm, there are two major computational tasks: (i) computing the fuzzy affinity relations and (ii) computing the fuzzy connectedness relations. These two tasks are implemented as CUDA kernels and executed on GPU. A dramatic improvement in speed for both tasks is achieved as a result. Our experiments based on three data sets of small, medium, and large data size demonstrate the efficiency of the parallel algorithm, which achieves a speed-up factor of 24.4x, 18.1x, and 10.3x, correspondingly, for the three data sets on the NVIDIA Tesla C1060 over the implementation of the algorithm on CPU, and takes 0.25, 0.72, and 15.04 s, correspondingly, for the three data sets. The authors developed a parallel algorithm of the widely used fuzzy connected image segmentation method on the NVIDIA GPUs, which are far more cost- and speed-effective than both cluster of workstations and multiprocessing systems. A near-interactive speed of segmentation has been achieved, even for the large data set.
Fast, Massively Parallel Data Processors

NASA Technical Reports Server (NTRS)

Heaton, Robert A.; Blevins, Donald W.; Davis, ED

1994-01-01

Proposed fast, massively parallel data processor contains 8x16 array of processing elements with efficient interconnection scheme and options for flexible local control. Processing elements communicate with each other on "X" interconnection grid with external memory via high-capacity input/output bus. This approach to conditional operation nearly doubles speed of various arithmetic operations.
Parallelization of fine-scale computation in Agile Multiscale Modelling Methodology

NASA Astrophysics Data System (ADS)

Macioł, Piotr; Michalik, Kazimierz

2016-10-01

Nowadays, multiscale modelling of material behavior is an extensively developed area. An important obstacle against its wide application is high computational demands. Among others, the parallelization of multiscale computations is a promising solution. Heterogeneous multiscale models are good candidates for parallelization, since communication between sub-models is limited. In this paper, the possibility of parallelization of multiscale models based on Agile Multiscale Methodology framework is discussed. A sequential, FEM based macroscopic model has been combined with concurrently computed fine-scale models, employing a MatCalc thermodynamic simulator. The main issues, being investigated in this work are: (i) the speed-up of multiscale models with special focus on fine-scale computations and (ii) on decreasing the quality of computations enforced by parallel execution. Speed-up has been evaluated on the basis of Amdahl's law equations. The problem of `delay error', rising from the parallel execution of fine scale sub-models, controlled by the sequential macroscopic sub-model is discussed. Some technical aspects of combining third-party commercial modelling software with an in-house multiscale framework and a MPI library are also discussed.
Analysis and identification of subsynchronous vibration for a high pressure parallel flow centrifugal compressor

NASA Technical Reports Server (NTRS)

Kirk, R. G.; Nicholas, J. C.; Donald, G. H.; Murphy, R. C.

1980-01-01

The summary of a complete analytical design evaluation of an existing parallel flow compressor is presented and a field vibration problem that manifested itself as a subsynchronous vibration that tracked at approximately 2/3 of compressor speed is reviewed. The comparison of predicted and observed peak response speeds, frequency spectrum content, and the performance of the bearing-seal systems are presented as the events of the field problem are reviewed. Conclusions and recommendations are made as to the degree of accuracy of the analytical techniques used to evaluate the compressor design.
Parallelism in Manipulator Dynamics. Revision.

DTIC Science & Technology

1983-12-01

computing the motor torques required to drive a lower-pair kinematic chain (e.g., a typical manipulator arm in free motion, or a mechanical leg in the... computations , and presents two "mathematically exact" formulationsespecially suited to high-speed, highly parallel implementa- tions using special-purpose...YNAMICS by I(IIAR) IIAROLI) LATIROP .4ISTRACT This paper addresses the problem of efficiently computing the motor torques required to drive a lower-pair
Parallel processing and expert systems

NASA Technical Reports Server (NTRS)

Yan, Jerry C.; Lau, Sonie

1991-01-01

Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 90's cannot enjoy an increased level of autonomy without the efficient use of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real time demands are met for large expert systems. Speed-up via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial labs in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems was surveyed. The survey is divided into three major sections: (1) multiprocessors for parallel expert systems; (2) parallel languages for symbolic computations; and (3) measurements of parallelism of expert system. Results to date indicate that the parallelism achieved for these systems is small. In order to obtain greater speed-ups, data parallelism and application parallelism must be exploited.
Cascaded VLSI neural network architecture for on-line learning

NASA Technical Reports Server (NTRS)

Thakoor, Anilkumar P. (Inventor); Duong, Tuan A. (Inventor); Daud, Taher (Inventor)

1992-01-01

High-speed, analog, fully-parallel, and asynchronous building blocks are cascaded for larger sizes and enhanced resolution. A hardware compatible algorithm permits hardware-in-the-loop learning despite limited weight resolution. A computation intensive feature classification application was demonstrated with this flexible hardware and new algorithm at high speed. This result indicates that these building block chips can be embedded as an application specific coprocessor for solving real world problems at extremely high data rates.
Cascaded VLSI neural network architecture for on-line learning

NASA Technical Reports Server (NTRS)

Duong, Tuan A. (Inventor); Daud, Taher (Inventor); Thakoor, Anilkumar P. (Inventor)

1995-01-01

High-speed, analog, fully-parallel and asynchronous building blocks are cascaded for larger sizes and enhanced resolution. A hardware-compatible algorithm permits hardware-in-the-loop learning despite limited weight resolution. A comparison-intensive feature classification application has been demonstrated with this flexible hardware and new algorithm at high speed. This result indicates that these building block chips can be embedded as application-specific-coprocessors for solving real-world problems at extremely high data rates.
RAMA: A file system for massively parallel computers

NASA Technical Reports Server (NTRS)

Miller, Ethan L.; Katz, Randy H.

1993-01-01

This paper describes a file system design for massively parallel computers which makes very efficient use of a few disks per processor. This overcomes the traditional I/O bottleneck of massively parallel machines by storing the data on disks within the high-speed interconnection network. In addition, the file system, called RAMA, requires little inter-node synchronization, removing another common bottleneck in parallel processor file systems. Support for a large tertiary storage system can easily be integrated in lo the file system; in fact, RAMA runs most efficiently when tertiary storage is used.
Parallelization of interpolation, solar radiation and water flow simulation modules in GRASS GIS using OpenMP

NASA Astrophysics Data System (ADS)

Hofierka, Jaroslav; Lacko, Michal; Zubal, Stanislav

2017-10-01

In this paper, we describe the parallelization of three complex and computationally intensive modules of GRASS GIS using the OpenMP application programming interface for multi-core computers. These include the v.surf.rst module for spatial interpolation, the r.sun module for solar radiation modeling and the r.sim.water module for water flow simulation. We briefly describe the functionality of the modules and parallelization approaches used in the modules. Our approach includes the analysis of the module's functionality, identification of source code segments suitable for parallelization and proper application of OpenMP parallelization code to create efficient threads processing the subtasks. We document the efficiency of the solutions using the airborne laser scanning data representing land surface in the test area and derived high-resolution digital terrain model grids. We discuss the performance speed-up and parallelization efficiency depending on the number of processor threads. The study showed a substantial increase in computation speeds on a standard multi-core computer while maintaining the accuracy of results in comparison to the output from original modules. The presented parallelization approach showed the simplicity and efficiency of the parallelization of open-source GRASS GIS modules using OpenMP, leading to an increased performance of this geospatial software on standard multi-core computers.
Research on parallel combinatory spread spectrum communication system with double information matching

NASA Astrophysics Data System (ADS)

Xue, Wei; Wang, Qi; Wang, Tianyu

2018-04-01

This paper presents an improved parallel combinatory spread spectrum (PC/SS) communication system with the method of double information matching (DIM). Compared with conventional PC/SS system, the new model inherits the advantage of high transmission speed, large information capacity and high security. Besides, the problem traditional system will face is the high bit error rate (BER) and since its data-sequence mapping algorithm. Hence the new model presented shows lower BER and higher efficiency by its optimization of mapping algorithm.
Efficient Parallel Levenberg-Marquardt Model Fitting towards Real-Time Automated Parametric Imaging Microscopy

PubMed Central

Zhu, Xiang; Zhang, Dianwen

2013-01-01

We present a fast, accurate and robust parallel Levenberg-Marquardt minimization optimizer, GPU-LMFit, which is implemented on graphics processing unit for high performance scalable parallel model fitting processing. GPU-LMFit can provide a dramatic speed-up in massive model fitting analyses to enable real-time automated pixel-wise parametric imaging microscopy. We demonstrate the performance of GPU-LMFit for the applications in superresolution localization microscopy and fluorescence lifetime imaging microscopy. PMID:24130785

JPARSS: A Java Parallel Network Package for Grid Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, Jie; Akers, Walter; Chen, Ying

2002-03-01

The emergence of high speed wide area networks makes grid computinga reality. However grid applications that need reliable data transfer still have difficulties to achieve optimal TCP performance due to network tuning of TCP window size to improve bandwidth and to reduce latency on a high speed wide area network. This paper presents a Java package called JPARSS (Java Parallel Secure Stream (Socket)) that divides data into partitions that are sent over several parallel Java streams simultaneously and allows Java or Web applications to achieve optimal TCP performance in a grid environment without the necessity of tuning TCP window size.more » This package enables single sign-on, certificate delegation and secure or plain-text data transfer using several security components based on X.509 certificate and SSL. Several experiments will be presented to show that using Java parallelstreams is more effective than tuning TCP window size. In addition a simple architecture using Web services« less
High-resolution, high-throughput imaging with a multibeam scanning electron microscope.

PubMed

Eberle, A L; Mikula, S; Schalek, R; Lichtman, J; Knothe Tate, M L; Zeidler, D

2015-08-01

Electron-electron interactions and detector bandwidth limit the maximal imaging speed of single-beam scanning electron microscopes. We use multiple electron beams in a single column and detect secondary electrons in parallel to increase the imaging speed by close to two orders of magnitude and demonstrate imaging for a variety of samples ranging from biological brain tissue to semiconductor wafers. © 2015 The Authors Journal of Microscopy © 2015 Royal Microscopical Society.
Acceleration of low-energy ions at parallel shocks with a focused transport model

DOE PAGES

Zuo, Pingbing; Zhang, Ming; Rassoul, Hamid K.

2013-04-10

Here, we present a test particle simulation on the injection and acceleration of low-energy suprathermal particles by parallel shocks with a focused transport model. The focused transport equation contains all necessary physics of shock acceleration, but avoids the limitation of diffusive shock acceleration (DSA) that requires a small pitch angle anisotropy. This simulation verifies that the particles with speeds of a fraction of to a few times the shock speed can indeed be directly injected and accelerated into the DSA regime by parallel shocks. At higher energies starting from a few times the shock speed, the energy spectrum of acceleratedmore » particles is a power law with the same spectral index as the solution of standard DSA theory, although the particles are highly anisotropic in the upstream region. The intensity, however, is different from that predicted by DSA theory, indicating a different level of injection efficiency. It is found that the shock strength, the injection speed, and the intensity of an electric cross-shock potential (CSP) jump can affect the injection efficiency of the low-energy particles. A stronger shock has a higher injection efficiency. In addition, if the speed of injected particles is above a few times the shock speed, the produced power-law spectrum is consistent with the prediction of standard DSA theory in both its intensity and spectrum index with an injection efficiency of 1. CSP can increase the injection efficiency through direct particle reflection back upstream, but it has little effect on the energetic particle acceleration once the speed of injected particles is beyond a few times the shock speed. This test particle simulation proves that the focused transport theory is an extension of DSA theory with the capability of predicting the efficiency of particle injection.« less
Effects of changes in size, speed and distance on the perception of curved 3D trajectories

PubMed Central

Zhang, Junjun; Braunstein, Myron L.; Andersen, George J.

2012-01-01

Previous research on the perception of 3D object motion has considered time to collision, time to passage, collision detection and judgments of speed and direction of motion, but has not directly studied the perception of the overall shape of the motion path. We examined the perception of the magnitude of curvature and sign of curvature of the motion path for objects moving at eye level in a horizontal plane parallel to the line of sight. We considered two sources of information for the perception of motion trajectories: changes in angular size and changes in angular speed. Three experiments examined judgments of relative curvature for objects moving at different distances. At the closest distance studied, accuracy was high with size information alone but near chance with speed information alone. At the greatest distance, accuracy with size information alone decreased sharply but accuracy for displays with both size and speed information remained high. We found similar results in two experiments with judgments of sign of curvature. Accuracy was higher for displays with both size and speed information than with size information alone, even when the speed information was based on parallel projections and was not informative about sign of curvature. For both magnitude of curvature and sign of curvature judgments, information indicating that the trajectory was curved increased accuracy, even when this information was not directly relevant to the required judgment. PMID:23007204
Three-dimensional Finite Element Formulation and Scalable Domain Decomposition for High Fidelity Rotor Dynamic Analysis

NASA Technical Reports Server (NTRS)

Datta, Anubhav; Johnson, Wayne R.

2009-01-01

This paper has two objectives. The first objective is to formulate a 3-dimensional Finite Element Model for the dynamic analysis of helicopter rotor blades. The second objective is to implement and analyze a dual-primal iterative substructuring based Krylov solver, that is parallel and scalable, for the solution of the 3-D FEM analysis. The numerical and parallel scalability of the solver is studied using two prototype problems - one for ideal hover (symmetric) and one for a transient forward flight (non-symmetric) - both carried out on up to 48 processors. In both hover and forward flight conditions, a perfect linear speed-up is observed, for a given problem size, up to the point of substructure optimality. Substructure optimality and the linear parallel speed-up range are both shown to depend on the problem size as well as on the selection of the coarse problem. With a larger problem size, linear speed-up is restored up to the new substructure optimality. The solver also scales with problem size - even though this conclusion is premature given the small prototype grids considered in this study.
An accurate, fast, and scalable solver for high-frequency wave propagation

NASA Astrophysics Data System (ADS)

Zepeda-Núñez, L.; Taus, M.; Hewett, R.; Demanet, L.

2017-12-01

In many science and engineering applications, solving time-harmonic high-frequency wave propagation problems quickly and accurately is of paramount importance. For example, in geophysics, particularly in oil exploration, such problems can be the forward problem in an iterative process for solving the inverse problem of subsurface inversion. It is important to solve these wave propagation problems accurately in order to efficiently obtain meaningful solutions of the inverse problems: low order forward modeling can hinder convergence. Additionally, due to the volume of data and the iterative nature of most optimization algorithms, the forward problem must be solved many times. Therefore, a fast solver is necessary to make solving the inverse problem feasible. For time-harmonic high-frequency wave propagation, obtaining both speed and accuracy is historically challenging. Recently, there have been many advances in the development of fast solvers for such problems, including methods which have linear complexity with respect to the number of degrees of freedom. While most methods scale optimally only in the context of low-order discretizations and smooth wave speed distributions, the method of polarized traces has been shown to retain optimal scaling for high-order discretizations, such as hybridizable discontinuous Galerkin methods and for highly heterogeneous (and even discontinuous) wave speeds. The resulting fast and accurate solver is consequently highly attractive for geophysical applications. To date, this method relies on a layered domain decomposition together with a preconditioner applied in a sweeping fashion, which has limited straight-forward parallelization. In this work, we introduce a new version of the method of polarized traces which reveals more parallel structure than previous versions while preserving all of its other advantages. We achieve this by further decomposing each layer and applying the preconditioner to these new components separately and in parallel. We demonstrate that this produces an even more effective and parallelizable preconditioner for a single right-hand side. As before, additional speed can be gained by pipelining several right-hand-sides.
Resonant tunnelling diode based high speed optoelectronic transmitters

NASA Astrophysics Data System (ADS)

Wang, Jue; Rodrigues, G. C.; Al-Khalidi, Abdullah; Figueiredo, José M. L.; Wasige, Edward

2017-08-01

Resonant tunneling diode (RTD) integration with photo detector (PD) from epi-layer design shows great potential for combining terahertz (THz) RTD electronic source with high speed optical modulation. With an optimized layer structure, the RTD-PD presented in the paper shows high stationary responsivity of 5 A/W at 1310 nm wavelength. High power microwave/mm-wave RTD-PD optoelectronic oscillators are proposed. The circuitry employs two RTD-PD devices in parallel. The oscillation frequencies range from 20-44 GHz with maximum attainable power about 1 mW at 34/37/44GHz.
Massively parallel information processing systems for space applications

NASA Technical Reports Server (NTRS)

Schaefer, D. H.

1979-01-01

NASA is developing massively parallel systems for ultra high speed processing of digital image data collected by satellite borne instrumentation. Such systems contain thousands of processing elements. Work is underway on the design and fabrication of the 'Massively Parallel Processor', a ground computer containing 16,384 processing elements arranged in a 128 x 128 array. This computer uses existing technology. Advanced work includes the development of semiconductor chips containing thousands of feedthrough paths. Massively parallel image analog to digital conversion technology is also being developed. The goal is to provide compact computers suitable for real-time onboard processing of images.
Evolving binary classifiers through parallel computation of multiple fitness cases.

PubMed

Cagnoni, Stefano; Bergenti, Federico; Mordonini, Monica; Adorni, Giovanni

2005-06-01

This paper describes two versions of a novel approach to developing binary classifiers, based on two evolutionary computation paradigms: cellular programming and genetic programming. Such an approach achieves high computation efficiency both during evolution and at runtime. Evolution speed is optimized by allowing multiple solutions to be computed in parallel. Runtime performance is optimized explicitly using parallel computation in the case of cellular programming or implicitly taking advantage of the intrinsic parallelism of bitwise operators on standard sequential architectures in the case of genetic programming. The approach was tested on a digit recognition problem and compared with a reference classifier.
Global Magnetohydrodynamic Simulation Using High Performance FORTRAN on Parallel Computers

NASA Astrophysics Data System (ADS)

Ogino, T.

High Performance Fortran (HPF) is one of modern and common techniques to achieve high performance parallel computation. We have translated a 3-dimensional magnetohydrodynamic (MHD) simulation code of the Earth's magnetosphere from VPP Fortran to HPF/JA on the Fujitsu VPP5000/56 vector-parallel supercomputer and the MHD code was fully vectorized and fully parallelized in VPP Fortran. The entire performance and capability of the HPF MHD code could be shown to be almost comparable to that of VPP Fortran. A 3-dimensional global MHD simulation of the earth's magnetosphere was performed at a speed of over 400 Gflops with an efficiency of 76.5 VPP5000/56 in vector and parallel computation that permitted comparison with catalog values. We have concluded that fluid and MHD codes that are fully vectorized and fully parallelized in VPP Fortran can be translated with relative ease to HPF/JA, and a code in HPF/JA may be expected to perform comparably to the same code written in VPP Fortran.
Scalable parallel communications

NASA Technical Reports Server (NTRS)

Maly, K.; Khanna, S.; Overstreet, C. M.; Mukkamala, R.; Zubair, M.; Sekhar, Y. S.; Foudriat, E. C.

1992-01-01

Coarse-grain parallelism in networking (that is, the use of multiple protocol processors running replicated software sending over several physical channels) can be used to provide gigabit communications for a single application. Since parallel network performance is highly dependent on real issues such as hardware properties (e.g., memory speeds and cache hit rates), operating system overhead (e.g., interrupt handling), and protocol performance (e.g., effect of timeouts), we have performed detailed simulations studies of both a bus-based multiprocessor workstation node (based on the Sun Galaxy MP multiprocessor) and a distributed-memory parallel computer node (based on the Touchstone DELTA) to evaluate the behavior of coarse-grain parallelism. Our results indicate: (1) coarse-grain parallelism can deliver multiple 100 Mbps with currently available hardware platforms and existing networking protocols (such as Transmission Control Protocol/Internet Protocol (TCP/IP) and parallel Fiber Distributed Data Interface (FDDI) rings); (2) scale-up is near linear in n, the number of protocol processors, and channels (for small n and up to a few hundred Mbps); and (3) since these results are based on existing hardware without specialized devices (except perhaps for some simple modifications of the FDDI boards), this is a low cost solution to providing multiple 100 Mbps on current machines. In addition, from both the performance analysis and the properties of these architectures, we conclude: (1) multiple processors providing identical services and the use of space division multiplexing for the physical channels can provide better reliability than monolithic approaches (it also provides graceful degradation and low-cost load balancing); (2) coarse-grain parallelism supports running several transport protocols in parallel to provide different types of service (for example, one TCP handles small messages for many users, other TCP's running in parallel provide high bandwidth service to a single application); and (3) coarse grain parallelism will be able to incorporate many future improvements from related work (e.g., reduced data movement, fast TCP, fine-grain parallelism) also with near linear speed-ups.
Limits to high-speed simulations of spiking neural networks using general-purpose computers.

PubMed

Zenke, Friedemann; Gerstner, Wulfram

2014-01-01

To understand how the central nervous system performs computations using recurrent neuronal circuitry, simulations have become an indispensable tool for theoretical neuroscience. To study neuronal circuits and their ability to self-organize, increasing attention has been directed toward synaptic plasticity. In particular spike-timing-dependent plasticity (STDP) creates specific demands for simulations of spiking neural networks. On the one hand a high temporal resolution is required to capture the millisecond timescale of typical STDP windows. On the other hand network simulations have to evolve over hours up to days, to capture the timescale of long-term plasticity. To do this efficiently, fast simulation speed is the crucial ingredient rather than large neuron numbers. Using different medium-sized network models consisting of several thousands of neurons and off-the-shelf hardware, we compare the simulation speed of the simulators: Brian, NEST and Neuron as well as our own simulator Auryn. Our results show that real-time simulations of different plastic network models are possible in parallel simulations in which numerical precision is not a primary concern. Even so, the speed-up margin of parallelism is limited and boosting simulation speeds beyond one tenth of real-time is difficult. By profiling simulation code we show that the run times of typical plastic network simulations encounter a hard boundary. This limit is partly due to latencies in the inter-process communications and thus cannot be overcome by increased parallelism. Overall, these results show that to study plasticity in medium-sized spiking neural networks, adequate simulation tools are readily available which run efficiently on small clusters. However, to run simulations substantially faster than real-time, special hardware is a prerequisite.
A 12-bit high-speed column-parallel two-step single-slope analog-to-digital converter (ADC) for CMOS image sensors.

PubMed

Lyu, Tao; Yao, Suying; Nie, Kaiming; Xu, Jiangtao

2014-11-17

A 12-bit high-speed column-parallel two-step single-slope (SS) analog-to-digital converter (ADC) for CMOS image sensors is proposed. The proposed ADC employs a single ramp voltage and multiple reference voltages, and the conversion is divided into coarse phase and fine phase to improve the conversion rate. An error calibration scheme is proposed to correct errors caused by offsets among the reference voltages. The digital-to-analog converter (DAC) used for the ramp generator is based on the split-capacitor array with an attenuation capacitor. Analysis of the DAC's linearity performance versus capacitor mismatch and parasitic capacitance is presented. A prototype 1024 × 32 Time Delay Integration (TDI) CMOS image sensor with the proposed ADC architecture has been fabricated in a standard 0.18 μm CMOS process. The proposed ADC has average power consumption of 128 μW and a conventional rate 6 times higher than the conventional SS ADC. A high-quality image, captured at the line rate of 15.5 k lines/s, shows that the proposed ADC is suitable for high-speed CMOS image sensors.
A High-Speed Design of Montgomery Multiplier

NASA Astrophysics Data System (ADS)

Fan, Yibo; Ikenaga, Takeshi; Goto, Satoshi

With the increase of key length used in public cryptographic algorithms such as RSA and ECC, the speed of Montgomery multiplication becomes a bottleneck. This paper proposes a high speed design of Montgomery multiplier. Firstly, a modified scalable high-radix Montgomery algorithm is proposed to reduce critical path. Secondly, a high-radix clock-saving dataflow is proposed to support high-radix operation and one clock cycle delay in dataflow. Finally, a hardware-reused architecture is proposed to reduce the hardware cost and a parallel radix-16 design of data path is proposed to accelerate the speed. By using HHNEC 0.25μm standard cell library, the implementation results show that the total cost of Montgomery multiplier is 130 KGates, the clock frequency is 180MHz and the throughput of 1024-bit RSA encryption is 352kbps. This design is suitable to be used in high speed RSA or ECC encryption/decryption. As a scalable design, it supports any key-length encryption/decryption up to the size of on-chip memory.
Heat Transfer in the Turbulent Boundary Layer of a Compressible Gas at High Speeds

NASA Technical Reports Server (NTRS)

Frankl, F.

1942-01-01

The Reynolds law of heat transfer from a wall to a turbulent stream is extended to the case of flow of a compressible gas at high speeds. The analysis is based on the modern theory of the turbulent boundary layer with laminar sublayer. The investigation is carried out for the case of a plate situated in a parallel stream. The results are obtained independently of the velocity distribution in the turbulent boundar layer.
Large-scale three-dimensional phase-field simulations for phase coarsening at ultrahigh volume fraction on high-performance architectures

NASA Astrophysics Data System (ADS)

Yan, Hui; Wang, K. G.; Jones, Jim E.

2016-06-01

A parallel algorithm for large-scale three-dimensional phase-field simulations of phase coarsening is developed and implemented on high-performance architectures. From the large-scale simulations, a new kinetics in phase coarsening in the region of ultrahigh volume fraction is found. The parallel implementation is capable of harnessing the greater computer power available from high-performance architectures. The parallelized code enables increase in three-dimensional simulation system size up to a 5123 grid cube. Through the parallelized code, practical runtime can be achieved for three-dimensional large-scale simulations, and the statistical significance of the results from these high resolution parallel simulations are greatly improved over those obtainable from serial simulations. A detailed performance analysis on speed-up and scalability is presented, showing good scalability which improves with increasing problem size. In addition, a model for prediction of runtime is developed, which shows a good agreement with actual run time from numerical tests.
Fast Whole-Engine Stirling Analysis

NASA Technical Reports Server (NTRS)

Dyson, Rodger W.; Wilson, Scott D.; Tew, Roy C.; Demko, Rikako

2006-01-01

This presentation discusses the simulation approach to whole-engine for physical consistency, REV regenerator modeling, grid layering for smoothness, and quality, conjugate heat transfer method adjustment, high-speed low cost parallel cluster, and debugging.
Multiprocessor speed-up, Amdahl's Law, and the Activity Set Model of parallel program behavior

NASA Technical Reports Server (NTRS)

Gelenbe, Erol

1988-01-01

An important issue in the effective use of parallel processing is the estimation of the speed-up one may expect as a function of the number of processors used. Amdahl's Law has traditionally provided a guideline to this issue, although it appears excessively pessimistic in the light of recent experimental results. In this note, Amdahl's Law is amended by giving a greater importance to the capacity of a program to make effective use of parallel processing, but also recognizing the fact that imbalance of the workload of each processor is bound to occur. An activity set model of parallel program behavior is then introduced along with the corresponding parallelism index of a program, leading to upper and lower bounds to the speed-up.
Parameters that affect parallel processing for computational electromagnetic simulation codes on high performance computing clusters

NASA Astrophysics Data System (ADS)

Moon, Hongsik

What is the impact of multicore and associated advanced technologies on computational software for science? Most researchers and students have multicore laptops or desktops for their research and they need computing power to run computational software packages. Computing power was initially derived from Central Processing Unit (CPU) clock speed. That changed when increases in clock speed became constrained by power requirements. Chip manufacturers turned to multicore CPU architectures and associated technological advancements to create the CPUs for the future. Most software applications benefited by the increased computing power the same way that increases in clock speed helped applications run faster. However, for Computational ElectroMagnetics (CEM) software developers, this change was not an obvious benefit - it appeared to be a detriment. Developers were challenged to find a way to correctly utilize the advancements in hardware so that their codes could benefit. The solution was parallelization and this dissertation details the investigation to address these challenges. Prior to multicore CPUs, advanced computer technologies were compared with the performance using benchmark software and the metric was FLoting-point Operations Per Seconds (FLOPS) which indicates system performance for scientific applications that make heavy use of floating-point calculations. Is FLOPS an effective metric for parallelized CEM simulation tools on new multicore system? Parallel CEM software needs to be benchmarked not only by FLOPS but also by the performance of other parameters related to type and utilization of the hardware, such as CPU, Random Access Memory (RAM), hard disk, network, etc. The codes need to be optimized for more than just FLOPs and new parameters must be included in benchmarking. In this dissertation, the parallel CEM software named High Order Basis Based Integral Equation Solver (HOBBIES) is introduced. This code was developed to address the needs of the changing computer hardware platforms in order to provide fast, accurate and efficient solutions to large, complex electromagnetic problems. The research in this dissertation proves that the performance of parallel code is intimately related to the configuration of the computer hardware and can be maximized for different hardware platforms. To benchmark and optimize the performance of parallel CEM software, a variety of large, complex projects are created and executed on a variety of computer platforms. The computer platforms used in this research are detailed in this dissertation. The projects run as benchmarks are also described in detail and results are presented. The parameters that affect parallel CEM software on High Performance Computing Clusters (HPCC) are investigated. This research demonstrates methods to maximize the performance of parallel CEM software code.
Real-time SHVC software decoding with multi-threaded parallel processing

NASA Astrophysics Data System (ADS)

Gudumasu, Srinivas; He, Yuwen; Ye, Yan; He, Yong; Ryu, Eun-Seok; Dong, Jie; Xiu, Xiaoyu

2014-09-01

This paper proposes a parallel decoding framework for scalable HEVC (SHVC). Various optimization technologies are implemented on the basis of SHVC reference software SHM-2.0 to achieve real-time decoding speed for the two layer spatial scalability configuration. SHVC decoder complexity is analyzed with profiling information. The decoding process at each layer and the up-sampling process are designed in parallel and scheduled by a high level application task manager. Within each layer, multi-threaded decoding is applied to accelerate the layer decoding speed. Entropy decoding, reconstruction, and in-loop processing are pipeline designed with multiple threads based on groups of coding tree units (CTU). A group of CTUs is treated as a processing unit in each pipeline stage to achieve a better trade-off between parallelism and synchronization. Motion compensation, inverse quantization, and inverse transform modules are further optimized with SSE4 SIMD instructions. Simulations on a desktop with an Intel i7 processor 2600 running at 3.4 GHz show that the parallel SHVC software decoder is able to decode 1080p spatial 2x at up to 60 fps (frames per second) and 1080p spatial 1.5x at up to 50 fps for those bitstreams generated with SHVC common test conditions in the JCT-VC standardization group. The decoding performance at various bitrates with different optimization technologies and different numbers of threads are compared in terms of decoding speed and resource usage, including processor and memory.

Implementation of a high-speed face recognition system that uses an optical parallel correlator.

PubMed

Watanabe, Eriko; Kodate, Kashiko

2005-02-10

We implement a fully automatic fast face recognition system by using a 1000 frame/s optical parallel correlator designed and assembled by us. The operational speed for the 1:N (i.e., matching one image against N, where N refers to the number of images in the database) identification experiment (4000 face images) amounts to less than 1.5 s, including the preprocessing and postprocessing times. The binary real-only matched filter is devised for the sake of face recognition, and the system is optimized by the false-rejection rate (FRR) and the false-acceptance rate (FAR), according to 300 samples selected by the biometrics guideline. From trial 1:N identification experiments with the optical parallel correlator, we acquired low error rates of 2.6% FRR and 1.3% FAR. Facial images of people wearing thin glasses or heavy makeup that rendered identification difficult were identified with this system.
Repeatability of high-speed migration of tremor along the Nankai subduction zone, Japan

NASA Astrophysics Data System (ADS)

Kato, A.; Tsuruoka, H.; Nakagawa, S.; Hirata, N.

2015-12-01

Tectonic tremors have been considered to be a swarm or superimposed pulses of low-frequency earthquakes (LFEs). To systematically analyze the high-speed migration of tremor [e.g., Shelly et al., 2007], we here focus on an intensive cluster hosting many low-frequency earthquakes located at the western part of Shikoku Island. We relocated ~770 hypocenters of LFEs identified by the JMA, which took place from Jan. 2008 to Dec. 2013, applying double differential relocation algorithm [e.g., Waldhauser and Ellsworth, 2000] to arrival times picked by the JMA and those obtained by waveform cross correlation measurements. The epicentral distributions show a clear alignment parallel to the subduction of the Philippine Sea plate, as like a slip-parallel streaking. Then, we applied a matched-filter technique to continuous seismograms recorded near the source region using relocated template LFEs during 6 years (between Jan. 2008 and Dec. 2013). We newly detected about 60 times the number of template events, which is fairly larger than ones obtained by conventional envelope cross correlation method. Interestingly, we identified many repeated sequences of tremor migrations along the slip-parallel streaking (~350 sequences). Front of each or stacked migration of tremors can be modeled by a parabolic envelope, indicating a diffusion process. The diffusivity of parabolic envelope is estimated to be around 105 m2/s, which is categorized as high-speed migration feature (~100 km/hour). Most of the rapid migrations took place during occurrences of short-term slow slip events (SSEs), and seems to be triggered by ocean and solid Earth tides. The most plausible explanation of the high-speed propagation is a diffusion process of stress pulse concentrated within a cluster of strong brittle patches on the ductile shear zone [Ando et al., 2012]. The viscosity of the ductile shear zone within the streaking is at least one order magnitude smaller than that of the slow-speed migration. This discrepancy of viscosity indicates that the streaking has different rheology compared with background main tremor/SSE belt. In addition, the diffusivity did not show any significant change before and after the Tohoku-Oki M9.0 Earthquake, suggesting that the high-speed propagation of tremors seems to be stable against external stress perturbations.
Novel wavelength diversity technique for high-speed atmospheric turbulence compensation

NASA Astrophysics Data System (ADS)

Arrasmith, William W.; Sullivan, Sean F.

2010-04-01

The defense, intelligence, and homeland security communities are driving a need for software dominant, real-time or near-real time atmospheric turbulence compensated imagery. The development of parallel processing capabilities are finding application in diverse areas including image processing, target tracking, pattern recognition, and image fusion to name a few. A novel approach to the computationally intensive case of software dominant optical and near infrared imaging through atmospheric turbulence is addressed in this paper. Previously, the somewhat conventional wavelength diversity method has been used to compensate for atmospheric turbulence with great success. We apply a new correlation based approach to the wavelength diversity methodology using a parallel processing architecture enabling high speed atmospheric turbulence compensation. Methods for optical imaging through distributed turbulence are discussed, simulation results are presented, and computational and performance assessments are provided.
Blade row dynamic digital compressor program. Volume 1: J85 clean inlet flow and parallel compressor models

NASA Technical Reports Server (NTRS)

Tesch, W. A.; Steenken, W. G.

1976-01-01

The results are presented of a one-dimensional dynamic digital blade row compressor model study of a J85-13 engine operating with uniform and with circumferentially distorted inlet flow. Details of the geometry and the derived blade row characteristics used to simulate the clean inlet performance are given. A stability criterion based upon the self developing unsteady internal flows near surge provided an accurate determination of the clean inlet surge line. The basic model was modified to include an arbitrary extent multi-sector parallel compressor configuration for investigating 180 deg 1/rev total pressure, total temperature, and combined total pressure and total temperature distortions. The combined distortions included opposed, coincident, and 90 deg overlapped patterns. The predicted losses in surge pressure ratio matched the measured data trends at all speeds and gave accurate predictions at high corrected speeds where the slope of the speed lines approached the vertical.
High-performance parallel interface to synchronous optical network gateway

DOEpatents

St. John, Wallace B.; DuBois, David H.

1996-01-01

A system of sending and receiving gateways interconnects high speed data interfaces, e.g., HIPPI interfaces, through fiber optic links, e.g., a SONET network. An electronic stripe distributor distributes bytes of data from a first interface at the sending gateway onto parallel fiber optics of the fiber optic link to form transmitted data. An electronic stripe collector receives the transmitted data on the parallel fiber optics and reforms the data into a format effective for input to a second interface at the receiving gateway. Preferably, an error correcting syndrome is constructed at the sending gateway and sent with a data frame so that transmission errors can be detected and corrected in a real-time basis. Since the high speed data interface operates faster than any of the fiber optic links the transmission rate must be adapted to match the available number of fiber optic links so the sending and receiving gateways monitor the availability of fiber links and adjust the data throughput accordingly. In another aspect, the receiving gateway must have sufficient available buffer capacity to accept an incoming data frame. A credit-based flow control system provides for continuously updating the sending gateway on the available buffer capacity at the receiving gateway.
Distributed Parallel Processing and Dynamic Load Balancing Techniques for Multidisciplinary High Speed Aircraft Design

NASA Technical Reports Server (NTRS)

Krasteva, Denitza T.

1998-01-01

Multidisciplinary design optimization (MDO) for large-scale engineering problems poses many challenges (e.g., the design of an efficient concurrent paradigm for global optimization based on disciplinary analyses, expensive computations over vast data sets, etc.) This work focuses on the application of distributed schemes for massively parallel architectures to MDO problems, as a tool for reducing computation time and solving larger problems. The specific problem considered here is configuration optimization of a high speed civil transport (HSCT), and the efficient parallelization of the embedded paradigm for reasonable design space identification. Two distributed dynamic load balancing techniques (random polling and global round robin with message combining) and two necessary termination detection schemes (global task count and token passing) were implemented and evaluated in terms of effectiveness and scalability to large problem sizes and a thousand processors. The effect of certain parameters on execution time was also inspected. Empirical results demonstrated stable performance and effectiveness for all schemes, and the parametric study showed that the selected algorithmic parameters have a negligible effect on performance.
Parallel computing in experimental mechanics and optical measurement: A review (II)

NASA Astrophysics Data System (ADS)

Wang, Tianyi; Kemao, Qian

2018-05-01

With advantages such as non-destructiveness, high sensitivity and high accuracy, optical techniques have successfully integrated into various important physical quantities in experimental mechanics (EM) and optical measurement (OM). However, in pursuit of higher image resolutions for higher accuracy, the computation burden of optical techniques has become much heavier. Therefore, in recent years, heterogeneous platforms composing of hardware such as CPUs and GPUs, have been widely employed to accelerate these techniques due to their cost-effectiveness, short development cycle, easy portability, and high scalability. In this paper, we analyze various works by first illustrating their different architectures, followed by introducing their various parallel patterns for high speed computation. Next, we review the effects of CPU and GPU parallel computing specifically in EM & OM applications in a broad scope, which include digital image/volume correlation, fringe pattern analysis, tomography, hyperspectral imaging, computer-generated holograms, and integral imaging. In our survey, we have found that high parallelism can always be exploited in such applications for the development of high-performance systems.
A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL)

NASA Technical Reports Server (NTRS)

Carroll, Chester C.; Owen, Jeffrey E.

1988-01-01

A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL) is presented which overcomes the traditional disadvantages of simulations executed on a digital computer. The incorporation of parallel processing allows the mapping of simulations into a digital computer to be done in the same inherently parallel manner as they are currently mapped onto an analog computer. The direct-execution format maximizes the efficiency of the executed code since the need for a high level language compiler is eliminated. Resolution is greatly increased over that which is available with an analog computer without the sacrifice in execution speed normally expected with digitial computer simulations. Although this report covers all aspects of the new architecture, key emphasis is placed on the processing element configuration and the microprogramming of the ACLS constructs. The execution times for all ACLS constructs are computed using a model of a processing element based on the AMD 29000 CPU and the AMD 29027 FPU. The increase in execution speed provided by parallel processing is exemplified by comparing the derived execution times of two ACSL programs with the execution times for the same programs executed on a similar sequential architecture.
Designing a parallel evolutionary algorithm for inferring gene networks on the cloud computing environment.

PubMed

Lee, Wei-Po; Hsiao, Yu-Ting; Hwang, Wei-Che

2014-01-16

To improve the tedious task of reconstructing gene networks through testing experimentally the possible interactions between genes, it becomes a trend to adopt the automated reverse engineering procedure instead. Some evolutionary algorithms have been suggested for deriving network parameters. However, to infer large networks by the evolutionary algorithm, it is necessary to address two important issues: premature convergence and high computational cost. To tackle the former problem and to enhance the performance of traditional evolutionary algorithms, it is advisable to use parallel model evolutionary algorithms. To overcome the latter and to speed up the computation, it is advocated to adopt the mechanism of cloud computing as a promising solution: most popular is the method of MapReduce programming model, a fault-tolerant framework to implement parallel algorithms for inferring large gene networks. This work presents a practical framework to infer large gene networks, by developing and parallelizing a hybrid GA-PSO optimization method. Our parallel method is extended to work with the Hadoop MapReduce programming model and is executed in different cloud computing environments. To evaluate the proposed approach, we use a well-known open-source software GeneNetWeaver to create several yeast S. cerevisiae sub-networks and use them to produce gene profiles. Experiments have been conducted and the results have been analyzed. They show that our parallel approach can be successfully used to infer networks with desired behaviors and the computation time can be largely reduced. Parallel population-based algorithms can effectively determine network parameters and they perform better than the widely-used sequential algorithms in gene network inference. These parallel algorithms can be distributed to the cloud computing environment to speed up the computation. By coupling the parallel model population-based optimization method and the parallel computational framework, high quality solutions can be obtained within relatively short time. This integrated approach is a promising way for inferring large networks.
Designing a parallel evolutionary algorithm for inferring gene networks on the cloud computing environment

PubMed Central

2014-01-01

Background To improve the tedious task of reconstructing gene networks through testing experimentally the possible interactions between genes, it becomes a trend to adopt the automated reverse engineering procedure instead. Some evolutionary algorithms have been suggested for deriving network parameters. However, to infer large networks by the evolutionary algorithm, it is necessary to address two important issues: premature convergence and high computational cost. To tackle the former problem and to enhance the performance of traditional evolutionary algorithms, it is advisable to use parallel model evolutionary algorithms. To overcome the latter and to speed up the computation, it is advocated to adopt the mechanism of cloud computing as a promising solution: most popular is the method of MapReduce programming model, a fault-tolerant framework to implement parallel algorithms for inferring large gene networks. Results This work presents a practical framework to infer large gene networks, by developing and parallelizing a hybrid GA-PSO optimization method. Our parallel method is extended to work with the Hadoop MapReduce programming model and is executed in different cloud computing environments. To evaluate the proposed approach, we use a well-known open-source software GeneNetWeaver to create several yeast S. cerevisiae sub-networks and use them to produce gene profiles. Experiments have been conducted and the results have been analyzed. They show that our parallel approach can be successfully used to infer networks with desired behaviors and the computation time can be largely reduced. Conclusions Parallel population-based algorithms can effectively determine network parameters and they perform better than the widely-used sequential algorithms in gene network inference. These parallel algorithms can be distributed to the cloud computing environment to speed up the computation. By coupling the parallel model population-based optimization method and the parallel computational framework, high quality solutions can be obtained within relatively short time. This integrated approach is a promising way for inferring large networks. PMID:24428926
Using Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

NASA Astrophysics Data System (ADS)

O'Connor, A. S.; Justice, B.; Harris, A. T.

2013-12-01

Graphics Processing Units (GPUs) are high-performance multiple-core processors capable of very high computational speeds and large data throughput. Modern GPUs are inexpensive and widely available commercially. These are general-purpose parallel processors with support for a variety of programming interfaces, including industry standard languages such as C. GPU implementations of algorithms that are well suited for parallel processing can often achieve speedups of several orders of magnitude over optimized CPU codes. Significant improvements in speeds for imagery orthorectification, atmospheric correction, target detection and image transformations like Independent Components Analsyis (ICA) have been achieved using GPU-based implementations. Additional optimizations, when factored in with GPU processing capabilities, can provide 50x - 100x reduction in the time required to process large imagery. Exelis Visual Information Solutions (VIS) has implemented a CUDA based GPU processing frame work for accelerating ENVI and IDL processes that can best take advantage of parallelization. Testing Exelis VIS has performed shows that orthorectification can take as long as two hours with a WorldView1 35,0000 x 35,000 pixel image. With GPU orthorecification, the same orthorectification process takes three minutes. By speeding up image processing, imagery can successfully be used by first responders, scientists making rapid discoveries with near real time data, and provides an operational component to data centers needing to quickly process and disseminate data.
Applications considerations in the system design of highly concurrent multiprocessors

NASA Technical Reports Server (NTRS)

Lundstrom, Stephen F.

1987-01-01

A flow model processor approach to parallel processing is described, using very-high-performance individual processors, high-speed circuit switched interconnection networks, and a high-speed synchronization capability to minimize the effect of the inherently serial portions of applications on performance. Design studies related to the determination of the number of processors, the memory organization, and the structure of the networks used to interconnect the processor and memory resources are discussed. Simulations indicate that applications centered on the large shared data memory should be able to sustain over 500 million floating point operations per second.
Modular time division multiplexer: Efficient simultaneous characterization of fast and slow transients in multiple samples

NASA Astrophysics Data System (ADS)

Kim, Stephan D.; Luo, Jiajun; Buchholz, D. Bruce; Chang, R. P. H.; Grayson, M.

2016-09-01

A modular time division multiplexer (MTDM) device is introduced to enable parallel measurement of multiple samples with both fast and slow decay transients spanning from millisecond to month-long time scales. This is achieved by dedicating a single high-speed measurement instrument for rapid data collection at the start of a transient, and by multiplexing a second low-speed measurement instrument for slow data collection of several samples in parallel for the later transients. The MTDM is a high-level design concept that can in principle measure an arbitrary number of samples, and the low cost implementation here allows up to 16 samples to be measured in parallel over several months, reducing the total ensemble measurement duration and equipment usage by as much as an order of magnitude without sacrificing fidelity. The MTDM was successfully demonstrated by simultaneously measuring the photoconductivity of three amorphous indium-gallium-zinc-oxide thin films with 20 ms data resolution for fast transients and an uninterrupted parallel run time of over 20 days. The MTDM has potential applications in many areas of research that manifest response times spanning many orders of magnitude, such as photovoltaics, rechargeable batteries, amorphous semiconductors such as silicon and amorphous indium-gallium-zinc-oxide.
Modular time division multiplexer: Efficient simultaneous characterization of fast and slow transients in multiple samples.

PubMed

Kim, Stephan D; Luo, Jiajun; Buchholz, D Bruce; Chang, R P H; Grayson, M

2016-09-01

A modular time division multiplexer (MTDM) device is introduced to enable parallel measurement of multiple samples with both fast and slow decay transients spanning from millisecond to month-long time scales. This is achieved by dedicating a single high-speed measurement instrument for rapid data collection at the start of a transient, and by multiplexing a second low-speed measurement instrument for slow data collection of several samples in parallel for the later transients. The MTDM is a high-level design concept that can in principle measure an arbitrary number of samples, and the low cost implementation here allows up to 16 samples to be measured in parallel over several months, reducing the total ensemble measurement duration and equipment usage by as much as an order of magnitude without sacrificing fidelity. The MTDM was successfully demonstrated by simultaneously measuring the photoconductivity of three amorphous indium-gallium-zinc-oxide thin films with 20 ms data resolution for fast transients and an uninterrupted parallel run time of over 20 days. The MTDM has potential applications in many areas of research that manifest response times spanning many orders of magnitude, such as photovoltaics, rechargeable batteries, amorphous semiconductors such as silicon and amorphous indium-gallium-zinc-oxide.
RRAM-based parallel computing architecture using k-nearest neighbor classification for pattern recognition

NASA Astrophysics Data System (ADS)

Jiang, Yuning; Kang, Jinfeng; Wang, Xinan

2017-03-01

Resistive switching memory (RRAM) is considered as one of the most promising devices for parallel computing solutions that may overcome the von Neumann bottleneck of today’s electronic systems. However, the existing RRAM-based parallel computing architectures suffer from practical problems such as device variations and extra computing circuits. In this work, we propose a novel parallel computing architecture for pattern recognition by implementing k-nearest neighbor classification on metal-oxide RRAM crossbar arrays. Metal-oxide RRAM with gradual RESET behaviors is chosen as both the storage and computing components. The proposed architecture is tested by the MNIST database. High speed (~100 ns per example) and high recognition accuracy (97.05%) are obtained. The influence of several non-ideal device properties is also discussed, and it turns out that the proposed architecture shows great tolerance to device variations. This work paves a new way to achieve RRAM-based parallel computing hardware systems with high performance.
Development of a novel parallel-spool pilot operated high-pressure solenoid valve with high flow rate and high speed

NASA Astrophysics Data System (ADS)

Dong, Dai; Li, Xiaoning

2015-03-01

High-pressure solenoid valve with high flow rate and high speed is a key component in an underwater driving system. However, traditional single spool pilot operated valve cannot meet the demands of both high flow rate and high speed simultaneously. A new structure for a high pressure solenoid valve is needed to meet the demand of the underwater driving system. A novel parallel-spool pilot operated high-pressure solenoid valve is proposed to overcome the drawback of the current single spool design. Mathematical models of the opening process and flow rate of the valve are established. Opening response time of the valve is subdivided into 4 parts to analyze the properties of the opening response. Corresponding formulas to solve 4 parts of the response time are derived. Key factors that influence the opening response time are analyzed. According to the mathematical model of the valve, a simulation of the opening process is carried out by MATLAB. Parameters are chosen based on theoretical analysis to design the test prototype of the new type of valve. Opening response time of the designed valve is tested by verifying response of the current in the coil and displacement of the main valve spool. The experimental results are in agreement with the simulated results, therefore the validity of the theoretical analysis is verified. Experimental opening response time of the valve is 48.3 ms at working pressure of 10 MPa. The flow capacity test shows that the largest effective area is 126 mm2 and the largest air flow rate is 2320 L/s. According to the result of the load driving test, the valve can meet the demands of the driving system. The proposed valve with parallel spools provides a new method for the design of a high-pressure valve with fast response and large flow rate.
Low-Speed Investigation of Upper-Surface Leading-Edge Blowing on a High-Speed Civil Transport Configuration

NASA Technical Reports Server (NTRS)

Banks, Daniel W.; Laflin, Brenda E. Gile; Kemmerly, Guy T.; Campbell, Bryan A.

1999-01-01

The paper identifies speed, agility, human interface, generation of sensitivity information, task decomposition, and data transmission (including storage) as important attributes for a computer environment to have in order to support engineering design effectively. It is argued that when examined in terms of these attributes the presently available environment can be shown to be inadequate. A radical improvement is needed, and it may be achieved by combining new methods that have recently emerged from multidisciplinary design optimisation (MDO) with massively parallel processing computer technology. The caveat is that, for successful use of that technology in engineering computing, new paradigms for computing will have to be developed - specifically, innovative algorithms that are intrinsically parallel so that their performance scales up linearly with the number of processors. It may be speculated that the idea of simulating a complex behaviour by interaction of a large number of very simple models may be an inspiration for the above algorithms; the cellular automata are an example. Because of the long lead time needed to develop and mature new paradigms, development should begin now, even though the widespread availability of massively parallel processing is still a few years away.
Concurrent computation of attribute filters on shared memory parallel machines.

PubMed

Wilkinson, Michael H F; Gao, Hui; Hesselink, Wim H; Jonker, Jan-Eppo; Meijster, Arnold

2008-10-01

Morphological attribute filters have not previously been parallelized, mainly because they are both global and non-separable. We propose a parallel algorithm that achieves efficient parallelism for a large class of attribute filters, including attribute openings, closings, thinnings and thickenings, based on Salembier's Max-Trees and Min-trees. The image or volume is first partitioned in multiple slices. We then compute the Max-trees of each slice using any sequential Max-Tree algorithm. Subsequently, the Max-trees of the slices can be merged to obtain the Max-tree of the image. A C-implementation yielded good speed-ups on both a 16-processor MIPS 14000 parallel machine, and a dual-core Opteron-based machine. It is shown that the speed-up of the parallel algorithm is a direct measure of the gain with respect to the sequential algorithm used. Furthermore, the concurrent algorithm shows a speed gain of up to 72 percent on a single-core processor, due to reduced cache thrashing.
Random number generators for large-scale parallel Monte Carlo simulations on FPGA

NASA Astrophysics Data System (ADS)

Lin, Y.; Wang, F.; Liu, B.

2018-05-01

Through parallelization, field programmable gate array (FPGA) can achieve unprecedented speeds in large-scale parallel Monte Carlo (LPMC) simulations. FPGA presents both new constraints and new opportunities for the implementations of random number generators (RNGs), which are key elements of any Monte Carlo (MC) simulation system. Using empirical and application based tests, this study evaluates all of the four RNGs used in previous FPGA based MC studies and newly proposed FPGA implementations for two well-known high-quality RNGs that are suitable for LPMC studies on FPGA. One of the newly proposed FPGA implementations: a parallel version of additive lagged Fibonacci generator (Parallel ALFG) is found to be the best among the evaluated RNGs in fulfilling the needs of LPMC simulations on FPGA.
Full range line-field parallel swept source imaging utilizing digital refocusing

NASA Astrophysics Data System (ADS)

Fechtig, Daniel J.; Kumar, Abhishek; Drexler, Wolfgang; Leitgeb, Rainer A.

2015-12-01

We present geometric optics-based refocusing applied to a novel off-axis line-field parallel swept source imaging (LPSI) system. LPSI is an imaging modality based on line-field swept source optical coherence tomography, which permits 3-D imaging at acquisition speeds of up to 1 MHz. The digital refocusing algorithm applies a defocus-correcting phase term to the Fourier representation of complex-valued interferometric image data, which is based on the geometrical optics information of the LPSI system. We introduce the off-axis LPSI system configuration, the digital refocusing algorithm and demonstrate the effectiveness of our method for refocusing volumetric images of technical and biological samples. An increase of effective in-focus depth range from 255 μm to 4.7 mm is achieved. The recovery of the full in-focus depth range might be especially valuable for future high-speed and high-resolution diagnostic applications of LPSI in ophthalmology.

Integrated test system of infrared and laser data based on USB 3.0

NASA Astrophysics Data System (ADS)

Fu, Hui Quan; Tang, Lin Bo; Zhang, Chao; Zhao, Bao Jun; Li, Mao Wen

2017-07-01

Based on USB3.0, this paper presents the design method of an integrated test system for both infrared image data and laser signal data processing module. The core of the design is FPGA logic control, the design uses dual-chip DDR3 SDRAM to achieve high-speed laser data cache, and receive parallel LVDS image data through serial-to-parallel conversion chip, and it achieves high-speed data communication between the system and host computer through the USB3.0 bus. The experimental results show that the developed PC software realizes the real-time display of 14-bit LVDS original image after 14-to-8 bit conversion and JPEG2000 compressed image after decompression in software, and can realize the real-time display of the acquired laser signal data. The correctness of the test system design is verified, indicating that the interface link is normal.
Performance of a 300 Mbps 1:16 serial/parallel optoelectronic receiver module

NASA Technical Reports Server (NTRS)

Richard, M. A.; Claspy, P. C.; Bhasin, K. B.; Bendett, M. B.

1990-01-01

Optical interconnects are being considered for the high speed distribution of multiplexed control signals in GaAs monolithic microwave integrated circuit (MMIC) based phased array antennas. The performance of a hybrid GaAs optoelectronic integrated circuit (OEIC) is described, as well as its design and fabrication. The OEIC converts a 16-bit serial optical input to a 16 parallel line electrical output using an on-board 1:16 demultiplexer and operates at data rates as high as 30b Mbps. The performance characteristics and potential applications of the device are presented.
Megavolt parallel potentials arising from double-layer streams in the Earth's outer radiation belt.

PubMed

Mozer, F S; Bale, S D; Bonnell, J W; Chaston, C C; Roth, I; Wygant, J

2013-12-06

Huge numbers of double layers carrying electric fields parallel to the local magnetic field line have been observed on the Van Allen probes in connection with in situ relativistic electron acceleration in the Earth's outer radiation belt. For one case with adequate high time resolution data, 7000 double layers were observed in an interval of 1 min to produce a 230,000 V net parallel potential drop crossing the spacecraft. Lower resolution data show that this event lasted for 6 min and that more than 1,000,000 volts of net parallel potential crossed the spacecraft during this time. A double layer traverses the length of a magnetic field line in about 15 s and the orbital motion of the spacecraft perpendicular to the magnetic field was about 700 km during this 6 min interval. Thus, the instantaneous parallel potential along a single magnetic field line was the order of tens of kilovolts. Electrons on the field line might experience many such potential steps in their lifetimes to accelerate them to energies where they serve as the seed population for relativistic acceleration by coherent, large amplitude whistler mode waves. Because the double-layer speed of 3100 km/s is the order of the electron acoustic speed (and not the ion acoustic speed) of a 25 eV plasma, the double layers may result from a new electron acoustic mode. Acceleration mechanisms involving double layers may also be important in planetary radiation belts such as Jupiter, Saturn, Uranus, and Neptune, in the solar corona during flares, and in astrophysical objects.
Construction of a parallel processor for simulating manipulators and other mechanical systems

NASA Technical Reports Server (NTRS)

Hannauer, George

1991-01-01

This report summarizes the results of NASA Contract NAS5-30905, awarded under phase 2 of the SBIR Program, for a demonstration of the feasibility of a new high-speed parallel simulation processor, called the Real-Time Accelerator (RTA). The principal goals were met, and EAI is now proceeding with phase 3: development of a commercial product. This product is scheduled for commercial introduction in the second quarter of 1992.
Design consideration in constructing high performance embedded Knowledge-Based Systems (KBS)

NASA Technical Reports Server (NTRS)

Dalton, Shelly D.; Daley, Philip C.

1988-01-01

As the hardware trends for artificial intelligence (AI) involve more and more complexity, the process of optimizing the computer system design for a particular problem will also increase in complexity. Space applications of knowledge based systems (KBS) will often require an ability to perform both numerically intensive vector computations and real time symbolic computations. Although parallel machines can theoretically achieve the speeds necessary for most of these problems, if the application itself is not highly parallel, the machine's power cannot be utilized. A scheme is presented which will provide the computer systems engineer with a tool for analyzing machines with various configurations of array, symbolic, scaler, and multiprocessors. High speed networks and interconnections make customized, distributed, intelligent systems feasible for the application of AI in space. The method presented can be used to optimize such AI system configurations and to make comparisons between existing computer systems. It is an open question whether or not, for a given mission requirement, a suitable computer system design can be constructed for any amount of money.
PCLIPS: Parallel CLIPS

NASA Technical Reports Server (NTRS)

Hall, Lawrence O.; Bennett, Bonnie H.; Tello, Ivan

1994-01-01

A parallel version of CLIPS 5.1 has been developed to run on Intel Hypercubes. The user interface is the same as that for CLIPS with some added commands to allow for parallel calls. A complete version of CLIPS runs on each node of the hypercube. The system has been instrumented to display the time spent in the match, recognize, and act cycles on each node. Only rule-level parallelism is supported. Parallel commands enable the assertion and retraction of facts to/from remote nodes working memory. Parallel CLIPS was used to implement a knowledge-based command, control, communications, and intelligence (C(sup 3)I) system to demonstrate the fusion of high-level, disparate sources. We discuss the nature of the information fusion problem, our approach, and implementation. Parallel CLIPS has also be used to run several benchmark parallel knowledge bases such as one to set up a cafeteria. Results show from running Parallel CLIPS with parallel knowledge base partitions indicate that significant speed increases, including superlinear in some cases, are possible.
A parallel implementation of a multisensor feature-based range-estimation method

NASA Technical Reports Server (NTRS)

Suorsa, Raymond E.; Sridhar, Banavar

1993-01-01

There are many proposed vision based methods to perform obstacle detection and avoidance for autonomous or semi-autonomous vehicles. All methods, however, will require very high processing rates to achieve real time performance. A system capable of supporting autonomous helicopter navigation will need to extract obstacle information from imagery at rates varying from ten frames per second to thirty or more frames per second depending on the vehicle speed. Such a system will need to sustain billions of operations per second. To reach such high processing rates using current technology, a parallel implementation of the obstacle detection/ranging method is required. This paper describes an efficient and flexible parallel implementation of a multisensor feature-based range-estimation algorithm, targeted for helicopter flight, realized on both a distributed-memory and shared-memory parallel computer.
Parallel optoelectronic trinary signed-digit division

NASA Astrophysics Data System (ADS)

Alam, Mohammad S.

1999-03-01

The trinary signed-digit (TSD) number system has been found to be very useful for parallel addition and subtraction of any arbitrary length operands in constant time. Using the TSD addition and multiplication modules as the basic building blocks, we develop an efficient algorithm for performing parallel TSD division in constant time. The proposed division technique uses one TSD subtraction and two TSD multiplication steps. An optoelectronic correlator based architecture is suggested for implementation of the proposed TSD division algorithm, which fully exploits the parallelism and high processing speed of optics. An efficient spatial encoding scheme is used to ensure better utilization of space bandwidth product of the spatial light modulators used in the optoelectronic implementation.
ARC-2007-ACD07-0184-003

NASA Image and Video Library

2007-09-26

From left: Data Parallel Line Relaxation (DPLR) software team members Kerry Trumble, Deepak Bose and David Hash analyze and predict the extreme environments NASA's space shuttle experiences during its super high-speed reentry into Earthâ€™s atmosphere.
The crew activity planning system bus interface unit

NASA Technical Reports Server (NTRS)

Allen, M. A.

1979-01-01

The hardware and software designs used to implement a high speed parallel communications interface to the MITRE 307.2 kilobit/second serial bus communications system are described. The primary topic is the development of the bus interface unit.
Holographic memory for high-density data storage and high-speed pattern recognition

NASA Astrophysics Data System (ADS)

Gu, Claire

2002-09-01

As computers and the internet become faster and faster, more and more information is transmitted, received, and stored everyday. The demand for high density and fast access time data storage is pushing scientists and engineers to explore all possible approaches including magnetic, mechanical, optical, etc. Optical data storage has already demonstrated its potential in the competition against other storage technologies. CD and DVD are showing their advantages in the computer and entertainment market. What motivated the use of optical waves to store and access information is the same as the motivation for optical communication. Light or an optical wave has an enormous capacity (or bandwidth) to carry information because of its short wavelength and parallel nature. In optical storage, there are two types of mechanism, namely localized and holographic memories. What gives the holographic data storage an advantage over localized bit storage is the natural ability to read the stored information in parallel, therefore, meeting the demand for fast access. Another unique feature that makes the holographic data storage attractive is that it is capable of performing associative recall at an incomparable speed. Therefore, volume holographic memory is particularly suitable for high-density data storage and high-speed pattern recognition. In this paper, we review previous works on volume holographic memories and discuss the challenges for this technology to become a reality.
Highly accelerated cardiovascular MR imaging using many channel technology: concepts and clinical applications

PubMed Central

Sodickson, Daniel K.

2010-01-01

Cardiovascular magnetic resonance imaging (CVMRI) is of proven clinical value in the non-invasive imaging of cardiovascular diseases. CVMRI requires rapid image acquisition, but acquisition speed is fundamentally limited in conventional MRI. Parallel imaging provides a means for increasing acquisition speed and efficiency. However, signal-to-noise (SNR) limitations and the limited number of receiver channels available on most MR systems have in the past imposed practical constraints, which dictated the use of moderate accelerations in CVMRI. High levels of acceleration, which were unattainable previously, have become possible with many-receiver MR systems and many-element, cardiac-optimized RF-coil arrays. The resulting imaging speed improvements can be exploited in a number of ways, ranging from enhancement of spatial and temporal resolution to efficient whole heart coverage to streamlining of CVMRI work flow. In this review, examples of these strategies are provided, following an outline of the fundamentals of the highly accelerated imaging approaches employed in CVMRI. Topics discussed include basic principles of parallel imaging; key requirements for MR systems and RF-coil design; practical considerations of SNR management, supported by multi-dimensional accelerations, 3D noise averaging and high field imaging; highly accelerated clinical state-of-the art cardiovascular imaging applications spanning the range from SNR-rich to SNR-limited; and current trends and future directions. PMID:17562047
A parallel algorithm for switch-level timing simulation on a hypercube multiprocessor

NASA Technical Reports Server (NTRS)

Rao, Hariprasad Nannapaneni

1989-01-01

The parallel approach to speeding up simulation is studied, specifically the simulation of digital LSI MOS circuitry on the Intel iPSC/2 hypercube. The simulation algorithm is based on RSIM, an event driven switch-level simulator that incorporates a linear transistor model for simulating digital MOS circuits. Parallel processing techniques based on the concepts of Virtual Time and rollback are utilized so that portions of the circuit may be simulated on separate processors, in parallel for as large an increase in speed as possible. A partitioning algorithm is also developed in order to subdivide the circuit for parallel processing.
Pushbroom Stereo for High-Speed Navigation in Cluttered Environments

DTIC Science & Technology

2014-09-01

inertial measurement sensors such as Achtelik et al .’s implemention of PTAM (parallel tracking and mapping) [15] with a barometric altimeter, stable flights...in indoor and outdoor environments are possible [1]. With a full vison- aided inertial navigation system (VINS), Li et al . have shown remarkable...avoidance on small UAVs. Stereo systems suffer from a similar speed issue, with most modern systems running at or below 30 Hz [8], [27]. Honegger et
High-performance parallel interface to synchronous optical network gateway

DOEpatents

St. John, W.B.; DuBois, D.H.

1996-12-03

Disclosed is a system of sending and receiving gateways interconnects high speed data interfaces, e.g., HIPPI interfaces, through fiber optic links, e.g., a SONET network. An electronic stripe distributor distributes bytes of data from a first interface at the sending gateway onto parallel fiber optics of the fiber optic link to form transmitted data. An electronic stripe collector receives the transmitted data on the parallel fiber optics and reforms the data into a format effective for input to a second interface at the receiving gateway. Preferably, an error correcting syndrome is constructed at the sending gateway and sent with a data frame so that transmission errors can be detected and corrected in a real-time basis. Since the high speed data interface operates faster than any of the fiber optic links the transmission rate must be adapted to match the available number of fiber optic links so the sending and receiving gateways monitor the availability of fiber links and adjust the data throughput accordingly. In another aspect, the receiving gateway must have sufficient available buffer capacity to accept an incoming data frame. A credit-based flow control system provides for continuously updating the sending gateway on the available buffer capacity at the receiving gateway. 7 figs.
Parallel Solver for Diffuse Optical Tomography on Realistic Head Models With Scattering and Clear Regions.

PubMed

Placati, Silvio; Guermandi, Marco; Samore, Andrea; Scarselli, Eleonora Franchi; Guerrieri, Roberto

2016-09-01

Diffuse optical tomography is an imaging technique, based on evaluation of how light propagates within the human head to obtain the functional information about the brain. Precision in reconstructing such an optical properties map is highly affected by the accuracy of the light propagation model implemented, which needs to take into account the presence of clear and scattering tissues. We present a numerical solver based on the radiosity-diffusion model, integrating the anatomical information provided by a structural MRI. The solver is designed to run on parallel heterogeneous platforms based on multiple GPUs and CPUs. We demonstrate how the solver provides a 7 times speed-up over an isotropic-scattered parallel Monte Carlo engine based on a radiative transport equation for a domain composed of 2 million voxels, along with a significant improvement in accuracy. The speed-up greatly increases for larger domains, allowing us to compute the light distribution of a full human head ( ≈ 3 million voxels) in 116 s for the platform used.
Precision Parameter Estimation and Machine Learning

NASA Astrophysics Data System (ADS)

Wandelt, Benjamin D.

2008-12-01

I discuss the strategy of ``Acceleration by Parallel Precomputation and Learning'' (AP-PLe) that can vastly accelerate parameter estimation in high-dimensional parameter spaces and costly likelihood functions, using trivially parallel computing to speed up sequential exploration of parameter space. This strategy combines the power of distributed computing with machine learning and Markov-Chain Monte Carlo techniques efficiently to explore a likelihood function, posterior distribution or χ2-surface. This strategy is particularly successful in cases where computing the likelihood is costly and the number of parameters is moderate or large. We apply this technique to two central problems in cosmology: the solution of the cosmological parameter estimation problem with sufficient accuracy for the Planck data using PICo; and the detailed calculation of cosmological helium and hydrogen recombination with RICO. Since the APPLe approach is designed to be able to use massively parallel resources to speed up problems that are inherently serial, we can bring the power of distributed computing to bear on parameter estimation problems. We have demonstrated this with the CosmologyatHome project.
Real-world hydrologic assessment of a fully-distributed hydrological model in a parallel computing environment

NASA Astrophysics Data System (ADS)

Vivoni, Enrique R.; Mascaro, Giuseppe; Mniszewski, Susan; Fasel, Patricia; Springer, Everett P.; Ivanov, Valeriy Y.; Bras, Rafael L.

2011-10-01

SummaryA major challenge in the use of fully-distributed hydrologic models has been the lack of computational capabilities for high-resolution, long-term simulations in large river basins. In this study, we present the parallel model implementation and real-world hydrologic assessment of the Triangulated Irregular Network (TIN)-based Real-time Integrated Basin Simulator (tRIBS). Our parallelization approach is based on the decomposition of a complex watershed using the channel network as a directed graph. The resulting sub-basin partitioning divides effort among processors and handles hydrologic exchanges across boundaries. Through numerical experiments in a set of nested basins, we quantify parallel performance relative to serial runs for a range of processors, simulation complexities and lengths, and sub-basin partitioning methods, while accounting for inter-run variability on a parallel computing system. In contrast to serial simulations, the parallel model speed-up depends on the variability of hydrologic processes. Load balancing significantly improves parallel speed-up with proportionally faster runs as simulation complexity (domain resolution and channel network extent) increases. The best strategy for large river basins is to combine a balanced partitioning with an extended channel network, with potential savings through a lower TIN resolution. Based on these advances, a wider range of applications for fully-distributed hydrologic models are now possible. This is illustrated through a set of ensemble forecasts that account for precipitation uncertainty derived from a statistical downscaling model.
GSRP/David Marshall: Fully Automated Cartesian Grid CFD Application for MDO in High Speed Flows

NASA Technical Reports Server (NTRS)

2003-01-01

With the renewed interest in Cartesian gridding methodologies for the ease and speed of gridding complex geometries in addition to the simplicity of the control volumes used in the computations, it has become important to investigate ways of extending the existing Cartesian grid solver functionalities. This includes developing methods of modeling the viscous effects in order to utilize Cartesian grids solvers for accurate drag predictions and addressing the issues related to the distributed memory parallelization of Cartesian solvers. This research presents advances in two areas of interest in Cartesian grid solvers, viscous effects modeling and MPI parallelization. The development of viscous effects modeling using solely Cartesian grids has been hampered by the widely varying control volume sizes associated with the mesh refinement and the cut cells associated with the solid surface. This problem is being addressed by using physically based modeling techniques to update the state vectors of the cut cells and removing them from the finite volume integration scheme. This work is performed on a new Cartesian grid solver, NASCART-GT, with modifications to its cut cell functionality. The development of MPI parallelization addresses issues associated with utilizing Cartesian solvers on distributed memory parallel environments. This work is performed on an existing Cartesian grid solver, CART3D, with modifications to its parallelization methodology.
The structure of the electron diffusion region during asymmetric anti-parallel magnetic reconnection

NASA Astrophysics Data System (ADS)

Swisdak, M.; Drake, J. F.; Price, L.; Burch, J. L.; Cassak, P.

2017-12-01

The structure of the electron diffusion region during asymmetric magnetic reconnection is ex- plored with high-resolution particle-in-cell simulations that focus on an magnetopause event ob- served by the Magnetospheric Multiscale Mission (MMS). A major surprise is the development of a standing, oblique whistler-like structure with regions of intense positive and negative dissipation. This structure arises from high-speed electrons that flow along the magnetosheath magnetic sepa- ratrices, converge in the dissipation region and jet across the x-line into the magnetosphere. The jet produces a region of negative charge and generates intense parallel electric fields that eject the electrons downstream along the magnetospheric separatrices. The ejected electrons produce the parallel velocity-space crescents documented by MMS.

Optimizing ion channel models using a parallel genetic algorithm on graphical processors.

PubMed

Ben-Shalom, Roy; Aviv, Amit; Razon, Benjamin; Korngreen, Alon

2012-01-01

We have recently shown that we can semi-automatically constrain models of voltage-gated ion channels by combining a stochastic search algorithm with ionic currents measured using multiple voltage-clamp protocols. Although numerically successful, this approach is highly demanding computationally, with optimization on a high performance Linux cluster typically lasting several days. To solve this computational bottleneck we converted our optimization algorithm for work on a graphical processing unit (GPU) using NVIDIA's CUDA. Parallelizing the process on a Fermi graphic computing engine from NVIDIA increased the speed ∼180 times over an application running on an 80 node Linux cluster, considerably reducing simulation times. This application allows users to optimize models for ion channel kinetics on a single, inexpensive, desktop "super computer," greatly reducing the time and cost of building models relevant to neuronal physiology. We also demonstrate that the point of algorithm parallelization is crucial to its performance. We substantially reduced computing time by solving the ODEs (Ordinary Differential Equations) so as to massively reduce memory transfers to and from the GPU. This approach may be applied to speed up other data intensive applications requiring iterative solutions of ODEs. Copyright © 2012 Elsevier B.V. All rights reserved.
Estimation of vibration frequency of loudspeaker diaphragm by parallel phase-shifting digital holography

NASA Astrophysics Data System (ADS)

Kakue, T.; Endo, Y.; Shimobaba, T.; Ito, T.

2014-11-01

We report frequency estimation of loudspeaker diaphragm vibrating at high speed by parallel phase-shifting digital holography which is a technique of single-shot phase-shifting interferometry. This technique records multiple phaseshifted holograms required for phase-shifting interferometry by using space-division multiplexing. We constructed a parallel phase-shifting digital holography system consisting of a high-speed polarization-imaging camera. This camera has a micro-polarizer array which selects four linear polarization axes for 2 × 2 pixels. We set a loudspeaker as an object, and recorded vibration of diaphragm of the loudspeaker by the constructed system. By the constructed system, we demonstrated observation of vibration displacement of loudspeaker diaphragm. In this paper, we aim to estimate vibration frequency of the loudspeaker diaphragm by applying the experimental results to frequency analysis. Holograms consisting of 128 × 128 pixels were recorded at a frame rate of 262,500 frames per second by the camera. A sinusoidal wave was input to the loudspeaker via a phone connector. We observed displacement of the loudspeaker diaphragm vibrating by the system. We also succeeded in estimating vibration frequency of the loudspeaker diaphragm by applying frequency analysis to the experimental results.
Matching pursuit parallel decomposition of seismic data

NASA Astrophysics Data System (ADS)

Li, Chuanhui; Zhang, Fanchang

2017-07-01

In order to improve the computation speed of matching pursuit decomposition of seismic data, a matching pursuit parallel algorithm is designed in this paper. We pick a fixed number of envelope peaks from the current signal in every iteration according to the number of compute nodes and assign them to the compute nodes on average to search the optimal Morlet wavelets in parallel. With the help of parallel computer systems and Message Passing Interface, the parallel algorithm gives full play to the advantages of parallel computing to significantly improve the computation speed of the matching pursuit decomposition and also has good expandability. Besides, searching only one optimal Morlet wavelet by every compute node in every iteration is the most efficient implementation.
A Comparative Propulsion System Analysis for the High-Speed Civil Transport

NASA Technical Reports Server (NTRS)

Berton, Jeffrey J.; Haller, William J.; Senick, Paul F.; Jones, Scott M.; Seidel, Jonathan A.

2005-01-01

Six of the candidate propulsion systems for the High-Speed Civil Transport are the turbojet, turbine bypass engine, mixed flow turbofan, variable cycle engine, Flade engine, and the inverting flow valve engine. A comparison of these propulsion systems by NASA's Glenn Research Center, paralleling studies within the aircraft industry, is presented. This report describes the Glenn Aeropropulsion Analysis Office's contribution to the High-Speed Research Program's 1993 and 1994 propulsion system selections. A parametric investigation of each propulsion cycle's primary design variables is analytically performed. Performance, weight, and geometric data are calculated for each engine. The resulting engines are then evaluated on two airframer-derived supersonic commercial aircraft for a 5000 nautical mile, Mach 2.4 cruise design mission. The effects of takeoff noise, cruise emissions, and cycle design rules are examined.
(abstract) A High Throughput 3-D Inner Product Processor

NASA Technical Reports Server (NTRS)

Daud, Tuan

1996-01-01

A particularily challenging image processing application is the real time scene acquisition and object discrimination. It requires spatio-temporal recognition of point and resolved objects at high speeds with parallel processing algorithms. Neural network paradigms provide fine grain parallism and, when implemented in hardware, offer orders of magnitude speed up. However, neural networks implemented on a VLSI chip are planer architectures capable of efficient processing of linear vector signals rather than 2-D images. Therefore, for processing of images, a 3-D stack of neural-net ICs receiving planar inputs and consuming minimal power are required. Details of the circuits with chip architectures will be described with need to develop ultralow-power electronics. Further, use of the architecture in a system for high-speed processing will be illustrated.
On Multiple AER Handshaking Channels Over High-Speed Bit-Serial Bidirectional LVDS Links With Flow-Control and Clock-Correction on Commercial FPGAs for Scalable Neuromorphic Systems.

PubMed

Yousefzadeh, Amirreza; Jablonski, Miroslaw; Iakymchuk, Taras; Linares-Barranco, Alejandro; Rosado, Alfredo; Plana, Luis A; Temple, Steve; Serrano-Gotarredona, Teresa; Furber, Steve B; Linares-Barranco, Bernabe

2017-10-01

Address event representation (AER) is a widely employed asynchronous technique for interchanging "neural spikes" between different hardware elements in neuromorphic systems. Each neuron or cell in a chip or a system is assigned an address (or ID), which is typically communicated through a high-speed digital bus, thus time-multiplexing a high number of neural connections. Conventional AER links use parallel physical wires together with a pair of handshaking signals (request and acknowledge). In this paper, we present a fully serial implementation using bidirectional SATA connectors with a pair of low-voltage differential signaling (LVDS) wires for each direction. The proposed implementation can multiplex a number of conventional parallel AER links for each physical LVDS connection. It uses flow control, clock correction, and byte alignment techniques to transmit 32-bit address events reliably over multiplexed serial connections. The setup has been tested using commercial Spartan6 FPGAs attaining a maximum event transmission speed of 75 Meps (Mega events per second) for 32-bit events at a line rate of 3.0 Gbps. Full HDL codes (vhdl/verilog) and example demonstration codes for the SpiNNaker platform will be made available.
Dual Super-Systolic Core for Real-Time Reconstructive Algorithms of High-Resolution Radar/SAR Imaging Systems

PubMed Central

Atoche, Alejandro Castillo; Castillo, Javier Vázquez

2012-01-01

A high-speed dual super-systolic core for reconstructive signal processing (SP) operations consists of a double parallel systolic array (SA) machine in which each processing element of the array is also conceptualized as another SA in a bit-level fashion. In this study, we addressed the design of a high-speed dual super-systolic array (SSA) core for the enhancement/reconstruction of remote sensing (RS) imaging of radar/synthetic aperture radar (SAR) sensor systems. The selected reconstructive SP algorithms are efficiently transformed in their parallel representation and then, they are mapped into an efficient high performance embedded computing (HPEC) architecture in reconfigurable Xilinx field programmable gate array (FPGA) platforms. As an implementation test case, the proposed approach was aggregated in a HW/SW co-design scheme in order to solve the nonlinear ill-posed inverse problem of nonparametric estimation of the power spatial spectrum pattern (SSP) from a remotely sensed scene. We show how such dual SSA core, drastically reduces the computational load of complex RS regularization techniques achieving the required real-time operational mode. PMID:22736964
Parallel performance investigations of an unstructured mesh Navier-Stokes solver

NASA Technical Reports Server (NTRS)

Mavriplis, Dimitri J.

2000-01-01

A Reynolds-averaged Navier-Stokes solver based on unstructured mesh techniques for analysis of high-lift configurations is described. The method makes use of an agglomeration multigrid solver for convergence acceleration. Implicit line-smoothing is employed to relieve the stiffness associated with highly stretched meshes. A GMRES technique is also implemented to speed convergence at the expense of additional memory usage. The solver is cache efficient and fully vectorizable, and is parallelized using a two-level hybrid MPI-OpenMP implementation suitable for shared and/or distributed memory architectures, as well as clusters of shared memory machines. Convergence and scalability results are illustrated for various high-lift cases.
A parallel input composite transimpedance amplifier.

PubMed

Kim, D J; Kim, C

2018-01-01

A new approach to high performance current to voltage preamplifier design is presented. The design using multiple operational amplifiers (op-amps) has a parasitic capacitance compensation network and a composite amplifier topology for fast, precision, and low noise performance. The input stage consisting of a parallel linked JFET op-amps and a high-speed bipolar junction transistor (BJT) gain stage driving the output in the composite amplifier topology, cooperating with the capacitance compensation feedback network, ensures wide bandwidth stability in the presence of input capacitance above 40 nF. The design is ideal for any two-probe measurement, including high impedance transport and scanning tunneling microscopy measurements.
A parallel input composite transimpedance amplifier

NASA Astrophysics Data System (ADS)

Kim, D. J.; Kim, C.

2018-01-01

A new approach to high performance current to voltage preamplifier design is presented. The design using multiple operational amplifiers (op-amps) has a parasitic capacitance compensation network and a composite amplifier topology for fast, precision, and low noise performance. The input stage consisting of a parallel linked JFET op-amps and a high-speed bipolar junction transistor (BJT) gain stage driving the output in the composite amplifier topology, cooperating with the capacitance compensation feedback network, ensures wide bandwidth stability in the presence of input capacitance above 40 nF. The design is ideal for any two-probe measurement, including high impedance transport and scanning tunneling microscopy measurements.
Experimental and Computational Sonic Boom Assessment of Lockheed-Martin N+2 Low Boom Models

NASA Technical Reports Server (NTRS)

Cliff, Susan E.; Durston, Donald A.; Elmiligui, Alaa A.; Walker, Eric L.; Carter, Melissa B.

2015-01-01

Flight at speeds greater than the speed of sound is not permitted over land, primarily because of the noise and structural damage caused by sonic boom pressure waves of supersonic aircraft. Mitigation of sonic boom is a key focus area of the High Speed Project under NASA's Fundamental Aeronautics Program. The project is focusing on technologies to enable future civilian aircraft to fly efficiently with reduced sonic boom, engine and aircraft noise, and emissions. A major objective of the project is to improve both computational and experimental capabilities for design of low-boom, high-efficiency aircraft. NASA and industry partners are developing improved wind tunnel testing techniques and new pressure instrumentation to measure the weak sonic boom pressure signatures of modern vehicle concepts. In parallel, computational methods are being developed to provide rapid design and analysis of supersonic aircraft with improved meshing techniques that provide efficient, robust, and accurate on- and off-body pressures at several body lengths from vehicles with very low sonic boom overpressures. The maturity of these critical parallel efforts is necessary before low-boom flight can be demonstrated and commercial supersonic flight can be realized.
Directly measuring of thermal pulse transfer in one-dimensional highly aligned carbon nanotubes.

PubMed

Zhang, Guang; Liu, Changhong; Fan, Shoushan

2013-01-01

Using a simple and precise instrument system, we directly measured the thermo-physical properties of one-dimensional highly aligned carbon nanotubes (CNTs). A kind of CNT-based macroscopic materials named super aligned carbon nanotube (SACNT) buckypapers was measured in our experiment. We defined a new one-dimensional parameter, the "thermal transfer speed" to characterize the thermal damping mechanisms in the SACNT buckypapers. Our results indicated that the SACNT buckypapers with different densities have obviously different thermal transfer speeds. Furthermore, we found that the thermal transfer speed of high-density SACNT buckypapers may have an obvious damping factor along the CNTs aligned direction. The anisotropic thermal diffusivities of SACNT buckypapers could be calculated by the thermal transfer speeds. The thermal diffusivities obviously increase as the buckypaper-density increases. For parallel SACNT buckypapers, the thermal diffusivity could be as high as 562.2 ± 55.4 mm(2)/s. The thermal conductivities of these SACNT buckypapers were also calculated by the equation k = Cpαρ.
Compact holographic optical neural network system for real-time pattern recognition

NASA Astrophysics Data System (ADS)

Lu, Taiwei; Mintzer, David T.; Kostrzewski, Andrew A.; Lin, Freddie S.

1996-08-01

One of the important characteristics of artificial neural networks is their capability for massive interconnection and parallel processing. Recently, specialized electronic neural network processors and VLSI neural chips have been introduced in the commercial market. The number of parallel channels they can handle is limited because of the limited parallel interconnections that can be implemented with 1D electronic wires. High-resolution pattern recognition problems can require a large number of neurons for parallel processing of an image. This paper describes a holographic optical neural network (HONN) that is based on high- resolution volume holographic materials and is capable of performing massive 3D parallel interconnection of tens of thousands of neurons. A HONN with more than 16,000 neurons packaged in an attache case has been developed. Rotation- shift-scale-invariant pattern recognition operations have been demonstrated with this system. System parameters such as the signal-to-noise ratio, dynamic range, and processing speed are discussed.
Parallel processing of genomics data

NASA Astrophysics Data System (ADS)

Agapito, Giuseppe; Guzzi, Pietro Hiram; Cannataro, Mario

2016-10-01

The availability of high-throughput experimental platforms for the analysis of biological samples, such as mass spectrometry, microarrays and Next Generation Sequencing, have made possible to analyze a whole genome in a single experiment. Such platforms produce an enormous volume of data per single experiment, thus the analysis of this enormous flow of data poses several challenges in term of data storage, preprocessing, and analysis. To face those issues, efficient, possibly parallel, bioinformatics software needs to be used to preprocess and analyze data, for instance to highlight genetic variation associated with complex diseases. In this paper we present a parallel algorithm for the parallel preprocessing and statistical analysis of genomics data, able to face high dimension of data and resulting in good response time. The proposed system is able to find statistically significant biological markers able to discriminate classes of patients that respond to drugs in different ways. Experiments performed on real and synthetic genomic datasets show good speed-up and scalability.
Performance evaluation of canny edge detection on a tiled multicore architecture

NASA Astrophysics Data System (ADS)

Brethorst, Andrew Z.; Desai, Nehal; Enright, Douglas P.; Scrofano, Ronald

2011-01-01

In the last few years, a variety of multicore architectures have been used to parallelize image processing applications. In this paper, we focus on assessing the parallel speed-ups of different Canny edge detection parallelization strategies on the Tile64, a tiled multicore architecture developed by the Tilera Corporation. Included in these strategies are different ways Canny edge detection can be parallelized, as well as differences in data management. The two parallelization strategies examined were loop-level parallelism and domain decomposition. Loop-level parallelism is achieved through the use of OpenMP,1 and it is capable of parallelization across the range of values over which a loop iterates. Domain decomposition is the process of breaking down an image into subimages, where each subimage is processed independently, in parallel. The results of the two strategies show that for the same number of threads, programmer implemented, domain decomposition exhibits higher speed-ups than the compiler managed, loop-level parallelism implemented with OpenMP.
Wavelet-space correlation imaging for high-speed MRI without motion monitoring or data segmentation.

PubMed

Li, Yu; Wang, Hui; Tkach, Jean; Roach, David; Woods, Jason; Dumoulin, Charles

2015-12-01

This study aims to (i) develop a new high-speed MRI approach by implementing correlation imaging in wavelet-space, and (ii) demonstrate the ability of wavelet-space correlation imaging to image human anatomy with involuntary or physiological motion. Correlation imaging is a high-speed MRI framework in which image reconstruction relies on quantification of data correlation. The presented work integrates correlation imaging with a wavelet transform technique developed originally in the field of signal and image processing. This provides a new high-speed MRI approach to motion-free data collection without motion monitoring or data segmentation. The new approach, called "wavelet-space correlation imaging", is investigated in brain imaging with involuntary motion and chest imaging with free-breathing. Wavelet-space correlation imaging can exceed the speed limit of conventional parallel imaging methods. Using this approach with high acceleration factors (6 for brain MRI, 16 for cardiac MRI, and 8 for lung MRI), motion-free images can be generated in static brain MRI with involuntary motion and nonsegmented dynamic cardiac/lung MRI with free-breathing. Wavelet-space correlation imaging enables high-speed MRI in the presence of involuntary motion or physiological dynamics without motion monitoring or data segmentation. © 2014 Wiley Periodicals, Inc.
Distributed Large Data-Object Environments: End-to-End Performance Analysis of High Speed Distributed Storage Systems in Wide Area ATM Networks

NASA Technical Reports Server (NTRS)

Johnston, William; Tierney, Brian; Lee, Jason; Hoo, Gary; Thompson, Mary

1996-01-01

We have developed and deployed a distributed-parallel storage system (DPSS) in several high speed asynchronous transfer mode (ATM) wide area networks (WAN) testbeds to support several different types of data-intensive applications. Architecturally, the DPSS is a network striped disk array, but is fairly unique in that its implementation allows applications complete freedom to determine optimal data layout, replication and/or coding redundancy strategy, security policy, and dynamic reconfiguration. In conjunction with the DPSS, we have developed a 'top-to-bottom, end-to-end' performance monitoring and analysis methodology that has allowed us to characterize all aspects of the DPSS operating in high speed ATM networks. In particular, we have run a variety of performance monitoring experiments involving the DPSS in the MAGIC testbed, which is a large scale, high speed, ATM network and we describe our experience using the monitoring methodology to identify and correct problems that limit the performance of high speed distributed applications. Finally, the DPSS is part of an overall architecture for using high speed, WAN's for enabling the routine, location independent use of large data-objects. Since this is part of the motivation for a distributed storage system, we describe this architecture.
Wavelet-space Correlation Imaging for High-speed MRI without Motion Monitoring or Data Segmentation

PubMed Central

Li, Yu; Wang, Hui; Tkach, Jean; Roach, David; Woods, Jason; Dumoulin, Charles

2014-01-01

Purpose This study aims to 1) develop a new high-speed MRI approach by implementing correlation imaging in wavelet-space, and 2) demonstrate the ability of wavelet-space correlation imaging to image human anatomy with involuntary or physiological motion. Methods Correlation imaging is a high-speed MRI framework in which image reconstruction relies on quantification of data correlation. The presented work integrates correlation imaging with a wavelet transform technique developed originally in the field of signal and image processing. This provides a new high-speed MRI approach to motion-free data collection without motion monitoring or data segmentation. The new approach, called “wavelet-space correlation imaging”, is investigated in brain imaging with involuntary motion and chest imaging with free-breathing. Results Wavelet-space correlation imaging can exceed the speed limit of conventional parallel imaging methods. Using this approach with high acceleration factors (6 for brain MRI, 16 for cardiac MRI and 8 for lung MRI), motion-free images can be generated in static brain MRI with involuntary motion and nonsegmented dynamic cardiac/lung MRI with free-breathing. Conclusion Wavelet-space correlation imaging enables high-speed MRI in the presence of involuntary motion or physiological dynamics without motion monitoring or data segmentation. PMID:25470230
Parallel peak pruning for scalable SMP contour tree computation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Carr, Hamish A.; Weber, Gunther H.; Sewell, Christopher M.

As data sets grow to exascale, automated data analysis and visualisation are increasingly important, to intermediate human understanding and to reduce demands on disk storage via in situ analysis. Trends in architecture of high performance computing systems necessitate analysis algorithms to make effective use of combinations of massively multicore and distributed systems. One of the principal analytic tools is the contour tree, which analyses relationships between contours to identify features of more than local importance. Unfortunately, the predominant algorithms for computing the contour tree are explicitly serial, and founded on serial metaphors, which has limited the scalability of this formmore » of analysis. While there is some work on distributed contour tree computation, and separately on hybrid GPU-CPU computation, there is no efficient algorithm with strong formal guarantees on performance allied with fast practical performance. Here in this paper, we report the first shared SMP algorithm for fully parallel contour tree computation, withfor-mal guarantees of O(lgnlgt) parallel steps and O(n lgn) work, and implementations with up to 10x parallel speed up in OpenMP and up to 50x speed up in NVIDIA Thrust.« less
Image sensor with high dynamic range linear output

NASA Technical Reports Server (NTRS)

Yadid-Pecht, Orly (Inventor); Fossum, Eric R. (Inventor)

2007-01-01

Designs and operational methods to increase the dynamic range of image sensors and APS devices in particular by achieving more than one integration times for each pixel thereof. An APS system with more than one column-parallel signal chains for readout are described for maintaining a high frame rate in readout. Each active pixel is sampled for multiple times during a single frame readout, thus resulting in multiple integration times. The operation methods can also be used to obtain multiple integration times for each pixel with an APS design having a single column-parallel signal chain for readout. Furthermore, analog-to-digital conversion of high speed and high resolution can be implemented.

Highly scalable parallel processing of extracellular recordings of Multielectrode Arrays.

PubMed

Gehring, Tiago V; Vasilaki, Eleni; Giugliano, Michele

2015-01-01

Technological advances of Multielectrode Arrays (MEAs) used for multisite, parallel electrophysiological recordings, lead to an ever increasing amount of raw data being generated. Arrays with hundreds up to a few thousands of electrodes are slowly seeing widespread use and the expectation is that more sophisticated arrays will become available in the near future. In order to process the large data volumes resulting from MEA recordings there is a pressing need for new software tools able to process many data channels in parallel. Here we present a new tool for processing MEA data recordings that makes use of new programming paradigms and recent technology developments to unleash the power of modern highly parallel hardware, such as multi-core CPUs with vector instruction sets or GPGPUs. Our tool builds on and complements existing MEA data analysis packages. It shows high scalability and can be used to speed up some performance critical pre-processing steps such as data filtering and spike detection, helping to make the analysis of larger data sets tractable.
Template based parallel checkpointing in a massively parallel computer system

DOEpatents

Archer, Charles Jens [Rochester, MN; Inglett, Todd Alan [Rochester, MN

2009-01-13

A method and apparatus for a template based parallel checkpoint save for a massively parallel super computer system using a parallel variation of the rsync protocol, and network broadcast. In preferred embodiments, the checkpoint data for each node is compared to a template checkpoint file that resides in the storage and that was previously produced. Embodiments herein greatly decrease the amount of data that must be transmitted and stored for faster checkpointing and increased efficiency of the computer system. Embodiments are directed to a parallel computer system with nodes arranged in a cluster with a high speed interconnect that can perform broadcast communication. The checkpoint contains a set of actual small data blocks with their corresponding checksums from all nodes in the system. The data blocks may be compressed using conventional non-lossy data compression algorithms to further reduce the overall checkpoint size.
Running accuracy analysis of a 3-RRR parallel kinematic machine considering the deformations of the links

NASA Astrophysics Data System (ADS)

Wang, Liping; Jiang, Yao; Li, Tiemin

2014-09-01

Parallel kinematic machines have drawn considerable attention and have been widely used in some special fields. However, high precision is still one of the challenges when they are used for advanced machine tools. One of the main reasons is that the kinematic chains of parallel kinematic machines are composed of elongated links that can easily suffer deformations, especially at high speeds and under heavy loads. A 3-RRR parallel kinematic machine is taken as a study object for investigating its accuracy with the consideration of the deformations of its links during the motion process. Based on the dynamic model constructed by the Newton-Euler method, all the inertia loads and constraint forces of the links are computed and their deformations are derived. Then the kinematic errors of the machine are derived with the consideration of the deformations of the links. Through further derivation, the accuracy of the machine is given in a simple explicit expression, which will be helpful to increase the calculating speed. The accuracy of this machine when following a selected circle path is simulated. The influences of magnitude of the maximum acceleration and external loads on the running accuracy of the machine are investigated. The results show that the external loads will deteriorate the accuracy of the machine tremendously when their direction coincides with the direction of the worst stiffness of the machine. The proposed method provides a solution for predicting the running accuracy of the parallel kinematic machines and can also be used in their design optimization as well as selection of suitable running parameters.
Distributed Computing for Signal Processing: Modeling of Asynchronous Parallel Computation.

DTIC Science & Technology

1986-03-01

the proposed approaches 16, 16, 40 . 451. The conclusion most often reached is that the best scheme to use in a particular design depends highly upon...76. 40 . Siegel, H. J., McMillen. R. J., and Mueller. P. T.. Jr. A survey of interconnection methods for reconligurable parallel processing systems...addressing meehaanm distributed in the network area rimonication% tit reach gigabit./second speeds je g.. PoCoS83 .’ i.V--i the lirO! lk i nitronment is
[Design and study of parallel computing environment of Monte Carlo simulation for particle therapy planning using a public cloud-computing infrastructure].

PubMed

Yokohama, Noriya

2013-07-01

This report was aimed at structuring the design of architectures and studying performance measurement of a parallel computing environment using a Monte Carlo simulation for particle therapy using a high performance computing (HPC) instance within a public cloud-computing infrastructure. Performance measurements showed an approximately 28 times faster speed than seen with single-thread architecture, combined with improved stability. A study of methods of optimizing the system operations also indicated lower cost.
Parallel computing using a Lagrangian formulation

NASA Technical Reports Server (NTRS)

Liou, May-Fun; Loh, Ching Yuen

1991-01-01

A new Lagrangian formulation of the Euler equation is adopted for the calculation of 2-D supersonic steady flow. The Lagrangian formulation represents the inherent parallelism of the flow field better than the common Eulerian formulation and offers a competitive alternative on parallel computers. The implementation of the Lagrangian formulation on the Thinking Machines Corporation CM-2 Computer is described. The program uses a finite volume, first-order Godunov scheme and exhibits high accuracy in dealing with multidimensional discontinuities (slip-line and shock). By using this formulation, a better than six times speed-up was achieved on a 8192-processor CM-2 over a single processor of a CRAY-2.
Parallel computing using a Lagrangian formulation

NASA Technical Reports Server (NTRS)

Liou, May-Fun; Loh, Ching-Yuen

1992-01-01

This paper adopts a new Lagrangian formulation of the Euler equation for the calculation of two dimensional supersonic steady flow. The Lagrangian formulation represents the inherent parallelism of the flow field better than the common Eulerian formulation and offers a competitive alternative on parallel computers. The implementation of the Lagrangian formulation on the Thinking Machines Corporation CM-2 Computer is described. The program uses a finite volume, first-order Godunov scheme and exhibits high accuracy in dealing with multidimensional discontinuities (slip-line and shock). By using this formulation, we have achieved better than six times speed-up on a 8192-processor CM-2 over a single processor of a CRAY-2.
A high-speed on-chip pseudo-random binary sequence generator for multi-tone phase calibration

NASA Astrophysics Data System (ADS)

Gommé, Liesbeth; Vandersteen, Gerd; Rolain, Yves

2011-07-01

An on-chip reference generator is conceived by adopting the technique of decimating a pseudo-random binary sequence (PRBS) signal in parallel sequences. This is of great benefit when high-speed generation of PRBS and PRBS-derived signals is the objective. The design implemented standard CMOS logic is available in commercial libraries to provide the logic functions for the generator. The design allows the user to select the periodicity of the PRBS and the PRBS-derived signals. The characterization of the on-chip generator marks its performance and reveals promising specifications.
Magnetospheric Multiscale Observations of the Electron Diffusion Region of Large Guide Field Magnetic Reconnection

NASA Technical Reports Server (NTRS)

Eriksson, S.; Wilder, F. D.; Ergun, R. E.; Schwartz, S. J.; Cassak, P. A.; Burch, J. L.; Chen, Li-Jen; Torbert, R. B.; Phan, T. D.; Lavraud, B.;

2016-01-01

We report observations from the Magnetospheric Multiscale (MMS) satellites of a large guide field magnetic reconnection event. The observations suggest that two of the four MMS spacecraft sampled the electron diffusion region, whereas the other two spacecraft detected the exhaust jet from the event. The guide magnetic field amplitude is approximately 4 times that of the reconnecting field. The event is accompanied by a significant parallel electric field (E(sub parallel lines) that is larger than predicted by simulations. The high-speed (approximately 300 km/s) crossing of the electron diffusion region limited the data set to one complete electron distribution inside of the electron diffusion region, which shows significant parallel heating. The data suggest that E(sub parallel lines) is balanced by a combination of electron inertia and a parallel gradient of the gyrotropic electron pressure.

Supercomputing on massively parallel bit-serial architectures

NASA Technical Reports Server (NTRS)

Iobst, Ken

1985-01-01

Research on the Goodyear Massively Parallel Processor (MPP) suggests that high-level parallel languages are practical and can be designed with powerful new semantics that allow algorithms to be efficiently mapped to the real machines. For the MPP these semantics include parallel/associative array selection for both dense and sparse matrices, variable precision arithmetic to trade accuracy for speed, micro-pipelined train broadcast, and conditional branching at the processing element (PE) control unit level. The preliminary design of a FORTRAN-like parallel language for the MPP has been completed and is being used to write programs to perform sparse matrix array selection, min/max search, matrix multiplication, Gaussian elimination on single bit arrays and other generic algorithms. A description is given of the MPP design. Features of the system and its operation are illustrated in the form of charts and diagrams.
Multiplexed Oversampling Digitizer in 65 nm CMOS for Column-Parallel CCD Readout

DOE Office of Scientific and Technical Information (OSTI.GOV)

Grace, Carl; Walder, Jean-Pierre; von der Lippe, Henrik

2012-04-10

A digitizer designed to read out column-parallel charge-coupled devices (CCDs) used for high-speed X-ray imaging is presented. The digitizer is included as part of the High-Speed Image Preprocessor with Oversampling (HIPPO) integrated circuit. The digitizer module comprises a multiplexed, oversampling, 12-bit, 80 MS/s pipelined Analog-to-Digital Converter (ADC) and a bank of four fast-settling sample-and-hold amplifiers to instrument four analog channels. The ADC multiplexes and oversamples to reduce its area to allow integration that is pitch-matched to the columns of the CCD. Novel design techniques are used to enable oversampling and multiplexing with a reduced power penalty. The ADC exhibits 188more » ?V-rms noise which is less than 1 LSB at a 12-bit level. The prototype is implemented in a commercially available 65 nm CMOS process. The digitizer will lead to a proof-of-principle 2D 10 Gigapixel/s X-ray detector.« less
The numerical simulation of a high-speed axial flow compressor

NASA Technical Reports Server (NTRS)

Mulac, Richard A.; Adamczyk, John J.

1991-01-01

The advancement of high-speed axial-flow multistage compressors is impeded by a lack of detailed flow-field information. Recent development in compressor flow modeling and numerical simulation have the potential to provide needed information in a timely manner. The development of a computer program is described to solve the viscous form of the average-passage equation system for multistage turbomachinery. Programming issues such as in-core versus out-of-core data storage and CPU utilization (parallelization, vectorization, and chaining) are addressed. Code performance is evaluated through the simulation of the first four stages of a five-stage, high-speed, axial-flow compressor. The second part addresses the flow physics which can be obtained from the numerical simulation. In particular, an examination of the endwall flow structure is made, and its impact on blockage distribution assessed.
High-speed massively parallel scanning

DOEpatents

Decker, Derek E [Byron, CA

2010-07-06

A new technique for recording a series of images of a high-speed event (such as, but not limited to: ballistics, explosives, laser induced changes in materials, etc.) is presented. Such technique(s) makes use of a lenslet array to take image picture elements (pixels) and concentrate light from each pixel into a spot that is much smaller than the pixel. This array of spots illuminates a detector region (e.g., film, as one embodiment) which is scanned transverse to the light, creating tracks of exposed regions. Each track is a time history of the light intensity for a single pixel. By appropriately configuring the array of concentrated spots with respect to the scanning direction of the detection material, different tracks fit between pixels and sufficient lengths are possible which can be of interest in several high-speed imaging applications.
Design of a high-speed digital processing element for parallel simulation

NASA Technical Reports Server (NTRS)

Milner, E. J.; Cwynar, D. S.

1983-01-01

A prototype of a custom designed computer to be used as a processing element in a multiprocessor based jet engine simulator is described. The purpose of the custom design was to give the computer the speed and versatility required to simulate a jet engine in real time. Real time simulations are needed for closed loop testing of digital electronic engine controls. The prototype computer has a microcycle time of 133 nanoseconds. This speed was achieved by: prefetching the next instruction while the current one is executing, transporting data using high speed data busses, and using state of the art components such as a very large scale integration (VLSI) multiplier. Included are discussions of processing element requirements, design philosophy, the architecture of the custom designed processing element, the comprehensive instruction set, the diagnostic support software, and the development status of the custom design.
GPU Particle Tracking and MHD Simulations with Greatly Enhanced Computational Speed

NASA Astrophysics Data System (ADS)

Ziemba, T.; O'Donnell, D.; Carscadden, J.; Cash, M.; Winglee, R.; Harnett, E.

2008-12-01

GPUs are intrinsically highly parallelized systems that provide more than an order of magnitude computing speed over a CPU based systems, for less cost than a high end-workstation. Recent advancements in GPU technologies allow for full IEEE float specifications with performance up to several hundred GFLOPs per GPU, and new software architectures have recently become available to ease the transition from graphics based to scientific applications. This allows for a cheap alternative to standard supercomputing methods and should increase the time to discovery. 3-D particle tracking and MHD codes have been developed using NVIDIA's CUDA and have demonstrated speed up of nearly a factor of 20 over equivalent CPU versions of the codes. Such a speed up enables new applications to develop, including real time running of radiation belt simulations and real time running of global magnetospheric simulations, both of which could provide important space weather prediction tools.
A Comparison of Lifting-Line and CFD Methods with Flight Test Data from a Research Puma Helicopter

NASA Technical Reports Server (NTRS)

Bousman, William G.; Young, Colin; Toulmay, Francois; Gilbert, Neil E.; Strawn, Roger C.; Miller, Judith V.; Maier, Thomas H.; Costes, Michel; Beaumier, Philippe

1996-01-01

Four lifting-line methods were compared with flight test data from a research Puma helicopter and the accuracy assessed over a wide range of flight speeds. Hybrid Computational Fluid Dynamics (CFD) methods were also examined for two high-speed conditions. A parallel analytical effort was performed with the lifting-line methods to assess the effects of modeling assumptions and this provided insight into the adequacy of these methods for load predictions.
Adaptive efficient compression of genomes

PubMed Central

2012-01-01

Modern high-throughput sequencing technologies are able to generate DNA sequences at an ever increasing rate. In parallel to the decreasing experimental time and cost necessary to produce DNA sequences, computational requirements for analysis and storage of the sequences are steeply increasing. Compression is a key technology to deal with this challenge. Recently, referential compression schemes, storing only the differences between a to-be-compressed input and a known reference sequence, gained a lot of interest in this field. However, memory requirements of the current algorithms are high and run times often are slow. In this paper, we propose an adaptive, parallel and highly efficient referential sequence compression method which allows fine-tuning of the trade-off between required memory and compression speed. When using 12 MB of memory, our method is for human genomes on-par with the best previous algorithms in terms of compression ratio (400:1) and compression speed. In contrast, it compresses a complete human genome in just 11 seconds when provided with 9 GB of main memory, which is almost three times faster than the best competitor while using less main memory. PMID:23146997
Different Relative Orientation of Static and Alternative Magnetic Fields and Cress Roots Direction of Growth Changes Their Gravitropic Reaction

NASA Astrophysics Data System (ADS)

Sheykina, Nadiia; Bogatina, Nina

The following variants of roots location relatively to static and alternative components of magnetic field were studied. At first variant the static magnetic field was directed parallel to the gravitation vector, the alternative magnetic field was directed perpendicular to static one; roots were directed perpendicular to both two fields’ components and gravitation vector. At the variant the negative gravitropysm for cress roots was observed. At second variant the static magnetic field was directed parallel to the gravitation vector, the alternative magnetic field was directed perpendicular to static one; roots were directed parallel to alternative magnetic field. At third variant the alternative magnetic field was directed parallel to the gravitation vector, the static magnetic field was directed perpendicular to the gravitation vector, roots were directed perpendicular to both two fields components and gravitation vector; At forth variant the alternative magnetic field was directed parallel to the gravitation vector, the static magnetic field was directed perpendicular to the gravitation vector, roots were directed parallel to static magnetic field. In all cases studied the alternative magnetic field frequency was equal to Ca ions cyclotron frequency. In 2, 3 and 4 variants gravitropism was positive. But the gravitropic reaction speeds were different. In second and forth variants the gravitropic reaction speed in error limits coincided with the gravitropic reaction speed under Earth’s conditions. At third variant the gravitropic reaction speed was slowed essentially.
Linear static structural and vibration analysis on high-performance computers

NASA Technical Reports Server (NTRS)

Baddourah, M. A.; Storaasli, O. O.; Bostic, S. W.

1993-01-01

Parallel computers offer the oppurtunity to significantly reduce the computation time necessary to analyze large-scale aerospace structures. This paper presents algorithms developed for and implemented on massively-parallel computers hereafter referred to as Scalable High-Performance Computers (SHPC), for the most computationally intensive tasks involved in structural analysis, namely, generation and assembly of system matrices, solution of systems of equations and calculation of the eigenvalues and eigenvectors. Results on SHPC are presented for large-scale structural problems (i.e. models for High-Speed Civil Transport). The goal of this research is to develop a new, efficient technique which extends structural analysis to SHPC and makes large-scale structural analyses tractable.
Comparison of cavity preparation quality using an electric motor handpiece and an air turbine dental handpiece.

PubMed

Kenyon, Brian J; Van Zyl, Ian; Louie, Kenneth G

2005-08-01

The high-speed high-torque (electric motor) handpiece is becoming more popular in dental offices and laboratories in the United States. It is reported to cut more precisely and to assist in the creation of finer margins that enhance cavity preparations. The authors conducted an in vitro study to compare the quality of cavity preparations fabricated with a high-speed high-torque (electric motor) handpiece and a high-speed low-torque (air turbine) handpiece. Eighty-six dental students each cut two Class I preparations, one with an air turbine handpiece and the other with an electric motor high-speed handpiece. The authors asked the students to cut each preparation accurately to a circular outline and to establish a flat pulpal floor with 1.5 millimeters' depth, 90-degree exit angles, parallel vertical walls and sharp internal line angles, as well as to refine the preparation to achieve flat, smooth walls with a well-defined cavosurface margin. A single faculty member scored the preparations for criteria and refinement using a nine-point scale (range, 1-9). The authors analyzed the data statistically using paired t tests. In preparation criteria, the electric motor high-speed handpiece had a higher average grade than did the air turbine handpiece (5.07 and 4.90, respectively). For refinement, the average grade for the air turbine high-speed handpiece was greater than that for the electric motor high-speed handpiece (5.72 and 5.52, respectively). The differences were not statistically significant. The electric motor high-speed handpiece performed as well as, but not better than, the air turbine handpiece in the fabrication of high-quality cavity preparations.

GRAVIDY, a GPU modular, parallel direct-summation N-body integrator: dynamics with softening

NASA Astrophysics Data System (ADS)

Maureira-Fredes, Cristián; Amaro-Seoane, Pau

2018-01-01

A wide variety of outstanding problems in astrophysics involve the motion of a large number of particles under the force of gravity. These include the global evolution of globular clusters, tidal disruptions of stars by a massive black hole, the formation of protoplanets and sources of gravitational radiation. The direct-summation of N gravitational forces is a complex problem with no analytical solution and can only be tackled with approximations and numerical methods. To this end, the Hermite scheme is a widely used integration method. With different numerical techniques and special-purpose hardware, it can be used to speed up the calculations. But these methods tend to be computationally slow and cumbersome to work with. We present a new graphics processing unit (GPU), direct-summation N-body integrator written from scratch and based on this scheme, which includes relativistic corrections for sources of gravitational radiation. GRAVIDY has high modularity, allowing users to readily introduce new physics, it exploits available computational resources and will be maintained by regular updates. GRAVIDY can be used in parallel on multiple CPUs and GPUs, with a considerable speed-up benefit. The single-GPU version is between one and two orders of magnitude faster than the single-CPU version. A test run using four GPUs in parallel shows a speed-up factor of about 3 as compared to the single-GPU version. The conception and design of this first release is aimed at users with access to traditional parallel CPU clusters or computational nodes with one or a few GPU cards.
User Interface Developed for Controls/CFD Interdisciplinary Research

NASA Technical Reports Server (NTRS)

1996-01-01

The NASA Lewis Research Center, in conjunction with the University of Akron, is developing analytical methods and software tools to create a cross-discipline "bridge" between controls and computational fluid dynamics (CFD) technologies. Traditionally, the controls analyst has used simulations based on large lumping techniques to generate low-order linear models convenient for designing propulsion system controls. For complex, high-speed vehicles such as the High Speed Civil Transport (HSCT), simulations based on CFD methods are required to capture the relevant flow physics. The use of CFD should also help reduce the development time and costs associated with experimentally tuning the control system. The initial application for this research is the High Speed Civil Transport inlet control problem. A major aspect of this research is the development of a controls/CFD interface for non-CFD experts, to facilitate the interactive operation of CFD simulations and the extraction of reduced-order, time-accurate models from CFD results. A distributed computing approach for implementing the interface is being explored. Software being developed as part of the Integrated CFD and Experiments (ICE) project provides the basis for the operating environment, including run-time displays and information (data base) management. Message-passing software is used to communicate between the ICE system and the CFD simulation, which can reside on distributed, parallel computing systems. Initially, the one-dimensional Large-Perturbation Inlet (LAPIN) code is being used to simulate a High Speed Civil Transport type inlet. LAPIN can model real supersonic inlet features, including bleeds, bypasses, and variable geometry, such as translating or variable-ramp-angle centerbodies. Work is in progress to use parallel versions of the multidimensional NPARC code.
Accelerating Electrostatic Surface Potential Calculation with Multiscale Approximation on Graphics Processing Units

PubMed Central

Anandakrishnan, Ramu; Scogland, Tom R. W.; Fenley, Andrew T.; Gordon, John C.; Feng, Wu-chun; Onufriev, Alexey V.

2010-01-01

Tools that compute and visualize biomolecular electrostatic surface potential have been used extensively for studying biomolecular function. However, determining the surface potential for large biomolecules on a typical desktop computer can take days or longer using currently available tools and methods. Two commonly used techniques to speed up these types of electrostatic computations are approximations based on multi-scale coarse-graining and parallelization across multiple processors. This paper demonstrates that for the computation of electrostatic surface potential, these two techniques can be combined to deliver significantly greater speed-up than either one separately, something that is in general not always possible. Specifically, the electrostatic potential computation, using an analytical linearized Poisson Boltzmann (ALPB) method, is approximated using the hierarchical charge partitioning (HCP) multiscale method, and parallelized on an ATI Radeon 4870 graphical processing unit (GPU). The implementation delivers a combined 934-fold speed-up for a 476,040 atom viral capsid, compared to an equivalent non-parallel implementation on an Intel E6550 CPU without the approximation. This speed-up is significantly greater than the 42-fold speed-up for the HCP approximation alone or the 182-fold speed-up for the GPU alone. PMID:20452792
Interferometric imaging of acoustical phenomena using high-speed polarization camera and 4-step parallel phase-shifting technique

NASA Astrophysics Data System (ADS)

Ishikawa, K.; Yatabe, K.; Ikeda, Y.; Oikawa, Y.; Onuma, T.; Niwa, H.; Yoshii, M.

2017-02-01

Imaging of sound aids the understanding of the acoustical phenomena such as propagation, reflection, and diffraction, which is strongly required for various acoustical applications. The imaging of sound is commonly done by using a microphone array, whereas optical methods have recently been interested due to its contactless nature. The optical measurement of sound utilizes the phase modulation of light caused by sound. Since light propagated through a sound field changes its phase as proportional to the sound pressure, optical phase measurement technique can be used for the sound measurement. Several methods including laser Doppler vibrometry and Schlieren method have been proposed for that purpose. However, the sensitivities of the methods become lower as a frequency of sound decreases. In contrast, since the sensitivities of the phase-shifting technique do not depend on the frequencies of sounds, that technique is suitable for the imaging of sounds in the low-frequency range. The principle of imaging of sound using parallel phase-shifting interferometry was reported by the authors (K. Ishikawa et al., Optics Express, 2016). The measurement system consists of a high-speed polarization camera made by Photron Ltd., and a polarization interferometer. This paper reviews the principle briefly and demonstrates the high-speed imaging of acoustical phenomena. The results suggest that the proposed system can be applied to various industrial problems in acoustical engineering.
Big data driven cycle time parallel prediction for production planning in wafer manufacturing

NASA Astrophysics Data System (ADS)

Wang, Junliang; Yang, Jungang; Zhang, Jie; Wang, Xiaoxi; Zhang, Wenjun Chris

2018-07-01

Cycle time forecasting (CTF) is one of the most crucial issues for production planning to keep high delivery reliability in semiconductor wafer fabrication systems (SWFS). This paper proposes a novel data-intensive cycle time (CT) prediction system with parallel computing to rapidly forecast the CT of wafer lots with large datasets. First, a density peak based radial basis function network (DP-RBFN) is designed to forecast the CT with the diverse and agglomerative CT data. Second, the network learning method based on a clustering technique is proposed to determine the density peak. Third, a parallel computing approach for network training is proposed in order to speed up the training process with large scaled CT data. Finally, an experiment with respect to SWFS is presented, which demonstrates that the proposed CTF system can not only speed up the training process of the model but also outperform the radial basis function network, the back-propagation-network and multivariate regression methodology based CTF methods in terms of the mean absolute deviation and standard deviation.
Online measurement for geometrical parameters of wheel set based on structure light and CUDA parallel processing

NASA Astrophysics Data System (ADS)

Wu, Kaihua; Shao, Zhencheng; Chen, Nian; Wang, Wenjie

2018-01-01

The wearing degree of the wheel set tread is one of the main factors that influence the safety and stability of running train. Geometrical parameters mainly include flange thickness and flange height. Line structure laser light was projected on the wheel tread surface. The geometrical parameters can be deduced from the profile image. An online image acquisition system was designed based on asynchronous reset of CCD and CUDA parallel processing unit. The image acquisition was fulfilled by hardware interrupt mode. A high efficiency parallel segmentation algorithm based on CUDA was proposed. The algorithm firstly divides the image into smaller squares, and extracts the squares of the target by fusion of k_means and STING clustering image segmentation algorithm. Segmentation time is less than 0.97ms. A considerable acceleration ratio compared with the CPU serial calculation was obtained, which greatly improved the real-time image processing capacity. When wheel set was running in a limited speed, the system placed alone railway line can measure the geometrical parameters automatically. The maximum measuring speed is 120km/h.
Trajectory Tracking of a Planer Parallel Manipulator by Using Computed Force Control Method

NASA Astrophysics Data System (ADS)

Bayram, Atilla

2017-03-01

Despite small workspace, parallel manipulators have some advantages over their serial counterparts in terms of higher speed, acceleration, rigidity, accuracy, manufacturing cost and payload. Accordingly, this type of manipulators can be used in many applications such as in high-speed machine tools, tuning machine for feeding, sensitive cutting, assembly and packaging. This paper presents a special type of planar parallel manipulator with three degrees of freedom. It is constructed as a variable geometry truss generally known planar Stewart platform. The reachable and orientation workspaces are obtained for this manipulator. The inverse kinematic analysis is solved for the trajectory tracking according to the redundancy and joint limit avoidance. Then, the dynamics model of the manipulator is established by using Virtual Work method. The simulations are performed to follow the given planar trajectories by using the dynamic equations of the variable geometry truss manipulator and computed force control method. In computed force control method, the feedback gain matrices for PD control are tuned with fixed matrices by trail end error and variable ones by means of optimization with genetic algorithm.
Evaporating Spray in Supersonic Streams Including Turbulence Effects

NASA Technical Reports Server (NTRS)

Balasubramanyam, M. S.; Chen, C. P.

2006-01-01

Evaporating spray plays an important role in spray combustion processes. This paper describes the development of a new finite-conductivity evaporation model, based on the two-temperature film theory, for two-phase numerical simulation using Eulerian-Lagrangian method. The model is a natural extension of the T-blob/T-TAB atomization/spray model which supplies the turbulence characteristics for estimating effective thermal diffusivity within the droplet phase. Both one-way and two-way coupled calculations were performed to investigate the performance of this model. Validation results indicate the superiority of the finite-conductivity model in low speed parallel flow evaporating sprays. High speed cross flow spray results indicate the effectiveness of the T-blob/T-TAB model and point to the needed improvements in high speed evaporating spray modeling.
Parallelization of the Flow Field Dependent Variation Scheme for Solving the Triple Shock/Boundary Layer Interaction Problem

NASA Technical Reports Server (NTRS)

Schunk, Richard Gregory; Chung, T. J.

2001-01-01

A parallelized version of the Flowfield Dependent Variation (FDV) Method is developed to analyze a problem of current research interest, the flowfield resulting from a triple shock/boundary layer interaction. Such flowfields are often encountered in the inlets of high speed air-breathing vehicles including the NASA Hyper-X research vehicle. In order to resolve the complex shock structure and to provide adequate resolution for boundary layer computations of the convective heat transfer from surfaces inside the inlet, models containing over 500,000 nodes are needed. Efficient parallelization of the computation is essential to achieving results in a timely manner. Results from a parallelization scheme, based upon multi-threading, as implemented on multiple processor supercomputers and workstations is presented.
Parallel-vector unsymmetric Eigen-Solver on high performance computers

NASA Technical Reports Server (NTRS)

Nguyen, Duc T.; Jiangning, Qin

1993-01-01

The popular QR algorithm for solving all eigenvalues of an unsymmetric matrix is reviewed. Among the basic components in the QR algorithm, it was concluded from this study, that the reduction of an unsymmetric matrix to a Hessenberg form (before applying the QR algorithm itself) can be done effectively by exploiting the vector speed and multiple processors offered by modern high-performance computers. Numerical examples of several test cases have indicated that the proposed parallel-vector algorithm for converting a given unsymmetric matrix to a Hessenberg form offers computational advantages over the existing algorithm. The time saving obtained by the proposed methods is increased as the problem size increased.
Parallel human genome analysis: microarray-based expression monitoring of 1000 genes.

PubMed Central

Schena, M; Shalon, D; Heller, R; Chai, A; Brown, P O; Davis, R W

1996-01-01

Microarrays containing 1046 human cDNAs of unknown sequence were printed on glass with high-speed robotics. These 1.0-cm2 DNA "chips" were used to quantitatively monitor differential expression of the cognate human genes using a highly sensitive two-color hybridization assay. Array elements that displayed differential expression patterns under given experimental conditions were characterized by sequencing. The identification of known and novel heat shock and phorbol ester-regulated genes in human T cells demonstrates the sensitivity of the assay. Parallel gene analysis with microarrays provides a rapid and efficient method for large-scale human gene discovery. Images Fig. 1 Fig. 2 Fig. 3 PMID:8855227
An efficient 3-dim FFT for plane wave electronic structure calculations on massively parallel machines composed of multiprocessor nodes

NASA Astrophysics Data System (ADS)

Goedecker, Stefan; Boulet, Mireille; Deutsch, Thierry

2003-08-01

Three-dimensional Fast Fourier Transforms (FFTs) are the main computational task in plane wave electronic structure calculations. Obtaining a high performance on a large numbers of processors is non-trivial on the latest generation of parallel computers that consist of nodes made up of a shared memory multiprocessors. A non-dogmatic method for obtaining high performance for such 3-dim FFTs in a combined MPI/OpenMP programming paradigm will be presented. Exploiting the peculiarities of plane wave electronic structure calculations, speedups of up to 160 and speeds of up to 130 Gflops were obtained on 256 processors.
Accelerating Astronomy & Astrophysics in the New Era of Parallel Computing: GPUs, Phi and Cloud Computing

NASA Astrophysics Data System (ADS)

Ford, Eric B.; Dindar, Saleh; Peters, Jorg

2015-08-01

The realism of astrophysical simulations and statistical analyses of astronomical data are set by the available computational resources. Thus, astronomers and astrophysicists are constantly pushing the limits of computational capabilities. For decades, astronomers benefited from massive improvements in computational power that were driven primarily by increasing clock speeds and required relatively little attention to details of the computational hardware. For nearly a decade, increases in computational capabilities have come primarily from increasing the degree of parallelism, rather than increasing clock speeds. Further increases in computational capabilities will likely be led by many-core architectures such as Graphical Processing Units (GPUs) and Intel Xeon Phi. Successfully harnessing these new architectures, requires significantly more understanding of the hardware architecture, cache hierarchy, compiler capabilities and network network characteristics.I will provide an astronomer's overview of the opportunities and challenges provided by modern many-core architectures and elastic cloud computing. The primary goal is to help an astronomical audience understand what types of problems are likely to yield more than order of magnitude speed-ups and which problems are unlikely to parallelize sufficiently efficiently to be worth the development time and/or costs.I will draw on my experience leading a team in developing the Swarm-NG library for parallel integration of large ensembles of small n-body systems on GPUs, as well as several smaller software projects. I will share lessons learned from collaborating with computer scientists, including both technical and soft skills. Finally, I will discuss the challenges of training the next generation of astronomers to be proficient in this new era of high-performance computing, drawing on experience teaching a graduate class on High-Performance Scientific Computing for Astrophysics and organizing a 2014 advanced summer school on Bayesian Computing for Astronomical Data Analysis with support of the Penn State Center for Astrostatistics and Institute for CyberScience.
Method and apparatus for data sampling

DOEpatents

Odell, Daniel M. C.

1994-01-01

A method and apparatus for sampling radiation detector outputs and determining event data from the collected samples. The method uses high speed sampling of the detector output, the conversion of the samples to digital values, and the discrimination of the digital values so that digital values representing detected events are determined. The high speed sampling and digital conversion is performed by an A/D sampler that samples the detector output at a rate high enough to produce numerous digital samples for each detected event. The digital discrimination identifies those digital samples that are not representative of detected events. The sampling and discrimination also provides for temporary or permanent storage, either serially or in parallel, to a digital storage medium.
High subsonic flow tests of a parallel pipe followed by a large area ratio diffuser

NASA Technical Reports Server (NTRS)

Barna, P. S.

1975-01-01

Experiments were performed on a pilot model duct system in order to explore its aerodynamic characteristics. The model was scaled from a design projected for the high speed operation mode of the Aircraft Noise Reduction Laboratory. The test results show that the model performed satisfactorily and therefore the projected design will most likely meet the specifications.
Merging parallel optics packaging and surface mount technologies

NASA Astrophysics Data System (ADS)

Kopp, Christophe; Volpert, Marion; Routin, Julien; Bernabé, Stéphane; Rossat, Cyrille; Tournaire, Myriam; Hamelin, Régis

2008-02-01

Optical links are well known to present significant advantages over electrical links for very high-speed data rate at 10Gpbs and above per channel. However, the transition towards optical interconnects solutions for short and very short reach applications requires the development of innovative packaging solutions that would deal with very high volume production capability and very low cost per unit. Moreover, the optoelectronic transceiver components must be able to move from the edge to anywhere on the printed circuit board, for instance close to integrated circuits with high speed IO. In this paper, we present an original packaging design to manufacture parallel optic transceivers that are surface mount devices. The package combines highly integrated Multi-Chip-Module on glass and usual IC ceramics packaging. The use of ceramic and the development of sealing technologies achieve hermetic requirements. Moreover, thanks to a chip scale package approach the final device exhibits a much minimized footprint. One of the main advantages of the package is its flexibility to be soldered or plugged anywhere on the printed circuit board as any other electronic device. As a demonstrator we present a 2 by 4 10Gbps transceiver operating at 850nm.
A parallel method of atmospheric correction for multispectral high spatial resolution remote sensing images

NASA Astrophysics Data System (ADS)

Zhao, Shaoshuai; Ni, Chen; Cao, Jing; Li, Zhengqiang; Chen, Xingfeng; Ma, Yan; Yang, Leiku; Hou, Weizhen; Qie, Lili; Ge, Bangyu; Liu, Li; Xing, Jin

2018-03-01

The remote sensing image is usually polluted by atmosphere components especially like aerosol particles. For the quantitative remote sensing applications, the radiative transfer model based atmospheric correction is used to get the reflectance with decoupling the atmosphere and surface by consuming a long computational time. The parallel computing is a solution method for the temporal acceleration. The parallel strategy which uses multi-CPU to work simultaneously is designed to do atmospheric correction for a multispectral remote sensing image. The parallel framework's flow and the main parallel body of atmospheric correction are described. Then, the multispectral remote sensing image of the Chinese Gaofen-2 satellite is used to test the acceleration efficiency. When the CPU number is increasing from 1 to 8, the computational speed is also increasing. The biggest acceleration rate is 6.5. Under the 8 CPU working mode, the whole image atmospheric correction costs 4 minutes.
High-speed volume measurement system

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lane, Michael H.; Doyle, Jr., James L.; Brinkman, Michael J.

2018-01-30

Disclosed is a volume sensor having a first axis, a second axis, and a third axis, each axis including a laser source configured to emit a beam; a parallel beam generating assembly configured to receive the beam and split the beam into a first parallel beam and a second parallel beam, a beam-collimating assembly configured to receive the first parallel beam and the second parallel beam and output a first beam sheet and a second beam sheet, the first beam sheet and the second beam sheet being configured to traverse the object aperture; a first collecting lens and a secondmore » collecting lens; and a first photodetector and a second photodetector, the first photodetector and the second photodetector configured to output an electrical signal proportional to the object; wherein the first axis, the second axis, and the third axis are arranged at an angular offset with respect to each other.« less
A Parallel Numerical Algorithm To Solve Linear Systems Of Equations Emerging From 3D Radiative Transfer

NASA Astrophysics Data System (ADS)

Wichert, Viktoria; Arkenberg, Mario; Hauschildt, Peter H.

2016-10-01

Highly resolved state-of-the-art 3D atmosphere simulations will remain computationally extremely expensive for years to come. In addition to the need for more computing power, rethinking coding practices is necessary. We take a dual approach by introducing especially adapted, parallel numerical methods and correspondingly parallelizing critical code passages. In the following, we present our respective work on PHOENIX/3D. With new parallel numerical algorithms, there is a big opportunity for improvement when iteratively solving the system of equations emerging from the operator splitting of the radiative transfer equation J = ΛS. The narrow-banded approximate Λ-operator Λ* , which is used in PHOENIX/3D, occurs in each iteration step. By implementing a numerical algorithm which takes advantage of its characteristic traits, the parallel code's efficiency is further increased and a speed-up in computational time can be achieved.
Implementation of Parallel Dynamic Simulation on Shared-Memory vs. Distributed-Memory Environments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jin, Shuangshuang; Chen, Yousu; Wu, Di

2015-12-09

Power system dynamic simulation computes the system response to a sequence of large disturbance, such as sudden changes in generation or load, or a network short circuit followed by protective branch switching operation. It consists of a large set of differential and algebraic equations, which is computational intensive and challenging to solve using single-processor based dynamic simulation solution. High-performance computing (HPC) based parallel computing is a very promising technology to speed up the computation and facilitate the simulation process. This paper presents two different parallel implementations of power grid dynamic simulation using Open Multi-processing (OpenMP) on shared-memory platform, and Messagemore » Passing Interface (MPI) on distributed-memory clusters, respectively. The difference of the parallel simulation algorithms and architectures of the two HPC technologies are illustrated, and their performances for running parallel dynamic simulation are compared and demonstrated.« less

INSTABILITIES DRIVEN BY THE DRIFT AND TEMPERATURE ANISOTROPY OF ALPHA PARTICLES IN THE SOLAR WIND

DOE Office of Scientific and Technical Information (OSTI.GOV)

Verscharen, Daniel; Bourouaine, Sofiane; Chandran, Benjamin D. G., E-mail: daniel.verscharen@unh.edu, E-mail: s.bourouaine@unh.edu, E-mail: benjamin.chandran@unh.edu

2013-08-20

We investigate the conditions under which parallel-propagating Alfven/ion-cyclotron (A/IC) waves and fast-magnetosonic/whistler (FM/W) waves are driven unstable by the differential flow and temperature anisotropy of alpha particles in the solar wind. We focus on the limit in which w{sub Parallel-To {alpha}} {approx}> 0.25v{sub A}, where w{sub Parallel-To {alpha}} is the parallel alpha-particle thermal speed and v{sub A} is the Alfven speed. We derive analytic expressions for the instability thresholds of these waves, which show, e.g., how the minimum unstable alpha-particle beam speed depends upon w{sub Parallel-To {alpha}}/v{sub A}, the degree of alpha-particle temperature anisotropy, and the alpha-to-proton temperature ratio. Wemore » validate our analytical results using numerical solutions to the full hot-plasma dispersion relation. Consistent with previous work, we find that temperature anisotropy allows A/IC waves and FM/W waves to become unstable at significantly lower values of the alpha-particle beam speed U{sub {alpha}} than in the isotropic-temperature case. Likewise, differential flow lowers the minimum temperature anisotropy needed to excite A/IC or FM/W waves relative to the case in which U{sub {alpha}} = 0. We discuss the relevance of our results to alpha particles in the solar wind near 1 AU.« less
Development of embedded real-time and high-speed vision platform

NASA Astrophysics Data System (ADS)

Ouyang, Zhenxing; Dong, Yimin; Yang, Hua

2015-12-01

Currently, high-speed vision platforms are widely used in many applications, such as robotics and automation industry. However, a personal computer (PC) whose over-large size is not suitable and applicable in compact systems is an indispensable component for human-computer interaction in traditional high-speed vision platforms. Therefore, this paper develops an embedded real-time and high-speed vision platform, ER-HVP Vision which is able to work completely out of PC. In this new platform, an embedded CPU-based board is designed as substitution for PC and a DSP and FPGA board is developed for implementing image parallel algorithms in FPGA and image sequential algorithms in DSP. Hence, the capability of ER-HVP Vision with size of 320mm x 250mm x 87mm can be presented in more compact condition. Experimental results are also given to indicate that the real-time detection and counting of the moving target at a frame rate of 200 fps at 512 x 512 pixels under the operation of this newly developed vision platform are feasible.
Testing of Face-milled Spiral Bevel Gears at High-speed and Load

NASA Technical Reports Server (NTRS)

Handschuh, Robert F.

2001-01-01

Spiral bevel gears are an important drive system components of rotorcraft (helicopters) currently in use. In this application the spiral bevel gears are required to transmit very high torque at high rotational speed. Available experimental data on the operational characteristics for thermal and structural behavior is relatively small in comparison to that found for parallel axis gears. An ongoing test program has been in place at NASA Glenn Research Center over the last ten years to investigate their operational behavior at operating conditions found in aerospace applications. This paper will summarize the results of the tests conducted on face-milled spiral bevel gears. The data from the pinion member (temperature and stress) were taken at conditions from slow-roll to 14400 rpm and up to 537 kW (720 hp). The results have shown that operating temperature is affected by the location of the lubricating jet with respect to the point it is injected and the operating conditions that are imposed. Also the stress measured from slow-roll to very high rotational speed, at various torque levels, indicated little dynamic affect over the rotational speeds tested.
VLSI neuroprocessors

NASA Technical Reports Server (NTRS)

Kemeny, Sabrina E.

1994-01-01

Electronic and optoelectronic hardware implementations of highly parallel computing architectures address several ill-defined and/or computation-intensive problems not easily solved by conventional computing techniques. The concurrent processing architectures developed are derived from a variety of advanced computing paradigms including neural network models, fuzzy logic, and cellular automata. Hardware implementation technologies range from state-of-the-art digital/analog custom-VLSI to advanced optoelectronic devices such as computer-generated holograms and e-beam fabricated Dammann gratings. JPL's concurrent processing devices group has developed a broad technology base in hardware implementable parallel algorithms, low-power and high-speed VLSI designs and building block VLSI chips, leading to application-specific high-performance embeddable processors. Application areas include high throughput map-data classification using feedforward neural networks, terrain based tactical movement planner using cellular automata, resource optimization (weapon-target assignment) using a multidimensional feedback network with lateral inhibition, and classification of rocks using an inner-product scheme on thematic mapper data. In addition to addressing specific functional needs of DOD and NASA, the JPL-developed concurrent processing device technology is also being customized for a variety of commercial applications (in collaboration with industrial partners), and is being transferred to U.S. industries. This viewgraph p resentation focuses on two application-specific processors which solve the computation intensive tasks of resource allocation (weapon-target assignment) and terrain based tactical movement planning using two extremely different topologies. Resource allocation is implemented as an asynchronous analog competitive assignment architecture inspired by the Hopfield network. Hardware realization leads to a two to four order of magnitude speed-up over conventional techniques and enables multiple assignments, (many to many), not achievable with standard statistical approaches. Tactical movement planning (finding the best path from A to B) is accomplished with a digital two-dimensional concurrent processor array. By exploiting the natural parallel decomposition of the problem in silicon, a four order of magnitude speed-up over optimized software approaches has been demonstrated.
Demonstration of an optoelectronic interconnect architecture for a parallel modified signed-digit adder and subtracter

NASA Astrophysics Data System (ADS)

Sun, Degui; Wang, Na-Xin; He, Li-Ming; Weng, Zhao-Heng; Wang, Daheng; Chen, Ray T.

1996-06-01

A space-position-logic-encoding scheme is proposed and demonstrated. This encoding scheme not only makes the best use of the convenience of binary logic operation, but is also suitable for the trinary property of modified signed- digit (MSD) numbers. Based on the space-position-logic-encoding scheme, a fully parallel modified signed-digit adder and subtractor is built using optoelectronic switch technologies in conjunction with fiber-multistage 3D optoelectronic interconnects. Thus an effective combination of a parallel algorithm and a parallel architecture is implemented. In addition, the performance of the optoelectronic switches used in this system is experimentally studied and verified. Both the 3-bit experimental model and the experimental results of a parallel addition and a parallel subtraction are provided and discussed. Finally, the speed ratio between the MSD adder and binary adders is discussed and the advantage of the MSD in operating speed is demonstrated.
Full speed ahead for software

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wolfe, A.

1986-03-10

Supercomputing software is moving into high gear, spurred by the rapid spread of supercomputers into new applications. The critical challenge is how to develop tools that will make it easier for programmers to write applications that take advantage of vectorizing in the classical supercomputer and the parallelism that is emerging in supercomputers and minisupercomputers. Writing parallel software is a challenge that every programmer must face because parallel architectures are springing up across the range of computing. Cray is developing a host of tools for programmers. Tools to support multitasking (in supercomputer parlance, multitasking means dividing up a single program tomore » run on multiple processors) are high on Cray's agenda. On tap for multitasking is Premult, dubbed a microtasking tool. As a preprocessor for Cray's CFT77 FORTRAN compiler, Premult will provide fine-grain multitasking.« less
High-Throughput Industrial Coatings Research at The Dow Chemical Company.

PubMed

Kuo, Tzu-Chi; Malvadkar, Niranjan A; Drumright, Ray; Cesaretti, Richard; Bishop, Matthew T

2016-09-12

At The Dow Chemical Company, high-throughput research is an active area for developing new industrial coatings products. Using the principles of automation (i.e., using robotic instruments), parallel processing (i.e., prepare, process, and evaluate samples in parallel), and miniaturization (i.e., reduce sample size), high-throughput tools for synthesizing, formulating, and applying coating compositions have been developed at Dow. In addition, high-throughput workflows for measuring various coating properties, such as cure speed, hardness development, scratch resistance, impact toughness, resin compatibility, pot-life, surface defects, among others have also been developed in-house. These workflows correlate well with the traditional coatings tests, but they do not necessarily mimic those tests. The use of such high-throughput workflows in combination with smart experimental designs allows accelerated discovery and commercialization.
New functionalities of potassium tantalate niobate deflectors enabled by the coexistence of pre-injected space charge and composition gradient

NASA Astrophysics Data System (ADS)

Zhu, Wenbin; Chao, Ju-Hung; Chen, Chang-Jiang; Campbell, Adrian L.; Henry, Michael G.; Yin, Stuart Shizhuo; Hoffman, Robert C.

2017-10-01

In most beam steering applications such as 3D printing and in vivo imaging, one of the essential challenges has been high-resolution high-speed multi-dimensional optical beam scanning. Although the pre-injected space charge controlled potassium tantalate niobate (KTN) deflectors can achieve speeds in the nanosecond regime, they deflect in only one dimension. In order to develop a high-resolution high-speed multi-dimensional KTN deflector, we studied the deflection behavior of KTN deflectors in the case of coexisting pre-injected space charge and composition gradient. We find that such coexistence can enable new functionalities of KTN crystal based electro-optic deflectors. When the direction of the composition gradient is parallel to the direction of the external electric field, the zero-deflection position can be shifted, which can reduce the internal electric field induced beam distortion, and thus enhance the resolution. When the direction of the composition gradient is perpendicular to the direction of the external electric field, two-dimensional beam scanning can be achieved by harnessing only one single piece of KTN crystal, which can result in a compact, high-speed two-dimensional deflector. Both theoretical analyses and experiments are conducted, which are consistent with each other. These new functionalities can expedite the usage of KTN deflection in many applications such as high-speed 3D printing, high-speed, high-resolution imaging, and free space broadband optical communication.
A Systems Approach to Scalable Transportation Network Modeling

DOE Office of Scientific and Technical Information (OSTI.GOV)

Perumalla, Kalyan S

2006-01-01

Emerging needs in transportation network modeling and simulation are raising new challenges with respect to scal-ability of network size and vehicular traffic intensity, speed of simulation for simulation-based optimization, and fidel-ity of vehicular behavior for accurate capture of event phe-nomena. Parallel execution is warranted to sustain the re-quired detail, size and speed. However, few parallel simulators exist for such applications, partly due to the challenges underlying their development. Moreover, many simulators are based on time-stepped models, which can be computationally inefficient for the purposes of modeling evacuation traffic. Here an approach is presented to de-signing a simulator with memory andmore » speed efficiency as the goals from the outset, and, specifically, scalability via parallel execution. The design makes use of discrete event modeling techniques as well as parallel simulation meth-ods. Our simulator, called SCATTER, is being developed, incorporating such design considerations. Preliminary per-formance results are presented on benchmark road net-works, showing scalability to one million vehicles simu-lated on one processor.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

Yang, C.

Almost every computer architect dreams of achieving high system performance with low implementation costs. A multigauge machine can reconfigure its data-path width, provide parallelism, achieve better resource utilization, and sometimes can trade computational precision for increased speed. A simple experimental method is used here to capture the main characteristics of multigauging. The measurements indicate evidence of near-optimal speedups. Adapting these ideas in designing parallel processors incurs low costs and provides flexibility. Several operational aspects of designing a multigauge machine are discussed as well. Thus, this research reports the technical, economical, and operational feasibility studies of multigauging.
Obsessive-compulsive tendencies are associated with a focused information processing strategy.

PubMed

Soref, Assaf; Dar, Reuven; Argov, Galit; Meiran, Nachshon

2008-12-01

The study examined the hypothesis that obsessive-compulsive (OC) tendencies are related to a reliance on focused and serial rather than a parallel, speed-oriented information processing style. Ten students with high OC tendencies and 10 students with low OC tendencies performed the flanker task, in which they were required to quickly classify a briefly presented target letter (S or H) that was flanked by compatible (e.g., SSSSS) or incompatible (e.g., HHSHH) noise letters. Participants received 4 blocks of 100 trials each, two with 50% compatible trials and two with 80% compatible trials and were informed of the probability of compatible trials before the beginning of each block. As predicted, high OC participants, as compared to low OC participants, had slower overall reaction time (RT) and lower tendency for parallel processing (defined as incompatible trials RT minus compatible trials RT). Low, more than high OC participants tended to adjust their focused/parallel processing including a shift towards parallel processing in blocks with 80% compatible trials and in trials following compatible trials. Implications of these results to the cognitive theory and therapy of OCD are discussed.
High performance data transfer

NASA Astrophysics Data System (ADS)

Cottrell, R.; Fang, C.; Hanushevsky, A.; Kreuger, W.; Yang, W.

2017-10-01

The exponentially increasing need for high speed data transfer is driven by big data, and cloud computing together with the needs of data intensive science, High Performance Computing (HPC), defense, the oil and gas industry etc. We report on the Zettar ZX software. This has been developed since 2013 to meet these growing needs by providing high performance data transfer and encryption in a scalable, balanced, easy to deploy and use way while minimizing power and space utilization. In collaboration with several commercial vendors, Proofs of Concept (PoC) consisting of clusters have been put together using off-the- shelf components to test the ZX scalability and ability to balance services using multiple cores, and links. The PoCs are based on SSD flash storage that is managed by a parallel file system. Each cluster occupies 4 rack units. Using the PoCs, between clusters we have achieved almost 200Gbps memory to memory over two 100Gbps links, and 70Gbps parallel file to parallel file with encryption over a 5000 mile 100Gbps link.
Parallel Simulation of Three-Dimensional Free Surface Fluid Flow Problems

DOE Office of Scientific and Technical Information (OSTI.GOV)

BAER,THOMAS A.; SACKINGER,PHILIP A.; SUBIA,SAMUEL R.

1999-10-14

Simulation of viscous three-dimensional fluid flow typically involves a large number of unknowns. When free surfaces are included, the number of unknowns increases dramatically. Consequently, this class of problem is an obvious application of parallel high performance computing. We describe parallel computation of viscous, incompressible, free surface, Newtonian fluid flow problems that include dynamic contact fines. The Galerkin finite element method was used to discretize the fully-coupled governing conservation equations and a ''pseudo-solid'' mesh mapping approach was used to determine the shape of the free surface. In this approach, the finite element mesh is allowed to deform to satisfy quasi-staticmore » solid mechanics equations subject to geometric or kinematic constraints on the boundaries. As a result, nodal displacements must be included in the set of unknowns. Other issues discussed are the proper constraints appearing along the dynamic contact line in three dimensions. Issues affecting efficient parallel simulations include problem decomposition to equally distribute computational work among a SPMD computer and determination of robust, scalable preconditioners for the distributed matrix systems that must be solved. Solution continuation strategies important for serial simulations have an enhanced relevance in a parallel coquting environment due to the difficulty of solving large scale systems. Parallel computations will be demonstrated on an example taken from the coating flow industry: flow in the vicinity of a slot coater edge. This is a three dimensional free surface problem possessing a contact line that advances at the web speed in one region but transitions to static behavior in another region. As such, a significant fraction of the computational time is devoted to processing boundary data. Discussion focuses on parallel speed ups for fixed problem size, a class of problems of immediate practical importance.« less
High-speed parallel implementation of a modified PBR algorithm on DSP-based EH topology

NASA Astrophysics Data System (ADS)

Rajan, K.; Patnaik, L. M.; Ramakrishna, J.

1997-08-01

Algebraic Reconstruction Technique (ART) is an age-old method used for solving the problem of three-dimensional (3-D) reconstruction from projections in electron microscopy and radiology. In medical applications, direct 3-D reconstruction is at the forefront of investigation. The simultaneous iterative reconstruction technique (SIRT) is an ART-type algorithm with the potential of generating in a few iterations tomographic images of a quality comparable to that of convolution backprojection (CBP) methods. Pixel-based reconstruction (PBR) is similar to SIRT reconstruction, and it has been shown that PBR algorithms give better quality pictures compared to those produced by SIRT algorithms. In this work, we propose a few modifications to the PBR algorithms. The modified algorithms are shown to give better quality pictures compared to PBR algorithms. The PBR algorithm and the modified PBR algorithms are highly compute intensive, Not many attempts have been made to reconstruct objects in the true 3-D sense because of the high computational overhead. In this study, we have developed parallel two-dimensional (2-D) and 3-D reconstruction algorithms based on modified PBR. We attempt to solve the two problems encountered by the PBR and modified PBR algorithms, i.e., the long computational time and the large memory requirements, by parallelizing the algorithm on a multiprocessor system. We investigate the possible task and data partitioning schemes by exploiting the potential parallelism in the PBR algorithm subject to minimizing the memory requirement. We have implemented an extended hypercube (EH) architecture for the high-speed execution of the 3-D reconstruction algorithm using the commercially available fast floating point digital signal processor (DSP) chips as the processing elements (PEs) and dual-port random access memories (DPR) as channels between the PEs. We discuss and compare the performances of the PBR algorithm on an IBM 6000 RISC workstation, on a Silicon Graphics Indigo 2 workstation, and on an EH system. The results show that an EH(3,1) using DSP chips as PEs executes the modified PBR algorithm about 100 times faster than an LBM 6000 RISC workstation. We have executed the algorithms on a 4-node IBM SP2 parallel computer. The results show that execution time of the algorithm on an EH(3,1) is better than that of a 4-node IBM SP2 system. The speed-up of an EH(3,1) system with eight PEs and one network controller is approximately 7.85.
Parallel network simulations with NEURON.

PubMed

Migliore, M; Cannia, C; Lytton, W W; Markram, Henry; Hines, M L

2006-10-01

The NEURON simulation environment has been extended to support parallel network simulations. Each processor integrates the equations for its subnet over an interval equal to the minimum (interprocessor) presynaptic spike generation to postsynaptic spike delivery connection delay. The performance of three published network models with very different spike patterns exhibits superlinear speedup on Beowulf clusters and demonstrates that spike communication overhead is often less than the benefit of an increased fraction of the entire problem fitting into high speed cache. On the EPFL IBM Blue Gene, almost linear speedup was obtained up to 100 processors. Increasing one model from 500 to 40,000 realistic cells exhibited almost linear speedup on 2,000 processors, with an integration time of 9.8 seconds and communication time of 1.3 seconds. The potential for speed-ups of several orders of magnitude makes practical the running of large network simulations that could otherwise not be explored.
Parallel Network Simulations with NEURON

PubMed Central

Migliore, M.; Cannia, C.; Lytton, W.W; Markram, Henry; Hines, M. L.

2009-01-01

The NEURON simulation environment has been extended to support parallel network simulations. Each processor integrates the equations for its subnet over an interval equal to the minimum (interprocessor) presynaptic spike generation to postsynaptic spike delivery connection delay. The performance of three published network models with very different spike patterns exhibits superlinear speedup on Beowulf clusters and demonstrates that spike communication overhead is often less than the benefit of an increased fraction of the entire problem fitting into high speed cache. On the EPFL IBM Blue Gene, almost linear speedup was obtained up to 100 processors. Increasing one model from 500 to 40,000 realistic cells exhibited almost linear speedup on 2000 processors, with an integration time of 9.8 seconds and communication time of 1.3 seconds. The potential for speed-ups of several orders of magnitude makes practical the running of large network simulations that could otherwise not be explored. PMID:16732488
More IMPATIENT: A Gridding-Accelerated Toeplitz-based Strategy for Non-Cartesian High-Resolution 3D MRI on GPUs

PubMed Central

Gai, Jiading; Obeid, Nady; Holtrop, Joseph L.; Wu, Xiao-Long; Lam, Fan; Fu, Maojing; Haldar, Justin P.; Hwu, Wen-mei W.; Liang, Zhi-Pei; Sutton, Bradley P.

2013-01-01

Several recent methods have been proposed to obtain significant speed-ups in MRI image reconstruction by leveraging the computational power of GPUs. Previously, we implemented a GPU-based image reconstruction technique called the Illinois Massively Parallel Acquisition Toolkit for Image reconstruction with ENhanced Throughput in MRI (IMPATIENT MRI) for reconstructing data collected along arbitrary 3D trajectories. In this paper, we improve IMPATIENT by removing computational bottlenecks by using a gridding approach to accelerate the computation of various data structures needed by the previous routine. Further, we enhance the routine with capabilities for off-resonance correction and multi-sensor parallel imaging reconstruction. Through implementation of optimized gridding into our iterative reconstruction scheme, speed-ups of more than a factor of 200 are provided in the improved GPU implementation compared to the previous accelerated GPU code. PMID:23682203
Parallelization of the Coupled Earthquake Model

NASA Technical Reports Server (NTRS)

Block, Gary; Li, P. Peggy; Song, Yuhe T.

2007-01-01

This Web-based tsunami simulation system allows users to remotely run a model on JPL s supercomputers for a given undersea earthquake. At the time of this reporting, predicting tsunamis on the Internet has never happened before. This new code directly couples the earthquake model and the ocean model on parallel computers and improves simulation speed. Seismometers can only detect information from earthquakes; they cannot detect whether or not a tsunami may occur as a result of the earthquake. When earthquake-tsunami models are coupled with the improved computational speed of modern, high-performance computers and constrained by remotely sensed data, they are able to provide early warnings for those coastal regions at risk. The software is capable of testing NASA s satellite observations of tsunamis. It has been successfully tested for several historical tsunamis, has passed all alpha and beta testing, and is well documented for users.
A Strassen-Newton algorithm for high-speed parallelizable matrix inversion

NASA Technical Reports Server (NTRS)

Bailey, David H.; Ferguson, Helaman R. P.

1988-01-01

Techniques are described for computing matrix inverses by algorithms that are highly suited to massively parallel computation. The techniques are based on an algorithm suggested by Strassen (1969). Variations of this scheme use matrix Newton iterations and other methods to improve the numerical stability while at the same time preserving a very high level of parallelism. One-processor Cray-2 implementations of these schemes range from one that is up to 55 percent faster than a conventional library routine to one that is slower than a library routine but achieves excellent numerical stability. The problem of computing the solution to a single set of linear equations is discussed, and it is shown that this problem can also be solved efficiently using these techniques.
Low Temperature Performance of High-Speed Neural Network Circuits

NASA Technical Reports Server (NTRS)

Duong, T.; Tran, M.; Daud, T.; Thakoor, A.

1995-01-01

Artificial neural networks, derived from their biological counterparts, offer a new and enabling computing paradigm specially suitable for such tasks as image and signal processing with feature classification/object recognition, global optimization, and adaptive control. When implemented in fully parallel electronic hardware, it offers orders of magnitude speed advantage. Basic building blocks of the new architecture are the processing elements called neurons implemented as nonlinear operational amplifiers with sigmoidal transfer function, interconnected through weighted connections called synapses implemented using circuitry for weight storage and multiply functions either in an analog, digital, or hybrid scheme.

Accelerating electrostatic surface potential calculation with multi-scale approximation on graphics processing units.

PubMed

Anandakrishnan, Ramu; Scogland, Tom R W; Fenley, Andrew T; Gordon, John C; Feng, Wu-chun; Onufriev, Alexey V

2010-06-01

Tools that compute and visualize biomolecular electrostatic surface potential have been used extensively for studying biomolecular function. However, determining the surface potential for large biomolecules on a typical desktop computer can take days or longer using currently available tools and methods. Two commonly used techniques to speed-up these types of electrostatic computations are approximations based on multi-scale coarse-graining and parallelization across multiple processors. This paper demonstrates that for the computation of electrostatic surface potential, these two techniques can be combined to deliver significantly greater speed-up than either one separately, something that is in general not always possible. Specifically, the electrostatic potential computation, using an analytical linearized Poisson-Boltzmann (ALPB) method, is approximated using the hierarchical charge partitioning (HCP) multi-scale method, and parallelized on an ATI Radeon 4870 graphical processing unit (GPU). The implementation delivers a combined 934-fold speed-up for a 476,040 atom viral capsid, compared to an equivalent non-parallel implementation on an Intel E6550 CPU without the approximation. This speed-up is significantly greater than the 42-fold speed-up for the HCP approximation alone or the 182-fold speed-up for the GPU alone. Copyright (c) 2010 Elsevier Inc. All rights reserved.
Parallel algorithm of VLBI software correlator under multiprocessor environment

NASA Astrophysics Data System (ADS)

Zheng, Weimin; Zhang, Dong

2007-11-01

The correlator is the key signal processing equipment of a Very Lone Baseline Interferometry (VLBI) synthetic aperture telescope. It receives the mass data collected by the VLBI observatories and produces the visibility function of the target, which can be used to spacecraft position, baseline length measurement, synthesis imaging, and other scientific applications. VLBI data correlation is a task of data intensive and computation intensive. This paper presents the algorithms of two parallel software correlators under multiprocessor environments. A near real-time correlator for spacecraft tracking adopts the pipelining and thread-parallel technology, and runs on the SMP (Symmetric Multiple Processor) servers. Another high speed prototype correlator using the mixed Pthreads and MPI (Massage Passing Interface) parallel algorithm is realized on a small Beowulf cluster platform. Both correlators have the characteristic of flexible structure, scalability, and with 10-station data correlating abilities.
Automated target recognition and tracking using an optical pattern recognition neural network

NASA Technical Reports Server (NTRS)

Chao, Tien-Hsin

1991-01-01

The on-going development of an automatic target recognition and tracking system at the Jet Propulsion Laboratory is presented. This system is an optical pattern recognition neural network (OPRNN) that is an integration of an innovative optical parallel processor and a feature extraction based neural net training algorithm. The parallel optical processor provides high speed and vast parallelism as well as full shift invariance. The neural network algorithm enables simultaneous discrimination of multiple noisy targets in spite of their scales, rotations, perspectives, and various deformations. This fully developed OPRNN system can be effectively utilized for the automated spacecraft recognition and tracking that will lead to success in the Automated Rendezvous and Capture (AR&C) of the unmanned Cargo Transfer Vehicle (CTV). One of the most powerful optical parallel processors for automatic target recognition is the multichannel correlator. With the inherent advantages of parallel processing capability and shift invariance, multiple objects can be simultaneously recognized and tracked using this multichannel correlator. This target tracking capability can be greatly enhanced by utilizing a powerful feature extraction based neural network training algorithm such as the neocognitron. The OPRNN, currently under investigation at JPL, is constructed with an optical multichannel correlator where holographic filters have been prepared using the neocognitron training algorithm. The computation speed of the neocognitron-type OPRNN is up to 10(exp 14) analog connections/sec that enabling the OPRNN to outperform its state-of-the-art electronics counterpart by at least two orders of magnitude.
High-Speed Systolic Array Testbed.

DTIC Science & Technology

1987-10-01

applications since the concept was introduced by H.T. Kung In 1978. This highly parallel architecture of nearet neighbor data communciation and...must be addressed. For instance, should bit-serial or bit parallei computation be utilized. Does the dynamic range of the candidate applications or...numericai stability of the algorithms used require computations In fixed point and Integer format or the architecturally more complex and slower floating
P-Hint-Hunt: a deep parallelized whole genome DNA methylation detection tool.

PubMed

Peng, Shaoliang; Yang, Shunyun; Gao, Ming; Liao, Xiangke; Liu, Jie; Yang, Canqun; Wu, Chengkun; Yu, Wenqiang

2017-03-14

The increasing studies have been conducted using whole genome DNA methylation detection as one of the most important part of epigenetics research to find the significant relationships among DNA methylation and several typical diseases, such as cancers and diabetes. In many of those studies, mapping the bisulfite treated sequence to the whole genome has been the main method to study DNA cytosine methylation. However, today's relative tools almost suffer from inaccuracies and time-consuming problems. In our study, we designed a new DNA methylation prediction tool ("Hint-Hunt") to solve the problem. By having an optimal complex alignment computation and Smith-Waterman matrix dynamic programming, Hint-Hunt could analyze and predict the DNA methylation status. But when Hint-Hunt tried to predict DNA methylation status with large-scale dataset, there are still slow speed and low temporal-spatial efficiency problems. In order to solve the problems of Smith-Waterman dynamic programming and low temporal-spatial efficiency, we further design a deep parallelized whole genome DNA methylation detection tool ("P-Hint-Hunt") on Tianhe-2 (TH-2) supercomputer. To the best of our knowledge, P-Hint-Hunt is the first parallel DNA methylation detection tool with a high speed-up to process large-scale dataset, and could run both on CPU and Intel Xeon Phi coprocessors. Moreover, we deploy and evaluate Hint-Hunt and P-Hint-Hunt on TH-2 supercomputer in different scales. The experimental results illuminate our tools eliminate the deviation caused by bisulfite treatment in mapping procedure and the multi-level parallel program yields a 48 times speed-up with 64 threads. P-Hint-Hunt gain a deep acceleration on CPU and Intel Xeon Phi heterogeneous platform, which gives full play of the advantages of multi-cores (CPU) and many-cores (Phi).
Interval Management with Spacing to Parallel Dependent Runways (IMSPIDR) Experiment and Results

NASA Technical Reports Server (NTRS)

Baxley, Brian T.; Swieringa, Kurt A.; Capron, William R.

2012-01-01

An area in aviation operations that may offer an increase in efficiency is the use of continuous descent arrivals (CDA), especially during dependent parallel runway operations. However, variations in aircraft descent angle and speed can cause inaccuracies in estimated time of arrival calculations, requiring an increase in the size of the buffer between aircraft. This in turn reduces airport throughput and limits the use of CDAs during high-density operations, particularly to dependent parallel runways. The Interval Management with Spacing to Parallel Dependent Runways (IMSPiDR) concept uses a trajectory-based spacing tool onboard the aircraft to achieve by the runway an air traffic control assigned spacing interval behind the previous aircraft. This paper describes the first ever experiment and results of this concept at NASA Langley. Pilots flew CDAs to the Dallas Fort-Worth airport using airspeed calculations from the spacing tool to achieve either a Required Time of Arrival (RTA) or Interval Management (IM) spacing interval at the runway threshold. Results indicate flight crews were able to land aircraft on the runway with a mean of 2 seconds and less than 4 seconds standard deviation of the air traffic control assigned time, even in the presence of forecast wind error and large time delay. Statistically significant differences in delivery precision and number of speed changes as a function of stream position were observed, however, there was no trend to the difference and the error did not increase during the operation. Two areas the flight crew indicated as not acceptable included the additional number of speed changes required during the wind shear event, and issuing an IM clearance via data link while at low altitude. A number of refinements and future spacing algorithm capabilities were also identified.
Concurrent Probabilistic Simulation of High Temperature Composite Structural Response

NASA Technical Reports Server (NTRS)

Abdi, Frank

1996-01-01

A computational structural/material analysis and design tool which would meet industry's future demand for expedience and reduced cost is presented. This unique software 'GENOA' is dedicated to parallel and high speed analysis to perform probabilistic evaluation of high temperature composite response of aerospace systems. The development is based on detailed integration and modification of diverse fields of specialized analysis techniques and mathematical models to combine their latest innovative capabilities into a commercially viable software package. The technique is specifically designed to exploit the availability of processors to perform computationally intense probabilistic analysis assessing uncertainties in structural reliability analysis and composite micromechanics. The primary objectives which were achieved in performing the development were: (1) Utilization of the power of parallel processing and static/dynamic load balancing optimization to make the complex simulation of structure, material and processing of high temperature composite affordable; (2) Computational integration and synchronization of probabilistic mathematics, structural/material mechanics and parallel computing; (3) Implementation of an innovative multi-level domain decomposition technique to identify the inherent parallelism, and increasing convergence rates through high- and low-level processor assignment; (4) Creating the framework for Portable Paralleled architecture for the machine independent Multi Instruction Multi Data, (MIMD), Single Instruction Multi Data (SIMD), hybrid and distributed workstation type of computers; and (5) Market evaluation. The results of Phase-2 effort provides a good basis for continuation and warrants Phase-3 government, and industry partnership.
GPU Based N-Gram String Matching Algorithm with Score Table Approach for String Searching in Many Documents

NASA Astrophysics Data System (ADS)

Srinivasa, K. G.; Shree Devi, B. N.

2017-10-01

String searching in documents has become a tedious task with the evolution of Big Data. Generation of large data sets demand for a high performance search algorithm in areas such as text mining, information retrieval and many others. The popularity of GPU's for general purpose computing has been increasing for various applications. Therefore it is of great interest to exploit the thread feature of a GPU to provide a high performance search algorithm. This paper proposes an optimized new approach to N-gram model for string search in a number of lengthy documents and its GPU implementation. The algorithm exploits GPGPUs for searching strings in many documents employing character level N-gram matching with parallel Score Table approach and search using CUDA API. The new approach of Score table used for frequency storage of N-grams in a document, makes the search independent of the document's length and allows faster access to the frequency values, thus decreasing the search complexity. The extensive thread feature in a GPU has been exploited to enable parallel pre-processing of trigrams in a document for Score Table creation and parallel search in huge number of documents, thus speeding up the whole search process even for a large pattern size. Experiments were carried out for many documents of varied length and search strings from the standard Lorem Ipsum text on NVIDIA's GeForce GT 540M GPU with 96 cores. Results prove that the parallel approach for Score Table creation and searching gives a good speed up than the same approach executed serially.
Aircraft Configuration and Flight Crew Compliance with Procedures While Conducting Flight Deck Based Interval Management (FIM) Operations

NASA Technical Reports Server (NTRS)

Shay, Rick; Swieringa, Kurt A.; Baxley, Brian T.

2012-01-01

Flight deck based Interval Management (FIM) applications using ADS-B are being developed to improve both the safety and capacity of the National Airspace System (NAS). FIM is expected to improve the safety and efficiency of the NAS by giving pilots the technology and procedures to precisely achieve an interval behind the preceding aircraft by a specific point. Concurrently but independently, Optimized Profile Descents (OPD) are being developed to help reduce fuel consumption and noise, however, the range of speeds available when flying an OPD results in a decrease in the delivery precision of aircraft to the runway. This requires the addition of a spacing buffer between aircraft, reducing system throughput. FIM addresses this problem by providing pilots with speed guidance to achieve a precise interval behind another aircraft, even while flying optimized descents. The Interval Management with Spacing to Parallel Dependent Runways (IMSPiDR) human-in-the-loop experiment employed 24 commercial pilots to explore the use of FIM equipment to conduct spacing operations behind two aircraft arriving to parallel runways, while flying an OPD during high-density operations. This paper describes the impact of variations in pilot operations; in particular configuring the aircraft, their compliance with FIM operating procedures, and their response to changes of the FIM speed. An example of the displayed FIM speeds used incorrectly by a pilot is also discussed. Finally, this paper examines the relationship between achieving airline operational goals for individual aircraft and the need for ATC to deliver aircraft to the runway with greater precision. The results show that aircraft can fly an OPD and conduct FIM operations to dependent parallel runways, enabling operational goals to be achieved efficiently while maintaining system throughput.
Directly measuring of thermal pulse transfer in one-dimensional highly aligned carbon nanotubes

PubMed Central

Zhang, Guang; Liu, Changhong; Fan, Shoushan

2013-01-01

Using a simple and precise instrument system, we directly measured the thermo-physical properties of one-dimensional highly aligned carbon nanotubes (CNTs). A kind of CNT-based macroscopic materials named super aligned carbon nanotube (SACNT) buckypapers was measured in our experiment. We defined a new one-dimensional parameter, the “thermal transfer speed” to characterize the thermal damping mechanisms in the SACNT buckypapers. Our results indicated that the SACNT buckypapers with different densities have obviously different thermal transfer speeds. Furthermore, we found that the thermal transfer speed of high-density SACNT buckypapers may have an obvious damping factor along the CNTs aligned direction. The anisotropic thermal diffusivities of SACNT buckypapers could be calculated by the thermal transfer speeds. The thermal diffusivities obviously increase as the buckypaper-density increases. For parallel SACNT buckypapers, the thermal diffusivity could be as high as 562.2 ± 55.4 mm2/s. The thermal conductivities of these SACNT buckypapers were also calculated by the equation k = Cpαρ. PMID:23989589
A NOVEL HIGH-SPEED METHOD FOR THE GENERATION OF 4-ARYLDIHYDROPYRIMIDINE COMPOUND LIBRARIES USING A MICROWAVE-ASSISTED BIGINELLI CONDENSATION PROTOCOL -

EPA Science Inventory

In this presentation we report the application of microwave assisted chemistry to the parallel synthesis of 4-aryl-3,4-dihydropyrimidin-2(1H)-ones employing a solventless Biginelli multicomponent condensation protocol. The novel method employs neat mixtures of B-ketoesters, aryl ...
Neuromorphic Hardware Architecture Using the Neural Engineering Framework for Pattern Recognition.

PubMed

Wang, Runchun; Thakur, Chetan Singh; Cohen, Gregory; Hamilton, Tara Julia; Tapson, Jonathan; van Schaik, Andre

2017-06-01

We present a hardware architecture that uses the neural engineering framework (NEF) to implement large-scale neural networks on field programmable gate arrays (FPGAs) for performing massively parallel real-time pattern recognition. NEF is a framework that is capable of synthesising large-scale cognitive systems from subnetworks and we have previously presented an FPGA implementation of the NEF that successfully performs nonlinear mathematical computations. That work was developed based on a compact digital neural core, which consists of 64 neurons that are instantiated by a single physical neuron using a time-multiplexing approach. We have now scaled this approach up to build a pattern recognition system by combining identical neural cores together. As a proof of concept, we have developed a handwritten digit recognition system using the MNIST database and achieved a recognition rate of 96.55%. The system is implemented on a state-of-the-art FPGA and can process 5.12 million digits per second. The architecture and hardware optimisations presented offer high-speed and resource-efficient means for performing high-speed, neuromorphic, and massively parallel pattern recognition and classification tasks.
Method and apparatus for data sampling

DOEpatents

Odell, D.M.C.

1994-04-19

A method and apparatus for sampling radiation detector outputs and determining event data from the collected samples is described. The method uses high speed sampling of the detector output, the conversion of the samples to digital values, and the discrimination of the digital values so that digital values representing detected events are determined. The high speed sampling and digital conversion is performed by an A/D sampler that samples the detector output at a rate high enough to produce numerous digital samples for each detected event. The digital discrimination identifies those digital samples that are not representative of detected events. The sampling and discrimination also provides for temporary or permanent storage, either serially or in parallel, to a digital storage medium. 6 figures.
Parallel MR Imaging with Accelerations Beyond the Number of Receiver Channels Using Real Image Reconstruction.

PubMed

Ji, Jim; Wright, Steven

2005-01-01

Parallel imaging using multiple phased-array coils and receiver channels has become an effective approach to high-speed magnetic resonance imaging (MRI). To obtain high spatiotemporal resolution, the k-space is subsampled and later interpolated using multiple channel data. Higher subsampling factors result in faster image acquisition. However, the subsampling factors are upper-bounded by the number of parallel channels. Phase constraints have been previously proposed to overcome this limitation with some success. In this paper, we demonstrate that in certain applications it is possible to obtain acceleration factors potentially up to twice the channel numbers by using a real image constraint. Data acquisition and processing methods to manipulate and estimate of the image phase information are presented for improving image reconstruction. In-vivo brain MRI experimental results show that accelerations up to 6 are feasible with 4-channel data.
SNSPD with parallel nanowires (Conference Presentation)

NASA Astrophysics Data System (ADS)

Ejrnaes, Mikkel; Parlato, Loredana; Gaggero, Alessandro; Mattioli, Francesco; Leoni, Roberto; Pepe, Giampiero; Cristiano, Roberto

2017-05-01

Superconducting nanowire single-photon detectors (SNSPDs) have shown to be promising in applications such as quantum communication and computation, quantum optics, imaging, metrology and sensing. They offer the advantages of a low dark count rate, high efficiency, a broadband response, a short time jitter, a high repetition rate, and no need for gated-mode operation. Several SNSPD designs have been proposed in literature. Here, we discuss the so-called parallel nanowires configurations. They were introduced with the aim of improving some SNSPD property like detection efficiency, speed, signal-to-noise ratio, or photon number resolution. Although apparently similar, the various parallel designs are not the same. There is no one design that can improve the mentioned properties all together. In fact, each design presents its own characteristics with specific advantages and drawbacks. In this work, we will discuss the various designs outlining peculiarities and possible improvements.
Computational Performance of a Parallelized Three-Dimensional High-Order Spectral Element Toolbox

NASA Astrophysics Data System (ADS)

Bosshard, Christoph; Bouffanais, Roland; Clémençon, Christian; Deville, Michel O.; Fiétier, Nicolas; Gruber, Ralf; Kehtari, Sohrab; Keller, Vincent; Latt, Jonas

In this paper, a comprehensive performance review of an MPI-based high-order three-dimensional spectral element method C++ toolbox is presented. The focus is put on the performance evaluation of several aspects with a particular emphasis on the parallel efficiency. The performance evaluation is analyzed with help of a time prediction model based on a parameterization of the application and the hardware resources. A tailor-made CFD computation benchmark case is introduced and used to carry out this review, stressing the particular interest for clusters with up to 8192 cores. Some problems in the parallel implementation have been detected and corrected. The theoretical complexities with respect to the number of elements, to the polynomial degree, and to communication needs are correctly reproduced. It is concluded that this type of code has a nearly perfect speed up on machines with thousands of cores, and is ready to make the step to next-generation petaflop machines.
City transport of the future - the high speed pedestrian conveyor. Part 1: ergonomic considerations of accelerators, decelerators and transfer sections.

PubMed

Browning, A C

1974-12-01

In this article, an uncommon form of passenger transport is considered, the moving pavement or pedestrian conveyor running at speeds of up to 16 km/h. There are very little relevant ergonomic data for such devices and some specific laboratory experiments have been carried out using 1000 subjects to represent the general public. It is concluded that whilst high speed pedestrian conveyors are quite feasible, stations along them are likely to be large. The most attractive type is a set of parallel surfaces moving at different speeds and with handholds provided in the form of poles. This type could be extremely convenient for certain locations but will probably have to be restricted in its use to fairly fit adults carrying little luggage, and would find applications in situations where a large number of people need to travel in the same direction. Part 2, Ergonomic considerations of complete conveyor systems, will follow.
Control of a small working robot on a large flexible manipulator for suppressing vibrations

NASA Technical Reports Server (NTRS)

Lee, Soo Han

1991-01-01

The short term objective of this research is the completion of experimental configuration of the Small Articulated Robot (SAM) and the derivations of the actuator dynamics of the Robotic Arm, Large and Flexible (RALF). In order to control vibrations SAM should have larger bandwidth than that of the vibrations. The bandwidth of SAM consist of 3 parts; structural rigidity, processing speed of controller, and motor speed. The structural rigidity was increased to a reasonably high value by attaching aluminum angles at weak points and replacing thin side plates by thicker ones. The high processing speed of the controller was achieved by using parallel processors (three 68000 process, three interface board, and one main processor (IBM-XT)). Maximum joint speed and acceleration of SAM is known as about 4 rad/s and 15 rad/sq s. Hence SAM can move only .04 rad at 3 Hz which is the natural frequency of RALF. This will be checked by experiment.
Parallelization of a blind deconvolution algorithm

NASA Astrophysics Data System (ADS)

Matson, Charles L.; Borelli, Kathy J.

2006-09-01

Often it is of interest to deblur imagery in order to obtain higher-resolution images. Deblurring requires knowledge of the blurring function - information that is often not available separately from the blurred imagery. Blind deconvolution algorithms overcome this problem by jointly estimating both the high-resolution image and the blurring function from the blurred imagery. Because blind deconvolution algorithms are iterative in nature, they can take minutes to days to deblur an image depending how many frames of data are used for the deblurring and the platforms on which the algorithms are executed. Here we present our progress in parallelizing a blind deconvolution algorithm to increase its execution speed. This progress includes sub-frame parallelization and a code structure that is not specialized to a specific computer hardware architecture.
Commodity cluster and hardware-based massively parallel implementations of hyperspectral imaging algorithms

NASA Astrophysics Data System (ADS)

Plaza, Antonio; Chang, Chein-I.; Plaza, Javier; Valencia, David

2006-05-01

The incorporation of hyperspectral sensors aboard airborne/satellite platforms is currently producing a nearly continual stream of multidimensional image data, and this high data volume has soon introduced new processing challenges. The price paid for the wealth spatial and spectral information available from hyperspectral sensors is the enormous amounts of data that they generate. Several applications exist, however, where having the desired information calculated quickly enough for practical use is highly desirable. High computing performance of algorithm analysis is particularly important in homeland defense and security applications, in which swift decisions often involve detection of (sub-pixel) military targets (including hostile weaponry, camouflage, concealment, and decoys) or chemical/biological agents. In order to speed-up computational performance of hyperspectral imaging algorithms, this paper develops several fast parallel data processing techniques. Techniques include four classes of algorithms: (1) unsupervised classification, (2) spectral unmixing, and (3) automatic target recognition, and (4) onboard data compression. A massively parallel Beowulf cluster (Thunderhead) at NASA's Goddard Space Flight Center in Maryland is used to measure parallel performance of the proposed algorithms. In order to explore the viability of developing onboard, real-time hyperspectral data compression algorithms, a Xilinx Virtex-II field programmable gate array (FPGA) is also used in experiments. Our quantitative and comparative assessment of parallel techniques and strategies may help image analysts in selection of parallel hyperspectral algorithms for specific applications.

Parallelization of GeoClaw code for modeling geophysical flows with adaptive mesh refinement on many-core systems

USGS Publications Warehouse

Zhang, S.; Yuen, D.A.; Zhu, A.; Song, S.; George, D.L.

2011-01-01

We parallelized the GeoClaw code on one-level grid using OpenMP in March, 2011 to meet the urgent need of simulating tsunami waves at near-shore from Tohoku 2011 and achieved over 75% of the potential speed-up on an eight core Dell Precision T7500 workstation [1]. After submitting that work to SC11 - the International Conference for High Performance Computing, we obtained an unreleased OpenMP version of GeoClaw from David George, who developed the GeoClaw code as part of his PH.D thesis. In this paper, we will show the complementary characteristics of the two approaches used in parallelizing GeoClaw and the speed-up obtained by combining the advantage of each of the two individual approaches with adaptive mesh refinement (AMR), demonstrating the capabilities of running GeoClaw efficiently on many-core systems. We will also show a novel simulation of the Tohoku 2011 Tsunami waves inundating the Sendai airport and Fukushima Nuclear Power Plants, over which the finest grid distance of 20 meters is achieved through a 4-level AMR. This simulation yields quite good predictions about the wave-heights and travel time of the tsunami waves. ?? 2011 IEEE.
High-speed free-space based reconfigurable card-to-card optical interconnects with broadcast capability.

PubMed

Wang, Ke; Nirmalathas, Ampalavanapillai; Lim, Christina; Skafidas, Efstratios; Alameh, Kamal

2013-07-01

In this paper, we propose and experimentally demonstrate a free-space based high-speed reconfigurable card-to-card optical interconnect architecture with broadcast capability, which is required for control functionalities and efficient parallel computing applications. Experimental results show that 10 Gb/s data can be broadcast to all receiving channels for up to 30 cm with a worst-case receiver sensitivity better than -12.20 dBm. In addition, arbitrary multicasting with the same architecture is also investigated. 10 Gb/s reconfigurable point-to-point link and multicast channels are simultaneously demonstrated with a measured receiver sensitivity power penalty of ~1.3 dB due to crosstalk.
A seismic reflection image for the base of a tectonic plate.

PubMed

Stern, T A; Henrys, S A; Okaya, D; Louie, J N; Savage, M K; Lamb, S; Sato, H; Sutherland, R; Iwasaki, T

2015-02-05

Plate tectonics successfully describes the surface of Earth as a mosaic of moving lithospheric plates. But it is not clear what happens at the base of the plates, the lithosphere-asthenosphere boundary (LAB). The LAB has been well imaged with converted teleseismic waves, whose 10-40-kilometre wavelength controls the structural resolution. Here we use explosion-generated seismic waves (of about 0.5-kilometre wavelength) to form a high-resolution image for the base of an oceanic plate that is subducting beneath North Island, New Zealand. Our 80-kilometre-wide image is based on P-wave reflections and shows an approximately 15° dipping, abrupt, seismic wave-speed transition (less than 1 kilometre thick) at a depth of about 100 kilometres. The boundary is parallel to the top of the plate and seismic attributes indicate a P-wave speed decrease of at least 8 ± 3 per cent across it. A parallel reflection event approximately 10 kilometres deeper shows that the decrease in P-wave speed is confined to a channel at the base of the plate, which we interpret as a sheared zone of ponded partial melts or volatiles. This is independent, high-resolution evidence for a low-viscosity channel at the LAB that decouples plates from mantle flow beneath, and allows plate tectonics to work.
INVITED TOPICAL REVIEW: Parallel magnetic resonance imaging

NASA Astrophysics Data System (ADS)

Larkman, David J.; Nunes, Rita G.

2007-04-01

Parallel imaging has been the single biggest innovation in magnetic resonance imaging in the last decade. The use of multiple receiver coils to augment the time consuming Fourier encoding has reduced acquisition times significantly. This increase in speed comes at a time when other approaches to acquisition time reduction were reaching engineering and human limits. A brief summary of spatial encoding in MRI is followed by an introduction to the problem parallel imaging is designed to solve. There are a large number of parallel reconstruction algorithms; this article reviews a cross-section, SENSE, SMASH, g-SMASH and GRAPPA, selected to demonstrate the different approaches. Theoretical (the g-factor) and practical (coil design) limits to acquisition speed are reviewed. The practical implementation of parallel imaging is also discussed, in particular coil calibration. How to recognize potential failure modes and their associated artefacts are shown. Well-established applications including angiography, cardiac imaging and applications using echo planar imaging are reviewed and we discuss what makes a good application for parallel imaging. Finally, active research areas where parallel imaging is being used to improve data quality by repairing artefacted images are also reviewed.
A parallel Monte Carlo code for planar and SPECT imaging: implementation, verification and applications in (131)I SPECT.

PubMed

Dewaraja, Yuni K; Ljungberg, Michael; Majumdar, Amitava; Bose, Abhijit; Koral, Kenneth F

2002-02-01

This paper reports the implementation of the SIMIND Monte Carlo code on an IBM SP2 distributed memory parallel computer. Basic aspects of running Monte Carlo particle transport calculations on parallel architectures are described. Our parallelization is based on equally partitioning photons among the processors and uses the Message Passing Interface (MPI) library for interprocessor communication and the Scalable Parallel Random Number Generator (SPRNG) to generate uncorrelated random number streams. These parallelization techniques are also applicable to other distributed memory architectures. A linear increase in computing speed with the number of processors is demonstrated for up to 32 processors. This speed-up is especially significant in Single Photon Emission Computed Tomography (SPECT) simulations involving higher energy photon emitters, where explicit modeling of the phantom and collimator is required. For (131)I, the accuracy of the parallel code is demonstrated by comparing simulated and experimental SPECT images from a heart/thorax phantom. Clinically realistic SPECT simulations using the voxel-man phantom are carried out to assess scatter and attenuation correction.
Multivariable speed synchronisation for a parallel hybrid electric vehicle drivetrain

NASA Astrophysics Data System (ADS)

Alt, B.; Antritter, F.; Svaricek, F.; Schultalbers, M.

2013-03-01

In this article, a new drivetrain configuration of a parallel hybrid electric vehicle is considered and a novel model-based control design strategy is given. In particular, the control design covers the speed synchronisation task during a restart of the internal combustion engine. The proposed multivariable synchronisation strategy is based on feedforward and decoupled feedback controllers. The performance and the robustness properties of the closed-loop system are illustrated by nonlinear simulation results.
Dual-thread parallel control strategy for ophthalmic adaptive optics.

PubMed

Yu, Yongxin; Zhang, Yuhua

To improve ophthalmic adaptive optics speed and compensate for ocular wavefront aberration of high temporal frequency, the adaptive optics wavefront correction has been implemented with a control scheme including 2 parallel threads; one is dedicated to wavefront detection and the other conducts wavefront reconstruction and compensation. With a custom Shack-Hartmann wavefront sensor that measures the ocular wave aberration with 193 subapertures across the pupil, adaptive optics has achieved a closed loop updating frequency up to 110 Hz, and demonstrated robust compensation for ocular wave aberration up to 50 Hz in an adaptive optics scanning laser ophthalmoscope.
Dual-thread parallel control strategy for ophthalmic adaptive optics

PubMed Central

Yu, Yongxin; Zhang, Yuhua

2015-01-01

To improve ophthalmic adaptive optics speed and compensate for ocular wavefront aberration of high temporal frequency, the adaptive optics wavefront correction has been implemented with a control scheme including 2 parallel threads; one is dedicated to wavefront detection and the other conducts wavefront reconstruction and compensation. With a custom Shack-Hartmann wavefront sensor that measures the ocular wave aberration with 193 subapertures across the pupil, adaptive optics has achieved a closed loop updating frequency up to 110 Hz, and demonstrated robust compensation for ocular wave aberration up to 50 Hz in an adaptive optics scanning laser ophthalmoscope. PMID:25866498
Acceleration of Radiance for Lighting Simulation by Using Parallel Computing with OpenCL

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zuo, Wangda; McNeil, Andrew; Wetter, Michael

2011-09-06

We report on the acceleration of annual daylighting simulations for fenestration systems in the Radiance ray-tracing program. The algorithm was optimized to reduce both the redundant data input/output operations and the floating-point operations. To further accelerate the simulation speed, the calculation for matrix multiplications was implemented using parallel computing on a graphics processing unit. We used OpenCL, which is a cross-platform parallel programming language. Numerical experiments show that the combination of the above measures can speed up the annual daylighting simulations 101.7 times or 28.6 times when the sky vector has 146 or 2306 elements, respectively.
Symplectic molecular dynamics simulations on specially designed parallel computers.

PubMed

Borstnik, Urban; Janezic, Dusanka

2005-01-01

We have developed a computer program for molecular dynamics (MD) simulation that implements the Split Integration Symplectic Method (SISM) and is designed to run on specialized parallel computers. The MD integration is performed by the SISM, which analytically treats high-frequency vibrational motion and thus enables the use of longer simulation time steps. The low-frequency motion is treated numerically on specially designed parallel computers, which decreases the computational time of each simulation time step. The combination of these approaches means that less time is required and fewer steps are needed and so enables fast MD simulations. We study the computational performance of MD simulation of molecular systems on specialized computers and provide a comparison to standard personal computers. The combination of the SISM with two specialized parallel computers is an effective way to increase the speed of MD simulations up to 16-fold over a single PC processor.
Massively parallel processor computer

NASA Technical Reports Server (NTRS)

Fung, L. W. (Inventor)

1983-01-01

An apparatus for processing multidimensional data with strong spatial characteristics, such as raw image data, characterized by a large number of parallel data streams in an ordered array is described. It comprises a large number (e.g., 16,384 in a 128 x 128 array) of parallel processing elements operating simultaneously and independently on single bit slices of a corresponding array of incoming data streams under control of a single set of instructions. Each of the processing elements comprises a bidirectional data bus in communication with a register for storing single bit slices together with a random access memory unit and associated circuitry, including a binary counter/shift register device, for performing logical and arithmetical computations on the bit slices, and an I/O unit for interfacing the bidirectional data bus with the data stream source. The massively parallel processor architecture enables very high speed processing of large amounts of ordered parallel data, including spatial translation by shifting or sliding of bits vertically or horizontally to neighboring processing elements.
Traction drive automatic transmission for gas turbine engine driveline

DOEpatents

Carriere, Donald L.

1984-01-01

A transaxle driveline for a wheeled vehicle has a high speed turbine engine and a torque splitting gearset that includes a traction drive unit and a torque converter on a common axis transversely arranged with respect to the longitudinal centerline of the vehicle. The drive wheels of the vehicle are mounted on a shaft parallel to the turbine shaft and carry a final drive gearset for driving the axle shafts. A second embodiment of the final drive gearing produces an overdrive ratio between the output of the first gearset and the axle shafts. A continuously variable range of speed ratios is produced by varying the position of the drive rollers of the traction unit. After starting the vehicle from rest, the transmission is set for operation in the high speed range by engaging a first lockup clutch that joins the torque converter impeller to the turbine for operation as a hydraulic coupling.
Parallel Simulation of Unsteady Turbulent Flames

NASA Technical Reports Server (NTRS)

Menon, Suresh

1996-01-01

Time-accurate simulation of turbulent flames in high Reynolds number flows is a challenging task since both fluid dynamics and combustion must be modeled accurately. To numerically simulate this phenomenon, very large computer resources (both time and memory) are required. Although current vector supercomputers are capable of providing adequate resources for simulations of this nature, the high cost and their limited availability, makes practical use of such machines less than satisfactory. At the same time, the explicit time integration algorithms used in unsteady flow simulations often possess a very high degree of parallelism, making them very amenable to efficient implementation on large-scale parallel computers. Under these circumstances, distributed memory parallel computers offer an excellent near-term solution for greatly increased computational speed and memory, at a cost that may render the unsteady simulations of the type discussed above more feasible and affordable.This paper discusses the study of unsteady turbulent flames using a simulation algorithm that is capable of retaining high parallel efficiency on distributed memory parallel architectures. Numerical studies are carried out using large-eddy simulation (LES). In LES, the scales larger than the grid are computed using a time- and space-accurate scheme, while the unresolved small scales are modeled using eddy viscosity based subgrid models. This is acceptable for the moment/energy closure since the small scales primarily provide a dissipative mechanism for the energy transferred from the large scales. However, for combustion to occur, the species must first undergo mixing at the small scales and then come into molecular contact. Therefore, global models cannot be used. Recently, a new model for turbulent combustion was developed, in which the combustion is modeled, within the subgrid (small-scales) using a methodology that simulates the mixing and the molecular transport and the chemical kinetics within each LES grid cell. Finite-rate kinetics can be included without any closure and this approach actually provides a means to predict the turbulent rates and the turbulent flame speed. The subgrid combustion model requires resolution of the local time scales associated with small-scale mixing, molecular diffusion and chemical kinetics and, therefore, within each grid cell, a significant amount of computations must be carried out before the large-scale (LES resolved) effects are incorporated. Therefore, this approach is uniquely suited for parallel processing and has been implemented on various systems such as: Intel Paragon, IBM SP-2, Cray T3D and SGI Power Challenge (PC) using the system independent Message Passing Interface (MPI) compiler. In this paper, timing data on these machines is reported along with some characteristic results.
Hardware Implementation of 32-Bit High-Speed Direct Digital Frequency Synthesizer

PubMed Central

Ibrahim, Salah Hasan; Ali, Sawal Hamid Md.; Islam, Md. Shabiul

2014-01-01

The design and implementation of a high-speed direct digital frequency synthesizer are presented. A modified Brent-Kung parallel adder is combined with pipelining technique to improve the speed of the system. A gated clock technique is proposed to reduce the number of registers in the phase accumulator design. The quarter wave symmetry technique is used to store only one quarter of the sine wave. The ROM lookup table (LUT) is partitioned into three 4-bit sub-ROMs based on angular decomposition technique and trigonometric identity. Exploiting the advantages of sine-cosine symmetrical attributes together with XOR logic gates, one sub-ROM block can be removed from the design. These techniques, compressed the ROM into 368 bits. The ROM compressed ratio is 534.2 : 1, with only two adders, two multipliers, and XOR-gates with high frequency resolution of 0.029 Hz. These techniques make the direct digital frequency synthesizer an attractive candidate for wireless communication applications. PMID:24991635
SequenceL: Automated Parallel Algorithms Derived from CSP-NT Computational Laws

NASA Technical Reports Server (NTRS)

Cooke, Daniel; Rushton, Nelson

2013-01-01

With the introduction of new parallel architectures like the cell and multicore chips from IBM, Intel, AMD, and ARM, as well as the petascale processing available for highend computing, a larger number of programmers will need to write parallel codes. Adding the parallel control structure to the sequence, selection, and iterative control constructs increases the complexity of code development, which often results in increased development costs and decreased reliability. SequenceL is a high-level programming language that is, a programming language that is closer to a human s way of thinking than to a machine s. Historically, high-level languages have resulted in decreased development costs and increased reliability, at the expense of performance. In recent applications at JSC and in industry, SequenceL has demonstrated the usual advantages of high-level programming in terms of low cost and high reliability. SequenceL programs, however, have run at speeds typically comparable with, and in many cases faster than, their counterparts written in C and C++ when run on single-core processors. Moreover, SequenceL is able to generate parallel executables automatically for multicore hardware, gaining parallel speedups without any extra effort from the programmer beyond what is required to write the sequen tial/singlecore code. A SequenceL-to-C++ translator has been developed that automatically renders readable multithreaded C++ from a combination of a SequenceL program and sample data input. The SequenceL language is based on two fundamental computational laws, Consume-Simplify- Produce (CSP) and Normalize-Trans - pose (NT), which enable it to automate the creation of parallel algorithms from high-level code that has no annotations of parallelism whatsoever. In our anecdotal experience, SequenceL development has been in every case less costly than development of the same algorithm in sequential (that is, single-core, single process) C or C++, and an order of magnitude less costly than development of comparable parallel code. Moreover, SequenceL not only automatically parallelizes the code, but since it is based on CSP-NT, it is provably race free, thus eliminating the largest quality challenge the parallelized software developer faces.
Revisiting Molecular Dynamics on a CPU/GPU system: Water Kernel and SHAKE Parallelization.

PubMed

Ruymgaart, A Peter; Elber, Ron

2012-11-13

We report Graphics Processing Unit (GPU) and Open-MP parallel implementations of water-specific force calculations and of bond constraints for use in Molecular Dynamics simulations. We focus on a typical laboratory computing-environment in which a CPU with a few cores is attached to a GPU. We discuss in detail the design of the code and we illustrate performance comparable to highly optimized codes such as GROMACS. Beside speed our code shows excellent energy conservation. Utilization of water-specific lists allows the efficient calculations of non-bonded interactions that include water molecules and results in a speed-up factor of more than 40 on the GPU compared to code optimized on a single CPU core for systems larger than 20,000 atoms. This is up four-fold from a factor of 10 reported in our initial GPU implementation that did not include a water-specific code. Another optimization is the implementation of constrained dynamics entirely on the GPU. The routine, which enforces constraints of all bonds, runs in parallel on multiple Open-MP cores or entirely on the GPU. It is based on Conjugate Gradient solution of the Lagrange multipliers (CG SHAKE). The GPU implementation is partially in double precision and requires no communication with the CPU during the execution of the SHAKE algorithm. The (parallel) implementation of SHAKE allows an increase of the time step to 2.0fs while maintaining excellent energy conservation. Interestingly, CG SHAKE is faster than the usual bond relaxation algorithm even on a single core if high accuracy is expected. The significant speedup of the optimized components transfers the computational bottleneck of the MD calculation to the reciprocal part of Particle Mesh Ewald (PME).
GPU accelerated fuzzy connected image segmentation by using CUDA.

PubMed

Zhuge, Ying; Cao, Yong; Miller, Robert W

2009-01-01

Image segmentation techniques using fuzzy connectedness principles have shown their effectiveness in segmenting a variety of objects in several large applications in recent years. However, one problem of these algorithms has been their excessive computational requirements when processing large image datasets. Nowadays commodity graphics hardware provides high parallel computing power. In this paper, we present a parallel fuzzy connected image segmentation algorithm on Nvidia's Compute Unified Device Architecture (CUDA) platform for segmenting large medical image data sets. Our experiments based on three data sets with small, medium, and large data size demonstrate the efficiency of the parallel algorithm, which achieves a speed-up factor of 7.2x, 7.3x, and 14.4x, correspondingly, for the three data sets over the sequential implementation of fuzzy connected image segmentation algorithm on CPU.
Ultrascalable petaflop parallel supercomputer

DOEpatents

Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton On Hudson, NY; Chiu, George [Cross River, NY; Cipolla, Thomas M [Katonah, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Hall, Shawn [Pleasantville, NY; Haring, Rudolf A [Cortlandt Manor, NY; Heidelberger, Philip [Cortlandt Manor, NY; Kopcsay, Gerard V [Yorktown Heights, NY; Ohmacht, Martin [Yorktown Heights, NY; Salapura, Valentina [Chappaqua, NY; Sugavanam, Krishnan [Mahopac, NY; Takken, Todd [Brewster, NY

2010-07-20

A massively parallel supercomputer of petaOPS-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC) having up to four processing elements. The ASIC nodes are interconnected by multiple independent networks that optimally maximize the throughput of packet communications between nodes with minimal latency. The multiple networks may include three high-speed networks for parallel algorithm message passing including a Torus, collective network, and a Global Asynchronous network that provides global barrier and notification functions. These multiple independent networks may be collaboratively or independently utilized according to the needs or phases of an algorithm for optimizing algorithm processing performance. The use of a DMA engine is provided to facilitate message passing among the nodes without the expenditure of processing resources at the node.
Optical computing and image processing using photorefractive gallium arsenide

NASA Technical Reports Server (NTRS)

Cheng, Li-Jen; Liu, Duncan T. H.

1990-01-01

Recent experimental results on matrix-vector multiplication and multiple four-wave mixing using GaAs are presented. Attention is given to a simple concept of using two overlapping holograms in GaAs to do two matrix-vector multiplication processes operating in parallel with a common input vector. This concept can be used to construct high-speed, high-capacity, reconfigurable interconnection and multiplexing modules, important for optical computing and neural-network applications.
Full-field transient vibrometry of the human tympanic membrane by local phase correlation and high-speed holography

NASA Astrophysics Data System (ADS)

Dobrev, Ivo; Furlong, Cosme; Cheng, Jeffrey T.; Rosowski, John J.

2014-09-01

Understanding the human hearing process would be helped by quantification of the transient mechanical response of the human ear, including the human tympanic membrane (TM or eardrum). We propose a new hybrid high-speed holographic system (HHS) for acquisition and quantification of the full-field nanometer transient (i.e., >10 kHz) displacement of the human TM. We have optimized and implemented a 2+1 frame local correlation (LC) based phase sampling method in combination with a high-speed (i.e., >40 K fps) camera acquisition system. To our knowledge, there is currently no existing system that provides such capabilities for the study of the human TM. The LC sampling method has a displacement difference of <11 nm relative to measurements obtained by a four-phase step algorithm. Comparisons between our high-speed acquisition system and a laser Doppler vibrometer indicate differences of <10 μs. The high temporal (i.e., >40 kHz) and spatial (i.e., >100 k data points) resolution of our HHS enables parallel measurements of all points on the surface of the TM, which allows quantification of spatially dependent motion parameters, such as modal frequencies and acoustic delays. Such capabilities could allow inferring local material properties across the surface of the TM.

Studying an Eulerian Computer Model on Different High-performance Computer Platforms and Some Applications

NASA Astrophysics Data System (ADS)

Georgiev, K.; Zlatev, Z.

2010-11-01

The Danish Eulerian Model (DEM) is an Eulerian model for studying the transport of air pollutants on large scale. Originally, the model was developed at the National Environmental Research Institute of Denmark. The model computational domain covers Europe and some neighbour parts belong to the Atlantic Ocean, Asia and Africa. If DEM model is to be applied by using fine grids, then its discretization leads to a huge computational problem. This implies that such a model as DEM must be run only on high-performance computer architectures. The implementation and tuning of such a complex large-scale model on each different computer is a non-trivial task. Here, some comparison results of running of this model on different kind of vector (CRAY C92A, Fujitsu, etc.), parallel computers with distributed memory (IBM SP, CRAY T3E, Beowulf clusters, Macintosh G4 clusters, etc.), parallel computers with shared memory (SGI Origin, SUN, etc.) and parallel computers with two levels of parallelism (IBM SMP, IBM BlueGene/P, clusters of multiprocessor nodes, etc.) will be presented. The main idea in the parallel version of DEM is domain partitioning approach. Discussions according to the effective use of the cache and hierarchical memories of the modern computers as well as the performance, speed-ups and efficiency achieved will be done. The parallel code of DEM, created by using MPI standard library, appears to be highly portable and shows good efficiency and scalability on different kind of vector and parallel computers. Some important applications of the computer model output are presented in short.
A highly efficient multi-core algorithm for clustering extremely large datasets

PubMed Central

2010-01-01

Background In recent years, the demand for computational power in computational biology has increased due to rapidly growing data sets from microarray and other high-throughput technologies. This demand is likely to increase. Standard algorithms for analyzing data, such as cluster algorithms, need to be parallelized for fast processing. Unfortunately, most approaches for parallelizing algorithms largely rely on network communication protocols connecting and requiring multiple computers. One answer to this problem is to utilize the intrinsic capabilities in current multi-core hardware to distribute the tasks among the different cores of one computer. Results We introduce a multi-core parallelization of the k-means and k-modes cluster algorithms based on the design principles of transactional memory for clustering gene expression microarray type data and categorial SNP data. Our new shared memory parallel algorithms show to be highly efficient. We demonstrate their computational power and show their utility in cluster stability and sensitivity analysis employing repeated runs with slightly changed parameters. Computation speed of our Java based algorithm was increased by a factor of 10 for large data sets while preserving computational accuracy compared to single-core implementations and a recently published network based parallelization. Conclusions Most desktop computers and even notebooks provide at least dual-core processors. Our multi-core algorithms show that using modern algorithmic concepts, parallelization makes it possible to perform even such laborious tasks as cluster sensitivity and cluster number estimation on the laboratory computer. PMID:20370922
Solar wind helium ions - Observations of the Helios solar probes between 0.3 and 1 AU

NASA Technical Reports Server (NTRS)

Marsch, E.; Rosenbauer, H.; Schwenn, R.; Muehlhaeuser, K.-H.; Neubauer, F. M.

1982-01-01

A Helios solar probe survey of solar wind helium ion velocity distributions and derived parameters between 0.3 and 1 AU is presented. Distributions in high-speed wind are found to generally have small total anisotropies, with some indication that, in the core part, the temperatures are greater parallel rather than perpendicular to the magnetic field. The anisotropy tends to increase with heliocentric radial distance, and the average dependence of helium ion temperatures on radial distance from the sun is described by a power law. Differential ion speeds with values of more than 150 km/sec are observed near perihelion, or 0.3 AU. The role of Coulomb collisions in limiting differential ion speeds and the ion temperature ratio is investigated, and it is found that collisions play a distinct role in low-speed wind, by limiting both differential ion velocity and temperature.
Modulated heat pulse propagation and partial transport barriers in chaotic magnetic fields

DOE PAGES

del-Castillo-Negrete, Diego; Blazevski, Daniel

2016-04-01

Direct numerical simulations of the time dependent parallel heat transport equation modeling heat pulses driven by power modulation in 3-dimensional chaotic magnetic fields are presented. The numerical method is based on the Fourier formulation of a Lagrangian-Green's function method that provides an accurate and efficient technique for the solution of the parallel heat transport equation in the presence of harmonic power modulation. The numerical results presented provide conclusive evidence that even in the absence of magnetic flux surfaces, chaotic magnetic field configurations with intermediate levels of stochasticity exhibit transport barriers to modulated heat pulse propagation. In particular, high-order islands and remnants of destroyed flux surfaces (Cantori) act as partial barriers that slow down or even stop the propagation of heat waves at places where the magnetic field connection length exhibits a strong gradient. The key parameter ismore » $$\\gamma=\\sqrt{\\omega/2 \\chi_\\parallel}$$ that determines the length scale, $$1/\\gamma$$, of the heat wave penetration along the magnetic field line. For large perturbation frequencies, $$\\omega \\gg 1$$, or small parallel thermal conductivities, $$\\chi_\\parallel \\ll 1$$, parallel heat transport is strongly damped and the magnetic field partial barriers act as robust barriers where the heat wave amplitude vanishes and its phase speed slows down to a halt. On the other hand, in the limit of small $$\\gamma$$, parallel heat transport is largely unimpeded, global transport is observed and the radial amplitude and phase speed of the heat wave remain finite. Results on modulated heat pulse propagation in fully stochastic fields and across magnetic islands are also presented. In qualitative agreement with recent experiments in LHD and DIII-D, it is shown that the elliptic (O) and hyperbolic (X) points of magnetic islands have a direct impact on the spatio-temporal dependence of the amplitude and the time delay of modulated heat pulses.« less
CMOS Image Sensors for High Speed Applications.

PubMed

El-Desouki, Munir; Deen, M Jamal; Fang, Qiyin; Liu, Louis; Tse, Frances; Armstrong, David

2009-01-01

Recent advances in deep submicron CMOS technologies and improved pixel designs have enabled CMOS-based imagers to surpass charge-coupled devices (CCD) imaging technology for mainstream applications. The parallel outputs that CMOS imagers can offer, in addition to complete camera-on-a-chip solutions due to being fabricated in standard CMOS technologies, result in compelling advantages in speed and system throughput. Since there is a practical limit on the minimum pixel size (4∼5 μm) due to limitations in the optics, CMOS technology scaling can allow for an increased number of transistors to be integrated into the pixel to improve both detection and signal processing. Such smart pixels truly show the potential of CMOS technology for imaging applications allowing CMOS imagers to achieve the image quality and global shuttering performance necessary to meet the demands of ultrahigh-speed applications. In this paper, a review of CMOS-based high-speed imager design is presented and the various implementations that target ultrahigh-speed imaging are described. This work also discusses the design, layout and simulation results of an ultrahigh acquisition rate CMOS active-pixel sensor imager that can take 8 frames at a rate of more than a billion frames per second (fps).
Decomposition method for fast computation of gigapixel-sized Fresnel holograms on a graphics processing unit cluster.

PubMed

Jackin, Boaz Jessie; Watanabe, Shinpei; Ootsu, Kanemitsu; Ohkawa, Takeshi; Yokota, Takashi; Hayasaki, Yoshio; Yatagai, Toyohiko; Baba, Takanobu

2018-04-20

A parallel computation method for large-size Fresnel computer-generated hologram (CGH) is reported. The method was introduced by us in an earlier report as a technique for calculating Fourier CGH from 2D object data. In this paper we extend the method to compute Fresnel CGH from 3D object data. The scale of the computation problem is also expanded to 2 gigapixels, making it closer to real application requirements. The significant feature of the reported method is its ability to avoid communication overhead and thereby fully utilize the computing power of parallel devices. The method exhibits three layers of parallelism that favor small to large scale parallel computing machines. Simulation and optical experiments were conducted to demonstrate the workability and to evaluate the efficiency of the proposed technique. A two-times improvement in computation speed has been achieved compared to the conventional method, on a 16-node cluster (one GPU per node) utilizing only one layer of parallelism. A 20-times improvement in computation speed has been estimated utilizing two layers of parallelism on a very large-scale parallel machine with 16 nodes, where each node has 16 GPUs.
Shared virtual memory and generalized speedup

NASA Technical Reports Server (NTRS)

Sun, Xian-He; Zhu, Jianping

1994-01-01

Generalized speedup is defined as parallel speed over sequential speed. The generalized speedup and its relation with other existing performance metrics, such as traditional speedup, efficiency, scalability, etc., are carefully studied. In terms of the introduced asymptotic speed, it was shown that the difference between the generalized speedup and the traditional speedup lies in the definition of the efficiency of uniprocessor processing, which is a very important issue in shared virtual memory machines. A scientific application was implemented on a KSR-1 parallel computer. Experimental and theoretical results show that the generalized speedup is distinct from the traditional speedup and provides a more reasonable measurement. In the study of different speedups, various causes of superlinear speedup are also presented.
High-speed high-resolution epifluorescence imaging system using CCD sensor and digital storage for neurobiological research

NASA Astrophysics Data System (ADS)

Takashima, Ichiro; Kajiwara, Riichi; Murano, Kiyo; Iijima, Toshio; Morinaka, Yasuhiro; Komobuchi, Hiroyoshi

2001-04-01

We have designed and built a high-speed CCD imaging system for monitoring neural activity in an exposed animal cortex stained with a voltage-sensitive dye. Two types of custom-made CCD sensors were developed for this system. The type I chip has a resolution of 2664 (H) X 1200 (V) pixels and a wide imaging area of 28.1 X 13.8 mm, while the type II chip has 1776 X 1626 pixels and an active imaging area of 20.4 X 18.7 mm. The CCD arrays were constructed with multiple output amplifiers in order to accelerate the readout rate. The two chips were divided into either 24 (I) or 16 (II) distinct areas that were driven in parallel. The parallel CCD outputs were digitized by 12-bit A/D converters and then stored in the frame memory. The frame memory was constructed with synchronous DRAM modules, which provided a capacity of 128 MB per channel. On-chip and on-memory binning methods were incorporated into the system, e.g., this enabled us to capture 444 X 200 pixel-images for periods of 36 seconds at a rate of 500 frames/second. This system was successfully used to visualize neural activity in the cortices of rats, guinea pigs, and monkeys.
Micro-seismic waveform matching inversion based on gravitational search algorithm and parallel computation

NASA Astrophysics Data System (ADS)

Jiang, Y.; Xing, H. L.

2016-12-01

Micro-seismic events induced by water injection, mining activity or oil/gas extraction are quite informative, the interpretation of which can be applied for the reconstruction of underground stress and monitoring of hydraulic fracturing progress in oil/gas reservoirs. The source characterises and locations are crucial parameters that required for these purposes, which can be obtained through the waveform matching inversion (WMI) method. Therefore it is imperative to develop a WMI algorithm with high accuracy and convergence speed. Heuristic algorithm, as a category of nonlinear method, possesses a very high convergence speed and good capacity to overcome local minimal values, and has been well applied for many areas (e.g. image processing, artificial intelligence). However, its effectiveness for micro-seismic WMI is still poorly investigated; very few literatures exits that addressing this subject. In this research an advanced heuristic algorithm, gravitational search algorithm (GSA) , is proposed to estimate the focal mechanism (angle of strike, dip and rake) and source locations in three dimension. Unlike traditional inversion methods, the heuristic algorithm inversion does not require the approximation of green function. The method directly interacts with a CPU parallelized finite difference forward modelling engine, and updating the model parameters under GSA criterions. The effectiveness of this method is tested with synthetic data form a multi-layered elastic model; the results indicate GSA can be well applied on WMI and has its unique advantages. Keywords: Micro-seismicity, Waveform matching inversion, gravitational search algorithm, parallel computation
Numerical Simulation of Flow Field Within Parallel Plate Plastometer

NASA Technical Reports Server (NTRS)

Antar, Basil N.

2002-01-01

Parallel Plate Plastometer (PPP) is a device commonly used for measuring the viscosity of high polymers at low rates of shear in the range 10(exp 4) to 10(exp 9) poises. This device is being validated for use in measuring the viscosity of liquid glasses at high temperatures having similar ranges for the viscosity values. PPP instrument consists of two similar parallel plates, both in the range of 1 inch in diameter with the upper plate being movable while the lower one is kept stationary. Load is applied to the upper plate by means of a beam connected to shaft attached to the upper plate. The viscosity of the fluid is deduced from measuring the variation of the plate separation, h, as a function of time when a specified fixed load is applied on the beam. Operating plate speeds measured with the PPP is usually in the range of 10.3 cm/s or lower. The flow field within the PPP can be simulated using the equations of motion of fluid flow for this configuration. With flow speeds in the range quoted above the flow field between the two plates is certainly incompressible and laminar. Such flows can be easily simulated using numerical modeling with computational fluid dynamics (CFD) codes. We present below the mathematical model used to simulate this flow field and also the solutions obtained for the flow using a commercially available finite element CFD code.
Applications of High Speed Networks

DTIC Science & Technology

1991-09-01

plished in order to achieve a dpgree of parallelism by constructing a distributed switch. The type of switch, self -routing, processes the packet...control more than a dozen missiles in flight, and the four Mark 99 target illuminators direct missiles in the terminal phase. The self -contained Phalanx...military installations, weapon system respose and expected missile performance against a threat. Projects are already underway transposing of
Special-purpose computer for holography HORN-2

NASA Astrophysics Data System (ADS)

Ito, Tomoyoshi; Eldeib, Hesham; Yoshida, Kenji; Takahashi, Shinya; Yabe, Takashi; Kunugi, Tomoaki

1996-01-01

We designed and built a special-purpose computer for holography, HORN-2 (HOlographic ReconstructioN). HORN-2 calculates light intensity at high speed of 0.3 Gflops per one board with single (32-bit floating point) precision. The cost of the board is 500 000 Japanese yen (5000 US dollar). We made three boards. Operating them in parallel, we get about 1 Gflops.
Playback system designed for X-Band SAR

NASA Astrophysics Data System (ADS)

Yuquan, Liu; Changyong, Dou

2014-03-01

SAR(Synthetic Aperture Radar) has extensive application because it is daylight and weather independent. In particular, X-Band SAR strip map, designed by Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, provides high ground resolution images, at the same time it has a large spatial coverage and a short acquisition time, so it is promising in multi-applications. When sudden disaster comes, the emergency situation acquires radar signal data and image as soon as possible, in order to take action to reduce loss and save lives in the first time. This paper summarizes a type of X-Band SAR playback processing system designed for disaster response and scientific needs. It describes SAR data workflow includes the payload data transmission and reception process. Playback processing system completes signal analysis on the original data, providing SAR level 0 products and quick image. Gigabit network promises radar signal transmission efficiency from recorder to calculation unit. Multi-thread parallel computing and ping pong operation can ensure computation speed. Through gigabit network, multi-thread parallel computing and ping pong operation, high speed data transmission and processing meet the SAR radar data playback real time requirement.
Fast disk array for image storage

NASA Astrophysics Data System (ADS)

Feng, Dan; Zhu, Zhichun; Jin, Hai; Zhang, Jiangling

1997-01-01

A fast disk array is designed for the large continuous image storage. It includes a high speed data architecture and the technology of data striping and organization on the disk array. The high speed data path which is constructed by two dual port RAM and some control circuit is configured to transfer data between a host system and a plurality of disk drives. The bandwidth can be more than 100 MB/s if the data path based on PCI (peripheral component interconnect). The organization of data stored on the disk array is similar to RAID 4. Data are striped on a plurality of disk, and each striping unit is equal to a track. I/O instructions are performed in parallel on the disk drives. An independent disk is used to store the parity information in the fast disk array architecture. By placing the parity generation circuit directly on the SCSI (or SCSI 2) bus, the parity information can be generated on the fly. It will affect little on the data writing in parallel on the other disks. The fast disk array architecture designed in the paper can meet the demands of the image storage.
The energy density distribution of an ideal gas and Bernoulli’s equations

NASA Astrophysics Data System (ADS)

Santos, Leonardo S. F.

2018-05-01

This work discusses the energy density distribution in an ideal gas and the consequences of Bernoulli’s equation and the corresponding relation for compressible fluids. The aim of this work is to study how Bernoulli’s equation determines the energy flow in a fluid, although Bernoulli’s equation does not describe the energy density itself. The model from molecular dynamic considerations that describes an ideal gas at rest with uniform density is modified to explore the gas in motion with non-uniform density and gravitational effects. The difference between the component of the speed of a particle that is parallel to the gas speed and the gas speed itself is called ‘parallel random speed’. The pressure from the ‘parallel random speed’ is denominated as parallel pressure. The modified model predicts that the energy density is the sum of kinetic and potential gravitational energy densities plus two terms with static and parallel pressures. The application of Bernoulli’s equation and the corresponding relation for compressible fluids in the energy density expression has resulted in two new formulations. For incompressible and compressible gas, the energy density expressions are written as a function of stagnation, static and parallel pressures, without any dependence on kinetic or gravitational potential energy densities. These expressions of the energy density are the main contributions of this work. When the parallel pressure was uniform, the energy density distribution for incompressible approximation and compressible gas did not converge to zero for the limit of null static pressure. This result is rather unusual because the temperature tends to zero for null pressure. When the gas was considered incompressible and the parallel pressure was equal to static pressure, the energy density maintained this unusual behaviour with small pressures. If the parallel pressure was equal to static pressure, the energy density converged to zero for the limit of the null pressure only if the gas was compressible. Only the last situation describes an intuitive behaviour for an ideal gas.
Real-time speckle variance swept-source optical coherence tomography using a graphics processing unit.

PubMed

Lee, Kenneth K C; Mariampillai, Adrian; Yu, Joe X Z; Cadotte, David W; Wilson, Brian C; Standish, Beau A; Yang, Victor X D

2012-07-01

Advances in swept source laser technology continues to increase the imaging speed of swept-source optical coherence tomography (SS-OCT) systems. These fast imaging speeds are ideal for microvascular detection schemes, such as speckle variance (SV), where interframe motion can cause severe imaging artifacts and loss of vascular contrast. However, full utilization of the laser scan speed has been hindered by the computationally intensive signal processing required by SS-OCT and SV calculations. Using a commercial graphics processing unit that has been optimized for parallel data processing, we report a complete high-speed SS-OCT platform capable of real-time data acquisition, processing, display, and saving at 108,000 lines per second. Subpixel image registration of structural images was performed in real-time prior to SV calculations in order to reduce decorrelation from stationary structures induced by the bulk tissue motion. The viability of the system was successfully demonstrated in a high bulk tissue motion scenario of human fingernail root imaging where SV images (512 × 512 pixels, n = 4) were displayed at 54 frames per second.
Application Characterization at Scale: Lessons learned from developing a distributed Open Community Runtime system for High Performance Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Landwehr, Joshua B.; Suetterlein, Joshua D.; Marquez, Andres

2016-05-16

Since 2012, the U.S. Department of Energy’s X-Stack program has been developing solutions including runtime systems, programming models, languages, compilers, and tools for the Exascale system software to address crucial performance and power requirements. Fine grain programming models and runtime systems show a great potential to efficiently utilize the underlying hardware. Thus, they are essential to many X-Stack efforts. An abundant amount of small tasks can better utilize the vast parallelism available on current and future machines. Moreover, finer tasks can recover faster and adapt better, due to a decrease in state and control. Nevertheless, current applications have been writtenmore » to exploit old paradigms (such as Communicating Sequential Processor and Bulk Synchronous Parallel processing). To fully utilize the advantages of these new systems, applications need to be adapted to these new paradigms. As part of the applications’ porting process, in-depth characterization studies, focused on both application characteristics and runtime features, need to take place to fully understand the application performance bottlenecks and how to resolve them. This paper presents a characterization study for a novel high performance runtime system, called the Open Community Runtime, using key HPC kernels as its vehicle. This study has the following contributions: one of the first high performance, fine grain, distributed memory runtime system implementing the OCR standard (version 0.99a); and a characterization study of key HPC kernels in terms of runtime primitives running on both intra and inter node environments. Running on a general purpose cluster, we have found up to 1635x relative speed-up for a parallel tiled Cholesky Kernels on 128 nodes with 16 cores each and a 1864x relative speed-up for a parallel tiled Smith-Waterman kernel on 128 nodes with 30 cores.« less
Parallel algorithms for quantum chemistry. I. Integral transformations on a hypercube multiprocessor

DOE Office of Scientific and Technical Information (OSTI.GOV)

Whiteside, R.A.; Binkley, J.S.; Colvin, M.E.

1987-02-15

For many years it has been recognized that fundamental physical constraints such as the speed of light will limit the ultimate speed of single processor computers to less than about three billion floating point operations per second (3 GFLOPS). This limitation is becoming increasingly restrictive as commercially available machines are now within an order of magnitude of this asymptotic limit. A natural way to avoid this limit is to harness together many processors to work on a single computational problem. In principle, these parallel processing computers have speeds limited only by the number of processors one chooses to acquire. Themore » usefulness of potentially unlimited processing speed to a computationally intensive field such as quantum chemistry is obvious. If these methods are to be applied to significantly larger chemical systems, parallel schemes will have to be employed. For this reason we have developed distributed-memory algorithms for a number of standard quantum chemical methods. We are currently implementing these on a 32 processor Intel hypercube. In this paper we present our algorithm and benchmark results for one of the bottleneck steps in quantum chemical calculations: the four index integral transformation.« less
Efficient system interrupt concept design at the microprogramming level

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fakharzadeh, M.M.

1989-01-01

Over the past decade the demand for high speed super microcomputers has been tremendously increased. To satisfy this demand many high speed 32-bit microcomputers have been designed. However, the currently available 32-bit systems do not provide an adequate solution to many highly demanding problems such as in multitasking, and in interrupt driven applications, which both require context switching. Systems for these purposes usually incorporate sophisticated software. In order to be efficient, a high end microprocessor based system must satisfy stringent software demands. Although these microprocessors use the latest technology in the fabrication design and run at a very high speed,more » they still suffer from insufficient hardware support for such applications. All too often, this lack also is the premier cause of execution inefficiency. In this dissertation a micro-programmable control unit and operation unit is considered in an advanced design. An automaton controller is designed for high speed micro-level interrupt handling. Different stack models are designed for the single task and multitasking environment. The stacks are used for storage of various components of the processor during the interrupt calls, procedure calls, and task switching. A universal (as an example seven port) register file is designed for high speed parameter passing, and intertask communication in the multitasking environment. In addition, the register file provides a direct path between ALU and the peripheral data which is important in real-time control applications. The overall system is a highly parallel architecture, with no pipeline and internal cache memory, which allows the designer to be able to predict the processor's behavior during the critical times.« less
Scalable descriptive and correlative statistics with Titan.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Thompson, David C.; Pebay, Philippe Pierre

This report summarizes the existing statistical engines in VTK/Titan and presents the parallel versions thereof which have already been implemented. The ease of use of these parallel engines is illustrated by the means of C++ code snippets. Furthermore, this report justifies the design of these engines with parallel scalability in mind; then, this theoretical property is verified with test runs that demonstrate optimal parallel speed-up with up to 200 processors.

Parallel-Processing CMOS Circuitry for M-QAM and 8PSK TCM

NASA Technical Reports Server (NTRS)

Gray, Andrew; Lee, Dennis; Hoy, Scott; Fisher, Dave; Fong, Wai; Ghuman, Parminder

2009-01-01

There has been some additional development of parts reported in "Multi-Modulator for Bandwidth-Efficient Communication" (NPO-40807), NASA Tech Briefs, Vol. 32, No. 6 (June 2009), page 34. The focus was on 1) The generation of M-order quadrature amplitude modulation (M-QAM) and octonary-phase-shift-keying, trellis-coded modulation (8PSK TCM), 2) The use of square-root raised-cosine pulse-shaping filters, 3) A parallel-processing architecture that enables low-speed [complementary metal oxide/semiconductor (CMOS)] circuitry to perform the coding, modulation, and pulse-shaping computations at a high rate; and 4) Implementation of the architecture in a CMOS field-programmable gate array.
Enhanced Axial Resolution of Wide-Field Two-Photon Excitation Microscopy by Line Scanning Using a Digital Micromirror Device.

PubMed

Park, Jong Kang; Rowlands, Christopher J; So, Peter T C

2017-01-01

Temporal focusing multiphoton microscopy is a technique for performing highly parallelized multiphoton microscopy while still maintaining depth discrimination. While the conventional wide-field configuration for temporal focusing suffers from sub-optimal axial resolution, line scanning temporal focusing, implemented here using a digital micromirror device (DMD), can provide substantial improvement. The DMD-based line scanning temporal focusing technique dynamically trades off the degree of parallelization, and hence imaging speed, for axial resolution, allowing performance parameters to be adapted to the experimental requirements. We demonstrate this new instrument in calibration specimens and in biological specimens, including a mouse kidney slice.
Enhanced Axial Resolution of Wide-Field Two-Photon Excitation Microscopy by Line Scanning Using a Digital Micromirror Device

PubMed Central

Park, Jong Kang; Rowlands, Christopher J.; So, Peter T. C.

2017-01-01

Temporal focusing multiphoton microscopy is a technique for performing highly parallelized multiphoton microscopy while still maintaining depth discrimination. While the conventional wide-field configuration for temporal focusing suffers from sub-optimal axial resolution, line scanning temporal focusing, implemented here using a digital micromirror device (DMD), can provide substantial improvement. The DMD-based line scanning temporal focusing technique dynamically trades off the degree of parallelization, and hence imaging speed, for axial resolution, allowing performance parameters to be adapted to the experimental requirements. We demonstrate this new instrument in calibration specimens and in biological specimens, including a mouse kidney slice. PMID:29387484
Using domain decomposition in the multigrid NAS parallel benchmark on the Fujitsu VPP500

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang, J.C.H.; Lung, H.; Katsumata, Y.

1995-12-01

In this paper, we demonstrate how domain decomposition can be applied to the multigrid algorithm to convert the code for MPP architectures. We also discuss the performance and scalability of this implementation on the new product line of Fujitsu`s vector parallel computer, VPP500. This computer has Fujitsu`s well-known vector processor as the PE each rated at 1.6 C FLOPS. The high speed crossbar network rated at 800 MB/s provides the inter-PE communication. The results show that the physical domain decomposition is the best way to solve MG problems on VPP500.
Applications of Emerging Parallel Optical Link Technology to High Energy Physics Experiments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chramowicz, J.; Kwan, S.; Prosser, A.

2011-09-01

Modern particle detectors depend upon optical fiber links to deliver event data to upstream trigger and data processing systems. Future detector systems can benefit from the development of dense arrangements of high speed optical links emerging from the telecommunications and storage area network market segments. These links support data transfers in each direction at rates up to 120 Gbps in packages that minimize or even eliminate edge connector requirements. Emerging products include a class of devices known as optical engines which permit assembly of the optical transceivers in close proximity to the electrical interfaces of ASICs and FPGAs which handlemore » the data in parallel electrical format. Such assemblies will reduce required printed circuit board area and minimize electromagnetic interference and susceptibility. We will present test results of some of these parallel components and report on the development of pluggable FPGA Mezzanine Cards equipped with optical engines to provide to collaborators on the Versatile Link Common Project for the HI-LHC at CERN.« less
The Software Correlator of the Chinese VLBI Network

NASA Technical Reports Server (NTRS)

Zheng, Weimin; Quan, Ying; Shu, Fengchun; Chen, Zhong; Chen, Shanshan; Wang, Weihua; Wang, Guangli

2010-01-01

The software correlator of the Chinese VLBI Network (CVN) has played an irreplaceable role in the CVN routine data processing, e.g., in the Chinese lunar exploration project. This correlator will be upgraded to process geodetic and astronomical observation data. In the future, with several new stations joining the network, CVN will carry out crustal movement observations, quick UT1 measurements, astrophysical observations, and deep space exploration activities. For the geodetic or astronomical observations, we need a wide-band 10-station correlator. For spacecraft tracking, a realtime and highly reliable correlator is essential. To meet the scientific and navigation requirements of CVN, two parallel software correlators in the multiprocessor environments are under development. A high speed, 10-station prototype correlator using the mixed Pthreads and MPI (Massage Passing Interface) parallel algorithm on a computer cluster platform is being developed. Another real-time software correlator for spacecraft tracking adopts the thread-parallel technology, and it runs on the SMP (Symmetric Multiple Processor) servers. Both correlators have the characteristic of flexible structure and scalability.
Flexbar 3.0 - SIMD and multicore parallelization.

PubMed

Roehr, Johannes T; Dieterich, Christoph; Reinert, Knut

2017-09-15

High-throughput sequencing machines can process many samples in a single run. For Illumina systems, sequencing reads are barcoded with an additional DNA tag that is contained in the respective sequencing adapters. The recognition of barcode and adapter sequences is hence commonly needed for the analysis of next-generation sequencing data. Flexbar performs demultiplexing based on barcodes and adapter trimming for such data. The massive amounts of data generated on modern sequencing machines demand that this preprocessing is done as efficiently as possible. We present Flexbar 3.0, the successor of the popular program Flexbar. It employs now twofold parallelism: multi-threading and additionally SIMD vectorization. Both types of parallelism are used to speed-up the computation of pair-wise sequence alignments, which are used for the detection of barcodes and adapters. Furthermore, new features were included to cover a wide range of applications. We evaluated the performance of Flexbar based on a simulated sequencing dataset. Our program outcompetes other tools in terms of speed and is among the best tools in the presented quality benchmark. https://github.com/seqan/flexbar. johannes.roehr@fu-berlin.de or knut.reinert@fu-berlin.de. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Novel approach for image skeleton and distance transformation parallel algorithms

NASA Astrophysics Data System (ADS)

Qing, Kent P.; Means, Robert W.

1994-05-01

Image Understanding is more important in medical imaging than ever, particularly where real-time automatic inspection, screening and classification systems are installed. Skeleton and distance transformations are among the common operations that extract useful information from binary images and aid in Image Understanding. The distance transformation describes the objects in an image by labeling every pixel in each object with the distance to its nearest boundary. The skeleton algorithm starts from the distance transformation and finds the set of pixels that have a locally maximum label. The distance algorithm has to scan the entire image several times depending on the object width. For each pixel, the algorithm must access the neighboring pixels and find the maximum distance from the nearest boundary. It is a computational and memory access intensive procedure. In this paper, we propose a novel parallel approach to the distance transform and skeleton algorithms using the latest VLSI high- speed convolutional chips such as HNC's ViP. The algorithm speed is dependent on the object's width and takes (k + [(k-1)/3]) * 7 milliseconds for a 512 X 512 image with k being the maximum distance of the largest object. All objects in the image will be skeletonized at the same time in parallel.
Development of high-speed video cameras

NASA Astrophysics Data System (ADS)

Etoh, Takeharu G.; Takehara, Kohsei; Okinaka, Tomoo; Takano, Yasuhide; Ruckelshausen, Arno; Poggemann, Dirk

2001-04-01

Presented in this paper is an outline of the R and D activities on high-speed video cameras, which have been done in Kinki University since more than ten years ago, and are currently proceeded as an international cooperative project with University of Applied Sciences Osnabruck and other organizations. Extensive marketing researches have been done, (1) on user's requirements on high-speed multi-framing and video cameras by questionnaires and hearings, and (2) on current availability of the cameras of this sort by search of journals and websites. Both of them support necessity of development of a high-speed video camera of more than 1 million fps. A video camera of 4,500 fps with parallel readout was developed in 1991. A video camera with triple sensors was developed in 1996. The sensor is the same one as developed for the previous camera. The frame rate is 50 million fps for triple-framing and 4,500 fps for triple-light-wave framing, including color image capturing. Idea on a video camera of 1 million fps with an ISIS, In-situ Storage Image Sensor, was proposed in 1993 at first, and has been continuously improved. A test sensor was developed in early 2000, and successfully captured images at 62,500 fps. Currently, design of a prototype ISIS is going on, and, hopefully, will be fabricated in near future. Epoch-making cameras in history of development of high-speed video cameras by other persons are also briefly reviewed.
Kinetic treatment of nonlinear magnetized plasma motions - General geometry and parallel waves

NASA Technical Reports Server (NTRS)

Khabibrakhmanov, I. KH.; Galinskii, V. L.; Verheest, F.

1992-01-01

The expansion of kinetic equations in the limit of a strong magnetic field is presented. This gives a natural description of the motions of magnetized plasmas, which are slow compared to the particle gyroperiods and gyroradii. Although the approach is 3D, this very general result is used only to focus on the parallel propagation of nonlinear Alfven waves. The derivative nonlinear Schroedinger-like equation is obtained. Two new terms occur compared to earlier treatments, a nonlinear term proportional to the heat flux along the magnetic field line and a higher-order dispersive term. It is shown that kinetic description avoids the singularities occurring in magnetohydrodynamic or multifluid approaches, which correspond to the degenerate case of sound speeds equal to the Alfven speed, and that parallel heat fluxes cannot be neglected, not even in the case of low parallel plasma beta. A truly stationary soliton solution is derived.
Study of Solid State Drives performance in PROOF distributed analysis system

NASA Astrophysics Data System (ADS)

Panitkin, S. Y.; Ernst, M.; Petkus, R.; Rind, O.; Wenaus, T.

2010-04-01

Solid State Drives (SSD) is a promising storage technology for High Energy Physics parallel analysis farms. Its combination of low random access time and relatively high read speed is very well suited for situations where multiple jobs concurrently access data located on the same drive. It also has lower energy consumption and higher vibration tolerance than Hard Disk Drive (HDD) which makes it an attractive choice in many applications raging from personal laptops to large analysis farms. The Parallel ROOT Facility - PROOF is a distributed analysis system which allows to exploit inherent event level parallelism of high energy physics data. PROOF is especially efficient together with distributed local storage systems like Xrootd, when data are distributed over computing nodes. In such an architecture the local disk subsystem I/O performance becomes a critical factor, especially when computing nodes use multi-core CPUs. We will discuss our experience with SSDs in PROOF environment. We will compare performance of HDD with SSD in I/O intensive analysis scenarios. In particular we will discuss PROOF system performance scaling with a number of simultaneously running analysis jobs.
Biomechanical Comparison of Parallel and Crossed Suture Repair for Longitudinal Meniscus Tears.

PubMed

Milchteim, Charles; Branch, Eric A; Maughon, Ty; Hughey, Jay; Anz, Adam W

2016-04-01

Longitudinal meniscus tears are commonly encountered in clinical practice. Meniscus repair devices have been previously tested and presented; however, prior studies have not evaluated repair construct designs head to head. This study compared a new-generation meniscus repair device, SpeedCinch, with a similar established device, Fast-Fix 360, and a parallel repair construct to a crossed construct. Both devices utilize self-adjusting No. 2-0 ultra-high molecular weight polyethylene (UHMWPE) and 2 polyether ether ketone (PEEK) anchors. Crossed suture repair constructs have higher failure loads and stiffness compared with simple parallel constructs. The newer repair device would exhibit similar performance to an established device. Controlled laboratory study. Sutures were placed in an open fashion into the body and posterior horn regions of the medial and lateral menisci in 16 cadaveric knees. Evaluation of 2 repair devices and 2 repair constructs created 4 groups: 2 parallel vertical sutures created with the Fast-Fix 360 (2PFF), 2 crossed vertical sutures created with the Fast-Fix 360 (2XFF), 2 parallel vertical sutures created with the SpeedCinch (2PSC), and 2 crossed vertical sutures created with the SpeedCinch (2XSC). After open placement of the repair construct, each meniscus was explanted and tested to failure on a uniaxial material testing machine. All data were checked for normality of distribution, and 1-way analysis of variance by ranks was chosen to evaluate for statistical significance of maximum failure load and stiffness between groups. Statistical significance was defined as P < .05. The mean maximum failure loads ± 95% CI (range) were 89.6 ± 16.3 N (125.7-47.8 N) (2PFF), 72.1 ± 11.7 N (103.4-47.6 N) (2XFF), 71.9 ± 15.5 N (109.4-41.3 N) (2PSC), and 79.5 ± 25.4 N (119.1-30.9 N) (2XSC). Interconstruct comparison revealed no statistical difference between all 4 constructs regarding maximum failure loads (P = .49). Stiffness values were also similar, with no statistical difference on comparison (P = .28). Both devices in the current study had similar failure load and stiffness when 2 vertical or 2 crossed sutures were tested in cadaveric human menisci. Simple parallel vertical sutures perform similarly to crossed suture patterns at the time of implantation.
Massively Parallel Rogue Cell Detection Using Serial Time-Encoded Amplified Microscopy of Inertially Ordered Cells in High-Throughput Flow

DTIC Science & Technology

2012-08-01

techniques and STEAM imager. It couples the high-speed capability of the STEAM imager and differential phase contrast imaging of DIC / Nomarski microscopy...On 10 TPE chips, we obtained 9 homogenous and strong bonds, the failed bond being due to operator error and presence of air bubbles in the TPE...instruments, structural dynamics, and microelectromechanical systems (MEMS) via laser-scanning surface vibrometry , and observation of biomechanical motility
Bayer image parallel decoding based on GPU

NASA Astrophysics Data System (ADS)

Hu, Rihui; Xu, Zhiyong; Wei, Yuxing; Sun, Shaohua

2012-11-01

In the photoelectrical tracking system, Bayer image is decompressed in traditional method, which is CPU-based. However, it is too slow when the images become large, for example, 2K×2K×16bit. In order to accelerate the Bayer image decoding, this paper introduces a parallel speedup method for NVIDA's Graphics Processor Unit (GPU) which supports CUDA architecture. The decoding procedure can be divided into three parts: the first is serial part, the second is task-parallelism part, and the last is data-parallelism part including inverse quantization, inverse discrete wavelet transform (IDWT) as well as image post-processing part. For reducing the execution time, the task-parallelism part is optimized by OpenMP techniques. The data-parallelism part could advance its efficiency through executing on the GPU as CUDA parallel program. The optimization techniques include instruction optimization, shared memory access optimization, the access memory coalesced optimization and texture memory optimization. In particular, it can significantly speed up the IDWT by rewriting the 2D (Tow-dimensional) serial IDWT into 1D parallel IDWT. Through experimenting with 1K×1K×16bit Bayer image, data-parallelism part is 10 more times faster than CPU-based implementation. Finally, a CPU+GPU heterogeneous decompression system was designed. The experimental result shows that it could achieve 3 to 5 times speed increase compared to the CPU serial method.
Investigation of high-speed shaft bearing loads in wind turbine gearboxes through dynamometer testing

DOE PAGES

Guo, Yi; Keller, Jonathan

2017-11-10

Many wind turbine gearboxes require repair or replacement well before reaching the end of their design life. The most common failure is bearing axial cracks, commonly called white etching cracks (WECs), which typically occur in the inner raceways of the high-speed parallel-stage rolling element bearings. Although the root causes of WECs are debated, one theory is that they are related to routine dynamic operating conditions and occasional transient events prevalent in wind turbines that can result in high bearing stress and sliding of the rolling elements. Here, this paper examined wind turbine gearbox high-speed shaft bearing loads and stresses throughmore » modeling and full-scale dynamometer testing. Bearing outer race loads were directly measured and predicted using a variety of modeling tools in normal operations, misaligned conditions, and transient events particularly prone to bearing sliding. Test data and models of bearing loads were well correlated. Neither operational misalignment due to rotor moments nor static generator misalignment affected the bearing loads when compared with pure-torque conditions. Thus, it is not likely that generator misalignment is a causal factor of WECs. In contrast, during transient events, the bearings experienced alternating periods of high stress, torque reversals, and loads under the minimum requisite at high rotating speeds while showing indications of sliding, all of which could be related to the formation of WECs.« less
Investigation of high-speed shaft bearing loads in wind turbine gearboxes through dynamometer testing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Guo, Yi; Keller, Jonathan

Many wind turbine gearboxes require repair or replacement well before reaching the end of their design life. The most common failure is bearing axial cracks, commonly called white etching cracks (WECs), which typically occur in the inner raceways of the high-speed parallel-stage rolling element bearings. Although the root causes of WECs are debated, one theory is that they are related to routine dynamic operating conditions and occasional transient events prevalent in wind turbines that can result in high bearing stress and sliding of the rolling elements. Here, this paper examined wind turbine gearbox high-speed shaft bearing loads and stresses throughmore » modeling and full-scale dynamometer testing. Bearing outer race loads were directly measured and predicted using a variety of modeling tools in normal operations, misaligned conditions, and transient events particularly prone to bearing sliding. Test data and models of bearing loads were well correlated. Neither operational misalignment due to rotor moments nor static generator misalignment affected the bearing loads when compared with pure-torque conditions. Thus, it is not likely that generator misalignment is a causal factor of WECs. In contrast, during transient events, the bearings experienced alternating periods of high stress, torque reversals, and loads under the minimum requisite at high rotating speeds while showing indications of sliding, all of which could be related to the formation of WECs.« less
Towards Autonomous Modular UAV Missions: The Detection, Geo-Location and Landing Paradigm

PubMed Central

Kyristsis, Sarantis; Antonopoulos, Angelos; Chanialakis, Theofilos; Stefanakis, Emmanouel; Linardos, Christos; Tripolitsiotis, Achilles; Partsinevelos, Panagiotis

2016-01-01

Nowadays, various unmanned aerial vehicle (UAV) applications become increasingly demanding since they require real-time, autonomous and intelligent functions. Towards this end, in the present study, a fully autonomous UAV scenario is implemented, including the tasks of area scanning, target recognition, geo-location, monitoring, following and finally landing on a high speed moving platform. The underlying methodology includes AprilTag target identification through Graphics Processing Unit (GPU) parallelized processing, image processing and several optimized locations and approach algorithms employing gimbal movement, Global Navigation Satellite System (GNSS) readings and UAV navigation. For the experimentation, a commercial and a custom made quad-copter prototype were used, portraying a high and a low-computational embedded platform alternative. Among the successful targeting and follow procedures, it is shown that the landing approach can be successfully performed even under high platform speeds. PMID:27827883
Towards Autonomous Modular UAV Missions: The Detection, Geo-Location and Landing Paradigm.

PubMed

Kyristsis, Sarantis; Antonopoulos, Angelos; Chanialakis, Theofilos; Stefanakis, Emmanouel; Linardos, Christos; Tripolitsiotis, Achilles; Partsinevelos, Panagiotis

2016-11-03

Nowadays, various unmanned aerial vehicle (UAV) applications become increasingly demanding since they require real-time, autonomous and intelligent functions. Towards this end, in the present study, a fully autonomous UAV scenario is implemented, including the tasks of area scanning, target recognition, geo-location, monitoring, following and finally landing on a high speed moving platform. The underlying methodology includes AprilTag target identification through Graphics Processing Unit (GPU) parallelized processing, image processing and several optimized locations and approach algorithms employing gimbal movement, Global Navigation Satellite System (GNSS) readings and UAV navigation. For the experimentation, a commercial and a custom made quad-copter prototype were used, portraying a high and a low-computational embedded platform alternative. Among the successful targeting and follow procedures, it is shown that the landing approach can be successfully performed even under high platform speeds.
The high performance parallel algorithm for Unified Gas-Kinetic Scheme

NASA Astrophysics Data System (ADS)

Li, Shiyi; Li, Qibing; Fu, Song; Xu, Jinxiu

2016-11-01

A high performance parallel algorithm for UGKS is developed to simulate three-dimensional flows internal and external on arbitrary grid system. The physical domain and velocity domain are divided into different blocks and distributed according to the two-dimensional Cartesian topology with intra-communicators in physical domain for data exchange and other intra-communicators in velocity domain for sum reduction to moment integrals. Numerical results of three-dimensional cavity flow and flow past a sphere agree well with the results from the existing studies and validate the applicability of the algorithm. The scalability of the algorithm is tested both on small (1-16) and large (729-5832) scale processors. The tested speed-up ratio is near linear ashind thus the efficiency is around 1, which reveals the good scalability of the present algorithm.
Extended Logic Intelligent Processing System for a Sensor Fusion Processor Hardware

NASA Technical Reports Server (NTRS)

Stoica, Adrian; Thomas, Tyson; Li, Wei-Te; Daud, Taher; Fabunmi, James

2000-01-01

The paper presents the hardware implementation and initial tests from a low-power, highspeed reconfigurable sensor fusion processor. The Extended Logic Intelligent Processing System (ELIPS) is described, which combines rule-based systems, fuzzy logic, and neural networks to achieve parallel fusion of sensor signals in compact low power VLSI. The development of the ELIPS concept is being done to demonstrate the interceptor functionality which particularly underlines the high speed and low power requirements. The hardware programmability allows the processor to reconfigure into different machines, taking the most efficient hardware implementation during each phase of information processing. Processing speeds of microseconds have been demonstrated using our test hardware.

Using parallel banded linear system solvers in generalized eigenvalue problems

NASA Technical Reports Server (NTRS)

Zhang, Hong; Moss, William F.

1993-01-01

Subspace iteration is a reliable and cost effective method for solving positive definite banded symmetric generalized eigenproblems, especially in the case of large scale problems. This paper discusses an algorithm that makes use of two parallel banded solvers in subspace iteration. A shift is introduced to decompose the banded linear systems into relatively independent subsystems and to accelerate the iterations. With this shift, an eigenproblem is mapped efficiently into the memories of a multiprocessor and a high speed-up is obtained for parallel implementations. An optimal shift is a shift that balances total computation and communication costs. Under certain conditions, we show how to estimate an optimal shift analytically using the decay rate for the inverse of a banded matrix, and how to improve this estimate. Computational results on iPSC/2 and iPSC/860 multiprocessors are presented.
a Spatiotemporal Aggregation Query Method Using Multi-Thread Parallel Technique Based on Regional Division

NASA Astrophysics Data System (ADS)

Liao, S.; Chen, L.; Li, J.; Xiong, W.; Wu, Q.

2015-07-01

Existing spatiotemporal database supports spatiotemporal aggregation query over massive moving objects datasets. Due to the large amounts of data and single-thread processing method, the query speed cannot meet the application requirements. On the other hand, the query efficiency is more sensitive to spatial variation then temporal variation. In this paper, we proposed a spatiotemporal aggregation query method using multi-thread parallel technique based on regional divison and implemented it on the server. Concretely, we divided the spatiotemporal domain into several spatiotemporal cubes, computed spatiotemporal aggregation on all cubes using the technique of multi-thread parallel processing, and then integrated the query results. By testing and analyzing on the real datasets, this method has improved the query speed significantly.
A Hybrid Shared-Memory Parallel Max-Tree Algorithm for Extreme Dynamic-Range Images.

PubMed

Moschini, Ugo; Meijster, Arnold; Wilkinson, Michael H F

2018-03-01

Max-trees, or component trees, are graph structures that represent the connected components of an image in a hierarchical way. Nowadays, many application fields rely on images with high-dynamic range or floating point values. Efficient sequential algorithms exist to build trees and compute attributes for images of any bit depth. However, we show that the current parallel algorithms perform poorly already with integers at bit depths higher than 16 bits per pixel. We propose a parallel method combining the two worlds of flooding and merging max-tree algorithms. First, a pilot max-tree of a quantized version of the image is built in parallel using a flooding method. Later, this structure is used in a parallel leaf-to-root approach to compute efficiently the final max-tree and to drive the merging of the sub-trees computed by the threads. We present an analysis of the performance both on simulated and actual 2D images and 3D volumes. Execution times are about better than the fastest sequential algorithm and speed-up goes up to on 64 threads.
Parallelized Kalman-Filter-Based Reconstruction of Particle Tracks on Many-Core Architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cerati, Giuseppe; Elmer, Peter; Krutelyov, Slava

Faced with physical and energy density limitations on clock speed, contemporary microprocessor designers have increasingly turned to on-chip parallelism for performance gains. Examples include the Intel Xeon Phi, GPGPUs, and similar technologies. Algorithms should accordingly be designed with ample amounts of fine-grained parallelism if they are to realize the full performance of the hardware. This requirement can be challenging for algorithms that are naturally expressed as a sequence of small-matrix operations, such as the Kalman filter methods widely in use in high-energy physics experiments. In the High-Luminosity Large Hadron Collider (HL-LHC), for example, one of the dominant computational problems ismore » expected to be finding and fitting charged-particle tracks during event reconstruction; today, the most common track-finding methods are those based on the Kalman filter. Experience at the LHC, both in the trigger and offline, has shown that these methods are robust and provide high physics performance. Previously we reported the significant parallel speedups that resulted from our efforts to adapt Kalman-filter-based tracking to many-core architectures such as Intel Xeon Phi. Here we report on how effectively those techniques can be applied to more realistic detector configurations and event complexity.« less
Reference Values for Shear Wave Elastography of Neck and Shoulder Muscles in Healthy Individuals.

PubMed

Ewertsen, Caroline; Carlsen, Jonathan; Perveez, Mohammed Aftab; Schytz, Henrik

2018-01-01

to establish reference values for ultrasound shear-wave elastography for pericranial muscles in healthy individuals (m. trapezius, m. splenius capitis, m. semispinalis capitis, m. sternocleidomastoideus and m. masseter). Also to evaluate day-to-day variations in the shear-wave speeds and evaluate the effect of the pennation of the muscle fibers, ie scanning parallel or perpendicularly to the fibers. 10 healthy individuals (5 males and 5 females) had their pericranial muscles examined with shear-wave elastography in two orthogonal planes on two different days for their dominant and non-dominant side. Mean shear wave speeds from 5 ROI's in each muscle, for each scan plane for the dominant and non-dominant side for the two days were calculated. The effect of the different parameters - muscle pennation, gender, dominant vs non-dominant side and day was evaluated. The effect of scan plane in relation to muscle pennation was statistically significant (p<0.0001). The mean shear-wave speed when scanning parallel to the muscle fibers was significantly higher than the mean shear-wave speed when scanning perpendicularly to the fibers. The day-to-day variation was statistically significant (p=0.0258), but not clinically relevant. Shear-wave speeds differed significantly between muscles. Mean shear wave speeds (m/s) for the muscles in the parallel plane were: for masseter 2.45 (SD:+/-0.25), semispinal 3.36 (SD:+/-0.75), splenius 3.04 (SD:+/-0.65), sternocleidomastoid 2.75 (SD:+/-0.23), trapezius 3.20 (SD:+/-0.27) and trapezius lateral 3.87 (SD:+/-3.87). The shear wave speed variation depended on the direction of scanning. Shear wave elastography may be a method to evaluate muscle stiffness in patients suffering from chronic neck pain.
Concurrent Cuba

NASA Astrophysics Data System (ADS)

Hahn, T.

2016-10-01

The parallel version of the multidimensional numerical integration package Cuba is presented and achievable speed-ups discussed. The parallelization is based on the fork/wait POSIX functions, needs no extra software installed, imposes almost no constraints on the integrand function, and works largely automatically.
Evaluation of a new parallel numerical parameter optimization algorithm for a dynamical system

NASA Astrophysics Data System (ADS)

Duran, Ahmet; Tuncel, Mehmet

2016-10-01

It is important to have a scalable parallel numerical parameter optimization algorithm for a dynamical system used in financial applications where time limitation is crucial. We use Message Passing Interface parallel programming and present such a new parallel algorithm for parameter estimation. For example, we apply the algorithm to the asset flow differential equations that have been developed and analyzed since 1989 (see [3-6] and references contained therein). We achieved speed-up for some time series to run up to 512 cores (see [10]). Unlike [10], we consider more extensive financial market situations, for example, in presence of low volatility, high volatility and stock market price at a discount/premium to its net asset value with varying magnitude, in this work. Moreover, we evaluated the convergence of the model parameter vector, the nonlinear least squares error and maximum improvement factor to quantify the success of the optimization process depending on the number of initial parameter vectors.
Processing large remote sensing image data sets on Beowulf clusters

USGS Publications Warehouse

Steinwand, Daniel R.; Maddox, Brian; Beckmann, Tim; Schmidt, Gail

2003-01-01

High-performance computing is often concerned with the speed at which floating- point calculations can be performed. The architectures of many parallel computers and/or their network topologies are based on these investigations. Often, benchmarks resulting from these investigations are compiled with little regard to how a large dataset would move about in these systems. This part of the Beowulf study addresses that concern by looking at specific applications software and system-level modifications. Applications include an implementation of a smoothing filter for time-series data, a parallel implementation of the decision tree algorithm used in the Landcover Characterization project, a parallel Kriging algorithm used to fit point data collected in the field on invasive species to a regular grid, and modifications to the Beowulf project's resampling algorithm to handle larger, higher resolution datasets at a national scale. Systems-level investigations include a feasibility study on Flat Neighborhood Networks and modifications of that concept with Parallel File Systems.
Design Sketches For Optical Crossbar Switches Intended For Large-Scale Parallel Processing Applications

NASA Astrophysics Data System (ADS)

Hartmann, Alfred; Redfield, Steve

1989-04-01

This paper discusses design of large-scale (1000x 1000) optical crossbar switching networks for use in parallel processing supercom-puters. Alternative design sketches for an optical crossbar switching network are presented using free-space optical transmission with either a beam spreading/masking model or a beam steering model for internodal communications. The performances of alternative multiple access channel communications protocol-unslotted and slotted ALOHA and carrier sense multiple access (CSMA)-are compared with the performance of the classic arbitrated bus crossbar of conventional electronic parallel computing. These comparisons indicate an almost inverse relationship between ease of implementation and speed of operation. Practical issues of optical system design are addressed, and an optically addressed, composite spatial light modulator design is presented for fabrication to arbitrarily large scale. The wide range of switch architecture, communications protocol, optical systems design, device fabrication, and system performance problems presented by these design sketches poses a serious challenge to practical exploitation of highly parallel optical interconnects in advanced computer designs.
Solution of task related to control of swiss-type automatic lathe to get planes parallel to part axis

NASA Astrophysics Data System (ADS)

Tabekina, N. A.; Chepchurov, M. S.; Evtushenko, E. I.; Dmitrievsky, B. S.

2018-05-01

The work solves the problem of automation of machining process namely turning to produce parts having the planes parallel to an axis of rotation of part without using special tools. According to the results, the availability of the equipment of a high speed electromechanical drive to control the operative movements of lathe machine will enable one to get the planes parallel to the part axis. The method of getting planes parallel to the part axis is based on the mathematical model, which is presented as functional dependency between the conveying velocity of the driven element and the time. It describes the operative movements of lathe machine all over the tool path. Using the model of movement of the tool, it has been found that the conveying velocity varies from the maximum to zero value. It will allow one to carry out the reverse of the drive. The scheme of tool placement regarding the workpiece has been proposed for unidirectional movement of the driven element at high conveying velocity. The control method of CNC machines can be used for getting geometrically complex parts on the lathe without using special milling tools.
Toward real-time Monte Carlo simulation using a commercial cloud computing infrastructure.

PubMed

Wang, Henry; Ma, Yunzhi; Pratx, Guillem; Xing, Lei

2011-09-07

Monte Carlo (MC) methods are the gold standard for modeling photon and electron transport in a heterogeneous medium; however, their computational cost prohibits their routine use in the clinic. Cloud computing, wherein computing resources are allocated on-demand from a third party, is a new approach for high performance computing and is implemented to perform ultra-fast MC calculation in radiation therapy. We deployed the EGS5 MC package in a commercial cloud environment. Launched from a single local computer with Internet access, a Python script allocates a remote virtual cluster. A handshaking protocol designates master and worker nodes. The EGS5 binaries and the simulation data are initially loaded onto the master node. The simulation is then distributed among independent worker nodes via the message passing interface, and the results aggregated on the local computer for display and data analysis. The described approach is evaluated for pencil beams and broad beams of high-energy electrons and photons. The output of cloud-based MC simulation is identical to that produced by single-threaded implementation. For 1 million electrons, a simulation that takes 2.58 h on a local computer can be executed in 3.3 min on the cloud with 100 nodes, a 47× speed-up. Simulation time scales inversely with the number of parallel nodes. The parallelization overhead is also negligible for large simulations. Cloud computing represents one of the most important recent advances in supercomputing technology and provides a promising platform for substantially improved MC simulation. In addition to the significant speed up, cloud computing builds a layer of abstraction for high performance parallel computing, which may change the way dose calculations are performed and radiation treatment plans are completed.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Busbey, A.B.

Seismic Processing Workshop, a program by Parallel Geosciences of Austin, TX, is discussed in this column. The program is a high-speed, interactive seismic processing and computer analysis system for the Apple Macintosh II family of computers. Also reviewed in this column are three products from Wilkerson Associates of Champaign, IL. SubSide is an interactive program for basin subsidence analysis; MacFault and MacThrustRamp are programs for modeling faults.
Learning in Neural Networks: VLSI Implementation Strategies

NASA Technical Reports Server (NTRS)

Duong, Tuan Anh

1995-01-01

Fully-parallel hardware neural network implementations may be applied to high-speed recognition, classification, and mapping tasks in areas such as vision, or can be used as low-cost self-contained units for tasks such as error detection in mechanical systems (e.g. autos). Learning is required not only to satisfy application requirements, but also to overcome hardware-imposed limitations such as reduced dynamic range of connections.
Implementing An Image Understanding System Architecture Using Pipe

NASA Astrophysics Data System (ADS)

Luck, Randall L.

1988-03-01

This paper will describe PIPE and how it can be used to implement an image understanding system. Image understanding is the process of developing a description of an image in order to make decisions about its contents. The tasks of image understanding are generally split into low level vision and high level vision. Low level vision is performed by PIPE -a high performance parallel processor with an architecture specifically designed for processing video images at up to 60 fields per second. High level vision is performed by one of several types of serial or parallel computers - depending on the application. An additional processor called ISMAP performs the conversion from iconic image space to symbolic feature space. ISMAP plugs into one of PIPE's slots and is memory mapped into the high level processor. Thus it forms the high speed link between the low and high level vision processors. The mechanisms for bottom-up, data driven processing and top-down, model driven processing are discussed.
Theory and implementation of a very high throughput true random number generator in field programmable gate array

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang, Yonggang, E-mail: wangyg@ustc.edu.cn; Hui, Cong; Liu, Chong

The contribution of this paper is proposing a new entropy extraction mechanism based on sampling phase jitter in ring oscillators to make a high throughput true random number generator in a field programmable gate array (FPGA) practical. Starting from experimental observation and analysis of the entropy source in FPGA, a multi-phase sampling method is exploited to harvest the clock jitter with a maximum entropy and fast sampling speed. This parametrized design is implemented in a Xilinx Artix-7 FPGA, where the carry chains in the FPGA are explored to realize the precise phase shifting. The generator circuit is simple and resource-saving,more » so that multiple generation channels can run in parallel to scale the output throughput for specific applications. The prototype integrates 64 circuit units in the FPGA to provide a total output throughput of 7.68 Gbps, which meets the requirement of current high-speed quantum key distribution systems. The randomness evaluation, as well as its robustness to ambient temperature, confirms that the new method in a purely digital fashion can provide high-speed high-quality random bit sequences for a variety of embedded applications.« less
Theory and implementation of a very high throughput true random number generator in field programmable gate array.

PubMed

Wang, Yonggang; Hui, Cong; Liu, Chong; Xu, Chao

2016-04-01

The contribution of this paper is proposing a new entropy extraction mechanism based on sampling phase jitter in ring oscillators to make a high throughput true random number generator in a field programmable gate array (FPGA) practical. Starting from experimental observation and analysis of the entropy source in FPGA, a multi-phase sampling method is exploited to harvest the clock jitter with a maximum entropy and fast sampling speed. This parametrized design is implemented in a Xilinx Artix-7 FPGA, where the carry chains in the FPGA are explored to realize the precise phase shifting. The generator circuit is simple and resource-saving, so that multiple generation channels can run in parallel to scale the output throughput for specific applications. The prototype integrates 64 circuit units in the FPGA to provide a total output throughput of 7.68 Gbps, which meets the requirement of current high-speed quantum key distribution systems. The randomness evaluation, as well as its robustness to ambient temperature, confirms that the new method in a purely digital fashion can provide high-speed high-quality random bit sequences for a variety of embedded applications.
The Xpress Transfer Protocol (XTP): A tutorial (expanded version)

NASA Technical Reports Server (NTRS)

Sanders, Robert M.; Weaver, Alfred C.

1990-01-01

The Xpress Transfer Protocol (XTP) is a reliable, real-time, light weight transfer layer protocol. Current transport layer protocols such as DoD's Transmission Control Protocol (TCP) and ISO's Transport Protocol (TP) were not designed for the next generation of high speed, interconnected reliable networks such as fiber distributed data interface (FDDI) and the gigabit/second wide area networks. Unlike all previous transport layer protocols, XTP is being designed to be implemented in hardware as a VLSI chip set. By streamlining the protocol, combining the transport and network layers and utilizing the increased speed and parallelization possible with a VLSI implementation, XTP will be able to provide the end-to-end data transmission rates demanded in high speed networks without compromising reliability and functionality. This paper describes the operation of the XTP protocol and in particular, its error, flow and rate control; inter-networking addressing mechanisms; and multicast support features, as defined in the XTP Protocol Definition Revision 3.4.
Rapid and highly integrated FPGA-based Shack-Hartmann wavefront sensor for adaptive optics system

NASA Astrophysics Data System (ADS)

Chen, Yi-Pin; Chang, Chia-Yuan; Chen, Shean-Jen

2018-02-01

In this study, a field programmable gate array (FPGA)-based Shack-Hartmann wavefront sensor (SHWS) programmed on LabVIEW can be highly integrated into customized applications such as adaptive optics system (AOS) for performing real-time wavefront measurement. Further, a Camera Link frame grabber embedded with FPGA is adopted to enhance the sensor speed reacting to variation considering its advantage of the highest data transmission bandwidth. Instead of waiting for a frame image to be captured by the FPGA, the Shack-Hartmann algorithm are implemented in parallel processing blocks design and let the image data transmission synchronize with the wavefront reconstruction. On the other hand, we design a mechanism to control the deformable mirror in the same FPGA and verify the Shack-Hartmann sensor speed by controlling the frequency of the deformable mirror dynamic surface deformation. Currently, this FPGAbead SHWS design can achieve a 266 Hz cyclic speed limited by the camera frame rate as well as leaves 40% logic slices for additionally flexible design.
A Stochastic Spiking Neural Network for Virtual Screening.

PubMed

Morro, A; Canals, V; Oliver, A; Alomar, M L; Galan-Prado, F; Ballester, P J; Rossello, J L

2018-04-01

Virtual screening (VS) has become a key computational tool in early drug design and screening performance is of high relevance due to the large volume of data that must be processed to identify molecules with the sought activity-related pattern. At the same time, the hardware implementations of spiking neural networks (SNNs) arise as an emerging computing technique that can be applied to parallelize processes that normally present a high cost in terms of computing time and power. Consequently, SNN represents an attractive alternative to perform time-consuming processing tasks, such as VS. In this brief, we present a smart stochastic spiking neural architecture that implements the ultrafast shape recognition (USR) algorithm achieving two order of magnitude of speed improvement with respect to USR software implementations. The neural system is implemented in hardware using field-programmable gate arrays allowing a highly parallelized USR implementation. The results show that, due to the high parallelization of the system, millions of compounds can be checked in reasonable times. From these results, we can state that the proposed architecture arises as a feasible methodology to efficiently enhance time-consuming data-mining processes such as 3-D molecular similarity search.
High speed real-time wavefront processing system for a solid-state laser system

NASA Astrophysics Data System (ADS)

Liu, Yuan; Yang, Ping; Chen, Shanqiu; Ma, Lifang; Xu, Bing

2008-03-01

A high speed real-time wavefront processing system for a solid-state laser beam cleanup system has been built. This system consists of a core2 Industrial PC (IPC) using Linux and real-time Linux (RT-Linux) operation system (OS), a PCI image grabber, a D/A card. More often than not, the phase aberrations of the output beam from solid-state lasers vary fast with intracavity thermal effects and environmental influence. To compensate the phase aberrations of solid-state lasers successfully, a high speed real-time wavefront processing system is presented. Compared to former systems, this system can improve the speed efficiently. In the new system, the acquisition of image data, the output of control voltage data and the implementation of reconstructor control algorithm are treated as real-time tasks in kernel-space, the display of wavefront information and man-machine conversation are treated as non real-time tasks in user-space. The parallel processing of real-time tasks in Symmetric Multi Processors (SMP) mode is the main strategy of improving the speed. In this paper, the performance and efficiency of this wavefront processing system are analyzed. The opened-loop experimental results show that the sampling frequency of this system is up to 3300Hz, and this system can well deal with phase aberrations from solid-state lasers.

Gear Design Effects on the Performance of High Speed Helical Gear Trains as Used in Aerospace Drive Systems

NASA Technical Reports Server (NTRS)

Handschuh, R.; Kilmain, D.; Ehinger, R.; Sinusas, E.

2013-01-01

The performance of high-speed helical gear trains is of particular importance for tiltrotor aircraft drive systems. These drive systems are used to provide speed reduction/torque multiplication from the gas turbine output shaft and provide the necessary offset between these parallel shafts in the aircraft. Four different design configurations have been tested in the NASA Glenn Research Center, High Speed Helical Gear Train Test Facility. The design configurations included the current aircraft design, current design with isotropic superfinished gear surfaces, double helical design (inward and outward pumping), increased pitch (finer teeth), and an increased helix angle. All designs were tested at multiple input shaft speeds (up to 15,000 rpm) and applied power (up to 5,000 hp). Also two lubrication, system-related, variables were tested: oil inlet temperature (160 to 250 F) and lubricating jet pressure (60 to 80 psig). Experimental data recorded from these tests included power loss of the helical system under study, the temperature increase of the lubricant from inlet to outlet of the drive system and fling off temperatures (radially and axially). Also, all gear systems were tested with and without shrouds around the gears. The empirical data resulting from this study will be useful to the design of future helical gear train systems anticipated for next generation rotorcraft drive systems.
Gear Design Effects on the Performance of High Speed Helical Gear Trains as Used in Aerospace Drive Systems

NASA Technical Reports Server (NTRS)

Handschuh, R.; Kilmain, C.; Ehinger, R.; Sinusas, E.

2013-01-01

The performance of high-speed helical gear trains is of particular importance for tiltrotor aircraft drive systems. These drive systems are used to provide speed reduction / torque multiplication from the gas turbine output shaft and provide the necessary offset between these parallel shafts in the aircraft. Four different design configurations have been tested in the NASA Glenn Research Center, High Speed Helical Gear Train Test Facility. The design configurations included the current aircraft design, current design with isotropic superfinished gear surfaces, double helical design (inward and outward pumping), increased pitch (finer teeth), and an increased helix angle. All designs were tested at multiple input shaft speeds (up to 15,000 rpm) and applied power (up to 5,000 hp). Also two lubrication, system-related, variables were tested: oil inlet temperature (160 to 250 degF) and lubricating jet pressure (60 to 80 psig). Experimental data recorded from these tests included power loss of the helical system under study, the temperature increase of the lubricant from inlet to outlet of the drive system and fling off temperatures (radially and axially). Also, all gear systems were tested with and without shrouds around the gears. The empirical data resulting from this study will be useful to the design of future helical gear train systems anticipated for next generation rotorcraft drive systems.
Radiation effects in reconfigurable FPGAs

NASA Astrophysics Data System (ADS)

Quinn, Heather

2017-04-01

Field-programmable gate arrays (FPGAs) are co-processing hardware used in image and signal processing. FPGA are programmed with custom implementations of an algorithm. These algorithms are highly parallel hardware designs that are faster than software implementations. This flexibility and speed has made FPGAs attractive for many space programs that need in situ, high-speed signal processing for data categorization and data compression. Most commercial FPGAs are affected by the space radiation environment, though. Problems with TID has restricted the use of flash-based FPGAs. Static random access memory based FPGAs must be mitigated to suppress errors from single-event upsets. This paper provides a review of radiation effects issues in reconfigurable FPGAs and discusses methods for mitigating these problems. With careful design it is possible to use these components effectively and resiliently.
Research on the adaptive optical control technology based on DSP

NASA Astrophysics Data System (ADS)

Zhang, Xiaolu; Xue, Qiao; Zeng, Fa; Zhao, Junpu; Zheng, Kuixing; Su, Jingqin; Dai, Wanjun

2018-02-01

Adaptive optics is a real-time compensation technique using high speed support system for wavefront errors caused by atmospheric turbulence. However, the randomness and instantaneity of atmospheric changing introduce great difficulties to the design of adaptive optical systems. A large number of complex real-time operations lead to large delay, which is an insurmountable problem. To solve this problem, hardware operation and parallel processing strategy are proposed, and a high-speed adaptive optical control system based on DSP is developed. The hardware counter is used to check the system. The results show that the system can complete a closed loop control in 7.1ms, and improve the controlling bandwidth of the adaptive optical system. Using this system, the wavefront measurement and closed loop experiment are carried out, and obtain the good results.
High-speed spectral domain optical coherence tomography using non-uniform fast Fourier transform

PubMed Central

Chan, Kenny K. H.; Tang, Shuo

2010-01-01

The useful imaging range in spectral domain optical coherence tomography (SD-OCT) is often limited by the depth dependent sensitivity fall-off. Processing SD-OCT data with the non-uniform fast Fourier transform (NFFT) can improve the sensitivity fall-off at maximum depth by greater than 5dB concurrently with a 30 fold decrease in processing time compared to the fast Fourier transform with cubic spline interpolation method. NFFT can also improve local signal to noise ratio (SNR) and reduce image artifacts introduced in post-processing. Combined with parallel processing, NFFT is shown to have the ability to process up to 90k A-lines per second. High-speed SD-OCT imaging is demonstrated at camera-limited 100 frames per second on an ex-vivo squid eye. PMID:21258551
Box schemes and their implementation on the iPSC/860

NASA Technical Reports Server (NTRS)

Chattot, J. J.; Merriam, M. L.

1991-01-01

Research on algoriths for efficiently solving fluid flow problems on massively parallel computers is continued in the present paper. Attention is given to the implementation of a box scheme on the iPSC/860, a massively parallel computer with a peak speed of 10 Gflops and a memory of 128 Mwords. A domain decomposition approach to parallelism is used.
Full-field transient vibrometry of the human tympanic membrane by local phase correlation and high-speed holography

PubMed Central

Dobrev, Ivo; Furlong, Cosme; Cheng, Jeffrey T.; Rosowski, John J.

2014-01-01

Abstract. Understanding the human hearing process would be helped by quantification of the transient mechanical response of the human ear, including the human tympanic membrane (TM or eardrum). We propose a new hybrid high-speed holographic system (HHS) for acquisition and quantification of the full-field nanometer transient (i.e., >10 kHz) displacement of the human TM. We have optimized and implemented a 2+1 frame local correlation (LC) based phase sampling method in combination with a high-speed (i.e., >40 K fps) camera acquisition system. To our knowledge, there is currently no existing system that provides such capabilities for the study of the human TM. The LC sampling method has a displacement difference of <11 nm relative to measurements obtained by a four-phase step algorithm. Comparisons between our high-speed acquisition system and a laser Doppler vibrometer indicate differences of <10 μs. The high temporal (i.e., >40 kHz) and spatial (i.e., >100 k data points) resolution of our HHS enables parallel measurements of all points on the surface of the TM, which allows quantification of spatially dependent motion parameters, such as modal frequencies and acoustic delays. Such capabilities could allow inferring local material properties across the surface of the TM. PMID:25191832
Advances and challenges in cryo ptychography at the Advanced Photon Source.

PubMed

Deng, J; Vine, D J; Chen, S; Nashed, Y S G; Jin, Q; Peterka, T; Vogt, S; Jacobsen, C

Ptychography has emerged as a nondestructive tool to quantitatively study extended samples at a high spatial resolution. In this manuscript, we report on recent developments from our team. We have combined cryo ptychography and fluorescence microscopy to provide simultaneous views of ultrastructure and elemental composition, we have developed multi-GPU parallel computation to speed up ptychographic reconstructions, and we have implemented fly-scan ptychography to allow for faster data acquisition. We conclude with a discussion of future challenges in high-resolution 3D ptychography.
Parallel Lattice Basis Reduction Using a Multi-threaded Schnorr-Euchner LLL Algorithm

NASA Astrophysics Data System (ADS)

Backes, Werner; Wetzel, Susanne

In this paper, we introduce a new parallel variant of the LLL lattice basis reduction algorithm. Our new, multi-threaded algorithm is the first to provide an efficient, parallel implementation of the Schorr-Euchner algorithm for today’s multi-processor, multi-core computer architectures. Experiments with sparse and dense lattice bases show a speed-up factor of about 1.8 for the 2-thread and about factor 3.2 for the 4-thread version of our new parallel lattice basis reduction algorithm in comparison to the traditional non-parallel algorithm.
Implementation of Temperature Sequential Controller on Variable Speed Drive

NASA Astrophysics Data System (ADS)

Cheong, Z. X.; Barsoum, N. N.

2008-10-01

There are many pump and motor installations with quite extensive speed variation, such as Sago conveyor, heating, ventilation and air conditioning (HVAC) and water pumping system. A common solution for these applications is to run several fixed speed motors in parallel, with flow control accomplish by turning the motors on and off. This type of control method causes high in-rush current, and adds a risk of damage caused by pressure transients. This paper explains the design and implementation of a temperature speed control system for use in industrial and commercial sectors. Advanced temperature speed control can be achieved by using ABB ACS800 variable speed drive-direct torque sequential control macro, programmable logic controller and temperature transmitter. The principle of direct torque sequential control macro (DTC-SC) is based on the control of torque and flux utilizing the stator flux field orientation over seven preset constant speed. As a result of continuous comparison of ambient temperature to the references temperatures; electromagnetic torque response is particularly fast to the motor state and it is able maintain constant speeds. Experimental tests have been carried out by using ABB ACS800-U1-0003-2, to validate the effectiveness and dynamic respond of ABB ACS800 against temperature variation, loads, and mechanical shocks.
Microwave conductance properties of aligned multiwall carbon nanotube textile sheets

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brown, Brian L.; Martinez, Patricia; Zakhidov, Anvar A.

2015-07-06

Understanding the conductance properties of multi-walled carbon nanotube (MWNT) textile sheets in the microwave regime is essential for their potential use in high-speed and high-frequency applications. To expand current knowledge, complex high-frequency conductance measurements from 0.01 to 50 GHz and across temperatures from 4.2 K to 300 K and magnetic fields up to 2 T were made on textile sheets of highly aligned MWNTs with strand alignment oriented both parallel and perpendicular to the microwave electric field polarization. Sheets were drawn from 329 and 520 μm high MWNT forests that resulted in different DC resistance anisotropy. For all samples, themore » microwave conductance can be modeled approximately by a shunt capacitance in parallel with a frequency-independent conductance, but with no inductive contribution. Finally, this is consistent with diffusive Drude conduction as the primary transport mechanism up to 50 GHz. Further, it is found that the microwave conductance is essentially independent of both temperature and magnetic field.« less
Nonvolatile “AND,” “OR,” and “NOT” Boolean logic gates based on phase-change memory

DOE Office of Scientific and Technical Information (OSTI.GOV)

Li, Y.; Zhong, Y. P.; Deng, Y. F.

2013-12-21

Electronic devices or circuits that can implement both logic and memory functions are regarded as the building blocks for future massive parallel computing beyond von Neumann architecture. Here we proposed phase-change memory (PCM)-based nonvolatile logic gates capable of AND, OR, and NOT Boolean logic operations verified in SPICE simulations and circuit experiments. The logic operations are parallel computing and results can be stored directly in the states of the logic gates, facilitating the combination of computing and memory in the same circuit. These results are encouraging for ultralow-power and high-speed nonvolatile logic circuit design based on novel memory devices.
Collective network for computer structures

DOEpatents

Blumrich, Matthias A; Coteus, Paul W; Chen, Dong; Gara, Alan; Giampapa, Mark E; Heidelberger, Philip; Hoenicke, Dirk; Takken, Todd E; Steinmacher-Burow, Burkhard D; Vranas, Pavlos M

2014-01-07

A system and method for enabling high-speed, low-latency global collective communications among interconnected processing nodes. The global collective network optimally enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices are included that interconnect the nodes of the network via links to facilitate performance of low-latency global processing operations at nodes of the virtual network. The global collective network may be configured to provide global barrier and interrupt functionality in asynchronous or synchronized manner. When implemented in a massively-parallel supercomputing structure, the global collective network is physically and logically partitionable according to the needs of a processing algorithm.
Boltzmann Transport Code Update: Parallelization and Integrated Design Updates

NASA Technical Reports Server (NTRS)

Heinbockel, J. H.; Nealy, J. E.; DeAngelis, G.; Feldman, G. A.; Chokshi, S.

2003-01-01

The on going efforts at developing a web site for radiation analysis is expected to result in an increased usage of the High Charge and Energy Transport Code HZETRN. It would be nice to be able to do the requested calculations quickly and efficiently. Therefore the question arose, "Could the implementation of parallel processing speed up the calculations required?" To answer this question two modifications of the HZETRN computer code were created. The first modification selected the shield material of Al(2219) , then polyethylene and then Al(2219). The modified Fortran code was labeled 1SSTRN.F. The second modification considered the shield material of CO2 and Martian regolith. This modified Fortran code was labeled MARSTRN.F.
Collective network for computer structures

DOEpatents

Blumrich, Matthias A [Ridgefield, CT; Coteus, Paul W [Yorktown Heights, NY; Chen, Dong [Croton On Hudson, NY; Gara, Alan [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Hoenicke, Dirk [Ossining, NY; Takken, Todd E [Brewster, NY; Steinmacher-Burow, Burkhard D [Wernau, DE; Vranas, Pavlos M [Bedford Hills, NY

2011-08-16

A system and method for enabling high-speed, low-latency global collective communications among interconnected processing nodes. The global collective network optimally enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices ate included that interconnect the nodes of the network via links to facilitate performance of low-latency global processing operations at nodes of the virtual network and class structures. The global collective network may be configured to provide global barrier and interrupt functionality in asynchronous or synchronized manner. When implemented in a massively-parallel supercomputing structure, the global collective network is physically and logically partitionable according to needs of a processing algorithm.
Photonic reservoir computing: a new approach to optical information processing

NASA Astrophysics Data System (ADS)

Vandoorne, Kristof; Fiers, Martin; Verstraeten, David; Schrauwen, Benjamin; Dambre, Joni; Bienstman, Peter

2010-06-01

Despite ever increasing computational power, recognition and classification problems remain challenging to solve. Recently, advances have been made by the introduction of the new concept of reservoir computing. This is a methodology coming from the field of machine learning and neural networks that has been successfully used in several pattern classification problems, like speech and image recognition. Thus far, most implementations have been in software, limiting their speed and power efficiency. Photonics could be an excellent platform for a hardware implementation of this concept because of its inherent parallelism and unique nonlinear behaviour. Moreover, a photonic implementation offers the promise of massively parallel information processing with low power and high speed. We propose using a network of coupled Semiconductor Optical Amplifiers (SOA) and show in simulation that it could be used as a reservoir by comparing it to conventional software implementations using a benchmark speech recognition task. In spite of the differences with classical reservoir models, the performance of our photonic reservoir is comparable to that of conventional implementations and sometimes slightly better. As our implementation uses coherent light for information processing, we find that phase tuning is crucial to obtain high performance. In parallel we investigate the use of a network of photonic crystal cavities. The coupled mode theory (CMT) is used to investigate these resonators. A new framework is designed to model networks of resonators and SOAs. The same network topologies are used, but feedback is added to control the internal dynamics of the system. By adjusting the readout weights of the network in a controlled manner, we can generate arbitrary periodic patterns.
Simulated parallel annealing within a neighborhood for optimization of biomechanical systems.

PubMed

Higginson, J S; Neptune, R R; Anderson, F C

2005-09-01

Optimization problems for biomechanical systems have become extremely complex. Simulated annealing (SA) algorithms have performed well in a variety of test problems and biomechanical applications; however, despite advances in computer speed, convergence to optimal solutions for systems of even moderate complexity has remained prohibitive. The objective of this study was to develop a portable parallel version of a SA algorithm for solving optimization problems in biomechanics. The algorithm for simulated parallel annealing within a neighborhood (SPAN) was designed to minimize interprocessor communication time and closely retain the heuristics of the serial SA algorithm. The computational speed of the SPAN algorithm scaled linearly with the number of processors on different computer platforms for a simple quadratic test problem and for a more complex forward dynamic simulation of human pedaling.
Hypercube Expert System Shell - Applying Production Parallelism.

DTIC Science & Technology

1989-12-01

possible processor organizations, or int( rconntction n thod,, for par- allel architetures . The following are examples of commonlv used interconnection...this timing analysis because match speed-up avaiiah& from production parallelism is proportional to the average number of affected produclions1 ( 11:5
Preparation of uniaxially aligned TiO2 ultrafine fibers by electrospinning.

PubMed

Nien, Yu-Hsun; Tsai, Yan-Sheng; Wang, Jia-Yi; Syu, Shu-Ping

2012-11-01

TiO2 nanofibers are often produced by electrospinning using a collector consisting of two parallel electrodes. In this work, a high speed rotating drum was used as a collector to produce uniaxially aligned TiO2 ultrafine fibers. The apparatus to manufacture uniaxially aligned TiO2 ultrafine fiber consisted of a high-speed roller, a high-voltage power supply, a controllable syringe pump and a syringe. Titanium (IV) isopropoxide and polyvinylpyrrolidone were used as precursor and auxiliary, respectively. Titanium (IV) isopropoxide and polyvinylpyrrolidone were well mixed with other essential reagents to form the polymer solution. The polymer solution was poured into the syringe and pumped at various flow rates. The electrospun ultrafine fibers collected on the roller were heat treated up to 600 degrees C and the uniaxially aligned TiO2 ultrafine fibers were formed and characterized using scanning electron microscope and X-ray diffraction.
Parallel Algorithm for GPU Processing; for use in High Speed Machine Vision Sensing of Cotton Lint Trash.

PubMed

Pelletier, Mathew G

2008-02-08

One of the main hurdles standing in the way of optimal cleaning of cotton lint isthe lack of sensing systems that can react fast enough to provide the control system withreal-time information as to the level of trash contamination of the cotton lint. This researchexamines the use of programmable graphic processing units (GPU) as an alternative to thePC's traditional use of the central processing unit (CPU). The use of the GPU, as analternative computation platform, allowed for the machine vision system to gain asignificant improvement in processing time. By improving the processing time, thisresearch seeks to address the lack of availability of rapid trash sensing systems and thusalleviate a situation in which the current systems view the cotton lint either well before, orafter, the cotton is cleaned. This extended lag/lead time that is currently imposed on thecotton trash cleaning control systems, is what is responsible for system operators utilizing avery large dead-band safety buffer in order to ensure that the cotton lint is not undercleaned.Unfortunately, the utilization of a large dead-band buffer results in the majority ofthe cotton lint being over-cleaned which in turn causes lint fiber-damage as well assignificant losses of the valuable lint due to the excessive use of cleaning machinery. Thisresearch estimates that upwards of a 30% reduction in lint loss could be gained through theuse of a tightly coupled trash sensor to the cleaning machinery control systems. Thisresearch seeks to improve processing times through the development of a new algorithm forcotton trash sensing that allows for implementation on a highly parallel architecture.Additionally, by moving the new parallel algorithm onto an alternative computing platform,the graphic processing unit "GPU", for processing of the cotton trash images, a speed up ofover 6.5 times, over optimized code running on the PC's central processing unit "CPU", wasgained. The new parallel algorithm operating on the GPU was able to process a 1024x1024image in less than 17ms. At this improved speed, the image processing system's performance should now be sufficient to provide a system that would be capable of realtimefeed-back control that is in tight cooperation with the cleaning equipment.

High-performance computing in image registration

NASA Astrophysics Data System (ADS)

Zanin, Michele; Remondino, Fabio; Dalla Mura, Mauro

2012-10-01

Thanks to the recent technological advances, a large variety of image data is at our disposal with variable geometric, radiometric and temporal resolution. In many applications the processing of such images needs high performance computing techniques in order to deliver timely responses e.g. for rapid decisions or real-time actions. Thus, parallel or distributed computing methods, Digital Signal Processor (DSP) architectures, Graphical Processing Unit (GPU) programming and Field-Programmable Gate Array (FPGA) devices have become essential tools for the challenging issue of processing large amount of geo-data. The article focuses on the processing and registration of large datasets of terrestrial and aerial images for 3D reconstruction, diagnostic purposes and monitoring of the environment. For the image alignment procedure, sets of corresponding feature points need to be automatically extracted in order to successively compute the geometric transformation that aligns the data. The feature extraction and matching are ones of the most computationally demanding operations in the processing chain thus, a great degree of automation and speed is mandatory. The details of the implemented operations (named LARES) exploiting parallel architectures and GPU are thus presented. The innovative aspects of the implementation are (i) the effectiveness on a large variety of unorganized and complex datasets, (ii) capability to work with high-resolution images and (iii) the speed of the computations. Examples and comparisons with standard CPU processing are also reported and commented.
On the generation of double layers from ion- and electron-acoustic instabilities

NASA Astrophysics Data System (ADS)

Fu, Xiangrong; Cowee, Misa M.; Gary, S. Peter; Winske, Dan

2016-03-01

A plasma double layer (DL) is a nonlinear electrostatic structure that carries a uni-polar electric field parallel to the background magnetic field due to local charge separation. Past studies showed that DLs observed in space plasmas are mostly associated with the ion acoustic instability. Recent Van Allen Probes observations of parallel electric field structures traveling much faster than the ion acoustic speed have motivated a computational study to test the hypothesis that a new type of DLs—electron acoustic DLs—generated from the electron acoustic instability are responsible for these electric fields. Nonlinear particle-in-cell simulations yield negative results, i.e., the hypothetical electron acoustic DLs cannot be formed in a way similar to ion acoustic DLs. Linear theory analysis and the simulations show that the frequencies of electron acoustic waves are too high for ions to respond and maintain charge separation required by DLs. However, our results do show that local density perturbations in a two-electron-component plasma can result in unipolar-like electric field structures that propagate at the electron thermal speed, suggesting another potential explanation for the observations.
The "c" Equivalence Principle and the Correct form of Writing Maxwell's Equations

ERIC Educational Resources Information Center

Heras, Jose A.

2010-01-01

It is well known that the speed [image omitted] is obtained in the process of defining SI units via action-at-a-distance forces, like the force between two static charges and the force between two long and parallel currents. The speed c[subscript u] is then physically different from the observed speed of propagation c associated with…
GPU-based parallel algorithm for blind image restoration using midfrequency-based methods

NASA Astrophysics Data System (ADS)

Xie, Lang; Luo, Yi-han; Bao, Qi-liang

2013-08-01

GPU-based general-purpose computing is a new branch of modern parallel computing, so the study of parallel algorithms specially designed for GPU hardware architecture is of great significance. In order to solve the problem of high computational complexity and poor real-time performance in blind image restoration, the midfrequency-based algorithm for blind image restoration was analyzed and improved in this paper. Furthermore, a midfrequency-based filtering method is also used to restore the image hardly with any recursion or iteration. Combining the algorithm with data intensiveness, data parallel computing and GPU execution model of single instruction and multiple threads, a new parallel midfrequency-based algorithm for blind image restoration is proposed in this paper, which is suitable for stream computing of GPU. In this algorithm, the GPU is utilized to accelerate the estimation of class-G point spread functions and midfrequency-based filtering. Aiming at better management of the GPU threads, the threads in a grid are scheduled according to the decomposition of the filtering data in frequency domain after the optimization of data access and the communication between the host and the device. The kernel parallelism structure is determined by the decomposition of the filtering data to ensure the transmission rate to get around the memory bandwidth limitation. The results show that, with the new algorithm, the operational speed is significantly increased and the real-time performance of image restoration is effectively improved, especially for high-resolution images.
Variable-Complexity Multidisciplinary Optimization on Parallel Computers

NASA Technical Reports Server (NTRS)

Grossman, Bernard; Mason, William H.; Watson, Layne T.; Haftka, Raphael T.

1998-01-01

This report covers work conducted under grant NAG1-1562 for the NASA High Performance Computing and Communications Program (HPCCP) from December 7, 1993, to December 31, 1997. The objective of the research was to develop new multidisciplinary design optimization (MDO) techniques which exploit parallel computing to reduce the computational burden of aircraft MDO. The design of the High-Speed Civil Transport (HSCT) air-craft was selected as a test case to demonstrate the utility of our MDO methods. The three major tasks of this research grant included: development of parallel multipoint approximation methods for the aerodynamic design of the HSCT, use of parallel multipoint approximation methods for structural optimization of the HSCT, mathematical and algorithmic development including support in the integration of parallel computation for items (1) and (2). These tasks have been accomplished with the development of a response surface methodology that incorporates multi-fidelity models. For the aerodynamic design we were able to optimize with up to 20 design variables using hundreds of expensive Euler analyses together with thousands of inexpensive linear theory simulations. We have thereby demonstrated the application of CFD to a large aerodynamic design problem. For the predicting structural weight we were able to combine hundreds of structural optimizations of refined finite element models with thousands of optimizations based on coarse models. Computations have been carried out on the Intel Paragon with up to 128 nodes. The parallel computation allowed us to perform combined aerodynamic-structural optimization using state of the art models of a complex aircraft configurations.
Parallel processing via a dual olfactory pathway in the honeybee.

PubMed

Brill, Martin F; Rosenbaum, Tobias; Reus, Isabelle; Kleineidam, Christoph J; Nawrot, Martin P; Rössler, Wolfgang

2013-02-06

In their natural environment, animals face complex and highly dynamic olfactory input. Thus vertebrates as well as invertebrates require fast and reliable processing of olfactory information. Parallel processing has been shown to improve processing speed and power in other sensory systems and is characterized by extraction of different stimulus parameters along parallel sensory information streams. Honeybees possess an elaborate olfactory system with unique neuronal architecture: a dual olfactory pathway comprising a medial projection-neuron (PN) antennal lobe (AL) protocerebral output tract (m-APT) and a lateral PN AL output tract (l-APT) connecting the olfactory lobes with higher-order brain centers. We asked whether this neuronal architecture serves parallel processing and employed a novel technique for simultaneous multiunit recordings from both tracts. The results revealed response profiles from a high number of PNs of both tracts to floral, pheromonal, and biologically relevant odor mixtures tested over multiple trials. PNs from both tracts responded to all tested odors, but with different characteristics indicating parallel processing of similar odors. Both PN tracts were activated by widely overlapping response profiles, which is a requirement for parallel processing. The l-APT PNs had broad response profiles suggesting generalized coding properties, whereas the responses of m-APT PNs were comparatively weaker and less frequent, indicating higher odor specificity. Comparison of response latencies within and across tracts revealed odor-dependent latencies. We suggest that parallel processing via the honeybee dual olfactory pathway provides enhanced odor processing capabilities serving sophisticated odor perception and olfactory demands associated with a complex olfactory world of this social insect.
Multicore Challenges and Benefits for High Performance Scientific Computing

DOE PAGES

Nielsen, Ida M. B.; Janssen, Curtis L.

2008-01-01

Until recently, performance gains in processors were achieved largely by improvements in clock speeds and instruction level parallelism. Thus, applications could obtain performance increases with relatively minor changes by upgrading to the latest generation of computing hardware. Currently, however, processor performance improvements are realized by using multicore technology and hardware support for multiple threads within each core, and taking full advantage of this technology to improve the performance of applications requires exposure of extreme levels of software parallelism. We will here discuss the architecture of parallel computers constructed from many multicore chips as well as techniques for managing the complexitymore » of programming such computers, including the hybrid message-passing/multi-threading programming model. We will illustrate these ideas with a hybrid distributed memory matrix multiply and a quantum chemistry algorithm for energy computation using Møller–Plesset perturbation theory.« less
A robot arm simulation with a shared memory multiprocessor machine

NASA Technical Reports Server (NTRS)

Kim, Sung-Soo; Chuang, Li-Ping

1989-01-01

A parallel processing scheme for a single chain robot arm is presented for high speed computation on a shared memory multiprocessor. A recursive formulation that is derived from a virtual work form of the d'Alembert equations of motion is utilized for robot arm dynamics. A joint drive system that consists of a motor rotor and gears is included in the arm dynamics model, in order to take into account gyroscopic effects due to the spinning of the rotor. The fine grain parallelism of mechanical and control subsystem models is exploited, based on independent computation associated with bodies, joint drive systems, and controllers. Efficiency and effectiveness of the parallel scheme are demonstrated through simulations of a telerobotic manipulator arm. Two different mechanical subsystem models, i.e., with and without gyroscopic effects, are compared, to show the trade-off between efficiency and accuracy.
High-speed holographic correlation system by a time-division recording method for copyright content management on the internet

NASA Astrophysics Data System (ADS)

Watanabe, Eriko; Ikeda, Kanami; Kodate, Kashiko

2012-10-01

Using a holographic disc memory on which a huge amount of data can be stored, we constructed an ultra-high-speed, all-optical correlation system. In this method, multiplex recording is, however, restricted to "one page" on "one spot." In addition, signal information must be normalized as data of the same size, even if the object data size is smaller. Therefore, this system is difficult to apply to part of the object data scene (i.e., partial scene searching and template matching), while maintaining high accessibility and programmability. In this paper, we develop a holographic correlation system by a time division recording method that increases the number of multiplex recordings on the same spot. Assuming that a four-channel detector is utilized, 15 parallel correlations are achieved by a time-division recording method. Preliminary correlation experiments with the holographic optical disc setup are carried out by high correlation peaks at a rotational speed of 300 rpm. We also describe the combination of an optical correlation system for copyright content management that searches the Internet and detects illegal contents on video sharing websites.
A task-based parallelism and vectorized approach to 3D Method of Characteristics (MOC) reactor simulation for high performance computing architectures

NASA Astrophysics Data System (ADS)

Tramm, John R.; Gunow, Geoffrey; He, Tim; Smith, Kord S.; Forget, Benoit; Siegel, Andrew R.

2016-05-01

In this study we present and analyze a formulation of the 3D Method of Characteristics (MOC) technique applied to the simulation of full core nuclear reactors. Key features of the algorithm include a task-based parallelism model that allows independent MOC tracks to be assigned to threads dynamically, ensuring load balancing, and a wide vectorizable inner loop that takes advantage of modern SIMD computer architectures. The algorithm is implemented in a set of highly optimized proxy applications in order to investigate its performance characteristics on CPU, GPU, and Intel Xeon Phi architectures. Speed, power, and hardware cost efficiencies are compared. Additionally, performance bottlenecks are identified for each architecture in order to determine the prospects for continued scalability of the algorithm on next generation HPC architectures.
High precision electric gate for time-of-flight ion mass spectrometers

NASA Technical Reports Server (NTRS)

Sittler, Edward C. (Inventor)

2011-01-01

A time-of-flight mass spectrometer having a chamber with electrodes to generate an electric field in the chamber and electric gating for allowing ions with a predetermined mass and velocity into the electric field. The design uses a row of very thin parallel aligned wires that are pulsed in sequence so the ion can pass through the gap of two parallel plates, which are biased to prevent passage of the ion. This design by itself can provide a high mass resolution capability and a very precise start pulse for an ion mass spectrometer. Furthermore, the ion will only pass through the chamber if it is within a wire diameter of the first wire when it is pulsed and has the right speed so it is near all other wires when they are pulsed.
Relation Between the Generalized Acoustic Analogy and Lilley's Contributions to Aeroacoustics

NASA Technical Reports Server (NTRS)

Goldstein, M. E.

2010-01-01

This paper reviews Lilley s reformulation of Lighthill s equation and shows that it can be obtained as a special case of a much more general acoustic analogy. It also shows how this generalized analogy can be used to eliminate some of the difficulties that arise when more conventional parallel flow analogies are applied to high speed jets. And, finally, some recent applications of these ideas are discussed.
Large amplitude forcing of a high speed 2-dimensional jet

NASA Technical Reports Server (NTRS)

Bernal, L.; Sarohia, V.

1984-01-01

The effect of large amplitude forcing on the growth of a high speed two dimensional jet was investigated experimentally. Two forcing techniques were utilized: mass flow oscillations and a mechanical system. The mass flow oscillation tests were conducted at Strouhal numbers from 0.00052 to 0.045, and peak to peak amplitudes up to 50 percent of the mean exit velocity. The exit Mach number was varied in the range 0.15 to 0.8. The corresponding Reynolds numbers were 8,400 and 45,000. The results indicate no significant change of the jet growth rate or centerline velocity decay compared to the undisturbed free jet. The mechanical forcing system consists of two counter rotating hexagonal cylinders located parallel to the span of the nozzle. Forcing frequencies up to 1,500 Hz were tested. Both symmetric and antisymmetric forcing can be implemented. The results for antisymmetric forcing showed a significant (75 percent) increase of the jet growth rate at an exit Mach number of 0.25 and a Strouhal number of 0.019. At higher rotational speeds, the jet deflected laterally. A deflection angle of 39 deg with respect to the centerline was measured at the maximum rotational speed.
High-Speed Computation of the Kleene Star in Max-Plus Algebraic System Using a Cell Broadband Engine

NASA Astrophysics Data System (ADS)

Goto, Hiroyuki

This research addresses a high-speed computation method for the Kleene star of the weighted adjacency matrix in a max-plus algebraic system. We focus on systems whose precedence constraints are represented by a directed acyclic graph and implement it on a Cell Broadband Engine™ (CBE) processor. Since the resulting matrix gives the longest travel times between two adjacent nodes, it is often utilized in scheduling problem solvers for a class of discrete event systems. This research, in particular, attempts to achieve a speedup by using two approaches: parallelization and SIMDization (Single Instruction, Multiple Data), both of which can be accomplished by a CBE processor. The former refers to a parallel computation using multiple cores, while the latter is a method whereby multiple elements are computed by a single instruction. Using the implementation on a Sony PlayStation 3™ equipped with a CBE processor, we found that the SIMDization is effective regardless of the system's size and the number of processor cores used. We also found that the scalability of using multiple cores is remarkable especially for systems with a large number of nodes. In a numerical experiment where the number of nodes is 2000, we achieved a speedup of 20 times compared with the method without the above techniques.
A high-order time-parallel scheme for solving wave propagation problems via the direct construction of an approximate time-evolution operator

DOE Office of Scientific and Technical Information (OSTI.GOV)

Haut, T. S.; Babb, T.; Martinsson, P. G.

2015-06-16

Our manuscript demonstrates a technique for efficiently solving the classical wave equation, the shallow water equations, and, more generally, equations of the form ∂u/∂t=Lu∂u/∂t=Lu, where LL is a skew-Hermitian differential operator. The idea is to explicitly construct an approximation to the time-evolution operator exp(τL)exp(τL) for a relatively large time-step ττ. Recently developed techniques for approximating oscillatory scalar functions by rational functions, and accelerated algorithms for computing functions of discretized differential operators are exploited. Principal advantages of the proposed method include: stability even for large time-steps, the possibility to parallelize in time over many characteristic wavelengths and large speed-ups over existingmore » methods in situations where simulation over long times are required. Numerical examples involving the 2D rotating shallow water equations and the 2D wave equation in an inhomogenous medium are presented, and the method is compared to the 4th order Runge–Kutta (RK4) method and to the use of Chebyshev polynomials. The new method achieved high accuracy over long-time intervals, and with speeds that are orders of magnitude faster than both RK4 and the use of Chebyshev polynomials.« less
Modulation frequency as a cue for auditory speed perception.

PubMed

Senna, Irene; Parise, Cesare V; Ernst, Marc O

2017-07-12

Unlike vision, the mechanisms underlying auditory motion perception are poorly understood. Here we describe an auditory motion illusion revealing a novel cue to auditory speed perception: the temporal frequency of amplitude modulation (AM-frequency), typical for rattling sounds. Naturally, corrugated objects sliding across each other generate rattling sounds whose AM-frequency tends to directly correlate with speed. We found that AM-frequency modulates auditory speed perception in a highly systematic fashion: moving sounds with higher AM-frequency are perceived as moving faster than sounds with lower AM-frequency. Even more interestingly, sounds with higher AM-frequency also induce stronger motion aftereffects. This reveals the existence of specialized neural mechanisms for auditory motion perception, which are sensitive to AM-frequency. Thus, in spatial hearing, the brain successfully capitalizes on the AM-frequency of rattling sounds to estimate the speed of moving objects. This tightly parallels previous findings in motion vision, where spatio-temporal frequency of moving displays systematically affects both speed perception and the magnitude of the motion aftereffects. Such an analogy with vision suggests that motion detection may rely on canonical computations, with similar neural mechanisms shared across the different modalities. © 2017 The Author(s).
Hardware design and implementation of fast DOA estimation method based on multicore DSP

NASA Astrophysics Data System (ADS)

Guo, Rui; Zhao, Yingxiao; Zhang, Yue; Lin, Qianqiang; Chen, Zengping

2016-10-01

In this paper, we present a high-speed real-time signal processing hardware platform based on multicore digital signal processor (DSP). The real-time signal processing platform shows several excellent characteristics including high performance computing, low power consumption, large-capacity data storage and high speed data transmission, which make it able to meet the constraint of real-time direction of arrival (DOA) estimation. To reduce the high computational complexity of DOA estimation algorithm, a novel real-valued MUSIC estimator is used. The algorithm is decomposed into several independent steps and the time consumption of each step is counted. Based on the statistics of the time consumption, we present a new parallel processing strategy to distribute the task of DOA estimation to different cores of the real-time signal processing hardware platform. Experimental results demonstrate that the high processing capability of the signal processing platform meets the constraint of real-time direction of arrival (DOA) estimation.
RAMICS: trainable, high-speed and biologically relevant alignment of high-throughput sequencing reads to coding DNA

PubMed Central

Wright, Imogen A.; Travers, Simon A.

2014-01-01

The challenge presented by high-throughput sequencing necessitates the development of novel tools for accurate alignment of reads to reference sequences. Current approaches focus on using heuristics to map reads quickly to large genomes, rather than generating highly accurate alignments in coding regions. Such approaches are, thus, unsuited for applications such as amplicon-based analysis and the realignment phase of exome sequencing and RNA-seq, where accurate and biologically relevant alignment of coding regions is critical. To facilitate such analyses, we have developed a novel tool, RAMICS, that is tailored to mapping large numbers of sequence reads to short lengths (<10 000 bp) of coding DNA. RAMICS utilizes profile hidden Markov models to discover the open reading frame of each sequence and aligns to the reference sequence in a biologically relevant manner, distinguishing between genuine codon-sized indels and frameshift mutations. This approach facilitates the generation of highly accurate alignments, accounting for the error biases of the sequencing machine used to generate reads, particularly at homopolymer regions. Performance improvements are gained through the use of graphics processing units, which increase the speed of mapping through parallelization. RAMICS substantially outperforms all other mapping approaches tested in terms of alignment quality while maintaining highly competitive speed performance. PMID:24861618
Programmable Illumination and High-Speed, Multi-Wavelength, Confocal Microscopy Using a Digital Micromirror

PubMed Central

Martial, Franck P.; Hartell, Nicholas A.

2012-01-01

Confocal microscopy is routinely used for high-resolution fluorescence imaging of biological specimens. Most standard confocal systems scan a laser across a specimen and collect emitted light passing through a single pinhole to produce an optical section of the sample. Sequential scanning on a point-by-point basis limits the speed of image acquisition and even the fastest commercial instruments struggle to resolve the temporal dynamics of rapid cellular events such as calcium signals. Various approaches have been introduced that increase the speed of confocal imaging. Nipkov disk microscopes, for example, use arrays of pinholes or slits on a spinning disk to achieve parallel scanning which significantly increases the speed of acquisition. Here we report the development of a microscope module that utilises a digital micromirror device as a spatial light modulator to provide programmable confocal optical sectioning with a single camera, at high spatial and axial resolution at speeds limited by the frame rate of the camera. The digital micromirror acts as a solid state Nipkov disk but with the added ability to change the pinholes size and separation and to control the light intensity on a mirror-by-mirror basis. The use of an arrangement of concave and convex mirrors in the emission pathway instead of lenses overcomes the astigmatism inherent with DMD devices, increases light collection efficiency and ensures image collection is achromatic so that images are perfectly aligned at different wavelengths. Combined with non-laser light sources, this allows low cost, high-speed, multi-wavelength image acquisition without the need for complex wavelength-dependent image alignment. The micromirror can also be used for programmable illumination allowing spatially defined photoactivation of fluorescent proteins. We demonstrate the use of this system for high-speed calcium imaging using both a single wavelength calcium indicator and a genetically encoded, ratiometric, calcium sensor. PMID:22937130
Programmable illumination and high-speed, multi-wavelength, confocal microscopy using a digital micromirror.

PubMed

Martial, Franck P; Hartell, Nicholas A

2012-01-01

Confocal microscopy is routinely used for high-resolution fluorescence imaging of biological specimens. Most standard confocal systems scan a laser across a specimen and collect emitted light passing through a single pinhole to produce an optical section of the sample. Sequential scanning on a point-by-point basis limits the speed of image acquisition and even the fastest commercial instruments struggle to resolve the temporal dynamics of rapid cellular events such as calcium signals. Various approaches have been introduced that increase the speed of confocal imaging. Nipkov disk microscopes, for example, use arrays of pinholes or slits on a spinning disk to achieve parallel scanning which significantly increases the speed of acquisition. Here we report the development of a microscope module that utilises a digital micromirror device as a spatial light modulator to provide programmable confocal optical sectioning with a single camera, at high spatial and axial resolution at speeds limited by the frame rate of the camera. The digital micromirror acts as a solid state Nipkov disk but with the added ability to change the pinholes size and separation and to control the light intensity on a mirror-by-mirror basis. The use of an arrangement of concave and convex mirrors in the emission pathway instead of lenses overcomes the astigmatism inherent with DMD devices, increases light collection efficiency and ensures image collection is achromatic so that images are perfectly aligned at different wavelengths. Combined with non-laser light sources, this allows low cost, high-speed, multi-wavelength image acquisition without the need for complex wavelength-dependent image alignment. The micromirror can also be used for programmable illumination allowing spatially defined photoactivation of fluorescent proteins. We demonstrate the use of this system for high-speed calcium imaging using both a single wavelength calcium indicator and a genetically encoded, ratiometric, calcium sensor.

GPURFSCREEN: a GPU based virtual screening tool using random forest classifier.

PubMed

Jayaraj, P B; Ajay, Mathias K; Nufail, M; Gopakumar, G; Jaleel, U C A

2016-01-01

In-silico methods are an integral part of modern drug discovery paradigm. Virtual screening, an in-silico method, is used to refine data models and reduce the chemical space on which wet lab experiments need to be performed. Virtual screening of a ligand data model requires large scale computations, making it a highly time consuming task. This process can be speeded up by implementing parallelized algorithms on a Graphical Processing Unit (GPU). Random Forest is a robust classification algorithm that can be employed in the virtual screening. A ligand based virtual screening tool (GPURFSCREEN) that uses random forests on GPU systems has been proposed and evaluated in this paper. This tool produces optimized results at a lower execution time for large bioassay data sets. The quality of results produced by our tool on GPU is same as that on a regular serial environment. Considering the magnitude of data to be screened, the parallelized virtual screening has a significantly lower running time at high throughput. The proposed parallel tool outperforms its serial counterpart by successfully screening billions of molecules in training and prediction phases.
Development of an add-on kit for scanning confocal microscopy (Conference Presentation)

NASA Astrophysics Data System (ADS)

Guo, Kaikai; Zheng, Guoan

2017-03-01

Scanning confocal microscopy is a standard choice for many fluorescence imaging applications in basic biomedical research. It is able to produce optically sectioned images and provide acquisition versatility to address many samples and application demands. However, scanning a focused point across the specimen limits the speed of image acquisition. As a result, scanning confocal microscope only works well with stationary samples. Researchers have performed parallel confocal scanning using digital-micromirror-device (DMD), which was used to project a scanning multi-point pattern across the sample. The DMD based parallel confocal systems increase the imaging speed while maintaining the optical sectioning ability. In this paper, we report the development of an add-on kit for high-speed and low-cost confocal microscopy. By adapting this add-on kit to an existing regular microscope, one can convert it into a confocal microscope without significant hardware modifications. Compared with current DMD-based implementations, the reported approach is able to recover multiple layers along the z axis simultaneously. It may find applications in wafer inspection and 3D metrology of semiconductor circuit. The dissemination of the proposed add-on kit under $1000 budget could also lead to new types of experimental designs for biological research labs, e.g., cytology analysis in cell culture experiments, genetic studies on multicellular organisms, pharmaceutical drug profiling, RNA interference studies, investigation of microbial communities in environmental systems, and etc.
Parallel computing of a climate model on the dawn 1000 by domain decomposition method

NASA Astrophysics Data System (ADS)

Bi, Xunqiang

1997-12-01

In this paper the parallel computing of a grid-point nine-level atmospheric general circulation model on the Dawn 1000 is introduced. The model was developed by the Institute of Atmospheric Physics (IAP), Chinese Academy of Sciences (CAS). The Dawn 1000 is a MIMD massive parallel computer made by National Research Center for Intelligent Computer (NCIC), CAS. A two-dimensional domain decomposition method is adopted to perform the parallel computing. The potential ways to increase the speed-up ratio and exploit more resources of future massively parallel supercomputation are also discussed.
A real-time MPEG software decoder using a portable message-passing library

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kwong, Man Kam; Tang, P.T. Peter; Lin, Biquan

1995-12-31

We present a real-time MPEG software decoder that uses message-passing libraries such as MPL, p4 and MPI. The parallel MPEG decoder currently runs on the IBM SP system but can be easil ported to other parallel machines. This paper discusses our parallel MPEG decoding algorithm as well as the parallel programming environment under which it uses. Several technical issues are discussed, including balancing of decoding speed, memory limitation, 1/0 capacities, and optimization of MPEG decoding components. This project shows that a real-time portable software MPEG decoder is feasible in a general-purpose parallel machine.
Programming Probabilistic Structural Analysis for Parallel Processing Computer

NASA Technical Reports Server (NTRS)

Sues, Robert H.; Chen, Heh-Chyun; Twisdale, Lawrence A.; Chamis, Christos C.; Murthy, Pappu L. N.

1991-01-01

The ultimate goal of this research program is to make Probabilistic Structural Analysis (PSA) computationally efficient and hence practical for the design environment by achieving large scale parallelism. The paper identifies the multiple levels of parallelism in PSA, identifies methodologies for exploiting this parallelism, describes the development of a parallel stochastic finite element code, and presents results of two example applications. It is demonstrated that speeds within five percent of those theoretically possible can be achieved. A special-purpose numerical technique, the stochastic preconditioned conjugate gradient method, is also presented and demonstrated to be extremely efficient for certain classes of PSA problems.
Implementing a Parallel Image Edge Detection Algorithm Based on the Otsu-Canny Operator on the Hadoop Platform.

PubMed

Cao, Jianfang; Chen, Lichao; Wang, Min; Tian, Yun

2018-01-01

The Canny operator is widely used to detect edges in images. However, as the size of the image dataset increases, the edge detection performance of the Canny operator decreases and its runtime becomes excessive. To improve the runtime and edge detection performance of the Canny operator, in this paper, we propose a parallel design and implementation for an Otsu-optimized Canny operator using a MapReduce parallel programming model that runs on the Hadoop platform. The Otsu algorithm is used to optimize the Canny operator's dual threshold and improve the edge detection performance, while the MapReduce parallel programming model facilitates parallel processing for the Canny operator to solve the processing speed and communication cost problems that occur when the Canny edge detection algorithm is applied to big data. For the experiments, we constructed datasets of different scales from the Pascal VOC2012 image database. The proposed parallel Otsu-Canny edge detection algorithm performs better than other traditional edge detection algorithms. The parallel approach reduced the running time by approximately 67.2% on a Hadoop cluster architecture consisting of 5 nodes with a dataset of 60,000 images. Overall, our approach system speeds up the system by approximately 3.4 times when processing large-scale datasets, which demonstrates the obvious superiority of our method. The proposed algorithm in this study demonstrates both better edge detection performance and improved time performance.
Efficiency of parallel direct optimization

NASA Technical Reports Server (NTRS)

Janies, D. A.; Wheeler, W. C.

2001-01-01

Tremendous progress has been made at the level of sequential computation in phylogenetics. However, little attention has been paid to parallel computation. Parallel computing is particularly suited to phylogenetics because of the many ways large computational problems can be broken into parts that can be analyzed concurrently. In this paper, we investigate the scaling factors and efficiency of random addition and tree refinement strategies using the direct optimization software, POY, on a small (10 slave processors) and a large (256 slave processors) cluster of networked PCs running LINUX. These algorithms were tested on several data sets composed of DNA and morphology ranging from 40 to 500 taxa. Various algorithms in POY show fundamentally different properties within and between clusters. All algorithms are efficient on the small cluster for the 40-taxon data set. On the large cluster, multibuilding exhibits excellent parallel efficiency, whereas parallel building is inefficient. These results are independent of data set size. Branch swapping in parallel shows excellent speed-up for 16 slave processors on the large cluster. However, there is no appreciable speed-up for branch swapping with the further addition of slave processors (>16). This result is independent of data set size. Ratcheting in parallel is efficient with the addition of up to 32 processors in the large cluster. This result is independent of data set size. c2001 The Willi Hennig Society.
High-Speed Interrogation for Large-Scale Fiber Bragg Grating Sensing

PubMed Central

Hu, Chenyuan; Bai, Wei

2018-01-01

A high-speed interrogation scheme for large-scale fiber Bragg grating (FBG) sensing arrays is presented. This technique employs parallel computing and pipeline control to modulate incident light and demodulate the reflected sensing signal. One Electro-optic modulator (EOM) and one semiconductor optical amplifier (SOA) were used to generate a phase delay to filter reflected spectrum form multiple candidate FBGs with the same optical path difference (OPD). Experimental results showed that the fastest interrogation delay time for the proposed method was only about 27.2 us for a single FBG interrogation, and the system scanning period was only limited by the optical transmission delay in the sensing fiber owing to the multiple simultaneous central wavelength calculations. Furthermore, the proposed FPGA-based technique had a verified FBG wavelength demodulation stability of ±1 pm without average processing. PMID:29495263
High-Speed Interrogation for Large-Scale Fiber Bragg Grating Sensing.

PubMed

Hu, Chenyuan; Bai, Wei

2018-02-24

A high-speed interrogation scheme for large-scale fiber Bragg grating (FBG) sensing arrays is presented. This technique employs parallel computing and pipeline control to modulate incident light and demodulate the reflected sensing signal. One Electro-optic modulator (EOM) and one semiconductor optical amplifier (SOA) were used to generate a phase delay to filter reflected spectrum form multiple candidate FBGs with the same optical path difference (OPD). Experimental results showed that the fastest interrogation delay time for the proposed method was only about 27.2 us for a single FBG interrogation, and the system scanning period was only limited by the optical transmission delay in the sensing fiber owing to the multiple simultaneous central wavelength calculations. Furthermore, the proposed FPGA-based technique had a verified FBG wavelength demodulation stability of ±1 pm without average processing.
Hardware Neural Network for a Visual Inspection System

NASA Astrophysics Data System (ADS)

Chun, Seungwoo; Hayakawa, Yoshihiro; Nakajima, Koji

The visual inspection of defects in products is heavily dependent on human experience and instinct. In this situation, it is difficult to reduce the production costs and to shorten the inspection time and hence the total process time. Consequently people involved in this area desire an automatic inspection system. In this paper, we propose a hardware neural network, which is expected to provide high-speed operation for automatic inspection of products. Since neural networks can learn, this is a suitable method for self-adjustment of criteria for classification. To achieve high-speed operation, we use parallel and pipelining techniques. Furthermore, we use a piecewise linear function instead of a conventional activation function in order to save hardware resources. Consequently, our proposed hardware neural network achieved 6GCPS and 2GCUPS, which in our test sample proved to be sufficiently fast.
Internal fluid mechanics research on supercomputers for aerospace propulsion systems

NASA Technical Reports Server (NTRS)

Miller, Brent A.; Anderson, Bernhard H.; Szuch, John R.

1988-01-01

The Internal Fluid Mechanics Division of the NASA Lewis Research Center is combining the key elements of computational fluid dynamics, aerothermodynamic experiments, and advanced computational technology to bring internal computational fluid mechanics (ICFM) to a state of practical application for aerospace propulsion systems. The strategies used to achieve this goal are to: (1) pursue an understanding of flow physics, surface heat transfer, and combustion via analysis and fundamental experiments, (2) incorporate improved understanding of these phenomena into verified 3-D CFD codes, and (3) utilize state-of-the-art computational technology to enhance experimental and CFD research. Presented is an overview of the ICFM program in high-speed propulsion, including work in inlets, turbomachinery, and chemical reacting flows. Ongoing efforts to integrate new computer technologies, such as parallel computing and artificial intelligence, into high-speed aeropropulsion research are described.
Latency Hiding in Dynamic Partitioning and Load Balancing of Grid Computing Applications

NASA Technical Reports Server (NTRS)

Das, Sajal K.; Harvey, Daniel J.; Biswas, Rupak

2001-01-01

The Information Power Grid (IPG) concept developed by NASA is aimed to provide a metacomputing platform for large-scale distributed computations, by hiding the intricacies of highly heterogeneous environment and yet maintaining adequate security. In this paper, we propose a latency-tolerant partitioning scheme that dynamically balances processor workloads on the.IPG, and minimizes data movement and runtime communication. By simulating an unsteady adaptive mesh application on a wide area network, we study the performance of our load balancer under the Globus environment. The number of IPG nodes, the number of processors per node, and the interconnected speeds are parameterized to derive conditions under which the IPG would be suitable for parallel distributed processing of such applications. Experimental results demonstrate that effective solution are achieved when the IPG nodes are connected by a high-speed asynchronous interconnection network.
A rapid parallelization of cone-beam projection and back-projection operator based on texture fetching interpolation

NASA Astrophysics Data System (ADS)

Xie, Lizhe; Hu, Yining; Chen, Yang; Shi, Luyao

2015-03-01

Projection and back-projection are the most computational consuming parts in Computed Tomography (CT) reconstruction. Parallelization strategies using GPU computing techniques have been introduced. We in this paper present a new parallelization scheme for both projection and back-projection. The proposed method is based on CUDA technology carried out by NVIDIA Corporation. Instead of build complex model, we aimed on optimizing the existing algorithm and make it suitable for CUDA implementation so as to gain fast computation speed. Besides making use of texture fetching operation which helps gain faster interpolation speed, we fixed sampling numbers in the computation of projection, to ensure the synchronization of blocks and threads, thus prevents the latency caused by inconsistent computation complexity. Experiment results have proven the computational efficiency and imaging quality of the proposed method.
cljam: a library for handling DNA sequence alignment/map (SAM) with parallel processing.

PubMed

Takeuchi, Toshiki; Yamada, Atsuo; Aoki, Takashi; Nishimura, Kunihiro

2016-01-01

Next-generation sequencing can determine DNA bases and the results of sequence alignments are generally stored in files in the Sequence Alignment/Map (SAM) format and the compressed binary version (BAM) of it. SAMtools is a typical tool for dealing with files in the SAM/BAM format. SAMtools has various functions, including detection of variants, visualization of alignments, indexing, extraction of parts of the data and loci, and conversion of file formats. It is written in C and can execute fast. However, SAMtools requires an additional implementation to be used in parallel with, for example, OpenMP (Open Multi-Processing) libraries. For the accumulation of next-generation sequencing data, a simple parallelization program, which can support cloud and PC cluster environments, is required. We have developed cljam using the Clojure programming language, which simplifies parallel programming, to handle SAM/BAM data. Cljam can run in a Java runtime environment (e.g., Windows, Linux, Mac OS X) with Clojure. Cljam can process and analyze SAM/BAM files in parallel and at high speed. The execution time with cljam is almost the same as with SAMtools. The cljam code is written in Clojure and has fewer lines than other similar tools.
Utilizing GPUs to Accelerate Turbomachinery CFD Codes

NASA Technical Reports Server (NTRS)

MacCalla, Weylin; Kulkarni, Sameer

2016-01-01

GPU computing has established itself as a way to accelerate parallel codes in the high performance computing world. This work focuses on speeding up APNASA, a legacy CFD code used at NASA Glenn Research Center, while also drawing conclusions about the nature of GPU computing and the requirements to make GPGPU worthwhile on legacy codes. Rewriting and restructuring of the source code was avoided to limit the introduction of new bugs. The code was profiled and investigated for parallelization potential, then OpenACC directives were used to indicate parallel parts of the code. The use of OpenACC directives was not able to reduce the runtime of APNASA on either the NVIDIA Tesla discrete graphics card, or the AMD accelerated processing unit. Additionally, it was found that in order to justify the use of GPGPU, the amount of parallel work being done within a kernel would have to greatly exceed the work being done by any one portion of the APNASA code. It was determined that in order for an application like APNASA to be accelerated on the GPU, it should not be modular in nature, and the parallel portions of the code must contain a large portion of the code's computation time.
Framework for Parallel Preprocessing of Microarray Data Using Hadoop

PubMed Central

2018-01-01

Nowadays, microarray technology has become one of the popular ways to study gene expression and diagnosis of disease. National Center for Biology Information (NCBI) hosts public databases containing large volumes of biological data required to be preprocessed, since they carry high levels of noise and bias. Robust Multiarray Average (RMA) is one of the standard and popular methods that is utilized to preprocess the data and remove the noises. Most of the preprocessing algorithms are time-consuming and not able to handle a large number of datasets with thousands of experiments. Parallel processing can be used to address the above-mentioned issues. Hadoop is a well-known and ideal distributed file system framework that provides a parallel environment to run the experiment. In this research, for the first time, the capability of Hadoop and statistical power of R have been leveraged to parallelize the available preprocessing algorithm called RMA to efficiently process microarray data. The experiment has been run on cluster containing 5 nodes, while each node has 16 cores and 16 GB memory. It compares efficiency and the performance of parallelized RMA using Hadoop with parallelized RMA using affyPara package as well as sequential RMA. The result shows the speed-up rate of the proposed approach outperforms the sequential approach and affyPara approach. PMID:29796018
Variable Mach number design approach for a parallel waverider with a wide-speed range based on the osculating cone theory

NASA Astrophysics Data System (ADS)

Zhao, Zhen-tao; Huang, Wei; Li, Shi-Bin; Zhang, Tian-Tian; Yan, Li

2018-06-01

In the current study, a variable Mach number waverider design approach has been proposed based on the osculating cone theory. The design Mach number of the osculating cone constant Mach number waverider with the same volumetric efficiency of the osculating cone variable Mach number waverider has been determined by writing a program for calculating the volumetric efficiencies of waveriders. The CFD approach has been utilized to verify the effectiveness of the proposed approach. At the same time, through the comparative analysis of the aerodynamic performance, the performance advantage of the osculating cone variable Mach number waverider is studied. The obtained results show that the osculating cone variable Mach number waverider owns higher lift-to-drag ratio throughout the flight profile when compared with the osculating cone constant Mach number waverider, and it has superior low-speed aerodynamic performance while maintaining nearly the same high-speed aerodynamic performance.
Synchronization trigger control system for flow visualization

NASA Technical Reports Server (NTRS)

Chun, K. S.

1987-01-01

The use of cinematography or holographic interferometry for dynamic flow visualization in an internal combustion engine requires a control device that globally synchronizes camera and light source timing at a predefined shaft encoder angle. The device is capable of 0.35 deg resolution for rotational speeds of up to 73 240 rpm. This was achieved by implementing the shaft encoder signal addressed look-up table (LUT) and appropriate latches. The developed digital signal processing technique achieves 25 nsec of high speed triggering angle detection by using direct parallel bit comparison of the shaft encoder digital code with a simulated angle reference code, instead of using angle value comparison which involves more complicated computation steps. In order to establish synchronization to an AC reference signal whose magnitude is variant with the rotating speed, a dynamic peak followup synchronization technique has been devised. This method scrutinizes the reference signal and provides the right timing within 40 nsec. Two application examples are described.
Field Programmable Gate Array Based Parallel Strapdown Algorithm Design for Strapdown Inertial Navigation Systems

PubMed Central

Li, Zong-Tao; Wu, Tie-Jun; Lin, Can-Long; Ma, Long-Hua

2011-01-01

A new generalized optimum strapdown algorithm with coning and sculling compensation is presented, in which the position, velocity and attitude updating operations are carried out based on the single-speed structure in which all computations are executed at a single updating rate that is sufficiently high to accurately account for high frequency angular rate and acceleration rectification effects. Different from existing algorithms, the updating rates of the coning and sculling compensations are unrelated with the number of the gyro incremental angle samples and the number of the accelerometer incremental velocity samples. When the output sampling rate of inertial sensors remains constant, this algorithm allows increasing the updating rate of the coning and sculling compensation, yet with more numbers of gyro incremental angle and accelerometer incremental velocity in order to improve the accuracy of system. Then, in order to implement the new strapdown algorithm in a single FPGA chip, the parallelization of the algorithm is designed and its computational complexity is analyzed. The performance of the proposed parallel strapdown algorithm is tested on the Xilinx ISE 12.3 software platform and the FPGA device XC6VLX550T hardware platform on the basis of some fighter data. It is shown that this parallel strapdown algorithm on the FPGA platform can greatly decrease the execution time of algorithm to meet the real-time and high precision requirements of system on the high dynamic environment, relative to the existing implemented on the DSP platform. PMID:22164058
Idle speed and fuel vapor recovery control system

DOE Office of Scientific and Technical Information (OSTI.GOV)

Orzel, D.V.

1993-06-01

A method for controlling idling speed of an engine via bypass throttle connected in parallel to a primary engine throttle and for controlling purge flow through a vapor recovery system into an air/fuel intake of the engine is described, comprising the steps of: positioning the bypass throttle to decrease any difference between a desired engine idle speed and actual engine idle speed; and decreasing the purge flow when said bypass throttle position is less than a preselected fraction of a maximum bypass throttle position.

Adaptation of superconducting fault current limiter to high-speed reclosing

NASA Astrophysics Data System (ADS)

Koyama, T.; Yanabu, S.

2009-10-01

Using a high temperature superconductor, we constructed and tested a model superconducting fault current limiter (SFCL). The superconductor might break in some cases because of its excessive generation of heat. Therefore, it is desirable to interrupt early the current that flows to superconductor. So, we proposed the SFCL using an electromagnetic repulsion switch which is composed of a superconductor, a vacuum interrupter and a by-pass coil, and its structure is simple. Duration that the current flow in the superconductor can be easily minimized to the level of less than 0.5 cycle using this equipment. On the other hand, the fault current is also easily limited by large reactance of the parallel coil. There is duty of high-speed reclosing after interrupting fault current in the electric power system. After the fault current is interrupted, the back-up breaker is re-closed within 350 ms. So, the electromagnetic repulsion switch should return to former state and the superconductor should be recovered to superconducting state before high-speed reclosing. Then, we proposed the SFCL using an electromagnetic repulsion switch which employs our new reclosing function. We also studied recovery time of the superconductor, because superconductor should be recovered to superconducting state within 350 ms. In this paper, the recovery time characteristics of the superconducting wire were investigated. Also, we combined the superconductor with the electromagnetic repulsion switch, and we did performance test. As a result, a high-speed reclosing within 350 ms was proven to be possible.
An enhanced high-speed multi-digit BCD adder using quantum-dot cellular automata

NASA Astrophysics Data System (ADS)

Ajitha, D.; Ramanaiah, K. V.; Sumalatha, V.

2017-02-01

The advent of development of high-performance, low-power digital circuits is achieved by a suitable emerging nanodevice called quantum-dot cellular automata (QCA). Even though many efficient arithmetic circuits were designed using QCA, there is still a challenge to implement high-speed circuits in an optimized manner. Among these circuits, one of the essential structures is a parallel multi-digit decimal adder unit with significant speed which is very attractive for future environments. To achieve high speed, a new correction logic formulation method is proposed for single and multi-digit BCD adder. The proposed enhanced single-digit BCD adder (ESDBA) is 26% faster than the carry flow adder (CFA)-based BCD adder. The multi-digit operations are also performed using the proposed ESDBA, which is cascaded innovatively. The enhanced multi-digit BCD adder (EMDBA) performs two 4-digit and two 8-digit BCD addition 50% faster than the CFA-based BCD adder with the nominal overhead of the area. The EMDBA performs two 4-digit BCD addition 24% faster with 23% decrease in the area, similarly for 8-digit operation the EMDBA achieves 36% increase in speed with 21% less area compared to the existing carry look ahead (CLA)-based BCD adder design. The proposed multi-digit adder produces significantly less delay of (N –1) + 3.5 clock cycles compared to the N* One digit BCD adder delay required by the conventional BCD adder method. It is observed that as per our knowledge this is the first innovative proposal for multi-digit BCD addition using QCA.
Parallel Implementation of Triangular Cellular Automata for Computing Two-Dimensional Elastodynamic Response on Arbitrary Domains

NASA Astrophysics Data System (ADS)

Leamy, Michael J.; Springer, Adam C.

In this research we report parallel implementation of a Cellular Automata-based simulation tool for computing elastodynamic response on complex, two-dimensional domains. Elastodynamic simulation using Cellular Automata (CA) has recently been presented as an alternative, inherently object-oriented technique for accurately and efficiently computing linear and nonlinear wave propagation in arbitrarily-shaped geometries. The local, autonomous nature of the method should lead to straight-forward and efficient parallelization. We address this notion on symmetric multiprocessor (SMP) hardware using a Java-based object-oriented CA code implementing triangular state machines (i.e., automata) and the MPI bindings written in Java (MPJ Express). We use MPJ Express to reconfigure our existing CA code to distribute a domain's automata to cores present on a dual quad-core shared-memory system (eight total processors). We note that this message passing parallelization strategy is directly applicable to computer clustered computing, which will be the focus of follow-on research. Results on the shared memory platform indicate nearly-ideal, linear speed-up. We conclude that the CA-based elastodynamic simulator is easily configured to run in parallel, and yields excellent speed-up on SMP hardware.
A parallel implementation of an off-lattice individual-based model of multicellular populations

NASA Astrophysics Data System (ADS)

Harvey, Daniel G.; Fletcher, Alexander G.; Osborne, James M.; Pitt-Francis, Joe

2015-07-01

As computational models of multicellular populations include ever more detailed descriptions of biophysical and biochemical processes, the computational cost of simulating such models limits their ability to generate novel scientific hypotheses and testable predictions. While developments in microchip technology continue to increase the power of individual processors, parallel computing offers an immediate increase in available processing power. To make full use of parallel computing technology, it is necessary to develop specialised algorithms. To this end, we present a parallel algorithm for a class of off-lattice individual-based models of multicellular populations. The algorithm divides the spatial domain between computing processes and comprises communication routines that ensure the model is correctly simulated on multiple processors. The parallel algorithm is shown to accurately reproduce the results of a deterministic simulation performed using a pre-existing serial implementation. We test the scaling of computation time, memory use and load balancing as more processes are used to simulate a cell population of fixed size. We find approximate linear scaling of both speed-up and memory consumption on up to 32 processor cores. Dynamic load balancing is shown to provide speed-up for non-regular spatial distributions of cells in the case of a growing population.
Development of a detachable high speed miniature scanning probe microscope for large area substrates inspection

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sadeghian, Hamed, E-mail: hamed.sadeghianmarnani@tno.nl, E-mail: h.sadeghianmarnani@tudelft.nl; Department of Precision and Microsystems Engineering, Delft University of Technology, Mekelweg 2, 2628 CD Delft; Herfst, Rodolf

We have developed a high speed, miniature scanning probe microscope (MSPM) integrated with a Positioning Unit (PU) for accurately positioning the MSPM on a large substrate. This combination enables simultaneous, parallel operation of many units on a large sample for high throughput measurements. The size of the MSPM is 19 × 45 × 70 mm{sup 3}. It contains a one-dimensional flexure stage with counter-balanced actuation for vertical scanning with a bandwidth of 50 kHz and a z-travel range of more than 2 μm. This stage is mechanically decoupled from the rest of the MSPM by suspending it on specific dynamicallymore » determined points. The motion of the probe, which is mounted on top of the flexure stage is measured by a very compact optical beam deflection (OBD). Thermal noise spectrum measurements of short cantilevers show a bandwidth of 2 MHz and a noise of less than 15 fm/Hz{sup 1/2}. A fast approach and engagement of the probe to the substrate surface have been achieved by integrating a small stepper actuator and direct monitoring of the cantilever response to the approaching surface. The PU has the same width as the MSPM, 45 mm and can position the MSPM to a pre-chosen position within an area of 275×30 mm{sup 2} to within 100 nm accuracy within a few seconds. During scanning, the MSPM is detached from the PU which is essential to eliminate mechanical vibration and drift from the relatively low-resonance frequency and low-stiffness structure of the PU. Although the specific implementation of the MSPM we describe here has been developed as an atomic force microscope, the general architecture is applicable to any form of SPM. This high speed MSPM is now being used in a parallel SPM architecture for inspection and metrology of large samples such as semiconductor wafers and masks.« less
A Low-Power High-Speed Smart Sensor Design for Space Exploration Missions

NASA Technical Reports Server (NTRS)

Fang, Wai-Chi

1997-01-01

A low-power high-speed smart sensor system based on a large format active pixel sensor (APS) integrated with a programmable neural processor for space exploration missions is presented. The concept of building an advanced smart sensing system is demonstrated by a system-level microchip design that is composed with an APS sensor, a programmable neural processor, and an embedded microprocessor in a SOI CMOS technology. This ultra-fast smart sensor system-on-a-chip design mimics what is inherent in biological vision systems. Moreover, it is programmable and capable of performing ultra-fast machine vision processing in all levels such as image acquisition, image fusion, image analysis, scene interpretation, and control functions. The system provides about one tera-operation-per-second computing power which is a two order-of-magnitude increase over that of state-of-the-art microcomputers. Its high performance is due to massively parallel computing structures, high data throughput rates, fast learning capabilities, and advanced VLSI system-on-a-chip implementation.
A Three-Dimensional Eulerian Code for Simulation of High-Speed Multimaterial Interactions

DTIC Science & Technology

2011-08-01

PDE -based extension. The extension process is done on only the host cells on a particular processor. After extension the parallel communication is...condensation shocks, explosive debris transport, detonation in heterogeneous media and so on. In these flows complex interactions occur between the...A.22] and ijΩ is the spin tensor. The Jaumann derivative is used to ensure objectivity of the stress tensor with respect to rotation
Rational calculation accuracy in acousto-optical matrix-vector processor

NASA Astrophysics Data System (ADS)

Oparin, V. V.; Tigin, Dmitry V.

1994-01-01

The high speed of parallel computations for a comparatively small-size processor and acceptable power consumption makes the usage of acousto-optic matrix-vector multiplier (AOMVM) attractive for processing of large amounts of information in real time. The limited accuracy of computations is an essential disadvantage of such a processor. The reduced accuracy requirements allow for considerable simplification of the AOMVM architecture and the reduction of the demands on its components.
Adaptive-optics optical coherence tomography processing using a graphics processing unit.

PubMed

Shafer, Brandon A; Kriske, Jeffery E; Kocaoglu, Omer P; Turner, Timothy L; Liu, Zhuolin; Lee, John Jaehwan; Miller, Donald T

2014-01-01

Graphics processing units are increasingly being used for scientific computing for their powerful parallel processing abilities, and moderate price compared to super computers and computing grids. In this paper we have used a general purpose graphics processing unit to process adaptive-optics optical coherence tomography (AOOCT) images in real time. Increasing the processing speed of AOOCT is an essential step in moving the super high resolution technology closer to clinical viability.
On Parallelizing Single Dynamic Simulation Using HPC Techniques and APIs of Commercial Software

DOE Office of Scientific and Technical Information (OSTI.GOV)

Diao, Ruisheng; Jin, Shuangshuang; Howell, Frederic

Time-domain simulations are heavily used in today’s planning and operation practices to assess power system transient stability and post-transient voltage/frequency profiles following severe contingencies to comply with industry standards. Because of the increased modeling complexity, it is several times slower than real time for state-of-the-art commercial packages to complete a dynamic simulation for a large-scale model. With the growing stochastic behavior introduced by emerging technologies, power industry has seen a growing need for performing security assessment in real time. This paper presents a parallel implementation framework to speed up a single dynamic simulation by leveraging the existing stability model librarymore » in commercial tools through their application programming interfaces (APIs). Several high performance computing (HPC) techniques are explored such as parallelizing the calculation of generator current injection, identifying fast linear solvers for network solution, and parallelizing data outputs when interacting with APIs in the commercial package, TSAT. The proposed method has been tested on a WECC planning base case with detailed synchronous generator models and exhibits outstanding scalable performance with sufficient accuracy.« less
Integrating Cache Performance Modeling and Tuning Support in Parallelization Tools

NASA Technical Reports Server (NTRS)

Waheed, Abdul; Yan, Jerry; Saini, Subhash (Technical Monitor)

1998-01-01

With the resurgence of distributed shared memory (DSM) systems based on cache-coherent Non Uniform Memory Access (ccNUMA) architectures and increasing disparity between memory and processors speeds, data locality overheads are becoming the greatest bottlenecks in the way of realizing potential high performance of these systems. While parallelization tools and compilers facilitate the users in porting their sequential applications to a DSM system, a lot of time and effort is needed to tune the memory performance of these applications to achieve reasonable speedup. In this paper, we show that integrating cache performance modeling and tuning support within a parallelization environment can alleviate this problem. The Cache Performance Modeling and Prediction Tool (CPMP), employs trace-driven simulation techniques without the overhead of generating and managing detailed address traces. CPMP predicts the cache performance impact of source code level "what-if" modifications in a program to assist a user in the tuning process. CPMP is built on top of a customized version of the Computer Aided Parallelization Tools (CAPTools) environment. Finally, we demonstrate how CPMP can be applied to tune a real Computational Fluid Dynamics (CFD) application.
Gray scale operation of a multichannel optical convolver using the Semetex magnetooptic spatial light modulator

NASA Technical Reports Server (NTRS)

Davis, Jeffrey A.; Day, Timothy; Lilly, Roger A.; Taber, Donald B.; Liu, Hua-Kuang

1988-01-01

A new multichannel optical correlator/convolver architecture which uses an acoustooptic light modulator for the input channel and a Semetex magnetooptic spatial light modulator (MOSLM) for the set of parallel reference channels is presented. Details of the anamorphic optical system are discussed. Experimental results illustrate the use of the system as a convolver for performing digital multiplication by analog convolution (DMAC). A limited gray scale capability for data stored by the MOSLM is demonstrated by implementing this DMAC algorithm with trinary logic. Use of the MOSLM allows the number of parallel channels for the convolver to be increased significantly compared with previously reported techniques while retaining the capability for updating both channels at high speeds.
Gray Scale Operation Of A Multichannel Optical Convolver Using The Semetex Magnetooptic Spatial Light Modulator

NASA Astrophysics Data System (ADS)

Davis, Jeffrey A.; Day, Timothy; Lilly, Roger A.; Taber, Donald B.; Liu, Hua-Kuang; Davis, J. A.; Day, T.; Lilly, R. A.; Taber, D. B.; Liu, H.-K.

1988-02-01

We present a new multichannel optical correlator/convolver architecture which uses an acoustooptic light modulator (AOLM) for the input channel and a Semetex magnetooptic spatial light modulator (MOSLM) for the set of parallel reference channels. Details of the anamorphic optical system are discussed. Experimental results illustrate use of the system as a convolver for performing digital multiplication by analog convolution (DMAC). A limited gray scale capability for data stored by the MOSLM is demonstrated by implementing this DMAC algorithm with trinary logic. Use of the MOSLM allows the number of parallel channels for the convolver to be increased significantly compared with previously reported techniques while retaining the capability for updating both channels at high speeds.
Gray scale operation of a multichannel optical convolver using the Semetex magnetooptic spatial light modulator

NASA Astrophysics Data System (ADS)

Davis, Jeffrey A.; Day, Timothy; Lilly, Roger A.; Taber, Donald B.; Liu, Hua-Kuang

A new multichannel optical correlator/convolver architecture which uses an acoustooptic light modulator for the input channel and a Semetex magnetooptic spatial light modulator (MOSLM) for the set of parallel reference channels is presented. Details of the anamorphic optical system are discussed. Experimental results illustrate the use of the system as a convolver for performing digital multiplication by analog convolution (DMAC). A limited gray scale capability for data stored by the MOSLM is demonstrated by implementing this DMAC algorithm with trinary logic. Use of the MOSLM allows the number of parallel channels for the convolver to be increased significantly compared with previously reported techniques while retaining the capability for updating both channels at high speeds.
Characterizing parallel file-access patterns on a large-scale multiprocessor

NASA Technical Reports Server (NTRS)

Purakayastha, Apratim; Ellis, Carla Schlatter; Kotz, David; Nieuwejaar, Nils; Best, Michael

1994-01-01

Rapid increases in the computational speeds of multiprocessors have not been matched by corresponding performance enhancements in the I/O subsystem. To satisfy the large and growing I/O requirements of some parallel scientific applications, we need parallel file systems that can provide high-bandwidth and high-volume data transfer between the I/O subsystem and thousands of processors. Design of such high-performance parallel file systems depends on a thorough grasp of the expected workload. So far there have been no comprehensive usage studies of multiprocessor file systems. Our CHARISMA project intends to fill this void. The first results from our study involve an iPSC/860 at NASA Ames. This paper presents results from a different platform, the CM-5 at the National Center for Supercomputing Applications. The CHARISMA studies are unique because we collect information about every individual read and write request and about the entire mix of applications running on the machines. The results of our trace analysis lead to recommendations for parallel file system design. First the file system should support efficient concurrent access to many files, and I/O requests from many jobs under varying load conditions. Second, it must efficiently manage large files kept open for long periods. Third, it should expect to see small requests predominantly sequential access patterns, application-wide synchronous access, no concurrent file-sharing between jobs appreciable byte and block sharing between processes within jobs, and strong interprocess locality. Finally, the trace data suggest that node-level write caches and collective I/O request interfaces may be useful in certain environments.
Massively Parallel Rogue Cell Detection using Serial Time-Encoded Amplified Microscopy of Inertially Ordered Cells in High Throughput Flow

DTIC Science & Technology

2013-06-01

couples the high-‐speed capability of the STEAM imager and differential phase... air bubbles in the TPE mix. Moreover, TPE chips were also successfully sealed to other substrates...dynamics, and microelectromechanical systems (MEMS) via laser-‐scanning surface vibrometry , and observation
Parallel Event Analysis Under Unix

NASA Astrophysics Data System (ADS)

Looney, S.; Nilsson, B. S.; Oest, T.; Pettersson, T.; Ranjard, F.; Thibonnier, J.-P.

The ALEPH experiment at LEP, the CERN CN division and Digital Equipment Corp. have, in a joint project, developed a parallel event analysis system. The parallel physics code is identical to ALEPH's standard analysis code, ALPHA, only the organisation of input/output is changed. The user may switch between sequential and parallel processing by simply changing one input "card". The initial implementation runs on an 8-node DEC 3000/400 farm, using the PVM software, and exhibits a near-perfect speed-up linearity, reducing the turn-around time by a factor of 8.
64 x 64 thresholding photodetector array for optical pattern recognition

NASA Astrophysics Data System (ADS)

Langenbacher, Harry; Chao, Tien-Hsin; Shaw, Timothy; Yu, Jeffrey W.

1993-10-01

A high performance 32 X 32 peak detector array is introduced. This detector consists of a 32 X 32 array of thresholding photo-transistor cells, manufactured with a standard MOSIS digital 2-micron CMOS process. A built-in thresholding function that is able to perform 1024 thresholding operations in parallel strongly distinguishes this chip from available CCD detectors. This high speed detector offers responses from one to 10 milliseconds that is much higher than the commercially available CCD detectors operating at a TV frame rate. The parallel multiple peaks thresholding detection capability makes it particularly suitable for optical correlator and optoelectronically implemented neural networks. The principle of operation, circuit design and the performance characteristics are described. Experimental demonstration of correlation peak detection is also provided. Recently, we have also designed and built an advanced version of a 64 X 64 thresholding photodetector array chip. Experimental investigation of using this chip for pattern recognition is ongoing.
Supersonic civil airplane study and design: Performance and sonic boom

NASA Technical Reports Server (NTRS)

Cheung, Samson

1995-01-01

Since aircraft configuration plays an important role in aerodynamic performance and sonic boom shape, the configuration of the next generation supersonic civil transport has to be tailored to meet high aerodynamic performance and low sonic boom requirements. Computational fluid dynamics (CFD) can be used to design airplanes to meet these dual objectives. The work and results in this report are used to support NASA's High Speed Research Program (HSRP). CFD tools and techniques have been developed for general usages of sonic boom propagation study and aerodynamic design. Parallel to the research effort on sonic boom extrapolation, CFD flow solvers have been coupled with a numeric optimization tool to form a design package for aircraft configuration. This CFD optimization package has been applied to configuration design on a low-boom concept and an oblique all-wing concept. A nonlinear unconstrained optimizer for Parallel Virtual Machine has been developed for aerodynamic design and study.
Modeling of High Speed Reacting Flows: Established Practices and Future Challenges

NASA Technical Reports Server (NTRS)

Baurle, R. A.

2004-01-01

Computational fluid dynamics (CFD) has proven to be an invaluable tool for the design and analysis of high- speed propulsion devices. Massively parallel computing, together with the maturation of robust CFD codes, has made it possible to perform simulations of complete engine flowpaths. Steady-state Reynolds-Averaged Navier-Stokes simulations are now routinely used in the scramjet engine development cycle to determine optimal fuel injector arrangements, investigate trends noted during testing, and extract various measures of engine efficiency. Unfortunately, the turbulence and combustion models used in these codes have not changed significantly over the past decade. Hence, the CFD practitioner must often rely heavily on existing measurements (at similar flow conditions) to calibrate model coefficients on a case- by-case basis. This paper provides an overview of the modeled equations typically employed by commercial- quality CFD codes for high-speed combustion applications. Careful attention is given to the approximations employed for each of the unclosed terms in the averaged equation set. The salient features (and shortcomings) of common models used to close these terms are covered in detail, and several academic efforts aimed at addressing these shortcomings are discussed.

Implementation of parallel transmit beamforming using orthogonal frequency division multiplexing--achievable resolution and interbeam interference.

PubMed

Demi, Libertario; Viti, Jacopo; Kusters, Lieneke; Guidi, Francesco; Tortoli, Piero; Mischi, Massimo

2013-11-01

The speed of sound in the human body limits the achievable data acquisition rate of pulsed ultrasound scanners. To overcome this limitation, parallel beamforming techniques are used in ultrasound 2-D and 3-D imaging systems. Different parallel beamforming approaches have been proposed. They may be grouped into two major categories: parallel beamforming in reception and parallel beamforming in transmission. The first category is not optimal for harmonic imaging; the second category may be more easily applied to harmonic imaging. However, inter-beam interference represents an issue. To overcome these shortcomings and exploit the benefit of combining harmonic imaging and high data acquisition rate, a new approach has been recently presented which relies on orthogonal frequency division multiplexing (OFDM) to perform parallel beamforming in transmission. In this paper, parallel transmit beamforming using OFDM is implemented for the first time on an ultrasound scanner. An advanced open platform for ultrasound research is used to investigate the axial resolution and interbeam interference achievable with parallel transmit beamforming using OFDM. Both fundamental and second-harmonic imaging modalities have been considered. Results show that, for fundamental imaging, axial resolution in the order of 2 mm can be achieved in combination with interbeam interference in the order of -30 dB. For second-harmonic imaging, axial resolution in the order of 1 mm can be achieved in combination with interbeam interference in the order of -35 dB.
The other fiber, the other fabric, the other way

NASA Astrophysics Data System (ADS)

Stephens, Gary R.

1993-02-01

Coaxial cable and distributed switches provide a way to configure high-speed Fiber Channel fabrics. This type of fabric provides a cost-effective alternative to a fabric of optical fibers and centralized cross-point switches. The fabric topology is a simple tree. Products using parallel busses require a significant change to migrate to a serial bus. Coaxial cables and distributed switches require a smaller technology shift for these device manufacturers. Each distributed switch permits both medium type and speed changes. The fabric can grow and bridge to optical fibers as the needs expand. A distributed fabric permits earlier entry into high-speed serial operations. For very low-cost fabrics, a distributed switch may permit a link configured as a loop. The loop eliminates half of the ports when compared to a switched point-to-point fabric. A fabric of distributed switches can interface to a cross-point switch fabric. The expected sequence of migration is: closed loops, small closed fabrics, and, finally, bridges, to connect optical cross-point switch fabrics. This paper presents the concept of distributed fabrics, including address assignment, frame routing, and general operation.
A Design and Development of Multi-Purpose CCD Camera System with Thermoelectric Cooling: Software

NASA Astrophysics Data System (ADS)

Oh, S. H.; Kang, Y. W.; Byun, Y. I.

2007-12-01

We present a software which we developed for the multi-purpose CCD camera. This software can be used on the all 3 types of CCD - KAF-0401E (768×512), KAF-1602E (15367times;1024), KAF-3200E (2184×1472) made in KODAK Co.. For the efficient CCD camera control, the software is operated with two independent processes of the CCD control program and the temperature/shutter operation program. This software is designed to fully automatic operation as well as manually operation under LINUX system, and is controled by LINUX user signal procedure. We plan to use this software for all sky survey system and also night sky monitoring and sky observation. As our results, the read-out time of each CCD are about 15sec, 64sec, 134sec for KAF-0401E, KAF-1602E, KAF-3200E., because these time are limited by the data transmission speed of parallel port. For larger format CCD, the data transmission is required more high speed. we are considering this control software to one using USB port for high speed data transmission.
Divide and Conquer (DC) BLAST: fast and easy BLAST execution within HPC environments

DOE PAGES

Yim, Won Cheol; Cushman, John C.

2017-07-22

Bioinformatics is currently faced with very large-scale data sets that lead to computational jobs, especially sequence similarity searches, that can take absurdly long times to run. For example, the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool (BLAST and BLAST+) suite, which is by far the most widely used tool for rapid similarity searching among nucleic acid or amino acid sequences, is highly central processing unit (CPU) intensive. While the BLAST suite of programs perform searches very rapidly, they have the potential to be accelerated. In recent years, distributed computing environments have become more widely accessible andmore » used due to the increasing availability of high-performance computing (HPC) systems. Therefore, simple solutions for data parallelization are needed to expedite BLAST and other sequence analysis tools. However, existing software for parallel sequence similarity searches often requires extensive computational experience and skill on the part of the user. In order to accelerate BLAST and other sequence analysis tools, Divide and Conquer BLAST (DCBLAST) was developed to perform NCBI BLAST searches within a cluster, grid, or HPC environment by using a query sequence distribution approach. Scaling from one (1) to 256 CPU cores resulted in significant improvements in processing speed. Thus, DCBLAST dramatically accelerates the execution of BLAST searches using a simple, accessible, robust, and parallel approach. DCBLAST works across multiple nodes automatically and it overcomes the speed limitation of single-node BLAST programs. DCBLAST can be used on any HPC system, can take advantage of hundreds of nodes, and has no output limitations. Thus, this freely available tool simplifies distributed computation pipelines to facilitate the rapid discovery of sequence similarities between very large data sets.« less
Divide and Conquer (DC) BLAST: fast and easy BLAST execution within HPC environments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yim, Won Cheol; Cushman, John C.

Bioinformatics is currently faced with very large-scale data sets that lead to computational jobs, especially sequence similarity searches, that can take absurdly long times to run. For example, the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool (BLAST and BLAST+) suite, which is by far the most widely used tool for rapid similarity searching among nucleic acid or amino acid sequences, is highly central processing unit (CPU) intensive. While the BLAST suite of programs perform searches very rapidly, they have the potential to be accelerated. In recent years, distributed computing environments have become more widely accessible andmore » used due to the increasing availability of high-performance computing (HPC) systems. Therefore, simple solutions for data parallelization are needed to expedite BLAST and other sequence analysis tools. However, existing software for parallel sequence similarity searches often requires extensive computational experience and skill on the part of the user. In order to accelerate BLAST and other sequence analysis tools, Divide and Conquer BLAST (DCBLAST) was developed to perform NCBI BLAST searches within a cluster, grid, or HPC environment by using a query sequence distribution approach. Scaling from one (1) to 256 CPU cores resulted in significant improvements in processing speed. Thus, DCBLAST dramatically accelerates the execution of BLAST searches using a simple, accessible, robust, and parallel approach. DCBLAST works across multiple nodes automatically and it overcomes the speed limitation of single-node BLAST programs. DCBLAST can be used on any HPC system, can take advantage of hundreds of nodes, and has no output limitations. Thus, this freely available tool simplifies distributed computation pipelines to facilitate the rapid discovery of sequence similarities between very large data sets.« less
Comparison of Acceleration Techniques for Selected Low-Level Bioinformatics Operations

PubMed Central

Langenkämper, Daniel; Jakobi, Tobias; Feld, Dustin; Jelonek, Lukas; Goesmann, Alexander; Nattkemper, Tim W.

2016-01-01

Within the recent years clock rates of modern processors stagnated while the demand for computing power continued to grow. This applied particularly for the fields of life sciences and bioinformatics, where new technologies keep on creating rapidly growing piles of raw data with increasing speed. The number of cores per processor increased in an attempt to compensate for slight increments of clock rates. This technological shift demands changes in software development, especially in the field of high performance computing where parallelization techniques are gaining in importance due to the pressing issue of large sized datasets generated by e.g., modern genomics. This paper presents an overview of state-of-the-art manual and automatic acceleration techniques and lists some applications employing these in different areas of sequence informatics. Furthermore, we provide examples for automatic acceleration of two use cases to show typical problems and gains of transforming a serial application to a parallel one. The paper should aid the reader in deciding for a certain techniques for the problem at hand. We compare four different state-of-the-art automatic acceleration approaches (OpenMP, PluTo-SICA, PPCG, and OpenACC). Their performance as well as their applicability for selected use cases is discussed. While optimizations targeting the CPU worked better in the complex k-mer use case, optimizers for Graphics Processing Units (GPUs) performed better in the matrix multiplication example. But performance is only superior at a certain problem size due to data migration overhead. We show that automatic code parallelization is feasible with current compiler software and yields significant increases in execution speed. Automatic optimizers for CPU are mature and usually no additional manual adjustment is required. In contrast, some automatic parallelizers targeting GPUs still lack maturity and are limited to simple statements and structures. PMID:26904094
Comparison of Acceleration Techniques for Selected Low-Level Bioinformatics Operations.

PubMed

Langenkämper, Daniel; Jakobi, Tobias; Feld, Dustin; Jelonek, Lukas; Goesmann, Alexander; Nattkemper, Tim W

2016-01-01

Within the recent years clock rates of modern processors stagnated while the demand for computing power continued to grow. This applied particularly for the fields of life sciences and bioinformatics, where new technologies keep on creating rapidly growing piles of raw data with increasing speed. The number of cores per processor increased in an attempt to compensate for slight increments of clock rates. This technological shift demands changes in software development, especially in the field of high performance computing where parallelization techniques are gaining in importance due to the pressing issue of large sized datasets generated by e.g., modern genomics. This paper presents an overview of state-of-the-art manual and automatic acceleration techniques and lists some applications employing these in different areas of sequence informatics. Furthermore, we provide examples for automatic acceleration of two use cases to show typical problems and gains of transforming a serial application to a parallel one. The paper should aid the reader in deciding for a certain techniques for the problem at hand. We compare four different state-of-the-art automatic acceleration approaches (OpenMP, PluTo-SICA, PPCG, and OpenACC). Their performance as well as their applicability for selected use cases is discussed. While optimizations targeting the CPU worked better in the complex k-mer use case, optimizers for Graphics Processing Units (GPUs) performed better in the matrix multiplication example. But performance is only superior at a certain problem size due to data migration overhead. We show that automatic code parallelization is feasible with current compiler software and yields significant increases in execution speed. Automatic optimizers for CPU are mature and usually no additional manual adjustment is required. In contrast, some automatic parallelizers targeting GPUs still lack maturity and are limited to simple statements and structures.
Wake Encounter Analysis for a Closely Spaced Parallel Runway Paired Approach Simulation

NASA Technical Reports Server (NTRS)

Mckissick,Burnell T.; Rico-Cusi, Fernando J.; Murdoch, Jennifer; Oseguera-Lohr, Rosa M.; Stough, Harry P, III; O'Connor, Cornelius J.; Syed, Hazari I.

2009-01-01

A Monte Carlo simulation of simultaneous approaches performed by two transport category aircraft from the final approach fix to a pair of closely spaced parallel runways was conducted to explore the aft boundary of the safe zone in which separation assurance and wake avoidance are provided. The simulation included variations in runway centerline separation, initial longitudinal spacing of the aircraft, crosswind speed, and aircraft speed during the approach. The data from the simulation showed that the majority of the wake encounters occurred near or over the runway and the aft boundaries of the safe zones were identified for all simulation conditions.
Line-Focused Optical Excitation of Parallel Acoustic Focused Sample Streams for High Volumetric and Analytical Rate Flow Cytometry.

PubMed

Kalb, Daniel M; Fencl, Frank A; Woods, Travis A; Swanson, August; Maestas, Gian C; Juárez, Jaime J; Edwards, Bruce S; Shreve, Andrew P; Graves, Steven W

2017-09-19

Flow cytometry provides highly sensitive multiparameter analysis of cells and particles but has been largely limited to the use of a single focused sample stream. This limits the analytical rate to ∼50K particles/s and the volumetric rate to ∼250 μL/min. Despite the analytical prowess of flow cytometry, there are applications where these rates are insufficient, such as rare cell analysis in high cellular backgrounds (e.g., circulating tumor cells and fetal cells in maternal blood), detection of cells/particles in large dilute samples (e.g., water quality, urine analysis), or high-throughput screening applications. Here we report a highly parallel acoustic flow cytometer that uses an acoustic standing wave to focus particles into 16 parallel analysis points across a 2.3 mm wide optical flow cell. A line-focused laser and wide-field collection optics are used to excite and collect the fluorescence emission of these parallel streams onto a high-speed camera for analysis. With this instrument format and fluorescent microsphere standards, we obtain analysis rates of 100K/s and flow rates of 10 mL/min, while maintaining optical performance comparable to that of a commercial flow cytometer. The results with our initial prototype instrument demonstrate that the integration of key parallelizable components, including the line-focused laser, particle focusing using multinode acoustic standing waves, and a spatially arrayed detector, can increase analytical and volumetric throughputs by orders of magnitude in a compact, simple, and cost-effective platform. Such instruments will be of great value to applications in need of high-throughput yet sensitive flow cytometry analysis.
Parallel auto-correlative statistics with VTK.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pebay, Philippe Pierre; Bennett, Janine Camille

2013-08-01

This report summarizes existing statistical engines in VTK and presents both the serial and parallel auto-correlative statistics engines. It is a sequel to [PT08, BPRT09b, PT09, BPT09, PT10] which studied the parallel descriptive, correlative, multi-correlative, principal component analysis, contingency, k-means, and order statistics engines. The ease of use of the new parallel auto-correlative statistics engine is illustrated by the means of C++ code snippets and algorithm verification is provided. This report justifies the design of the statistics engines with parallel scalability in mind, and provides scalability and speed-up analysis results for the autocorrelative statistics engine.
Portable parallel stochastic optimization for the design of aeropropulsion components

NASA Technical Reports Server (NTRS)

Sues, Robert H.; Rhodes, G. S.

1994-01-01

This report presents the results of Phase 1 research to develop a methodology for performing large-scale Multi-disciplinary Stochastic Optimization (MSO) for the design of aerospace systems ranging from aeropropulsion components to complete aircraft configurations. The current research recognizes that such design optimization problems are computationally expensive, and require the use of either massively parallel or multiple-processor computers. The methodology also recognizes that many operational and performance parameters are uncertain, and that uncertainty must be considered explicitly to achieve optimum performance and cost. The objective of this Phase 1 research was to initialize the development of an MSO methodology that is portable to a wide variety of hardware platforms, while achieving efficient, large-scale parallelism when multiple processors are available. The first effort in the project was a literature review of available computer hardware, as well as review of portable, parallel programming environments. The first effort was to implement the MSO methodology for a problem using the portable parallel programming language, Parallel Virtual Machine (PVM). The third and final effort was to demonstrate the example on a variety of computers, including a distributed-memory multiprocessor, a distributed-memory network of workstations, and a single-processor workstation. Results indicate the MSO methodology can be well-applied towards large-scale aerospace design problems. Nearly perfect linear speedup was demonstrated for computation of optimization sensitivity coefficients on both a 128-node distributed-memory multiprocessor (the Intel iPSC/860) and a network of workstations (speedups of almost 19 times achieved for 20 workstations). Very high parallel efficiencies (75 percent for 31 processors and 60 percent for 50 processors) were also achieved for computation of aerodynamic influence coefficients on the Intel. Finally, the multi-level parallelization strategy that will be needed for large-scale MSO problems was demonstrated to be highly efficient. The same parallel code instructions were used on both platforms, demonstrating portability. There are many applications for which MSO can be applied, including NASA's High-Speed-Civil Transport, and advanced propulsion systems. The use of MSO will reduce design and development time and testing costs dramatically.
Label-Free Biomedical Imaging Using High-Speed Lock-In Pixel Sensor for Stimulated Raman Scattering

PubMed Central

Mars, Kamel; Kawahito, Shoji; Yasutomi, Keita; Kagawa, Keiichiro; Yamada, Takahiro

2017-01-01

Raman imaging eliminates the need for staining procedures, providing label-free imaging to study biological samples. Recent developments in stimulated Raman scattering (SRS) have achieved fast acquisition speed and hyperspectral imaging. However, there has been a problem of lack of detectors suitable for MHz modulation rate parallel detection, detecting multiple small SRS signals while eliminating extremely strong offset due to direct laser light. In this paper, we present a complementary metal-oxide semiconductor (CMOS) image sensor using high-speed lock-in pixels for stimulated Raman scattering that is capable of obtaining the difference of Stokes-on and Stokes-off signal at modulation frequency of 20 MHz in the pixel before reading out. The generated small SRS signal is extracted and amplified in a pixel using a high-speed and large area lateral electric field charge modulator (LEFM) employing two-step ion implantation and an in-pixel pair of low-pass filter, a sample and hold circuit and a switched capacitor integrator using a fully differential amplifier. A prototype chip is fabricated using 0.11 μm CMOS image sensor technology process. SRS spectra and images of stearic acid and 3T3-L1 samples are successfully obtained. The outcomes suggest that hyperspectral and multi-focus SRS imaging at video rate is viable after slight modifications to the pixel architecture and the acquisition system. PMID:29120358
Label-Free Biomedical Imaging Using High-Speed Lock-In Pixel Sensor for Stimulated Raman Scattering.

PubMed

Mars, Kamel; Lioe, De Xing; Kawahito, Shoji; Yasutomi, Keita; Kagawa, Keiichiro; Yamada, Takahiro; Hashimoto, Mamoru

2017-11-09

Raman imaging eliminates the need for staining procedures, providing label-free imaging to study biological samples. Recent developments in stimulated Raman scattering (SRS) have achieved fast acquisition speed and hyperspectral imaging. However, there has been a problem of lack of detectors suitable for MHz modulation rate parallel detection, detecting multiple small SRS signals while eliminating extremely strong offset due to direct laser light. In this paper, we present a complementary metal-oxide semiconductor (CMOS) image sensor using high-speed lock-in pixels for stimulated Raman scattering that is capable of obtaining the difference of Stokes-on and Stokes-off signal at modulation frequency of 20 MHz in the pixel before reading out. The generated small SRS signal is extracted and amplified in a pixel using a high-speed and large area lateral electric field charge modulator (LEFM) employing two-step ion implantation and an in-pixel pair of low-pass filter, a sample and hold circuit and a switched capacitor integrator using a fully differential amplifier. A prototype chip is fabricated using 0.11 μm CMOS image sensor technology process. SRS spectra and images of stearic acid and 3T3-L1 samples are successfully obtained. The outcomes suggest that hyperspectral and multi-focus SRS imaging at video rate is viable after slight modifications to the pixel architecture and the acquisition system.
The Temporal Dynamics of Visual Search: Evidence for Parallel Processing in Feature and Conjunction Searches

PubMed Central

McElree, Brian; Carrasco, Marisa

2012-01-01

Feature and conjunction searches have been argued to delineate parallel and serial operations in visual processing. The authors evaluated this claim by examining the temporal dynamics of the detection of features and conjunctions. The 1st experiment used a reaction time (RT) task to replicate standard mean RT patterns and to examine the shapes of the RT distributions. The 2nd experiment used the response-signal speed–accuracy trade-off (SAT) procedure to measure discrimination (asymptotic detection accuracy) and detection speed (processing dynamics). Set size affected discrimination in both feature and conjunction searches but affected detection speed only in the latter. Fits of models to the SAT data that included a serial component overpredicted the magnitude of the observed dynamics differences. The authors concluded that both features and conjunctions are detected in parallel. Implications for the role of attention in visual processing are discussed. PMID:10641310
OPTICAL correlation identification technology applied in underwater laser imaging target identification

NASA Astrophysics Data System (ADS)

Yao, Guang-tao; Zhang, Xiao-hui; Ge, Wei-long

2012-01-01

The underwater laser imaging detection is an effective method of detecting short distance target underwater as an important complement of sonar detection. With the development of underwater laser imaging technology and underwater vehicle technology, the underwater automatic target identification has gotten more and more attention, and is a research difficulty in the area of underwater optical imaging information processing. Today, underwater automatic target identification based on optical imaging is usually realized with the method of digital circuit software programming. The algorithm realization and control of this method is very flexible. However, the optical imaging information is 2D image even 3D image, the amount of imaging processing information is abundant, so the electronic hardware with pure digital algorithm will need long identification time and is hard to meet the demands of real-time identification. If adopt computer parallel processing, the identification speed can be improved, but it will increase complexity, size and power consumption. This paper attempts to apply optical correlation identification technology to realize underwater automatic target identification. The optics correlation identification technology utilizes the Fourier transform characteristic of Fourier lens which can accomplish Fourier transform of image information in the level of nanosecond, and optical space interconnection calculation has the features of parallel, high speed, large capacity and high resolution, combines the flexibility of calculation and control of digital circuit method to realize optoelectronic hybrid identification mode. We reduce theoretical formulation of correlation identification and analyze the principle of optical correlation identification, and write MATLAB simulation program. We adopt single frame image obtained in underwater range gating laser imaging to identify, and through identifying and locating the different positions of target, we can improve the speed and orientation efficiency of target identification effectively, and validate the feasibility of this method primarily.
Constructing a safety and security system by medical applications of a fast face recognition optical parallel correlator

NASA Astrophysics Data System (ADS)

Watanabe, Eriko; Ishikawa, Mami; Ohta, Maiko; Murakami, Yasuo; Kodate, Kashiko

2006-01-01

Medical errors and patient safety have always received a great deal of attention, as they can be critically life-threatening and significant matters. Hospitals and medical personnel are trying their utmost to avoid these errors. Currently in the medical field, patients' record is identified through their PIN numbers and ID cards. However, for patients who cannot speak or move, or who suffer from memory disturbances, alternative methods would be more desirable, and necessary in some cases. The authors previously proposed and fabricated a specially-designed correlator called FARCO (Fast Face Recognition Optical Correlator) based on the Vanderlugt Correlator1, which operates at the speed of 1000 faces/s 2,3,4. Combined with high-speed display devices, the four-channel processing could achieve such high operational speed as 4000 faces/s. Running trial experiments on a 1-to-N identification basis using the optical parallel correlator, we succeeded in acquiring low error rates of 1 % FMR and 2.3 % FNMR. In this paper, we propose a robust face recognition system using the FARCO for focusing on the safety and security of the medical field. We apply our face recognition system to registration of inpatients, in particular children and infants, before and after medical treatments or operations. The proposed system has recorded a higher recognition rate by multiplexing both input and database facial images from moving images. The system was also tested and evaluated for further practical use, leaving excellent results. Hence, our face recognition system could function effectively as an integral part of medical system, meeting these essential requirements of safety, security and privacy.
The artificial retina processor for track reconstruction at the LHC crossing rate

DOE PAGES

Abba, A.; Bedeschi, F.; Citterio, M.; ...

2015-03-16

We present results of an R&D study for a specialized processor capable of precisely reconstructing, in pixel detectors, hundreds of charged-particle tracks from high-energy collisions at 40 MHz rate. We apply a highly parallel pattern-recognition algorithm, inspired by studies of the processing of visual images by the brain as it happens in nature, and describe in detail an efficient hardware implementation in high-speed, high-bandwidth FPGA devices. This is the first detailed demonstration of reconstruction of offline-quality tracks at 40 MHz and makes the device suitable for processing Large Hadron Collider events at the full crossing frequency.
Hardware accelerator for molecular dynamics: MDGRAPE-2

NASA Astrophysics Data System (ADS)

Susukita, Ryutaro; Ebisuzaki, Toshikazu; Elmegreen, Bruce G.; Furusawa, Hideaki; Kato, Kenya; Kawai, Atsushi; Kobayashi, Yoshinao; Koishi, Takahiro; McNiven, Geoffrey D.; Narumi, Tetsu; Yasuoka, Kenji

2003-10-01

We developed MDGRAPE-2, a hardware accelerator that calculates forces at high speed in molecular dynamics (MD) simulations. MDGRAPE-2 is connected to a PC or a workstation as an extension board. The sustained performance of one MDGRAPE-2 board is 15 Gflops, roughly equivalent to the peak performance of the fastest supercomputer processing element. One board is able to calculate all forces between 10 000 particles in 0.28 s (i.e. 310000 time steps per day). If 16 boards are connected to one computer and operated in parallel, this calculation speed becomes ˜10 times faster. In addition to MD, MDGRAPE-2 can be applied to gravitational N-body simulations, the vortex method and smoothed particle hydrodynamics in computational fluid dynamics.
Incoherent beam combining based on the momentum SPGD algorithm

NASA Astrophysics Data System (ADS)

Yang, Guoqing; Liu, Lisheng; Jiang, Zhenhua; Guo, Jin; Wang, Tingfeng

2018-05-01

Incoherent beam combining (ICBC) technology is one of the most promising ways to achieve high-energy, near-diffraction laser output. In this paper, the momentum method is proposed as a modification of the stochastic parallel gradient descent (SPGD) algorithm. The momentum method can improve the speed of convergence of the combining system efficiently. The analytical method is employed to interpret the principle of the momentum method. Furthermore, the proposed algorithm is testified through simulations as well as experiments. The results of the simulations and the experiments show that the proposed algorithm not only accelerates the speed of the iteration, but also keeps the stability of the combining process. Therefore the feasibility of the proposed algorithm in the beam combining system is testified.
The seasonal-cycle climate model

NASA Technical Reports Server (NTRS)

Marx, L.; Randall, D. A.

1981-01-01

The seasonal cycle run which will become the control run for the comparison with runs utilizing codes and parameterizations developed by outside investigators is discussed. The climate model currently exists in two parallel versions: one running on the Amdahl and the other running on the CYBER 203. These two versions are as nearly identical as machine capability and the requirement for high speed performance will allow. Developmental changes are made on the Amdahl/CMS version for ease of testing and rapidity of turnaround. The changes are subsequently incorporated into the CYBER 203 version using vectorization techniques where speed improvement can be realized. The 400 day seasonal cycle run serves as a control run for both medium and long range climate forecasts alsensitivity studies.

The RABiT: a rapid automated biodosimetry tool for radiological triage. II. Technological developments.

PubMed

Garty, Guy; Chen, Youhua; Turner, Helen C; Zhang, Jian; Lyulko, Oleksandra V; Bertucci, Antonella; Xu, Yanping; Wang, Hongliang; Simaan, Nabil; Randers-Pehrson, Gerhard; Lawrence Yao, Y; Brenner, David J

2011-08-01

Over the past five years the Center for Minimally Invasive Radiation Biodosimetry at Columbia University has developed the Rapid Automated Biodosimetry Tool (RABiT), a completely automated, ultra-high throughput biodosimetry workstation. This paper describes recent upgrades and reliability testing of the RABiT. The RABiT analyses fingerstick-derived blood samples to estimate past radiation exposure or to identify individuals exposed above or below a cut-off dose. Through automated robotics, lymphocytes are extracted from fingerstick blood samples into filter-bottomed multi-well plates. Depending on the time since exposure, the RABiT scores either micronuclei or phosphorylation of the histone H2AX, in an automated robotic system, using filter-bottomed multi-well plates. Following lymphocyte culturing, fixation and staining, the filter bottoms are removed from the multi-well plates and sealed prior to automated high-speed imaging. Image analysis is performed online using dedicated image processing hardware. Both the sealed filters and the images are archived. We have developed a new robotic system for lymphocyte processing, making use of an upgraded laser power and parallel processing of four capillaries at once. This system has allowed acceleration of lymphocyte isolation, the main bottleneck of the RABiT operation, from 12 to 2 sec/sample. Reliability tests have been performed on all robotic subsystems. Parallel handling of multiple samples through the use of dedicated, purpose-built, robotics and high speed imaging allows analysis of up to 30,000 samples per day.
The RABiT: A Rapid Automated Biodosimetry Tool For Radiological Triage. II. Technological Developments

PubMed Central

Garty, Guy; Chen, Youhua; Turner, Helen; Zhang, Jian; Lyulko, Oleksandra; Bertucci, Antonella; Xu, Yanping; Wang, Hongliang; Simaan, Nabil; Randers-Pehrson, Gerhard; Yao, Y. Lawrence; Brenner, David J.

2011-01-01

Purpose Over the past five years the Center for Minimally Invasive Radiation Biodosimetry at Columbia University has developed the Rapid Automated Biodosimetry Tool (RABiT), a completely automated, ultra-high throughput biodosimetry workstation. This paper describes recent upgrades and reliability testing of the RABiT. Materials and methods The RABiT analyzes fingerstick-derived blood samples to estimate past radiation exposure or to identify individuals exposed above or below a cutoff dose. Through automated robotics, lymphocytes are extracted from fingerstick blood samples into filter-bottomed multi-well plates. Depending on the time since exposure, the RABiT scores either micronuclei or phosphorylation of the histone H2AX, in an automated robotic system, using filter-bottomed multi-well plates. Following lymphocyte culturing, fixation and staining, the filter bottoms are removed from the multi-well plates and sealed prior to automated high-speed imaging. Image analysis is performed online using dedicated image processing hardware. Both the sealed filters and the images are archived. Results We have developed a new robotic system for lymphocyte processing, making use of an upgraded laser power and parallel processing of four capillaries at once. This system has allowed acceleration of lymphocyte isolation, the main bottleneck of the RABiT operation, from 12 to 2 sec/sample. Reliability tests have been performed on all robotic subsystems. Conclusions Parallel handling of multiple samples through the use of dedicated, purpose-built, robotics and high speed imaging allows analysis of up to 30,000 samples per day. PMID:21557703
Implementing a Parallel Image Edge Detection Algorithm Based on the Otsu-Canny Operator on the Hadoop Platform

PubMed Central

Wang, Min; Tian, Yun

2018-01-01

The Canny operator is widely used to detect edges in images. However, as the size of the image dataset increases, the edge detection performance of the Canny operator decreases and its runtime becomes excessive. To improve the runtime and edge detection performance of the Canny operator, in this paper, we propose a parallel design and implementation for an Otsu-optimized Canny operator using a MapReduce parallel programming model that runs on the Hadoop platform. The Otsu algorithm is used to optimize the Canny operator's dual threshold and improve the edge detection performance, while the MapReduce parallel programming model facilitates parallel processing for the Canny operator to solve the processing speed and communication cost problems that occur when the Canny edge detection algorithm is applied to big data. For the experiments, we constructed datasets of different scales from the Pascal VOC2012 image database. The proposed parallel Otsu-Canny edge detection algorithm performs better than other traditional edge detection algorithms. The parallel approach reduced the running time by approximately 67.2% on a Hadoop cluster architecture consisting of 5 nodes with a dataset of 60,000 images. Overall, our approach system speeds up the system by approximately 3.4 times when processing large-scale datasets, which demonstrates the obvious superiority of our method. The proposed algorithm in this study demonstrates both better edge detection performance and improved time performance. PMID:29861711
Parallel Processing in Visual Search Asymmetry

ERIC Educational Resources Information Center

Dosher, Barbara Anne; Han, Songmei; Lu, Zhong-Lin

2004-01-01

The difficulty of visual search may depend on assignment of the same visual elements as targets and distractors-search asymmetry. Easy C-in-O searches and difficult O-in-C searches are often associated with parallel and serial search, respectively. Here, the time course of visual search was measured for both tasks with speed-accuracy methods. The…
COMPARISON OF PARALLEL AND SERIES HYBRID POWERTRAINS FOR TRANSIT BUS APPLICATION

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gao, Zhiming; Daw, C Stuart; Smith, David E

2016-01-01

The fuel economy and emissions of both conventional and hybrid buses equipped with emissions aftertreatment were evaluated via computational simulation for six representative city bus drive cycles. Both series and parallel configurations for the hybrid case were studied. The simulation results indicate that series hybrid buses have the greatest overall advantage in fuel economy. The series and parallel hybrid buses were predicted to produce similar CO and HC tailpipe emissions but were also predicted to have reduced NOx tailpipe emissions compared to the conventional bus in higher speed cycles. For the New York bus cycle (NYBC), which has the lowestmore » average speed among the cycles evaluated, the series bus tailpipe emissions were somewhat higher than they were for the conventional bus, while the parallel hybrid bus had significantly lower tailpipe emissions. All three bus powertrains were found to require periodic active DPF regeneration to maintain PM control. Plug-in operation of series hybrid buses appears to offer significant fuel economy benefits and is easily employed due to the relatively large battery capacity that is typical of the series hybrid configuration.« less
High speed civil transport aerodynamic optimization

NASA Technical Reports Server (NTRS)

Ryan, James S.

1994-01-01

This is a report of work in support of the Computational Aerosciences (CAS) element of the Federal HPCC program. Specifically, CFD and aerodynamic optimization are being performed on parallel computers. The long-range goal of this work is to facilitate teraflops-rate multidisciplinary optimization of aerospace vehicles. This year's work is targeted for application to the High Speed Civil Transport (HSCT), one of four CAS grand challenges identified in the HPCC FY 1995 Blue Book. This vehicle is to be a passenger aircraft, with the promise of cutting overseas flight time by more than half. To meet fuel economy, operational costs, environmental impact, noise production, and range requirements, improved design tools are required, and these tools must eventually integrate optimization, external aerodynamics, propulsion, structures, heat transfer, controls, and perhaps other disciplines. The fundamental goal of this project is to contribute to improved design tools for U.S. industry, and thus to the nation's economic competitiveness.
Dynamic strain distribution of FRP plate under blast loading

NASA Astrophysics Data System (ADS)

Saburi, T.; Yoshida, M.; Kubota, S.

2017-02-01

The dynamic strain distribution of a fiber re-enforced plastic (FRP) plate under blast loading was investigated using a Digital Image Correlation (DIC) image analysis method. The testing FRP plates were mounted in parallel to each other on a steel frame. 50 g of composition C4 explosive was used as a blast loading source and set in the center of the FRP plates. The dynamic behavior of the FRP plate under blast loading were observed by two high-speed video cameras. The set of two high-speed video image sequences were used to analyze the FRP three-dimensional strain distribution by means of DIC method. A point strain profile extracted from the analyzed strain distribution data was compared with a directly observed strain profile using a strain gauge and it was shown that the strain profile under the blast loading by DIC method is quantitatively accurate.
High-speed femtosecond pump-probe spectroscopy with a smart pixel detector array.

PubMed

Bourquin, S; Prasankumar, R P; Kärtner, F X; Fujimoto, J G; Lasser, T; Salathé, R P

2003-09-01

A new femtosecond pump-probe spectroscopy technique is demonstrated that permits the high-speed, parallel acquisition of pump-probe measurements at multiple wavelengths. This is made possible by use of a novel, two-dimensional smart pixel detector array that performs amplitude demodulation in real time on each pixel. This detector array can not only achieve sensitivities comparable with lock-in amplification but also simultaneously performs demodulation of probe transmission signals at multiple wavelengths, thus permitting rapid time- and wavelength-resolved femtosecond pump-probe spectroscopy. Measurements on a thin sample of bulk GaAs are performed across 58 simultaneous wavelengths. Differential probe transmission changes as small as approximately 2 x 10(-4) can be measured over a 5-ps delay scan in only approximately 3 min. This technology can be applied to a wide range of pump-probe measurements in condensed matter, chemistry, and biology.
Processing Device for High-Speed Execution of an Xrisc Computer Program

NASA Technical Reports Server (NTRS)

Ng, Tak-Kwong (Inventor); Mills, Carl S. (Inventor)

2016-01-01

A processing device for high-speed execution of a computer program is provided. A memory module may store one or more computer programs. A sequencer may select one of the computer programs and controls execution of the selected program. A register module may store intermediate values associated with a current calculation set, a set of output values associated with a previous calculation set, and a set of input values associated with a subsequent calculation set. An external interface may receive the set of input values from a computing device and provides the set of output values to the computing device. A computation interface may provide a set of operands for computation during processing of the current calculation set. The set of input values are loaded into the register and the set of output values are unloaded from the register in parallel with processing of the current calculation set.
Ion selectivity of the Vibrio alginolyticus flagellar motor.

PubMed Central

Liu, J Z; Dapice, M; Khan, S

1990-01-01

The marine bacterium, Vibrio alginolyticus, normally requires sodium for motility. We found that lithium will substitute for sodium. In neutral pH buffers, the membrane potential and swimming speed of glycolyzing bacteria reached maximal values as sodium or lithium concentration was increased. While the maximal potentials obtained in the two cations were comparable, the maximal swimming speed was substantially lower in lithium. Over a wide range of sodium concentration, the bacteria maintained an invariant sodium electrochemical potential as determined by membrane potential and intracellular sodium measurements. Over this range the increase of swimming speed took Michaelis-Menten form. Artificial energization of swimming motility required imposition of a voltage difference in concert with a sodium pulse. The cation selectivity and concentration dependence exhibited by the motile apparatus depended on the viscosity of the medium. In high-viscosity media, swimming speeds were relatively independent of either ion type or concentration. These facts parallel and extend observations of the swimming behavior of bacteria propelled by proton-powered flagella. In particular, they show that ion transfers limit unloaded motor speed in this bacterium and imply that the coupling between ion transfers and force generation must be fairly tight. PMID:2394685
Method of multi-mode vibration control for the carbody of high-speed electric multiple unit trains

NASA Astrophysics Data System (ADS)

Gong, Dao; Zhou, Jinsong; Sun, Wenjing; Sun, Yu; Xia, Zhanghui

2017-11-01

A method of multi-mode vibration control for the carbody of high-speed electric multiple unit (EMU) trains by using the onboard and suspended equipments as dynamic vibration absorbers (DVAs) is proposed. The effect of the multi-mode vibration on the ride quality of a high-speed EMU train was studied, and the target modes of vibration control were determined. An equivalent mass identification method was used to determine the equivalent mass for the target modes at the device installation positions. To optimize the vibration acceleration response of the carbody, the natural frequencies and damping ratios of the lateral and vertical vibration were designed based on the theory of dynamic vibration absorption. In order to realize the optimized design values of the natural frequencies for the lateral and vertical vibrations simultaneously, a new type of vibration absorber was designed in which a belleville spring and conventional rubber parts are connected in parallel. This design utilizes the negative stiffness of the belleville spring. Results show that, as compared to rigid equipment connections, the proposed method effectively reduces the multi-mode vibration of a carbody in a high-speed EMU train, thereby achieving the control objectives. The ride quality in terms of the lateral and vertical vibration of the carbody is considerably improved. Moreover, the optimal value of the damping ratio is effective in dissipating the vibration energy, which reduces the vibration of both the carbody and the equipment.
Multilevel Parallelization of AutoDock 4.2.

PubMed

Norgan, Andrew P; Coffman, Paul K; Kocher, Jean-Pierre A; Katzmann, David J; Sosa, Carlos P

2011-04-28

Virtual (computational) screening is an increasingly important tool for drug discovery. AutoDock is a popular open-source application for performing molecular docking, the prediction of ligand-receptor interactions. AutoDock is a serial application, though several previous efforts have parallelized various aspects of the program. In this paper, we report on a multi-level parallelization of AutoDock 4.2 (mpAD4). Using MPI and OpenMP, AutoDock 4.2 was parallelized for use on MPI-enabled systems and to multithread the execution of individual docking jobs. In addition, code was implemented to reduce input/output (I/O) traffic by reusing grid maps at each node from docking to docking. Performance of mpAD4 was examined on two multiprocessor computers. Using MPI with OpenMP multithreading, mpAD4 scales with near linearity on the multiprocessor systems tested. In situations where I/O is limiting, reuse of grid maps reduces both system I/O and overall screening time. Multithreading of AutoDock's Lamarkian Genetic Algorithm with OpenMP increases the speed of execution of individual docking jobs, and when combined with MPI parallelization can significantly reduce the execution time of virtual screens. This work is significant in that mpAD4 speeds the execution of certain molecular docking workloads and allows the user to optimize the degree of system-level (MPI) and node-level (OpenMP) parallelization to best fit both workloads and computational resources.
High Speed FETs Fabricated in GaAs/AlGaAs Layered Structures Prepared by Molecular Beam Epitaxy.

DTIC Science & Technology

1984-01-01

but proper measures, such as improved ohmic con - tacts, metal conductors and small geometrics are useful. In digital circuit applications in addition to...heterointerface encounter reduced scattering by ionized donors located in AlGaAs layer, the current con - ducting channel must be parallel to the...ments apply to the velocity saturated MOSFET as well. For the MESFET, in con - trast, the transconductance increases with increasing gate biases, since
Neural Network Control of a Parallel Hybrid-Electric Propulsion System for a Small Unmanned Aerial Vehicle

DTIC Science & Technology

2005-01-01

Eppler , or Selig airfoil [147, 148] to be used. Other high lift wings could be used such as the low Reynolds number NASA LRN-I-1010 airfoil used in...Fraser, Airfoils at Low Speeds. Virginia Beach, VA: H.A. Stokely, 1989. [148] R. Eppler , Airfoil Design and Data. Berlin, Germany: Springer-Verlag... 61 Figure 3-4: RMS Error for CMAC Approximation (L=3) .......................................... 61 Figure 3-5: CMAC
FIR Filter of DS-CDMA UWB Modem Transmitter

NASA Astrophysics Data System (ADS)

Kang, Kyu-Min; Cho, Sang-In; Won, Hui-Chul; Choi, Sang-Sung

This letter presents low-complexity digital pulse shaping filter structures of a direct sequence code division multiple access (DS-CDMA) ultra wide-band (UWB) modem transmitter with a ternary spreading code. The proposed finite impulse response (FIR) filter structures using a look-up table (LUT) have the effect of saving the amount of memory by about 50% to 80% in comparison to the conventional FIR filter structures, and consequently are suitable for a high-speed parallel data process.
Innovative Growth and Defect Analysis of Group III - Nitrides for High Speed Electronics

DTIC Science & Technology

2008-02-29

nitrides have optical transitions from the infrared into the ultra violet and are used for light generation with a luminous flux of approximately 100...exist below the detection limit of X- Ray Diffraction (XRD). It has been shown, that metal clusters could cause resonance in the infrared and effect the...plasmonic (Mie) resonances and the specific interband absorption between the parallel bands in metallic indium [Har66]; the latter starts from 0.6
Acceleration of integral imaging based incoherent Fourier hologram capture using graphic processing unit.

PubMed

Jeong, Kyeong-Min; Kim, Hee-Seung; Hong, Sung-In; Lee, Sung-Keun; Jo, Na-Young; Kim, Yong-Soo; Lim, Hong-Gi; Park, Jae-Hyeung

2012-10-08

Speed enhancement of integral imaging based incoherent Fourier hologram capture using a graphic processing unit is reported. Integral imaging based method enables exact hologram capture of real-existing three-dimensional objects under regular incoherent illumination. In our implementation, we apply parallel computation scheme using the graphic processing unit, accelerating the processing speed. Using enhanced speed of hologram capture, we also implement a pseudo real-time hologram capture and optical reconstruction system. The overall operation speed is measured to be 1 frame per second.
A parallel and modular deformable cell Car-Parrinello code

NASA Astrophysics Data System (ADS)

Cavazzoni, Carlo; Chiarotti, Guido L.

1999-12-01

We have developed a modular parallel code implementing the Car-Parrinello [Phys. Rev. Lett. 55 (1985) 2471] algorithm including the variable cell dynamics [Europhys. Lett. 36 (1994) 345; J. Phys. Chem. Solids 56 (1995) 510]. Our code is written in Fortran 90, and makes use of some new programming concepts like encapsulation, data abstraction and data hiding. The code has a multi-layer hierarchical structure with tree like dependences among modules. The modules include not only the variables but also the methods acting on them, in an object oriented fashion. The modular structure allows easier code maintenance, develop and debugging procedures, and is suitable for a developer team. The layer structure permits high portability. The code displays an almost linear speed-up in a wide range of number of processors independently of the architecture. Super-linear speed up is obtained with a "smart" Fast Fourier Transform (FFT) that uses the available memory on the single node (increasing for a fixed problem with the number of processing elements) as temporary buffer to store wave function transforms. This code has been used to simulate water and ammonia at giant planet conditions for systems as large as 64 molecules for ˜50 ps.
On the generation of double layers from ion- and electron-acoustic instabilities

DOE PAGES

Fu, Xiangrong; Cowee, Misa M.; Gary, Stephen Peter; ...

2016-03-17

A plasma double layer (DL) is a nonlinear electrostatic structure that carries a uni-polar electric field parallel to the background magnetic field due to local charge separation. Past studies showed that DLs observed in space plasmas are mostly associated with the ion acoustic instability. Recent Van Allen Probes observations of parallel electric fields traveling much faster than the ion acoustic speed have motivated a computational study to test the hypothesis that a new type of DLs – electron acoustic DLs – generated from the electron acoustic instability are responsible for these electric fields. Nonlinear particle-in-cell simulations yield negative results, i.e.more » the hypothetical electron acoustic DLs cannot be formed in a way similar to ion acoustic DLs. We find that linear theory analysis and the simulations show that the frequencies of electron acoustic waves are too high for ions to respond and maintain charge separation required by DLs. However, our results do show that local density perturbations in a two-electron-component plasma can result in unipolar-like electric fields that propagate at the electron thermal speed, suggesting another potential explanation for the observations.« less
On the generation of double layers from ion- and electron-acoustic instabilities

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fu, Xiangrong, E-mail: xrfu@lanl.gov; Cowee, Misa M.; Winske, Dan

2016-03-15

A plasma double layer (DL) is a nonlinear electrostatic structure that carries a uni-polar electric field parallel to the background magnetic field due to local charge separation. Past studies showed that DLs observed in space plasmas are mostly associated with the ion acoustic instability. Recent Van Allen Probes observations of parallel electric field structures traveling much faster than the ion acoustic speed have motivated a computational study to test the hypothesis that a new type of DLs—electron acoustic DLs—generated from the electron acoustic instability are responsible for these electric fields. Nonlinear particle-in-cell simulations yield negative results, i.e., the hypothetical electronmore » acoustic DLs cannot be formed in a way similar to ion acoustic DLs. Linear theory analysis and the simulations show that the frequencies of electron acoustic waves are too high for ions to respond and maintain charge separation required by DLs. However, our results do show that local density perturbations in a two-electron-component plasma can result in unipolar-like electric field structures that propagate at the electron thermal speed, suggesting another potential explanation for the observations.« less

RAMICS: trainable, high-speed and biologically relevant alignment of high-throughput sequencing reads to coding DNA.

PubMed

Wright, Imogen A; Travers, Simon A

2014-07-01

The challenge presented by high-throughput sequencing necessitates the development of novel tools for accurate alignment of reads to reference sequences. Current approaches focus on using heuristics to map reads quickly to large genomes, rather than generating highly accurate alignments in coding regions. Such approaches are, thus, unsuited for applications such as amplicon-based analysis and the realignment phase of exome sequencing and RNA-seq, where accurate and biologically relevant alignment of coding regions is critical. To facilitate such analyses, we have developed a novel tool, RAMICS, that is tailored to mapping large numbers of sequence reads to short lengths (<10 000 bp) of coding DNA. RAMICS utilizes profile hidden Markov models to discover the open reading frame of each sequence and aligns to the reference sequence in a biologically relevant manner, distinguishing between genuine codon-sized indels and frameshift mutations. This approach facilitates the generation of highly accurate alignments, accounting for the error biases of the sequencing machine used to generate reads, particularly at homopolymer regions. Performance improvements are gained through the use of graphics processing units, which increase the speed of mapping through parallelization. RAMICS substantially outperforms all other mapping approaches tested in terms of alignment quality while maintaining highly competitive speed performance. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
High-speed extended-term time-domain simulation for online cascading analysis of power system

NASA Astrophysics Data System (ADS)

Fu, Chuan

A high-speed extended-term (HSET) time domain simulator (TDS), intended to become a part of an energy management system (EMS), has been newly developed for use in online extended-term dynamic cascading analysis of power systems. HSET-TDS includes the following attributes for providing situational awareness of high-consequence events: (i) online analysis, including n-1 and n-k events, (ii) ability to simulate both fast and slow dynamics for 1-3 hours in advance, (iii) inclusion of rigorous protection-system modeling, (iv) intelligence for corrective action ID, storage, and fast retrieval, and (v) high-speed execution. Very fast on-line computational capability is the most desired attribute of this simulator. Based on the process of solving algebraic differential equations describing the dynamics of power system, HSET-TDS seeks to develop computational efficiency at each of the following hierarchical levels, (i) hardware, (ii) strategies, (iii) integration methods, (iv) nonlinear solvers, and (v) linear solver libraries. This thesis first describes the Hammer-Hollingsworth 4 (HH4) implicit integration method. Like the trapezoidal rule, HH4 is symmetrically A-Stable but it possesses greater high-order precision (h4 ) than the trapezoidal rule. Such precision enables larger integration steps and therefore improves simulation efficiency for variable step size implementations. This thesis provides the underlying theory on which we advocate use of HH4 over other numerical integration methods for power system time-domain simulation. Second, motivated by the need to perform high speed extended-term time domain simulation (HSET-TDS) for on-line purposes, this thesis presents principles for designing numerical solvers of differential algebraic systems associated with power system time-domain simulation, including DAE construction strategies (Direct Solution Method), integration methods(HH4), nonlinear solvers(Very Dishonest Newton), and linear solvers(SuperLU). We have implemented a design appropriate for HSET-TDS, and we compare it to various solvers, including the commercial grade PSSE program, with respect to computational efficiency and accuracy, using as examples the New England 39 bus system, the expanded 8775 bus system, and PJM 13029 buses system. Third, we have explored a stiffness-decoupling method, intended to be part of parallel design of time domain simulation software for super computers. The stiffness-decoupling method is able to combine the advantages of implicit methods (A-stability) and explicit method(less computation). With the new stiffness detection method proposed herein, the stiffness can be captured. The expanded 975 buses system is used to test simulation efficiency. Finally, several parallel strategies for super computer deployment to simulate power system dynamics are proposed and compared. Design A partitions the task via scale with the stiffness decoupling method, waveform relaxation, and parallel linear solver. Design B partitions the task via the time axis using a highly precise integration method, the Kuntzmann-Butcher Method - order 8 (KB8). The strategy of partitioning events is designed to partition the whole simulation via the time axis through a simulated sequence of cascading events. For all strategies proposed, a strategy of partitioning cascading events is recommended, since the sub-tasks for each processor are totally independent, and therefore minimum communication time is needed.
Toward real-time Monte Carlo simulation using a commercial cloud computing infrastructure

NASA Astrophysics Data System (ADS)

Wang, Henry; Ma, Yunzhi; Pratx, Guillem; Xing, Lei

2011-09-01

Monte Carlo (MC) methods are the gold standard for modeling photon and electron transport in a heterogeneous medium; however, their computational cost prohibits their routine use in the clinic. Cloud computing, wherein computing resources are allocated on-demand from a third party, is a new approach for high performance computing and is implemented to perform ultra-fast MC calculation in radiation therapy. We deployed the EGS5 MC package in a commercial cloud environment. Launched from a single local computer with Internet access, a Python script allocates a remote virtual cluster. A handshaking protocol designates master and worker nodes. The EGS5 binaries and the simulation data are initially loaded onto the master node. The simulation is then distributed among independent worker nodes via the message passing interface, and the results aggregated on the local computer for display and data analysis. The described approach is evaluated for pencil beams and broad beams of high-energy electrons and photons. The output of cloud-based MC simulation is identical to that produced by single-threaded implementation. For 1 million electrons, a simulation that takes 2.58 h on a local computer can be executed in 3.3 min on the cloud with 100 nodes, a 47× speed-up. Simulation time scales inversely with the number of parallel nodes. The parallelization overhead is also negligible for large simulations. Cloud computing represents one of the most important recent advances in supercomputing technology and provides a promising platform for substantially improved MC simulation. In addition to the significant speed up, cloud computing builds a layer of abstraction for high performance parallel computing, which may change the way dose calculations are performed and radiation treatment plans are completed. This work was presented in part at the 2010 Annual Meeting of the American Association of Physicists in Medicine (AAPM), Philadelphia, PA.
Parallel 3D Multi-Stage Simulation of a Turbofan Engine

NASA Technical Reports Server (NTRS)

Turner, Mark G.; Topp, David A.

1998-01-01

A 3D multistage simulation of each component of a modern GE Turbofan engine has been made. An axisymmetric view of this engine is presented in the document. This includes a fan, booster rig, high pressure compressor rig, high pressure turbine rig and a low pressure turbine rig. In the near future, all components will be run in a single calculation for a solution of 49 blade rows. The simulation exploits the use of parallel computations by using two levels of parallelism. Each blade row is run in parallel and each blade row grid is decomposed into several domains and run in parallel. 20 processors are used for the 4 blade row analysis. The average passage approach developed by John Adamczyk at NASA Lewis Research Center has been further developed and parallelized. This is APNASA Version A. It is a Navier-Stokes solver using a 4-stage explicit Runge-Kutta time marching scheme with variable time steps and residual smoothing for convergence acceleration. It has an implicit K-E turbulence model which uses an ADI solver to factor the matrix. Between 50 and 100 explicit time steps are solved before a blade row body force is calculated and exchanged with the other blade rows. This outer iteration has been coined a "flip." Efforts have been made to make the solver linearly scaleable with the number of blade rows. Enough flips are run (between 50 and 200) so the solution in the entire machine is not changing. The K-E equations are generally solved every other explicit time step. One of the key requirements in the development of the parallel code was to make the parallel solution exactly (bit for bit) match the serial solution. This has helped isolate many small parallel bugs and guarantee the parallelization was done correctly. The domain decomposition is done only in the axial direction since the number of points axially is much larger than the other two directions. This code uses MPI for message passing. The parallel speed up of the solver portion (no 1/0 or body force calculation) for a grid which has 227 points axially.
Method and apparatus for combinatorial logic signal processor in a digitally based high speed x-ray spectrometer

DOEpatents

Warburton, William K.; Zhou, Zhiquing

1999-01-01

A high speed, digitally based, signal processing system which accepts a digitized input signal and detects the presence of step-like pulses in the this data stream, extracts filtered estimates of their amplitudes, inspects for pulse pileup, and records input pulse rates and system livetime. The system has two parallel processing channels: a slow channel, which filters the data stream with a long time constant trapezoidal filter for good energy resolution; and a fast channel which filters the data stream with a short time constant trapezoidal filter, detects pulses, inspects for pileups, and captures peak values from the slow channel for good events. The presence of a simple digital interface allows the system to be easily integrated with a digital processor to produce accurate spectra at high count rates and allow all spectrometer functions to be fully automated. Because the method is digitally based, it allows pulses to be binned based on time related values, as well as on their amplitudes, if desired.
Design and Verification of Remote Sensing Image Data Center Storage Architecture Based on Hadoop

NASA Astrophysics Data System (ADS)

Tang, D.; Zhou, X.; Jing, Y.; Cong, W.; Li, C.

2018-04-01

The data center is a new concept of data processing and application proposed in recent years. It is a new method of processing technologies based on data, parallel computing, and compatibility with different hardware clusters. While optimizing the data storage management structure, it fully utilizes cluster resource computing nodes and improves the efficiency of data parallel application. This paper used mature Hadoop technology to build a large-scale distributed image management architecture for remote sensing imagery. Using MapReduce parallel processing technology, it called many computing nodes to process image storage blocks and pyramids in the background to improve the efficiency of image reading and application and sovled the need for concurrent multi-user high-speed access to remotely sensed data. It verified the rationality, reliability and superiority of the system design by testing the storage efficiency of different image data and multi-users and analyzing the distributed storage architecture to improve the application efficiency of remote sensing images through building an actual Hadoop service system.
Design of k-Space Channel Combination Kernels and Integration with Parallel Imaging

PubMed Central

Beatty, Philip J.; Chang, Shaorong; Holmes, James H.; Wang, Kang; Brau, Anja C. S.; Reeder, Scott B.; Brittain, Jean H.

2014-01-01

Purpose In this work, a new method is described for producing local k-space channel combination kernels using a small amount of low-resolution multichannel calibration data. Additionally, this work describes how these channel combination kernels can be combined with local k-space unaliasing kernels produced by the calibration phase of parallel imaging methods such as GRAPPA, PARS and ARC. Methods Experiments were conducted to evaluate both the image quality and computational efficiency of the proposed method compared to a channel-by-channel parallel imaging approach with image-space sum-of-squares channel combination. Results Results indicate comparable image quality overall, with some very minor differences seen in reduced field-of-view imaging. It was demonstrated that this method enables a speed up in computation time on the order of 3–16X for 32-channel data sets. Conclusion The proposed method enables high quality channel combination to occur earlier in the reconstruction pipeline, reducing computational and memory requirements for image reconstruction. PMID:23943602
Array processor architecture

NASA Technical Reports Server (NTRS)

Barnes, George H. (Inventor); Lundstrom, Stephen F. (Inventor); Shafer, Philip E. (Inventor)

1983-01-01

A high speed parallel array data processing architecture fashioned under a computational envelope approach includes a data base memory for secondary storage of programs and data, and a plurality of memory modules interconnected to a plurality of processing modules by a connection network of the Omega gender. Programs and data are fed from the data base memory to the plurality of memory modules and from hence the programs are fed through the connection network to the array of processors (one copy of each program for each processor). Execution of the programs occur with the processors operating normally quite independently of each other in a multiprocessing fashion. For data dependent operations and other suitable operations, all processors are instructed to finish one given task or program branch before all are instructed to proceed in parallel processing fashion on the next instruction. Even when functioning in the parallel processing mode however, the processors are not locked-step but execute their own copy of the program individually unless or until another overall processor array synchronization instruction is issued.
Binary tree eigen solver in finite element analysis

NASA Technical Reports Server (NTRS)

Akl, F. A.; Janetzke, D. C.; Kiraly, L. J.

1993-01-01

This paper presents a transputer-based binary tree eigensolver for the solution of the generalized eigenproblem in linear elastic finite element analysis. The algorithm is based on the method of recursive doubling, which parallel implementation of a number of associative operations on an arbitrary set having N elements is of the order of o(log2N), compared to (N-1) steps if implemented sequentially. The hardware used in the implementation of the binary tree consists of 32 transputers. The algorithm is written in OCCAM which is a high-level language developed with the transputers to address parallel programming constructs and to provide the communications between processors. The algorithm can be replicated to match the size of the binary tree transputer network. Parallel and sequential finite element analysis programs have been developed to solve for the set of the least-order eigenpairs using the modified subspace method. The speed-up obtained for a typical analysis problem indicates close agreement with the theoretical prediction given by the method of recursive doubling.
Brian hears: online auditory processing using vectorization over channels.

PubMed

Fontaine, Bertrand; Goodman, Dan F M; Benichoux, Victor; Brette, Romain

2011-01-01

The human cochlea includes about 3000 inner hair cells which filter sounds at frequencies between 20 Hz and 20 kHz. This massively parallel frequency analysis is reflected in models of auditory processing, which are often based on banks of filters. However, existing implementations do not exploit this parallelism. Here we propose algorithms to simulate these models by vectorizing computation over frequency channels, which are implemented in "Brian Hears," a library for the spiking neural network simulator package "Brian." This approach allows us to use high-level programming languages such as Python, because with vectorized operations, the computational cost of interpretation represents a small fraction of the total cost. This makes it possible to define and simulate complex models in a simple way, while all previous implementations were model-specific. In addition, we show that these algorithms can be naturally parallelized using graphics processing units, yielding substantial speed improvements. We demonstrate these algorithms with several state-of-the-art cochlear models, and show that they compare favorably with existing, less flexible, implementations.
Parallel digital modem using multirate digital filter banks

NASA Technical Reports Server (NTRS)

Sadr, Ramin; Vaidyanathan, P. P.; Raphaeli, Dan; Hinedi, Sami

1994-01-01

A new class of architectures for an all-digital modem is presented in this report. This architecture, referred to as the parallel receiver (PRX), is based on employing multirate digital filter banks (DFB's) to demodulate, track, and detect the received symbol stream. The resulting architecture is derived, and specifications are outlined for designing the DFB for the PRX. The key feature of this approach is a lower processing rate then either the Nyquist rate or the symbol rate, without any degradation in the symbol error rate. Due to the freedom in choosing the processing rate, the designer is able to arbitrarily select and use digital components, independent of the speed of the integrated circuit technology. PRX architecture is particularly suited for high data rate applications, and due to the modular structure of the parallel signal path, expansion to even higher data rates is accommodated with each. Applications of the PRX would include gigabit satellite channels, multiple spacecraft, optical links, interactive cable-TV, telemedicine, code division multiple access (CDMA) communications, and others.
A simple way to higher speed atomic force microscopy by retrofitting with a novel high-speed flexure-guided scanner

NASA Astrophysics Data System (ADS)

Ouma Alunda, Bernard; Lee, Yong Joong; Park, Soyeun

2018-06-01

A typical line-scan rate for a commercial atomic force microscope (AFM) is about 1 Hz. At such a rate, more than four minutes of scanning time is required to obtain an image of 256 × 256 pixels. Despite control electronics of most commercial AFMs permit faster scan rates, default piezoelectric X–Y scanners limit the overall speed of the system. This is a direct consequence of manufacturers choosing a large scan range over the maximum operating speed for a X–Y scanner. Although some AFM manufacturers offer reduced-scan area scanners as an option, the speed improvement is not significant because such scanners do not have large enough reduction in the scan range and are mainly targeted to reducing the overall cost of the AFM systems. In this article, we present a simple parallel-kinematic substitute scanner for a commercial atomic force microscope to afford a higher scanning speed with no other hardware or software upgrade to the original system. Although the scan area reduction is unavoidable, our modified commercial XE-70 AFM from Park Systems has achieved a line scan rate of over 50 Hz, more than 10 times faster than the original, unmodified system. Our flexure-guided X–Y scanner can be a simple drop-in replacement option for enhancing the speed of various aging atomic force microscopes.
Synchronization Of Parallel Discrete Event Simulations

NASA Technical Reports Server (NTRS)

Steinman, Jeffrey S.

1992-01-01

Adaptive, parallel, discrete-event-simulation-synchronization algorithm, Breathing Time Buckets, developed in Synchronous Parallel Environment for Emulation and Discrete Event Simulation (SPEEDES) operating system. Algorithm allows parallel simulations to process events optimistically in fluctuating time cycles that naturally adapt while simulation in progress. Combines best of optimistic and conservative synchronization strategies while avoiding major disadvantages. Algorithm processes events optimistically in time cycles adapting while simulation in progress. Well suited for modeling communication networks, for large-scale war games, for simulated flights of aircraft, for simulations of computer equipment, for mathematical modeling, for interactive engineering simulations, and for depictions of flows of information.
Implementation of High Time Delay Accuracy of Ultrasonic Phased Array Based on Interpolation CIC Filter

PubMed Central

Liu, Peilu; Li, Xinghua; Li, Haopeng; Su, Zhikun; Zhang, Hongxu

2017-01-01

In order to improve the accuracy of ultrasonic phased array focusing time delay, analyzing the original interpolation Cascade-Integrator-Comb (CIC) filter, an 8× interpolation CIC filter parallel algorithm was proposed, so that interpolation and multichannel decomposition can simultaneously process. Moreover, we summarized the general formula of arbitrary multiple interpolation CIC filter parallel algorithm and established an ultrasonic phased array focusing time delay system based on 8× interpolation CIC filter parallel algorithm. Improving the algorithmic structure, 12.5% of addition and 29.2% of multiplication was reduced, meanwhile the speed of computation is still very fast. Considering the existing problems of the CIC filter, we compensated the CIC filter; the compensated CIC filter’s pass band is flatter, the transition band becomes steep, and the stop band attenuation increases. Finally, we verified the feasibility of this algorithm on Field Programming Gate Array (FPGA). In the case of system clock is 125 MHz, after 8× interpolation filtering and decomposition, time delay accuracy of the defect echo becomes 1 ns. Simulation and experimental results both show that the algorithm we proposed has strong feasibility. Because of the fast calculation, small computational amount and high resolution, this algorithm is especially suitable for applications with high time delay accuracy and fast detection. PMID:29023385
Implementation of High Time Delay Accuracy of Ultrasonic Phased Array Based on Interpolation CIC Filter.

PubMed

Liu, Peilu; Li, Xinghua; Li, Haopeng; Su, Zhikun; Zhang, Hongxu

2017-10-12

In order to improve the accuracy of ultrasonic phased array focusing time delay, analyzing the original interpolation Cascade-Integrator-Comb (CIC) filter, an 8× interpolation CIC filter parallel algorithm was proposed, so that interpolation and multichannel decomposition can simultaneously process. Moreover, we summarized the general formula of arbitrary multiple interpolation CIC filter parallel algorithm and established an ultrasonic phased array focusing time delay system based on 8× interpolation CIC filter parallel algorithm. Improving the algorithmic structure, 12.5% of addition and 29.2% of multiplication was reduced, meanwhile the speed of computation is still very fast. Considering the existing problems of the CIC filter, we compensated the CIC filter; the compensated CIC filter's pass band is flatter, the transition band becomes steep, and the stop band attenuation increases. Finally, we verified the feasibility of this algorithm on Field Programming Gate Array (FPGA). In the case of system clock is 125 MHz, after 8× interpolation filtering and decomposition, time delay accuracy of the defect echo becomes 1 ns. Simulation and experimental results both show that the algorithm we proposed has strong feasibility. Because of the fast calculation, small computational amount and high resolution, this algorithm is especially suitable for applications with high time delay accuracy and fast detection.
Eigensolution of finite element problems in a completely connected parallel architecture

NASA Technical Reports Server (NTRS)

Akl, Fred A.; Morel, Michael R.

1989-01-01

A parallel algorithm for the solution of the generalized eigenproblem in linear elastic finite element analysis, (K)(phi)=(M)(phi)(omega), where (K) and (M) are of order N, and (omega) is of order q is presented. The parallel algorithm is based on a completely connected parallel architecture in which each processor is allowed to communicate with all other processors. The algorithm has been successfully implemented on a tightly coupled multiple-instruction-multiple-data (MIMD) parallel processing computer, Cray X-MP. A finite element model is divided into m domains each of which is assumed to process n elements. Each domain is then assigned to a processor, or to a logical processor (task) if the number of domains exceeds the number of physical processors. The macro-tasking library routines are used in mapping each domain to a user task. Computational speed-up and efficiency are used to determine the effectiveness of the algorithm. The effect of the number of domains, the number of degrees-of-freedom located along the global fronts and the dimension of the subspace on the performance of the algorithm are investigated. For a 64-element rectangular plate, speed-ups of 1.86, 3.13, 3.18 and 3.61 are achieved on two, four, six and eight processors, respectively.
Implementation of molecular dynamics and its extensions with the coarse-grained UNRES force field on massively parallel systems; towards millisecond-scale simulations of protein structure, dynamics, and thermodynamics

PubMed Central

Liwo, Adam; Ołdziej, Stanisław; Czaplewski, Cezary; Kleinerman, Dana S.; Blood, Philip; Scheraga, Harold A.

2010-01-01

We report the implementation of our united-residue UNRES force field for simulations of protein structure and dynamics with massively parallel architectures. In addition to coarse-grained parallelism already implemented in our previous work, in which each conformation was treated by a different task, we introduce a fine-grained level in which energy and gradient evaluation are split between several tasks. The Message Passing Interface (MPI) libraries have been utilized to construct the parallel code. The parallel performance of the code has been tested on a professional Beowulf cluster (Xeon Quad Core), a Cray XT3 supercomputer, and two IBM BlueGene/P supercomputers with canonical and replica-exchange molecular dynamics. With IBM BlueGene/P, about 50 % efficiency and 120-fold speed-up of the fine-grained part was achieved for a single trajectory of a 767-residue protein with use of 256 processors/trajectory. Because of averaging over the fast degrees of freedom, UNRES provides an effective 1000-fold speed-up compared to the experimental time scale and, therefore, enables us to effectively carry out millisecond-scale simulations of proteins with 500 and more amino-acid residues in days of wall-clock time. PMID:20305729
A high speed buffer for LV data acquisition

NASA Technical Reports Server (NTRS)

Cavone, Angelo A.; Sterlina, Patrick S.; Clemmons, James I., Jr.; Meyers, James F.

1987-01-01

The laser velocimeter (autocovariance) buffer interface is a data acquisition subsystem designed specifically for the acquisition of data from a laser velocimeter. The subsystem acquires data from up to six laser velocimeter components in parallel, measures the times between successive data points for each of the components, establishes and maintains a coincident condition between any two or three components, and acquires data from other instrumentation systems simultaneously with the laser velocimeter data points. The subsystem is designed to control the entire data acquisition process based on initial setup parameters obtained from a host computer and to be independent of the computer during the acquisition. On completion of the acquisition cycle, the interface transfers the contents of its memory to the host under direction of the host via a single 16-bit parallel DMA channel.
Vector processing efficiency of plasma MHD codes by use of the FACOM 230-75 APU

NASA Astrophysics Data System (ADS)

Matsuura, T.; Tanaka, Y.; Naraoka, K.; Takizuka, T.; Tsunematsu, T.; Tokuda, S.; Azumi, M.; Kurita, G.; Takeda, T.

1982-06-01

In the framework of pipelined vector architecture, the efficiency of vector processing is assessed with respect to plasma MHD codes in nuclear fusion research. By using a vector processor, the FACOM 230-75 APU, the limit of the enhancement factor due to parallelism of current vector machines is examined for three numerical codes based on a fluid model. Reasonable speed-up factors of approximately 6,6 and 4 times faster than the highly optimized scalar version are obtained for ERATO (linear stability code), AEOLUS-R1 (nonlinear stability code) and APOLLO (1-1/2D transport code), respectively. Problems of the pipelined vector processors are discussed from the viewpoint of restructuring, optimization and choice of algorithms. In conclusion, the important concept of "concurrency within pipelined parallelism" is emphasized.
A privacy-preserving parallel and homomorphic encryption scheme

NASA Astrophysics Data System (ADS)

Min, Zhaoe; Yang, Geng; Shi, Jingqi

2017-04-01

In order to protect data privacy whilst allowing efficient access to data in multi-nodes cloud environments, a parallel homomorphic encryption (PHE) scheme is proposed based on the additive homomorphism of the Paillier encryption algorithm. In this paper we propose a PHE algorithm, in which plaintext is divided into several blocks and blocks are encrypted with a parallel mode. Experiment results demonstrate that the encryption algorithm can reach a speed-up ratio at about 7.1 in the MapReduce environment with 16 cores and 4 nodes.

Optimization of the coherence function estimation for multi-core central processing unit

NASA Astrophysics Data System (ADS)

Cheremnov, A. G.; Faerman, V. A.; Avramchuk, V. S.

2017-02-01

The paper considers use of parallel processing on multi-core central processing unit for optimization of the coherence function evaluation arising in digital signal processing. Coherence function along with other methods of spectral analysis is commonly used for vibration diagnosis of rotating machinery and its particular nodes. An algorithm is given for the function evaluation for signals represented with digital samples. The algorithm is analyzed for its software implementation and computational problems. Optimization measures are described, including algorithmic, architecture and compiler optimization, their results are assessed for multi-core processors from different manufacturers. Thus, speeding-up of the parallel execution with respect to sequential execution was studied and results are presented for Intel Core i7-4720HQ и AMD FX-9590 processors. The results show comparatively high efficiency of the optimization measures taken. In particular, acceleration indicators and average CPU utilization have been significantly improved, showing high degree of parallelism of the constructed calculating functions. The developed software underwent state registration and will be used as a part of a software and hardware solution for rotating machinery fault diagnosis and pipeline leak location with acoustic correlation method.
A 4MP high-dynamic-range, low-noise CMOS image sensor

NASA Astrophysics Data System (ADS)

Ma, Cheng; Liu, Yang; Li, Jing; Zhou, Quan; Chang, Yuchun; Wang, Xinyang

2015-03-01

In this paper we present a 4 Megapixel high dynamic range, low dark noise and dark current CMOS image sensor, which is ideal for high-end scientific and surveillance applications. The pixel design is based on a 4-T PPD structure. During the readout of the pixel array, signals are first amplified, and then feed to a low- power column-parallel ADC array which is already presented in [1]. Measurement results show that the sensor achieves a dynamic range of 96dB, a dark noise of 1.47e- at 24fps speed. The dark current is 0.15e-/pixel/s at -20oC.
Time lens assisted photonic sampling extraction

NASA Astrophysics Data System (ADS)

Petrillo, Keith Gordon

Telecommunication bandwidth demands have dramatically increased in recent years due to Internet based services like cloud computing and storage, large file sharing, and video streaming. Additionally, sensing systems such as wideband radar, magnetic imaging resonance systems, and complex modulation formats to handle large data transfer in telecommunications require high speed, high resolution analog-to-digital converters (ADCs) to interpret the data. Accurately processing and acquiring the information at next generation data rates from these systems has become challenging for electronic systems. The largest contributors to the electronic bottleneck are bandwidth and timing jitter which limit speed and reduce accuracy. Optical systems have shown to have at least three orders of magnitude increase in bandwidth capabilities and state of the art mode locked lasers have reduced timing jitters into thousands of attoseconds. Such features have encouraged processing signals without the use of electronics or using photonics to assist electronics. All optical signal processing has allowed the processing of telecommunication line rates up to 1.28 Tb/s and high resolution analog-to-digital converters in the 10s of gigahertz. The major drawback to these optical systems is the high cost of the components. The application of all optical processing techniques such as a time lens and chirped processing can greatly reduce bandwidth and cost requirements of optical serial to parallel converters and push photonically assisted ADCs into the 100s of gigahertz. In this dissertation, the building blocks to a high speed photonically assisted ADC are demonstrated, each providing benefits to its own respective application. A serial to parallel converter using a continuously operating time lens as an optical Fourier processor is demonstrated to fully convert a 160-Gb/s optical time division multiplexed signal to 16 10-Gb/s channels with error free operation. Using chirped processing, an optical sample and hold concept is demonstrated and analyzed as a resolution improvement to existing photonically assisted ADCs. Simulations indicate that the application of a continuously operating time lens to a photonically assisted sampling system can increase photonically sampled systems by an order of magnitude while acquiring properties similar to an optical sample and hold system.
Magnetosphere simulations with a high-performance 3D AMR MHD Code

NASA Astrophysics Data System (ADS)

Gombosi, Tamas; Dezeeuw, Darren; Groth, Clinton; Powell, Kenneth; Song, Paul

1998-11-01

BATS-R-US is a high-performance 3D AMR MHD code for space physics applications running on massively parallel supercomputers. In BATS-R-US the electromagnetic and fluid equations are solved with a high-resolution upwind numerical scheme in a tightly coupled manner. The code is very robust and it is capable of spanning a wide range of plasma parameters (such as β, acoustic and Alfvénic Mach numbers). Our code is highly scalable: it achieved a sustained performance of 233 GFLOPS on a Cray T3E-1200 supercomputer with 1024 PEs. This talk reports results from the BATS-R-US code for the GGCM (Geospace General Circularculation Model) Phase 1 Standard Model Suite. This model suite contains 10 different steady-state configurations: 5 IMF clock angles (north, south, and three equally spaced angles in- between) with 2 IMF field strengths for each angle (5 nT and 10 nT). The other parameters are: solar wind speed =400 km/sec; solar wind number density = 5 protons/cc; Hall conductance = 0; Pedersen conductance = 5 S; parallel conductivity = ∞.
Locally adaptive parallel temperature accelerated dynamics method

NASA Astrophysics Data System (ADS)

Shim, Yunsic; Amar, Jacques G.

2010-03-01

The recently-developed temperature-accelerated dynamics (TAD) method [M. Sørensen and A.F. Voter, J. Chem. Phys. 112, 9599 (2000)] along with the more recently developed parallel TAD (parTAD) method [Y. Shim et al, Phys. Rev. B 76, 205439 (2007)] allow one to carry out non-equilibrium simulations over extended time and length scales. The basic idea behind TAD is to speed up transitions by carrying out a high-temperature MD simulation and then use the resulting information to obtain event times at the desired low temperature. In a typical implementation, a fixed high temperature Thigh is used. However, in general one expects that for each configuration there exists an optimal value of Thigh which depends on the particular transition pathways and activation energies for that configuration. Here we present a locally adaptive high-temperature TAD method in which instead of using a fixed Thigh the high temperature is dynamically adjusted in order to maximize simulation efficiency. Preliminary results of the performance obtained from parTAD simulations of Cu/Cu(100) growth using the locally adaptive Thigh method will also be presented.
Operating characteristics of superconducting fault current limiter using 24kV vacuum interrupter driven by electromagnetic repulsion switch

NASA Astrophysics Data System (ADS)

Endo, M.; Hori, T.; Koyama, K.; Yamaguchi, I.; Arai, K.; Kaiho, K.; Yanabu, S.

2008-02-01

Using a high temperature superconductor, we constructed and tested a model Superconducting Fault Current Limiter (SFCL). SFCL which has a vacuum interrupter with electromagnetic repulsion mechanism. We set out to construct high voltage class SFCL. We produced the electromagnetic repulsion switch equipped with a 24kV vacuum interrupter(VI). There are problems that opening speed becomes late. Because the larger vacuum interrupter the heavier weight of its contact. For this reason, the current which flows in a superconductor may be unable to be interrupted within a half cycles of current. In order to solve this problem, it is necessary to change the design of the coil connected in parallel and to strengthen the electromagnetic repulsion force at the time of opening the vacuum interrupter. Then, the design of the coil was changed, and in order to examine whether the problem is solvable, the current limiting test was conducted. We examined current limiting test using 4 series and 2 parallel-connected YBCO thin films. We used 12-centimeter-long YBCO thin film. The parallel resistance (0.1Ω) is connected with each YBCO thin film. As a result, we succeed in interrupting the current of superconductor within a half cycle of it. Furthermore, series and parallel-connected YBCO thin film could limit without failure.
Smoldyn on graphics processing units: massively parallel Brownian dynamics simulations.

PubMed

Dematté, Lorenzo

2012-01-01

Space is a very important aspect in the simulation of biochemical systems; recently, the need for simulation algorithms able to cope with space is becoming more and more compelling. Complex and detailed models of biochemical systems need to deal with the movement of single molecules and particles, taking into consideration localized fluctuations, transportation phenomena, and diffusion. A common drawback of spatial models lies in their complexity: models can become very large, and their simulation could be time consuming, especially if we want to capture the systems behavior in a reliable way using stochastic methods in conjunction with a high spatial resolution. In order to deliver the promise done by systems biology to be able to understand a system as whole, we need to scale up the size of models we are able to simulate, moving from sequential to parallel simulation algorithms. In this paper, we analyze Smoldyn, a widely diffused algorithm for stochastic simulation of chemical reactions with spatial resolution and single molecule detail, and we propose an alternative, innovative implementation that exploits the parallelism of Graphics Processing Units (GPUs). The implementation executes the most computational demanding steps (computation of diffusion, unimolecular, and bimolecular reaction, as well as the most common cases of molecule-surface interaction) on the GPU, computing them in parallel on each molecule of the system. The implementation offers good speed-ups and real time, high quality graphics output
Hypergraph partitioning implementation for parallelizing matrix-vector multiplication using CUDA GPU-based parallel computing

NASA Astrophysics Data System (ADS)

Murni, Bustamam, A.; Ernastuti, Handhika, T.; Kerami, D.

2017-07-01

Calculation of the matrix-vector multiplication in the real-world problems often involves large matrix with arbitrary size. Therefore, parallelization is needed to speed up the calculation process that usually takes a long time. Graph partitioning techniques that have been discussed in the previous studies cannot be used to complete the parallelized calculation of matrix-vector multiplication with arbitrary size. This is due to the assumption of graph partitioning techniques that can only solve the square and symmetric matrix. Hypergraph partitioning techniques will overcome the shortcomings of the graph partitioning technique. This paper addresses the efficient parallelization of matrix-vector multiplication through hypergraph partitioning techniques using CUDA GPU-based parallel computing. CUDA (compute unified device architecture) is a parallel computing platform and programming model that was created by NVIDIA and implemented by the GPU (graphics processing unit).
Three axis electronic flight motion simulator real time control system design and implementation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gao, Zhiyuan; Miao, Zhonghua, E-mail: zhonghua-miao@163.com; Wang, Xiaohua

2014-12-15

A three axis electronic flight motion simulator is reported in this paper including the modelling, the controller design as well as the hardware implementation. This flight motion simulator could be used for inertial navigation test and high precision inertial navigation system with good dynamic and static performances. A real time control system is designed, several control system implementation problems were solved including time unification with parallel port interrupt, high speed finding-zero method of rotary inductosyn, zero-crossing management with continuous rotary, etc. Tests were carried out to show the effectiveness of the proposed real time control system.
Three axis electronic flight motion simulator real time control system design and implementation.

PubMed

Gao, Zhiyuan; Miao, Zhonghua; Wang, Xuyong; Wang, Xiaohua

2014-12-01

A three axis electronic flight motion simulator is reported in this paper including the modelling, the controller design as well as the hardware implementation. This flight motion simulator could be used for inertial navigation test and high precision inertial navigation system with good dynamic and static performances. A real time control system is designed, several control system implementation problems were solved including time unification with parallel port interrupt, high speed finding-zero method of rotary inductosyn, zero-crossing management with continuous rotary, etc. Tests were carried out to show the effectiveness of the proposed real time control system.
Radio Synthesis Imaging - A High Performance Computing and Communications Project

NASA Astrophysics Data System (ADS)

Crutcher, Richard M.

The National Science Foundation has funded a five-year High Performance Computing and Communications project at the National Center for Supercomputing Applications (NCSA) for the direct implementation of several of the computing recommendations of the Astronomy and Astrophysics Survey Committee (the "Bahcall report"). This paper is a summary of the project goals and a progress report. The project will implement a prototype of the next generation of astronomical telescope systems - remotely located telescopes connected by high-speed networks to very high performance, scalable architecture computers and on-line data archives, which are accessed by astronomers over Gbit/sec networks. Specifically, a data link has been installed between the BIMA millimeter-wave synthesis array at Hat Creek, California and NCSA at Urbana, Illinois for real-time transmission of data to NCSA. Data are automatically archived, and may be browsed and retrieved by astronomers using the NCSA Mosaic software. In addition, an on-line digital library of processed images will be established. BIMA data will be processed on a very high performance distributed computing system, with I/O, user interface, and most of the software system running on the NCSA Convex C3880 supercomputer or Silicon Graphics Onyx workstations connected by HiPPI to the high performance, massively parallel Thinking Machines Corporation CM-5. The very computationally intensive algorithms for calibration and imaging of radio synthesis array observations will be optimized for the CM-5 and new algorithms which utilize the massively parallel architecture will be developed. Code running simultaneously on the distributed computers will communicate using the Data Transport Mechanism developed by NCSA. The project will also use the BLANCA Gbit/s testbed network between Urbana and Madison, Wisconsin to connect an Onyx workstation in the University of Wisconsin Astronomy Department to the NCSA CM-5, for development of long-distance distributed computing. Finally, the project is developing 2D and 3D visualization software as part of the international AIPS++ project. This research and development project is being carried out by a team of experts in radio astronomy, algorithm development for massively parallel architectures, high-speed networking, database management, and Thinking Machines Corporation personnel. The development of this complete software, distributed computing, and data archive and library solution to the radio astronomy computing problem will advance our expertise in high performance computing and communications technology and the application of these techniques to astronomical data processing.
AHPCRC (Army High Performance Computing Research Center) Bulletin. Volume 2, Issue 2, 2011

DTIC Science & Technology

2011-01-01

fixed (i.e., no flapping). The simulation was performed at sea level conditions with a pressure of 101 kPa and a density of 1.23 kg/m3. The air speed...Hardening Behavior in Au Nanopillar Microplasticity . IJMCE 5 (3&4) 287–294. (2007) 5. S. J. Plimpton. Fast Parallel Algorithms for Short- Range Molecular...such as crude oil underwa- ter. Scattering is also used for sea floor mapping. For example, communications companies laying underwa- ter fiber optic
A wind-tunnel investigation at high subsonic speeds of the lateral control characteristics of various plain spoiler configurations on a 3-percent-thick 60 degree delta wing

NASA Technical Reports Server (NTRS)

Wiley, Harleth G

1954-01-01

Results are presented of wind-tunnel investigations at Mach numbers of 0.60 to 0.94 and angles of attack of -2 degrees to about 24 degrees to determine the lateral control characteristics of spoilers with various wing chord-wise and spanwise locations and spoiler spans and deflections on thin 60 degree delta wing of NACA 65a003 airfoil section parallel to free stream.
New Methods of Low-Field Magnetic Resonance Imaging for Application to Traumatic Brain Injury

DTIC Science & Technology

2013-02-01

magnet based ), the development of novel high-speed parallel imaging detection systems, and work on advanced adaptive reconstruction methods ...signal many times within the acquisition time . We present here a new method for 3D OMRI based on b-SSFP at a constant field of 6.5 mT that provides up...developing injury-sensitive MRI based on the detection of free radicals associat- ed with injury using the Overhauser effect and subsequently imaging that
Optical RISC computer

NASA Astrophysics Data System (ADS)

Guilfoyle, Peter S.; Stone, Richard V.; Hessenbruch, John M.; Zeise, Frederick F.

1993-07-01

A second generation digital optical computer (DOC II) has been developed which utilizes a RISC based operating system as its host. This 32 bit, high performance (12.8 GByte/sec), computing platform demonstrates a number of basic principals that are inherent to parallel free space optical interconnects such as speed (up to 1012 bit operations per second) and low power 1.2 fJ per bit). Although DOC II is a general purpose machine, special purpose applications have been developed and are currently being evaluated on the optical platform.
Supercomputers Of The Future

NASA Technical Reports Server (NTRS)

Peterson, Victor L.; Kim, John; Holst, Terry L.; Deiwert, George S.; Cooper, David M.; Watson, Andrew B.; Bailey, F. Ron

1992-01-01

Report evaluates supercomputer needs of five key disciplines: turbulence physics, aerodynamics, aerothermodynamics, chemistry, and mathematical modeling of human vision. Predicts these fields will require computer speed greater than 10(Sup 18) floating-point operations per second (FLOP's) and memory capacity greater than 10(Sup 15) words. Also, new parallel computer architectures and new structured numerical methods will make necessary speed and capacity available.
Parallelized Stochastic Cutoff Method for Long-Range Interacting Systems

NASA Astrophysics Data System (ADS)

Endo, Eishin; Toga, Yuta; Sasaki, Munetaka

2015-07-01

We present a method of parallelizing the stochastic cutoff (SCO) method, which is a Monte-Carlo method for long-range interacting systems. After interactions are eliminated by the SCO method, we subdivide a lattice into noninteracting interpenetrating sublattices. This subdivision enables us to parallelize the Monte-Carlo calculation in the SCO method. Such subdivision is found by numerically solving the vertex coloring of a graph created by the SCO method. We use an algorithm proposed by Kuhn and Wattenhofer to solve the vertex coloring by parallel computation. This method was applied to a two-dimensional magnetic dipolar system on an L × L square lattice to examine its parallelization efficiency. The result showed that, in the case of L = 2304, the speed of computation increased about 102 times by parallel computation with 288 processors.
Long-range interactions and parallel scalability in molecular simulations

NASA Astrophysics Data System (ADS)

Patra, Michael; Hyvönen, Marja T.; Falck, Emma; Sabouri-Ghomi, Mohsen; Vattulainen, Ilpo; Karttunen, Mikko

2007-01-01

Typical biomolecular systems such as cellular membranes, DNA, and protein complexes are highly charged. Thus, efficient and accurate treatment of electrostatic interactions is of great importance in computational modeling of such systems. We have employed the GROMACS simulation package to perform extensive benchmarking of different commonly used electrostatic schemes on a range of computer architectures (Pentium-4, IBM Power 4, and Apple/IBM G5) for single processor and parallel performance up to 8 nodes—we have also tested the scalability on four different networks, namely Infiniband, GigaBit Ethernet, Fast Ethernet, and nearly uniform memory architecture, i.e. communication between CPUs is possible by directly reading from or writing to other CPUs' local memory. It turns out that the particle-mesh Ewald method (PME) performs surprisingly well and offers competitive performance unless parallel runs on PC hardware with older network infrastructure are needed. Lipid bilayers of sizes 128, 512 and 2048 lipid molecules were used as the test systems representing typical cases encountered in biomolecular simulations. Our results enable an accurate prediction of computational speed on most current computing systems, both for serial and parallel runs. These results should be helpful in, for example, choosing the most suitable configuration for a small departmental computer cluster.
Implementation of a Parallel Kalman Filter for Stratospheric Chemical Tracer Assimilation

NASA Technical Reports Server (NTRS)

Chang, Lang-Ping; Lyster, Peter M.; Menard, R.; Cohn, S. E.

1998-01-01

A Kalman filter for the assimilation of long-lived atmospheric chemical constituents has been developed for two-dimensional transport models on isentropic surfaces over the globe. An important attribute of the Kalman filter is that it calculates error covariances of the constituent fields using the tracer dynamics. Consequently, the current Kalman-filter assimilation is a five-dimensional problem (coordinates of two points and time), and it can only be handled on computers with large memory and high floating point speed. In this paper, an implementation of the Kalman filter for distributed-memory, message-passing parallel computers is discussed. Two approaches were studied: an operator decomposition and a covariance decomposition. The latter was found to be more scalable than the former, and it possesses the property that the dynamical model does not need to be parallelized, which is of considerable practical advantage. This code is currently used to assimilate constituent data retrieved by limb sounders on the Upper Atmosphere Research Satellite. Tests of the code examined the variance transport and observability properties. Aspects of the parallel implementation, some timing results, and a brief discussion of the physical results will be presented.
Particle-in-cell simulation study of the scaling of asymmetric magnetic reconnection with in-plane flow shear

DOE Office of Scientific and Technical Information (OSTI.GOV)

Doss, C. E.; Cassak, P. A., E-mail: Paul.Cassak@mail.wvu.edu; Swisdak, M.

2016-08-15

We investigate magnetic reconnection in systems simultaneously containing asymmetric (anti-parallel) magnetic fields, asymmetric plasma densities and temperatures, and arbitrary in-plane bulk flow of plasma in the upstream regions. Such configurations are common in the high-latitudes of Earth's magnetopause and in tokamaks. We investigate the convection speed of the X-line, the scaling of the reconnection rate, and the condition for which the flow suppresses reconnection as a function of upstream flow speeds. We use two-dimensional particle-in-cell simulations to capture the mixing of plasma in the outflow regions better than is possible in fluid modeling. We perform simulations with asymmetric magnetic fields,more » simulations with asymmetric densities, and simulations with magnetopause-like parameters where both are asymmetric. For flow speeds below the predicted cutoff velocity, we find good scaling agreement with the theory presented in Doss et al. [J. Geophys. Res. 120, 7748 (2015)]. Applications to planetary magnetospheres, tokamaks, and the solar wind are discussed.« less

The Autumn of break-ups: When Jakobshavn Isbrae lost its floating tongue

NASA Astrophysics Data System (ADS)

Aschwanden, A.; Fahnestock, M. A.; Truffer, M.; Motyka, R. J.

2015-12-01

Capturing the temporal variability in outlet glacier flow remains one of the holy grails in ice sheet modeling. Here we demonstrate progress using the three-dimensional Parallel Ice Sheet Model. Using a first-order calving law and prescribed subshelf basal melt rates, we performed high-resolution (<1km) hindcasts of the Greenland Ice Sheet of the 1989-2012 period. These hindcasts allow us to study the processes governing ice-shelf thinning, break-up, and subsequent speed-ups and dynamic thinning. Focussing our analysis on the Jakobshavn basin we show that our simulations are able to capture the thinning of the floating tongue resulting from increased subshelf basal melt rates. Furthermore, our simulations capture both the magnitude and the timing of the dynamic thinning associated with the loss of the floating tongue, as well as the speed-up. We find little seasonal variations in surface speeds prior to 1995, and strong variations thereafter, in good agreement with observations of Echelmeyer and Harrison (1991) and Joughin et al (2012).
High-sensitivity, high-speed continuous imaging system

DOEpatents

Watson, Scott A; Bender, III, Howard A

2014-11-18

A continuous imaging system for recording low levels of light typically extending over small distances with high-frame rates and with a large number of frames is described. Photodiode pixels disposed in an array having a chosen geometry, each pixel having a dedicated amplifier, analog-to-digital convertor, and memory, provide parallel operation of the system. When combined with a plurality of scintillators responsive to a selected source of radiation, in a scintillator array, the light from each scintillator being directed to a single corresponding photodiode in close proximity or lens-coupled thereto, embodiments of the present imaging system may provide images of x-ray, gamma ray, proton, and neutron sources with high efficiency.
A Dynamic Range Enhanced Readout Technique with a Two-Step TDC for High Speed Linear CMOS Image Sensors.

PubMed

Gao, Zhiyuan; Yang, Congjie; Xu, Jiangtao; Nie, Kaiming

2015-11-06

This paper presents a dynamic range (DR) enhanced readout technique with a two-step time-to-digital converter (TDC) for high speed linear CMOS image sensors. A multi-capacitor and self-regulated capacitive trans-impedance amplifier (CTIA) structure is employed to extend the dynamic range. The gain of the CTIA is auto adjusted by switching different capacitors to the integration node asynchronously according to the output voltage. A column-parallel ADC based on a two-step TDC is utilized to improve the conversion rate. The conversion is divided into coarse phase and fine phase. An error calibration scheme is also proposed to correct quantization errors caused by propagation delay skew within -T(clk)~+T(clk). A linear CMOS image sensor pixel array is designed in the 0.13 μm CMOS process to verify this DR-enhanced high speed readout technique. The post simulation results indicate that the dynamic range of readout circuit is 99.02 dB and the ADC achieves 60.22 dB SNDR and 9.71 bit ENOB at a conversion rate of 2 MS/s after calibration, with 14.04 dB and 2.4 bit improvement, compared with SNDR and ENOB of that without calibration.
High-speed visualization of fuel spray impingement in the near-wall region using a DISI injector

NASA Astrophysics Data System (ADS)

Kawahara, N.; Kintaka, K.; Tomita, E.

2017-02-01

We used a multi-hole injector to spray isooctane under atmospheric conditions and observed droplet impingement behaviors. It is generally known that droplet impact regimes such as splashing, deposition, or bouncing are governed by the Weber number. However, owing to its complexity, little has been reported on microscopic visualization of poly-dispersed spray. During the spray impingement process, a large number of droplets approach, hit, then interact with the wall. It is therefore difficult to focus on a single droplet and observe the impingement process. We solved this difficulty using high-speed microscopic visualization. The spray/wall interaction processes were recorded by a high-speed camera (Shimadzu HPV-X2) with a long-distance microscope. We captured several impinging microscopic droplets. After optimizing the magnification and frame rate, the atomization behaviors, splashing and deposition, were recorded. Then, we processed the images obtained to determine droplet parameters such as the diameter, velocity, and impingement angle. Based on this information, the critical threshold between splashing and deposition was investigated in terms of the normal and parallel components of the Weber number with respect to the wall. The results suggested that, on a dry wall, we should set the normal critical Weber number to 300.
Impact of Cricothyroid Muscle Contraction on Vocal Fold Vibration: Experimental Study with High-Speed Videoendoscopy.

PubMed

Ishikawa, Camila Cristina; Pinheiro, Thais Gonçalves; Hachiya, Adriana; Montagnoli, Arlindo Neto; Tsuji, Domingos Hiroshi

2017-05-01

The aim of this study was to evaluate the effects of cricothyroid muscle contraction on vocal fold vibration, as evaluated with high-speed videoendoscopy, and to identify one or more aspects of vocal fold vibration that could be used as an irrefutable indicator of unilateral cricothyroid muscle paralysis. This was an experimental study employing excised human larynges. Twenty freshly excised human larynges were evaluated during artificially produced vibration. Each larynx was assessed in three situations: bilateral cricothyroid muscle contraction, unilateral cricothyroid muscle contraction, and no contraction of either cricothyroid muscle. The following parameters were evaluated by high-speed videoendoscopy: fundamental frequency, periodicity, amplitude of vocal fold vibration, and phase symmetry between the vocal folds. Although neither unilateral nor bilateral cricothyroid muscle contraction altered the periodicity of vibration or the occurrence of phase asymmetry, there was a significant decrease in fundamental frequency in parallel with decreasing longitudinal tension. We also found an increase in vibration amplitude of right and left vocal folds, which were similar in terms of their behavior for this parameter in the various situations studied. Our results suggest that differences in vibration amplitude and phase symmetry between vocal folds are not reliable indicators of unilateral cricothyroid muscle paralysis. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Comparison of Parallel and Series Hybrid Power Trains for Transit Bus Applications

DOE PAGES

Gao, Zhiming; Daw, C. Stuart; Smith, David E.; ...

2016-08-01

The fuel economy and emissions of conventional and hybrid buses equipped with emissions after treatment were evaluated via computational simulation for six representative city bus drive cycles. Both series and parallel configurations for the hybrid case were studied. The simulation results indicated that series hybrid buses have the greatest overall advantage in fuel economy. The series and parallel hybrid buses were predicted to produce similar carbon monoxide and hydrocarbon tailpipe emissions but were also predicted to have reduced tailpipe emissions of nitrogen oxides compared with the conventional bus in higher speed cycles. For the New York bus cycle, which hasmore » the lowest average speed among the cycles evaluated, the series bus tailpipe emissions were somewhat higher than they were for the conventional bus; the parallel hybrid bus had significantly lower tailpipe emissions. All three bus power trains were found to require periodic active diesel particulate filter regeneration to maintain control of particulate matter. Finally, plug-in operation of series hybrid buses appears to offer significant fuel economy benefits and is easily employed because of the relatively large battery capacity that is typical of the series hybrid configuration.« less
Dynamic Load-Balancing for Distributed Heterogeneous Computing of Parallel CFD Problems

NASA Technical Reports Server (NTRS)

Ecer, A.; Chien, Y. P.; Boenisch, T.; Akay, H. U.

2000-01-01

The developed methodology is aimed at improving the efficiency of executing block-structured algorithms on parallel, distributed, heterogeneous computers. The basic approach of these algorithms is to divide the flow domain into many sub- domains called blocks, and solve the governing equations over these blocks. Dynamic load balancing problem is defined as the efficient distribution of the blocks among the available processors over a period of several hours of computations. In environments with computers of different architecture, operating systems, CPU speed, memory size, load, and network speed, balancing the loads and managing the communication between processors becomes crucial. Load balancing software tools for mutually dependent parallel processes have been created to efficiently utilize an advanced computation environment and algorithms. These tools are dynamic in nature because of the chances in the computer environment during execution time. More recently, these tools were extended to a second operating system: NT. In this paper, the problems associated with this application will be discussed. Also, the developed algorithms were combined with the load sharing capability of LSF to efficiently utilize workstation clusters for parallel computing. Finally, results will be presented on running a NASA based code ADPAC to demonstrate the developed tools for dynamic load balancing.
Comparison of Parallel and Series Hybrid Power Trains for Transit Bus Applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gao, Zhiming; Daw, C. Stuart; Smith, David E.

The fuel economy and emissions of conventional and hybrid buses equipped with emissions after treatment were evaluated via computational simulation for six representative city bus drive cycles. Both series and parallel configurations for the hybrid case were studied. The simulation results indicated that series hybrid buses have the greatest overall advantage in fuel economy. The series and parallel hybrid buses were predicted to produce similar carbon monoxide and hydrocarbon tailpipe emissions but were also predicted to have reduced tailpipe emissions of nitrogen oxides compared with the conventional bus in higher speed cycles. For the New York bus cycle, which hasmore » the lowest average speed among the cycles evaluated, the series bus tailpipe emissions were somewhat higher than they were for the conventional bus; the parallel hybrid bus had significantly lower tailpipe emissions. All three bus power trains were found to require periodic active diesel particulate filter regeneration to maintain control of particulate matter. Finally, plug-in operation of series hybrid buses appears to offer significant fuel economy benefits and is easily employed because of the relatively large battery capacity that is typical of the series hybrid configuration.« less
Discrete Event Modeling and Massively Parallel Execution of Epidemic Outbreak Phenomena

DOE Office of Scientific and Technical Information (OSTI.GOV)

Perumalla, Kalyan S; Seal, Sudip K

2011-01-01

In complex phenomena such as epidemiological outbreaks, the intensity of inherent feedback effects and the significant role of transients in the dynamics make simulation the only effective method for proactive, reactive or post-facto analysis. The spatial scale, runtime speed, and behavioral detail needed in detailed simulations of epidemic outbreaks make it necessary to use large-scale parallel processing. Here, an optimistic parallel execution of a new discrete event formulation of a reaction-diffusion simulation model of epidemic propagation is presented to facilitate in dramatically increasing the fidelity and speed by which epidemiological simulations can be performed. Rollback support needed during optimistic parallelmore » execution is achieved by combining reverse computation with a small amount of incremental state saving. Parallel speedup of over 5,500 and other runtime performance metrics of the system are observed with weak-scaling execution on a small (8,192-core) Blue Gene / P system, while scalability with a weak-scaling speedup of over 10,000 is demonstrated on 65,536 cores of a large Cray XT5 system. Scenarios representing large population sizes exceeding several hundreds of millions of individuals in the largest cases are successfully exercised to verify model scalability.« less
Parallel detection experiment of fluorescence confocal microscopy using DMD.

PubMed

Wang, Qingqing; Zheng, Jihong; Wang, Kangni; Gui, Kun; Guo, Hanming; Zhuang, Songlin

2016-05-01

Parallel detection of fluorescence confocal microscopy (PDFCM) system based on Digital Micromirror Device (DMD) is reported in this paper in order to realize simultaneous multi-channel imaging and improve detection speed. DMD is added into PDFCM system, working to take replace of the single traditional pinhole in the confocal system, which divides the laser source into multiple excitation beams. The PDFCM imaging system based on DMD is experimentally set up. The multi-channel image of fluorescence signal of potato cells sample is detected by parallel lateral scanning in order to verify the feasibility of introducing the DMD into fluorescence confocal microscope. In addition, for the purpose of characterizing the microscope, the depth response curve is also acquired. The experimental result shows that in contrast to conventional microscopy, the DMD-based PDFCM system has higher axial resolution and faster detection speed, which may bring some potential benefits in the biology and medicine analysis. SCANNING 38:234-239, 2016. © 2015 Wiley Periodicals, Inc. © Wiley Periodicals, Inc.
Optical interconnection using polyimide waveguide for multichip module

NASA Astrophysics Data System (ADS)

Koyanagi, Mitsumasa

1996-01-01

We have developed a parallel processor system with 152 RISC processor chips specific for Monte-Carlo analysis. This system has the ring-bus architecture. The performance of several Gflops is expected in this system according to the computer simulation. However, it was revealed that the data transfer speed of the bus has to be increased more dramatically in order to further increase the performance. Then, we propose to introduce the optical interconnection into the parallel processor system to increase the data transfer speed of the buses. The double ringbus architecture is employed in this new parallel processor system with optical interconnection. The free-space optical interconnection arid the optical waveguide are used for the optical ring-bus. Thin polyimide film was used to form the optical waveguide. A relatively low propagation loss was achieved in the polyimide optical waveguide. In addition, it was confirmed that the propagation direction of signal light can be easily changed by using a micro-mirror.
Optical interconnection using polyimide waveguide for multichip module

NASA Astrophysics Data System (ADS)

Koyanagi, Mitsumasa

1996-01-01

We have developed a parallel processor system with 152 RISC processor chips specific for Monte-Carlo analysis. This system has the ring-bus architecture. The performance of several Gflops is expected in this system according to the computer simulation. However, it was revealed that the data transfer speed of the bus has to be increased more dramatically in order to further increase the performance. Then, we propose to introduce the optical interconnection into the parallel processor system to increase the data transfer speed of the buses. The double ring-bus architecture is employed in this new parallel processor system with optical interconnection. The free-space optical interconnection and the optical waveguide are used for the optical ring-bus. Thin polyimide film was used to form the optical waveguide. A relatively low propagation loss was achieved in the polyimide optical waveguide. In addition, it was confirmed that the propagation direction of signal light can be easily changed by using a micro-mirror.
An efficient parallel-processing method for transposing large matrices in place.

PubMed

Portnoff, M R

1999-01-01

We have developed an efficient algorithm for transposing large matrices in place. The algorithm is efficient because data are accessed either sequentially in blocks or randomly within blocks small enough to fit in cache, and because the same indexing calculations are shared among identical procedures operating on independent subsets of the data. This inherent parallelism makes the method well suited for a multiprocessor computing environment. The algorithm is easy to implement because the same two procedures are applied to the data in various groupings to carry out the complete transpose operation. Using only a single processor, we have demonstrated nearly an order of magnitude increase in speed over the previously published algorithm by Gate and Twigg for transposing a large rectangular matrix in place. With multiple processors operating in parallel, the processing speed increases almost linearly with the number of processors. A simplified version of the algorithm for square matrices is presented as well as an extension for matrices large enough to require virtual memory.
Smart photodetector arrays for error control in page-oriented optical memory

NASA Astrophysics Data System (ADS)

Schaffer, Maureen Elizabeth

1998-12-01

Page-oriented optical memories (POMs) have been proposed to meet high speed, high capacity storage requirements for input/output intensive computer applications. This technology offers the capability for storage and retrieval of optical data in two-dimensional pages resulting in high throughput data rates. Since currently measured raw bit error rates for these systems fall several orders of magnitude short of industry requirements for binary data storage, powerful error control codes must be adopted. These codes must be designed to take advantage of the two-dimensional memory output. In addition, POMs require an optoelectronic interface to transfer the optical data pages to one or more electronic host systems. Conventional charge coupled device (CCD) arrays can receive optical data in parallel, but the relatively slow serial electronic output of these devices creates a system bottleneck thereby eliminating the POM advantage of high transfer rates. Also, CCD arrays are "unintelligent" interfaces in that they offer little data processing capabilities. The optical data page can be received by two-dimensional arrays of "smart" photo-detector elements that replace conventional CCD arrays. These smart photodetector arrays (SPAs) can perform fast parallel data decoding and error control, thereby providing an efficient optoelectronic interface between the memory and the electronic computer. This approach optimizes the computer memory system by combining the massive parallelism and high speed of optics with the diverse functionality, low cost, and local interconnection efficiency of electronics. In this dissertation we examine the design of smart photodetector arrays for use as the optoelectronic interface for page-oriented optical memory. We review options and technologies for SPA fabrication, develop SPA requirements, and determine SPA scalability constraints with respect to pixel complexity, electrical power dissipation, and optical power limits. Next, we examine data modulation and error correction coding for the purpose of error control in the POM system. These techniques are adapted, where possible, for 2D data and evaluated as to their suitability for a SPA implementation in terms of BER, code rate, decoder time and pixel complexity. Our analysis shows that differential data modulation combined with relatively simple block codes known as array codes provide a powerful means to achieve the desired data transfer rates while reducing error rates to industry requirements. Finally, we demonstrate the first smart photodetector array designed to perform parallel error correction on an entire page of data and satisfy the sustained data rates of page-oriented optical memories. Our implementation integrates a monolithic PN photodiode array and differential input receiver for optoelectronic signal conversion with a cluster error correction code using 0.35-mum CMOS. This approach provides high sensitivity, low electrical power dissipation, and fast parallel correction of 2 x 2-bit cluster errors in an 8 x 8 bit code block to achieve corrected output data rates scalable to 102 Gbps in the current technology increasing to 1.88 Tbps in 0.1-mum CMOS.
Deep Neural Network Emulation of a High-Order, WENO-Limited, Space-Time Reconstruction

NASA Astrophysics Data System (ADS)

Norman, M. R.; Hall, D. M.

2017-12-01

Deep Neural Networks (DNNs) have been used to emulate a number of processes in atmospheric models, including radiation and even so-called super-parameterization of moist convection. In each scenario, the DNN provides a good representation of the process even for inputs that have not been encountered before. More notably, they provide an emulation at a fraction of the cost of the original routine, giving speed-ups of 30× and even up to 200× compared to the runtime costs of the original routines. However, to our knowledge there has not been an investigation into using DNNs to emulate the dynamics. The most likely reason for this is that dynamics operators are typically both linear and low cost, meaning they cannot be sped up by a non-linear DNN emulation. However, there exist high-cost non-linear space-time dynamics operators that significantly reduce the number of parallel data transfers necessary to complete an atmospheric simulation. The WENO-limited Finite-Volume method with ADER-DT time integration is a prime example of this - needing only two parallel communications per large, fully limited time step. However, it comes at a high cost in terms of computation, which is why many would hesitate to use it. This talk investigates DNN emulation of the WENO-limited space-time finite-volume reconstruction procedure - the most expensive portion of this method, which densely clusters a large amount of non-linear computation. Different training techniques and network architectures are tested, and the accuracy and speed-up of each is given.
Towards green high capacity optical networks

NASA Astrophysics Data System (ADS)

Glesk, I.; Mohd Warip, M. N.; Idris, S. K.; Osadola, T. B.; Andonovic, I.

2011-09-01

The demand for fast, secure, energy efficient high capacity networks is growing. It is fuelled by transmission bandwidth needs which will support among other things the rapid penetration of multimedia applications empowering smart consumer electronics and E-businesses. All the above trigger unparallel needs for networking solutions which must offer not only high-speed low-cost "on demand" mobile connectivity but should be ecologically friendly and have low carbon footprint. The first answer to address the bandwidth needs was deployment of fibre optic technologies into transport networks. After this it became quickly obvious that the inferior electronic bandwidth (if compared to optical fiber) will further keep its upper hand on maximum implementable serial data rates. A new solution was found by introducing parallelism into data transport in the form of Wavelength Division Multiplexing (WDM) which has helped dramatically to improve aggregate throughput of optical networks. However with these advancements a new bottleneck has emerged at fibre endpoints where data routers must process the incoming and outgoing traffic. Here, even with the massive and power hungry electronic parallelism routers today (still relying upon bandwidth limiting electronics) do not offer needed processing speeds networks demands. In this paper we will discuss some novel unconventional approaches to address network scalability leading to energy savings via advance optical signal processing. We will also investigate energy savings based on advanced network management through nodes hibernation proposed for Optical IP networks. The hibernation reduces the network overall power consumption by forming virtual network reconfigurations through selective nodes groupings and by links segmentations and partitionings.
A PC parallel port button box provides millisecond response time accuracy under Linux.

PubMed

Stewart, Neil

2006-02-01

For psychologists, it is sometimes necessary to measure people's reaction times to the nearest millisecond. This article describes how to use the PC parallel port to receive signals from a button box to achieve millisecond response time accuracy. The workings of the parallel port, the corresponding port addresses, and a simple Linux program for controlling the port are described. A test of the speed and reliability of button box signal detection is reported. If the reader is moderately familiar with Linux, this article should provide sufficient instruction for him or her to build and test his or her own parallel port button box. This article also describes how the parallel port could be used to control an external apparatus.
Improving significantly the failure strain and work hardening response of LPSO-strengthened Mg-Y-Zn-Al alloy via hot extrusion speed control

NASA Astrophysics Data System (ADS)

Tan, Xinghe; Chee, Winston; Chan, Jimmy; Kwok, Richard; Gupta, Manoj

2017-07-01

The effect of hot extrusion speed on the microstructure and mechanical properties of MgY1.06Zn0.76Al0.42 (at%) alloy strengthened by the novel long-period stacking ordered (LPSO) phase was systematically investigated. Increase in the speed of extrusion accelerated dynamic recrystallization of α-Mg via particle-stimulated nucleation and grain growth in the alloy. The intensive recrystallization and grain growth events weakened the conventional basal texture and Hall-Petch strengthening in the alloy which led to significant improvement in its failure strain from 4.9% to 19.6%. The critical strengthening contribution from LPSO phase known for attributing high strength to the alloy was observed to be greatly undermined by the parallel competition from texture weakening and the adverse Hall-Petch effect when the alloy was extruded at higher speed. Absence of work hardening interestingly observed in the alloy extruded at lower speed was discussed in terms of its ultra-fine grained microstructure which promoted the condition of steady-state defect density in the alloy; where dislocation annihilation balances out the generation of new dislocations during plastic deformation. One approach to improve work hardening response of the alloy to prevent unstable deformation and abrupt failure in service is to increase the grain diameter in the alloy by judiciously increasing the extrusion speed.
Parallel Computation of the Jacobian Matrix for Nonlinear Equation Solvers Using MATLAB

NASA Technical Reports Server (NTRS)

Rose, Geoffrey K.; Nguyen, Duc T.; Newman, Brett A.

2017-01-01

Demonstrating speedup for parallel code on a multicore shared memory PC can be challenging in MATLAB due to underlying parallel operations that are often opaque to the user. This can limit potential for improvement of serial code even for the so-called embarrassingly parallel applications. One such application is the computation of the Jacobian matrix inherent to most nonlinear equation solvers. Computation of this matrix represents the primary bottleneck in nonlinear solver speed such that commercial finite element (FE) and multi-body-dynamic (MBD) codes attempt to minimize computations. A timing study using MATLAB's Parallel Computing Toolbox was performed for numerical computation of the Jacobian. Several approaches for implementing parallel code were investigated while only the single program multiple data (spmd) method using composite objects provided positive results. Parallel code speedup is demonstrated but the goal of linear speedup through the addition of processors was not achieved due to PC architecture.
Proposed hardware architectures of particle filter for object tracking

NASA Astrophysics Data System (ADS)

Abd El-Halym, Howida A.; Mahmoud, Imbaby Ismail; Habib, SED

2012-12-01

In this article, efficient hardware architectures for particle filter (PF) are presented. We propose three different architectures for Sequential Importance Resampling Filter (SIRF) implementation. The first architecture is a two-step sequential PF machine, where particle sampling, weight, and output calculations are carried out in parallel during the first step followed by sequential resampling in the second step. For the weight computation step, a piecewise linear function is used instead of the classical exponential function. This decreases the complexity of the architecture without degrading the results. The second architecture speeds up the resampling step via a parallel, rather than a serial, architecture. This second architecture targets a balance between hardware resources and the speed of operation. The third architecture implements the SIRF as a distributed PF composed of several processing elements and central unit. All the proposed architectures are captured using VHDL synthesized using Xilinx environment, and verified using the ModelSim simulator. Synthesis results confirmed the resource reduction and speed up advantages of our architectures.

Cost-effective data storage/archival subsystem for functional PACS

NASA Astrophysics Data System (ADS)

Chen, Y. P.; Kim, Yongmin

1993-09-01

Not the least of the requirements of a workable PACS is the ability to store and archive vast amounts of information. A medium-size hospital will generate between 1 and 2 TBytes of data annually on a fully functional PACS. A high-speed image transmission network coupled with a comparably high-speed central data storage unit can make local memory and magnetic disks in the PACS workstations less critical and, in an extreme case, unnecessary. Under these circumstances, the capacity and performance of the central data storage subsystem and database is critical in determining the response time at the workstations, thus significantly affecting clinical acceptability. The central data storage subsystem not only needs to provide sufficient capacity to store about ten days worth of images (five days worth of new studies, and on the average, about one comparison study for each new study), but also supplies images to the requesting workstation in a timely fashion. The database must provide fast retrieval responses upon users' requests for images. This paper analyzes both advantages and disadvantages of multiple parallel transfer disks versus RAID disks for short-term central data storage subsystem, as well as optical disk jukebox versus digital recorder tape subsystem for long-term archive. Furthermore, an example high-performance cost-effective storage subsystem which integrates both the RAID disks and high-speed digital tape subsystem as a cost-effective PACS data storage/archival unit are presented.
Real-time polarization-sensitive optical coherence tomography data processing with parallel computing

PubMed Central

Liu, Gangjun; Zhang, Jun; Yu, Lingfeng; Xie, Tuqiang; Chen, Zhongping

2010-01-01

With the increase of the A-line speed of optical coherence tomography (OCT) systems, real-time processing of acquired data has become a bottleneck. The shared-memory parallel computing technique is used to process OCT data in real time. The real-time processing power of a quad-core personal computer (PC) is analyzed. It is shown that the quad-core PC could provide real-time OCT data processing ability of more than 80K A-lines per second. A real-time, fiber-based, swept source polarization-sensitive OCT system with 20K A-line speed is demonstrated with this technique. The real-time 2D and 3D polarization-sensitive imaging of chicken muscle and pig tendon is also demonstrated. PMID:19904337
An Automated Parallel Image Registration Technique Based on the Correlation of Wavelet Features

NASA Technical Reports Server (NTRS)

LeMoigne, Jacqueline; Campbell, William J.; Cromp, Robert F.; Zukor, Dorothy (Technical Monitor)

2001-01-01

With the increasing importance of multiple platform/multiple remote sensing missions, fast and automatic integration of digital data from disparate sources has become critical to the success of these endeavors. Our work utilizes maxima of wavelet coefficients to form the basic features of a correlation-based automatic registration algorithm. Our wavelet-based registration algorithm is tested successfully with data from the National Oceanic and Atmospheric Administration (NOAA) Advanced Very High Resolution Radiometer (AVHRR) and the Landsat/Thematic Mapper(TM), which differ by translation and/or rotation. By the choice of high-frequency wavelet features, this method is similar to an edge-based correlation method, but by exploiting the multi-resolution nature of a wavelet decomposition, our method achieves higher computational speeds for comparable accuracies. This algorithm has been implemented on a Single Instruction Multiple Data (SIMD) massively parallel computer, the MasPar MP-2, as well as on the CrayT3D, the Cray T3E and a Beowulf cluster of Pentium workstations.
A superconducting direct-current limiter with a power of up to 8 MVA

NASA Astrophysics Data System (ADS)

Fisher, L. M.; Alferov, D. F.; Akhmetgareev, M. R.; Budovskii, A. I.; Evsin, D. V.; Voloshin, I. F.; Kalinov, A. V.

2016-12-01

A resistive switching superconducting fault current limiter (SFCL) for DC networks with a nominal voltage of 3.5 kV and a nominal current of 2 kA was developed, produced, and tested. The SFCL has two main units—an assembly of superconducting modules and a high-speed vacuum circuit breaker. The assembly of superconducting modules consists of nine (3 × 3) parallel-series connected modules. Each module contains four parallel-connected 2G high-temperature superconducting (HTS) tapes. The results of SFCL tests in the short-circuit emulation mode with a maximum current rise rate of 1300 A/ms are presented. The SFCL is capable of limiting the current at a level of 7 kA and break it 8 ms after the current-limiting mode begins. The average temperature of HTS tapes during the current-limiting mode increases to 210 K. After the current is interrupted, the superconductivity recovery time does not exceed 1 s.
Toward real-time diffuse optical tomography: accelerating light propagation modeling employing parallel computing on GPU and CPU

NASA Astrophysics Data System (ADS)

Doulgerakis, Matthaios; Eggebrecht, Adam; Wojtkiewicz, Stanislaw; Culver, Joseph; Dehghani, Hamid

2017-12-01

Parameter recovery in diffuse optical tomography is a computationally expensive algorithm, especially when used for large and complex volumes, as in the case of human brain functional imaging. The modeling of light propagation, also known as the forward problem, is the computational bottleneck of the recovery algorithm, whereby the lack of a real-time solution is impeding practical and clinical applications. The objective of this work is the acceleration of the forward model, within a diffusion approximation-based finite-element modeling framework, employing parallelization to expedite the calculation of light propagation in realistic adult head models. The proposed methodology is applicable for modeling both continuous wave and frequency-domain systems with the results demonstrating a 10-fold speed increase when GPU architectures are available, while maintaining high accuracy. It is shown that, for a very high-resolution finite-element model of the adult human head with ˜600,000 nodes, consisting of heterogeneous layers, light propagation can be calculated at ˜0.25 s/excitation source.
Research in Parallel Algorithms and Software for Computational Aerosciences

NASA Technical Reports Server (NTRS)

Domel, Neal D.

1996-01-01

Phase I is complete for the development of a Computational Fluid Dynamics parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Research in Parallel Algorithms and Software for Computational Aerosciences

NASA Technical Reports Server (NTRS)

Domel, Neal D.

1996-01-01

Phase 1 is complete for the development of a computational fluid dynamics CFD) parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Electron hole tracking PIC simulation

NASA Astrophysics Data System (ADS)

Zhou, Chuteng; Hutchinson, Ian

2016-10-01

An electron hole is a coherent BGK mode solitary wave. Electron holes are observed to travel at high velocities relative to bulk plasmas. The kinematics of a 1-D electron hole is studied using a novel Particle-In-Cell simulation code with fully kinetic ions. A hole tracking technique enables us to follow the trajectory of a fast-moving solitary hole and study quantitatively hole acceleration and coupling to ions. The electron hole signal is detected and the simulation domain moves by a carefully designed feedback control law to follow its propagation. This approach has the advantage that the length of the simulation domain can be significantly reduced to several times the hole width, which makes high resolution simulations tractable. We observe a transient at the initial stage of hole formation when the hole accelerates to several times the cold-ion sound speed. Artificially imposing slow ion speed changes on a fully formed hole causes its velocity to change even when the ion stream speed in the hole frame greatly exceeds the ion thermal speed, so there are no reflected ions. The behavior that we observe in numerical simulations agrees very well with our analytic theory of hole momentum conservation and energization effects we call ``jetting''. The work was partially supported by the NSF/DOE Basic Plasma Science Partnership under Grant DE-SC0010491. Computer simulations were carried out on the MIT PSFC parallel AMD Opteron/Infiniband cluster Loki.
Visualizing Special Relativity: The Field of An Electric Dipole Moving at Relativistic Speed

ERIC Educational Resources Information Center

Smith, Glenn S.

2011-01-01

The electromagnetic field is determined for a time-varying electric dipole moving with a constant velocity that is parallel to its moment. Graphics are used to visualize this field in the rest frame of the dipole and in the laboratory frame when the dipole is moving at relativistic speed. Various phenomena from special relativity are clearly…
Self-propulsion of Leidenfrost Drops between Non-Parallel Structures.

PubMed

Luo, Cheng; Mrinal, Manjarik; Wang, Xiang

2017-09-20

In this work, we explored self-propulsion of a Leidenfrost drop between non-parallel structures. A theoretical model was first developed to determine conditions for liquid drops to start moving away from the corner of two non-parallel plates. These conditions were then simplified for the case of a Leidenfrost drop. Furthermore, ejection speeds and travel distances of Leidenfrost drops were derived using a scaling law. Subsequently, the theoretical models were validated by experiments. Finally, three new devices have been developed to manipulate Leidenfrost drops in different ways.
Speeding up parallel processing

NASA Technical Reports Server (NTRS)

Denning, Peter J.

1988-01-01

In 1967 Amdahl expressed doubts about the ultimate utility of multiprocessors. The formulation, now called Amdahl's law, became part of the computing folklore and has inspired much skepticism about the ability of the current generation of massively parallel processors to efficiently deliver all their computing power to programs. The widely publicized recent results of a group at Sandia National Laboratory, which showed speedup on a 1024 node hypercube of over 500 for three fixed size problems and over 1000 for three scalable problems, have convincingly challenged this bit of folklore and have given new impetus to parallel scientific computing.
A Subsystem Test Bed for Chinese Spectral Radioheliograph

NASA Astrophysics Data System (ADS)

Zhao, An; Yan, Yihua; Wang, Wei

2014-11-01

The Chinese Spectral Radioheliograph is a solar dedicated radio interferometric array that will produce high spatial resolution, high temporal resolution, and high spectral resolution images of the Sun simultaneously in decimetre and centimetre wave range. Digital processing of intermediate frequency signal is an important part in a radio telescope. This paper describes a flexible and high-speed digital down conversion system for the CSRH by applying complex mixing, parallel filtering, and extracting algorithms to process IF signal at the time of being designed and incorporates canonic-signed digit coding and bit-plane method to improve program efficiency. The DDC system is intended to be a subsystem test bed for simulation and testing for CSRH. Software algorithms for simulation and hardware language algorithms based on FPGA are written which use less hardware resources and at the same time achieve high performances such as processing high-speed data flow (1 GHz) with 10 MHz spectral resolution. An experiment with the test bed is illustrated by using geostationary satellite data observed on March 20, 2014. Due to the easy alterability of the algorithms on FPGA, the data can be recomputed with different digital signal processing algorithms for selecting optimum algorithm.
Further development of a robust workup process for solution-phase high-throughput library synthesis to address environmental and sample tracking issues.

PubMed

Kuroda, Noritaka; Hird, Nick; Cork, David G

2006-01-01

During further improvement of a high-throughput, solution-phase synthesis system, new workup tools and apparatus for parallel liquid-liquid extraction and evaporation have been developed. A combination of in-house design and collaboration with external manufacturers has been used to address (1) environmental issues concerning solvent emissions and (2) sample tracking errors arising from manual intervention. A parallel liquid-liquid extraction unit, containing miniature high-speed magnetic stirrers for efficient mixing of organic and aqueous phases, has been developed for use on a multichannel liquid handler. Separation of the phases is achieved by dispensing them into a newly patented filter tube containing a vertical hydrophobic porous membrane, which allows only the organic phase to pass into collection vials positioned below. The vertical positioning of the membrane overcomes the hitherto dependence on the use of heavier-than-water, bottom-phase, organic solvents such as dichloromethane, which are restricted due to environmental concerns. Both small (6-mL) and large (60-mL) filter tubes were developed for parallel phase separation in library and template synthesis, respectively. In addition, an apparatus for parallel solvent evaporation was developed to (1) remove solvent from the above samples with highly efficient recovery and (2) avoid the movement of individual samples between their collection on a liquid handler and registration to prevent sample identification errors. The apparatus uses a diaphragm pump to achieve a dynamic circulating closed system with a heating block for the rack of 96 sample vials and an efficient condenser to trap the solvents. Solvent recovery is typically >98%, and convenient operation and monitoring has made the apparatus the first choice for removal of volatile solvents.
A parallel-vector algorithm for rapid structural analysis on high-performance computers

NASA Technical Reports Server (NTRS)

Storaasli, Olaf O.; Nguyen, Duc T.; Agarwal, Tarun K.

1990-01-01

A fast, accurate Choleski method for the solution of symmetric systems of linear equations is presented. This direct method is based on a variable-band storage scheme and takes advantage of column heights to reduce the number of operations in the Choleski factorization. The method employs parallel computation in the outermost DO-loop and vector computation via the 'loop unrolling' technique in the innermost DO-loop. The method avoids computations with zeros outside the column heights, and as an option, zeros inside the band. The close relationship between Choleski and Gauss elimination methods is examined. The minor changes required to convert the Choleski code to a Gauss code to solve non-positive-definite symmetric systems of equations are identified. The results for two large-scale structural analyses performed on supercomputers, demonstrate the accuracy and speed of the method.
A parallel-vector algorithm for rapid structural analysis on high-performance computers

NASA Technical Reports Server (NTRS)

Storaasli, Olaf O.; Nguyen, Duc T.; Agarwal, Tarun K.

1990-01-01

A fast, accurate Choleski method for the solution of symmetric systems of linear equations is presented. This direct method is based on a variable-band storage scheme and takes advantage of column heights to reduce the number of operations in the Choleski factorization. The method employs parallel computation in the outermost DO-loop and vector computation via the loop unrolling technique in the innermost DO-loop. The method avoids computations with zeros outside the column heights, and as an option, zeros inside the band. The close relationship between Choleski and Gauss elimination methods is examined. The minor changes required to convert the Choleski code to a Gauss code to solve non-positive-definite symmetric systems of equations are identified. The results for two large scale structural analyses performed on supercomputers, demonstrate the accuracy and speed of the method.
Parallel implementation of D-Phylo algorithm for maximum likelihood clusters.

PubMed

Malik, Shamita; Sharma, Dolly; Khatri, Sunil Kumar

2017-03-01

This study explains a newly developed parallel algorithm for phylogenetic analysis of DNA sequences. The newly designed D-Phylo is a more advanced algorithm for phylogenetic analysis using maximum likelihood approach. The D-Phylo while misusing the seeking capacity of k -means keeps away from its real constraint of getting stuck at privately conserved motifs. The authors have tested the behaviour of D-Phylo on Amazon Linux Amazon Machine Image(Hardware Virtual Machine)i2.4xlarge, six central processing unit, 122 GiB memory, 8 × 800 Solid-state drive Elastic Block Store volume, high network performance up to 15 processors for several real-life datasets. Distributing the clusters evenly on all the processors provides us the capacity to accomplish a near direct speed if there should arise an occurrence of huge number of processors.
A high performance parallel computing architecture for robust image features

NASA Astrophysics Data System (ADS)

Zhou, Renyan; Liu, Leibo; Wei, Shaojun

2014-03-01

A design of parallel architecture for image feature detection and description is proposed in this article. The major component of this architecture is a 2D cellular network composed of simple reprogrammable processors, enabling the Hessian Blob Detector and Haar Response Calculation, which are the most computing-intensive stage of the Speeded Up Robust Features (SURF) algorithm. Combining this 2D cellular network and dedicated hardware for SURF descriptors, this architecture achieves real-time image feature detection with minimal software in the host processor. A prototype FPGA implementation of the proposed architecture achieves 1318.9 GOPS general pixel processing @ 100 MHz clock and achieves up to 118 fps in VGA (640 × 480) image feature detection. The proposed architecture is stand-alone and scalable so it is easy to be migrated into VLSI implementation.
By Hand or Not By-Hand: A Case Study of Alternative Approaches to Parallelize CFD Applications

NASA Technical Reports Server (NTRS)

Yan, Jerry C.; Bailey, David (Technical Monitor)

1997-01-01

While parallel processing promises to speed up applications by several orders of magnitude, the performance achieved still depends upon several factors, including the multiprocessor architecture, system software, data distribution and alignment, as well as the methods used for partitioning the application and mapping its components onto the architecture. The existence of the Gorden Bell Prize given out at Supercomputing every year suggests that while good performance can be attained for real applications on general purpose multiprocessors, the large investment in man-power and time still has to be repeated for each application-machine combination. As applications and machine architectures become more complex, the cost and time-delays for obtaining performance by hand will become prohibitive. Computer users today can turn to three possible avenues for help: parallel libraries, parallel languages and compilers, interactive parallelization tools. The success of these methodologies, in turn, depends on proper application of data dependency analysis, program structure recognition and transformation, performance prediction as well as exploitation of user supplied knowledge. NASA has been developing multidisciplinary applications on highly parallel architectures under the High Performance Computing and Communications Program. Over the past six years, the transition of underlying hardware and system software have forced the scientists to spend a large effort to migrate and recede their applications. Various attempts to exploit software tools to automate the parallelization process have not produced favorable results. In this paper, we report our most recent experience with CAPTOOL, a package developed at Greenwich University. We have chosen CAPTOOL for three reasons: 1. CAPTOOL accepts a FORTRAN 77 program as input. This suggests its potential applicability to a large collection of legacy codes currently in use. 2. CAPTOOL employs domain decomposition to obtain parallelism. Although the fact that not all kinds of parallelism are handled may seem unappealing, many NASA applications in computational aerosciences as well as earth and space sciences are amenable to domain decomposition. 3. CAPTOOL generates code for a large variety of environments employed across NASA centers: MPI/PVM on network of workstations to the IBS/SP2 and CRAY/T3D.
High-voltage isolation transformer for sub-nanosecond rise time pulses constructed with annular parallel-strip transmission lines.

PubMed

Homma, Akira

2011-07-01

A novel annular parallel-strip transmission line was devised to construct high-voltage high-speed pulse isolation transformers. The transmission lines can easily realize stable high-voltage operation and good impedance matching between primary and secondary circuits. The time constant for the step response of the transformer was calculated by introducing a simple low-frequency equivalent circuit model. Results show that the relation between the time constant and low-cut-off frequency of the transformer conforms to the theory of the general first-order linear time-invariant system. Results also show that the test transformer composed of the new transmission lines can transmit about 600 ps rise time pulses across the dc potential difference of more than 150 kV with insertion loss of -2.5 dB. The measured effective time constant of 12 ns agreed exactly with the theoretically predicted value. For practical applications involving the delivery of synchronized trigger signals to a dc high-voltage electron gun station, the transformer described in this paper exhibited advantages over methods using fiber optic cables for the signal transfer system. This transformer has no jitter or breakdown problems that invariably occur in active circuit components.
Massively Parallel Signal Processing using the Graphics Processing Unit for Real-Time Brain-Computer Interface Feature Extraction.

PubMed

Wilson, J Adam; Williams, Justin C

2009-01-01

The clock speeds of modern computer processors have nearly plateaued in the past 5 years. Consequently, neural prosthetic systems that rely on processing large quantities of data in a short period of time face a bottleneck, in that it may not be possible to process all of the data recorded from an electrode array with high channel counts and bandwidth, such as electrocorticographic grids or other implantable systems. Therefore, in this study a method of using the processing capabilities of a graphics card [graphics processing unit (GPU)] was developed for real-time neural signal processing of a brain-computer interface (BCI). The NVIDIA CUDA system was used to offload processing to the GPU, which is capable of running many operations in parallel, potentially greatly increasing the speed of existing algorithms. The BCI system records many channels of data, which are processed and translated into a control signal, such as the movement of a computer cursor. This signal processing chain involves computing a matrix-matrix multiplication (i.e., a spatial filter), followed by calculating the power spectral density on every channel using an auto-regressive method, and finally classifying appropriate features for control. In this study, the first two computationally intensive steps were implemented on the GPU, and the speed was compared to both the current implementation and a central processing unit-based implementation that uses multi-threading. Significant performance gains were obtained with GPU processing: the current implementation processed 1000 channels of 250 ms in 933 ms, while the new GPU method took only 27 ms, an improvement of nearly 35 times.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.