Graphics Processor Units (GPUs)
NASA Technical Reports Server (NTRS)
Wyrwas, Edward J.
2017-01-01
This presentation will include information about Graphics Processor Units (GPUs) technology, NASA Electronic Parts and Packaging (NEPP) tasks, The test setup, test parameter considerations, lessons learned, collaborations, a roadmap, NEPP partners, results to date, and future plans.
Weather information network including graphical display
NASA Technical Reports Server (NTRS)
Leger, Daniel R. (Inventor); Burdon, David (Inventor); Son, Robert S. (Inventor); Martin, Kevin D. (Inventor); Harrison, John (Inventor); Hughes, Keith R. (Inventor)
2006-01-01
An apparatus for providing weather information onboard an aircraft includes a processor unit and a graphical user interface. The processor unit processes weather information after it is received onboard the aircraft from a ground-based source, and the graphical user interface provides a graphical presentation of the weather information to a user onboard the aircraft. Preferably, the graphical user interface includes one or more user-selectable options for graphically displaying at least one of convection information, turbulence information, icing information, weather satellite information, SIGMET information, significant weather prognosis information, and winds aloft information.
Orthorectification by Using Gpgpu Method
NASA Astrophysics Data System (ADS)
Sahin, H.; Kulur, S.
2012-07-01
Thanks to the nature of the graphics processing, the newly released products offer highly parallel processing units with high-memory bandwidth and computational power of more than teraflops per second. The modern GPUs are not only powerful graphic engines but also they are high level parallel programmable processors with very fast computing capabilities and high-memory bandwidth speed compared to central processing units (CPU). Data-parallel computations can be shortly described as mapping data elements to parallel processing threads. The rapid development of GPUs programmability and capabilities attracted the attentions of researchers dealing with complex problems which need high level calculations. This interest has revealed the concepts of "General Purpose Computation on Graphics Processing Units (GPGPU)" and "stream processing". The graphic processors are powerful hardware which is really cheap and affordable. So the graphic processors became an alternative to computer processors. The graphic chips which were standard application hardware have been transformed into modern, powerful and programmable processors to meet the overall needs. Especially in recent years, the phenomenon of the usage of graphics processing units in general purpose computation has led the researchers and developers to this point. The biggest problem is that the graphics processing units use different programming models unlike current programming methods. Therefore, an efficient GPU programming requires re-coding of the current program algorithm by considering the limitations and the structure of the graphics hardware. Currently, multi-core processors can not be programmed by using traditional programming methods. Event procedure programming method can not be used for programming the multi-core processors. GPUs are especially effective in finding solution for repetition of the computing steps for many data elements when high accuracy is needed. Thus, it provides the computing process more quickly and accurately. Compared to the GPUs, CPUs which perform just one computing in a time according to the flow control are slower in performance. This structure can be evaluated for various applications of computer technology. In this study covers how general purpose parallel programming and computational power of the GPUs can be used in photogrammetric applications especially direct georeferencing. The direct georeferencing algorithm is coded by using GPGPU method and CUDA (Compute Unified Device Architecture) programming language. Results provided by this method were compared with the traditional CPU programming. In the other application the projective rectification is coded by using GPGPU method and CUDA programming language. Sample images of various sizes, as compared to the results of the program were evaluated. GPGPU method can be used especially in repetition of same computations on highly dense data, thus finding the solution quickly.
Parallel processor-based raster graphics system architecture
Littlefield, Richard J.
1990-01-01
An apparatus for generating raster graphics images from the graphics command stream includes a plurality of graphics processors connected in parallel, each adapted to receive any part of the graphics command stream for processing the command stream part into pixel data. The apparatus also includes a frame buffer for mapping the pixel data to pixel locations and an interconnection network for interconnecting the graphics processors to the frame buffer. Through the interconnection network, each graphics processor may access any part of the frame buffer concurrently with another graphics processor accessing any other part of the frame buffer. The plurality of graphics processors can thereby transmit concurrently pixel data to pixel locations in the frame buffer.
Software Graphics Processing Unit (sGPU) for Deep Space Applications
NASA Technical Reports Server (NTRS)
McCabe, Mary; Salazar, George; Steele, Glen
2015-01-01
A graphics processing capability will be required for deep space missions and must include a range of applications, from safety-critical vehicle health status to telemedicine for crew health. However, preliminary radiation testing of commercial graphics processing cards suggest they cannot operate in the deep space radiation environment. Investigation into an Software Graphics Processing Unit (sGPU)comprised of commercial-equivalent radiation hardened/tolerant single board computers, field programmable gate arrays, and safety-critical display software shows promising results. Preliminary performance of approximately 30 frames per second (FPS) has been achieved. Use of multi-core processors may provide a significant increase in performance.
Real-time digital holographic microscopy using the graphic processing unit.
Shimobaba, Tomoyoshi; Sato, Yoshikuni; Miura, Junya; Takenouchi, Mai; Ito, Tomoyoshi
2008-08-04
Digital holographic microscopy (DHM) is a well-known powerful method allowing both the amplitude and phase of a specimen to be simultaneously observed. In order to obtain a reconstructed image from a hologram, numerous calculations for the Fresnel diffraction are required. The Fresnel diffraction can be accelerated by the FFT (Fast Fourier Transform) algorithm. However, real-time reconstruction from a hologram is difficult even if we use a recent central processing unit (CPU) to calculate the Fresnel diffraction by the FFT algorithm. In this paper, we describe a real-time DHM system using a graphic processing unit (GPU) with many stream processors, which allows use as a highly parallel processor. The computational speed of the Fresnel diffraction using the GPU is faster than that of recent CPUs. The real-time DHM system can obtain reconstructed images from holograms whose size is 512 x 512 grids in 24 frames per second.
NASA Astrophysics Data System (ADS)
Laracuente, Nicholas; Grossman, Carl
2013-03-01
We developed an algorithm and software to calculate autocorrelation functions from real-time photon-counting data using the fast, parallel capabilities of graphical processor units (GPUs). Recent developments in hardware and software have allowed for general purpose computing with inexpensive GPU hardware. These devices are more suited for emulating hardware autocorrelators than traditional CPU-based software applications by emphasizing parallel throughput over sequential speed. Incoming data are binned in a standard multi-tau scheme with configurable points-per-bin size and are mapped into a GPU memory pattern to reduce time-expensive memory access. Applications include dynamic light scattering (DLS) and fluorescence correlation spectroscopy (FCS) experiments. We ran the software on a 64-core graphics pci card in a 3.2 GHz Intel i5 CPU based computer running Linux. FCS measurements were made on Alexa-546 and Texas Red dyes in a standard buffer (PBS). Software correlations were compared to hardware correlator measurements on the same signals. Supported by HHMI and Swarthmore College
A High Performance VLSI Computer Architecture For Computer Graphics
NASA Astrophysics Data System (ADS)
Chin, Chi-Yuan; Lin, Wen-Tai
1988-10-01
A VLSI computer architecture, consisting of multiple processors, is presented in this paper to satisfy the modern computer graphics demands, e.g. high resolution, realistic animation, real-time display etc.. All processors share a global memory which are partitioned into multiple banks. Through a crossbar network, data from one memory bank can be broadcasted to many processors. Processors are physically interconnected through a hyper-crossbar network (a crossbar-like network). By programming the network, the topology of communication links among processors can be reconfigurated to satisfy specific dataflows of different applications. Each processor consists of a controller, arithmetic operators, local memory, a local crossbar network, and I/O ports to communicate with other processors, memory banks, and a system controller. Operations in each processor are characterized into two modes, i.e. object domain and space domain, to fully utilize the data-independency characteristics of graphics processing. Special graphics features such as 3D-to-2D conversion, shadow generation, texturing, and reflection, can be easily handled. With the current high density interconnection (MI) technology, it is feasible to implement a 64-processor system to achieve 2.5 billion operations per second, a performance needed in most advanced graphics applications.
General purpose molecular dynamics simulations fully implemented on graphics processing units
NASA Astrophysics Data System (ADS)
Anderson, Joshua A.; Lorenz, Chris D.; Travesset, A.
2008-05-01
Graphics processing units (GPUs), originally developed for rendering real-time effects in computer games, now provide unprecedented computational power for scientific applications. In this paper, we develop a general purpose molecular dynamics code that runs entirely on a single GPU. It is shown that our GPU implementation provides a performance equivalent to that of fast 30 processor core distributed memory cluster. Our results show that GPUs already provide an inexpensive alternative to such clusters and discuss implications for the future.
Acceleration of spiking neural network based pattern recognition on NVIDIA graphics processors.
Han, Bing; Taha, Tarek M
2010-04-01
There is currently a strong push in the research community to develop biological scale implementations of neuron based vision models. Systems at this scale are computationally demanding and generally utilize more accurate neuron models, such as the Izhikevich and the Hodgkin-Huxley models, in favor of the more popular integrate and fire model. We examine the feasibility of using graphics processing units (GPUs) to accelerate a spiking neural network based character recognition network to enable such large scale systems. Two versions of the network utilizing the Izhikevich and Hodgkin-Huxley models are implemented. Three NVIDIA general-purpose (GP) GPU platforms are examined, including the GeForce 9800 GX2, the Tesla C1060, and the Tesla S1070. Our results show that the GPGPUs can provide significant speedup over conventional processors. In particular, the fastest GPGPU utilized, the Tesla S1070, provided a speedup of 5.6 and 84.4 over highly optimized implementations on the fastest central processing unit (CPU) tested, a quadcore 2.67 GHz Xeon processor, for the Izhikevich and the Hodgkin-Huxley models, respectively. The CPU implementation utilized all four cores and the vector data parallelism offered by the processor. The results indicate that GPUs are well suited for this application domain.
Analysis of Interactive Graphics Display Equipment for an Automated Photo Interpretation System.
1982-06-01
System provides the hardware and software for a range of graphics processor tasks. The IMAGE System employs the RSX- II M real - time operating . system in...One hard copy unit serves up to four work stations. The executive program of the IMAGE system is the DEC RSX- 11 M real - time operating system . In...picture controller. The PDP 11/34 executes programs concurrently under the RSX- I IM real - time operating system . Each graphics program consists of a
NASA Astrophysics Data System (ADS)
Dave, Gaurav P.; Sureshkumar, N.; Blessy Trencia Lincy, S. S.
2017-11-01
Current trend in processor manufacturing focuses on multi-core architectures rather than increasing the clock speed for performance improvement. Graphic processors have become as commodity hardware for providing fast co-processing in computer systems. Developments in IoT, social networking web applications, big data created huge demand for data processing activities and such kind of throughput intensive applications inherently contains data level parallelism which is more suited for SIMD architecture based GPU. This paper reviews the architectural aspects of multi/many core processors and graphics processors. Different case studies are taken to compare performance of throughput computing applications using shared memory programming in OpenMP and CUDA API based programming.
Numerical simulation of air hypersonic flows with equilibrium chemical reactions
NASA Astrophysics Data System (ADS)
Emelyanov, Vladislav; Karpenko, Anton; Volkov, Konstantin
2018-05-01
The finite volume method is applied to solve unsteady three-dimensional compressible Navier-Stokes equations on unstructured meshes. High-temperature gas effects altering the aerodynamics of vehicles are taken into account. Possibilities of the use of graphics processor units (GPUs) for the simulation of hypersonic flows are demonstrated. Solutions of some test cases on GPUs are reported, and a comparison between computational results of equilibrium chemically reacting and perfect air flowfields is performed. Speedup of solution on GPUs with respect to the solution on central processor units (CPUs) is compared. The results obtained provide promising perspective for designing a GPU-based software framework for practical applications.
Konstantinidis, Evdokimos I; Frantzidis, Christos A; Pappas, Costas; Bamidis, Panagiotis D
2012-07-01
In this paper the feasibility of adopting Graphic Processor Units towards real-time emotion aware computing is investigated for boosting the time consuming computations employed in such applications. The proposed methodology was employed in analysis of encephalographic and electrodermal data gathered when participants passively viewed emotional evocative stimuli. The GPU effectiveness when processing electroencephalographic and electrodermal recordings is demonstrated by comparing the execution time of chaos/complexity analysis through nonlinear dynamics (multi-channel correlation dimension/D2) and signal processing algorithms (computation of skin conductance level/SCL) into various popular programming environments. Apart from the beneficial role of parallel programming, the adoption of special design techniques regarding memory management may further enhance the time minimization which approximates a factor of 30 in comparison with ANSI C language (single-core sequential execution). Therefore, the use of GPU parallel capabilities offers a reliable and robust solution for real-time sensing the user's affective state. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Quantum optimal control with automatic differentiation using graphics processors
NASA Astrophysics Data System (ADS)
Leung, Nelson; Abdelhafez, Mohamed; Chakram, Srivatsan; Naik, Ravi; Groszkowski, Peter; Koch, Jens; Schuster, David
We implement quantum optimal control based on automatic differentiation and harness the acceleration afforded by graphics processing units (GPUs). Automatic differentiation allows us to specify advanced optimization criteria and incorporate them into the optimization process with ease. We will describe efficient techniques to optimally control weakly anharmonic systems that are commonly encountered in circuit QED, including coupled superconducting transmon qubits and multi-cavity circuit QED systems. These systems allow for a rich variety of control schemes that quantum optimal control is well suited to explore.
Lee, Anthony; Yau, Christopher; Giles, Michael B.; Doucet, Arnaud; Holmes, Christopher C.
2011-01-01
We present a case-study on the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. Graphics cards, containing multiple Graphics Processing Units (GPUs), are self-contained parallel computational devices that can be housed in conventional desktop and laptop computers and can be thought of as prototypes of the next generation of many-core processors. For certain classes of population-based Monte Carlo algorithms they offer massively parallel simulation, with the added advantage over conventional distributed multi-core processors that they are cheap, easily accessible, easy to maintain, easy to code, dedicated local devices with low power consumption. On a canonical set of stochastic simulation examples including population-based Markov chain Monte Carlo methods and Sequential Monte Carlo methods, we nd speedups from 35 to 500 fold over conventional single-threaded computer code. Our findings suggest that GPUs have the potential to facilitate the growth of statistical modelling into complex data rich domains through the availability of cheap and accessible many-core computation. We believe the speedup we observe should motivate wider use of parallelizable simulation methods and greater methodological attention to their design. PMID:22003276
Analysis of impact of general-purpose graphics processor units in supersonic flow modeling
NASA Astrophysics Data System (ADS)
Emelyanov, V. N.; Karpenko, A. G.; Kozelkov, A. S.; Teterina, I. V.; Volkov, K. N.; Yalozo, A. V.
2017-06-01
Computational methods are widely used in prediction of complex flowfields associated with off-normal situations in aerospace engineering. Modern graphics processing units (GPU) provide architectures and new programming models that enable to harness their large processing power and to design computational fluid dynamics (CFD) simulations at both high performance and low cost. Possibilities of the use of GPUs for the simulation of external and internal flows on unstructured meshes are discussed. The finite volume method is applied to solve three-dimensional unsteady compressible Euler and Navier-Stokes equations on unstructured meshes with high resolution numerical schemes. CUDA technology is used for programming implementation of parallel computational algorithms. Solutions of some benchmark test cases on GPUs are reported, and the results computed are compared with experimental and computational data. Approaches to optimization of the CFD code related to the use of different types of memory are considered. Speedup of solution on GPUs with respect to the solution on central processor unit (CPU) is compared. Performance measurements show that numerical schemes developed achieve 20-50 speedup on GPU hardware compared to CPU reference implementation. The results obtained provide promising perspective for designing a GPU-based software framework for applications in CFD.
System Level RBDO for Military Ground Vehicles using High Performance Computing
2008-01-01
platform. Only the analyses that required more than 24 processors were conducted on the Onyx 350 due to the limited number of processors on the...optimization constraints varied. The queues set the number of processors and number of finite element code licenses available to the analyses. sgi ONYX ...3900: unix 24 MIPS R16000 PROCESSORS 4 IR2 GRAPHICS PIPES 4 IR3 GRAPHICS PIPES 24 GBYTES MEMORY 36 GBYTES LOCAL DISK SPACE sgi ONYX 350: unix 32 MIPS
Apparatus and method for implementing power saving techniques when processing floating point values
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kim, Young Moon; Park, Sang Phill
An apparatus and method are described for reducing power when reading and writing graphics data. For example, one embodiment of an apparatus comprises: a graphics processor unit (GPU) to process graphics data including floating point data; a set of registers, at least one of the registers of the set partitioned to store the floating point data; and encode/decode logic to reduce a number of binary 1 values being read from the at least one register by causing a specified set of bit positions within the floating point data to be read out as 0s rather than 1s.
NASA Technical Reports Server (NTRS)
Hirt, E. F.; Fox, G. L.
1982-01-01
Two specific NASTRAN preprocessors and postprocessors are examined. A postprocessor for dynamic analysis and a graphical interactive package for model generation and review of resuls are presented. A computer program that provides response spectrum analysis capability based on data from NASTRAN finite element model is described and the GIFTS system, a graphic processor to augment NASTRAN is introduced.
CPU architecture for a fast and energy-saving calculation of convolution neural networks
NASA Astrophysics Data System (ADS)
Knoll, Florian J.; Grelcke, Michael; Czymmek, Vitali; Holtorf, Tim; Hussmann, Stephan
2017-06-01
One of the most difficult problem in the use of artificial neural networks is the computational capacity. Although large search engine companies own specially developed hardware to provide the necessary computing power, for the conventional user only remains the state of the art method, which is the use of a graphic processing unit (GPU) as a computational basis. Although these processors are well suited for large matrix computations, they need massive energy. Therefore a new processor on the basis of a field programmable gate array (FPGA) has been developed and is optimized for the application of deep learning. This processor is presented in this paper. The processor can be adapted for a particular application (in this paper to an organic farming application). The power consumption is only a fraction of a GPU application and should therefore be well suited for energy-saving applications.
NASA Astrophysics Data System (ADS)
Grzeszczuk, A.; Kowalski, S.
2015-04-01
Compute Unified Device Architecture (CUDA) is a parallel computing platform developed by Nvidia for increase speed of graphics by usage of parallel mode for processes calculation. The success of this solution has opened technology General-Purpose Graphic Processor Units (GPGPUs) for applications not coupled with graphics. The GPGPUs system can be applying as effective tool for reducing huge number of data for pulse shape analysis measures, by on-line recalculation or by very quick system of compression. The simplified structure of CUDA system and model of programming based on example Nvidia GForce GTX580 card are presented by our poster contribution in stand-alone version and as ROOT application.
Deep Unsupervised Learning on a Desktop PC: A Primer for Cognitive Scientists.
Testolin, Alberto; Stoianov, Ivilin; De Filippo De Grazia, Michele; Zorzi, Marco
2013-01-01
Deep belief networks hold great promise for the simulation of human cognition because they show how structured and abstract representations may emerge from probabilistic unsupervised learning. These networks build a hierarchy of progressively more complex distributed representations of the sensory data by fitting a hierarchical generative model. However, learning in deep networks typically requires big datasets and it can involve millions of connection weights, which implies that simulations on standard computers are unfeasible. Developing realistic, medium-to-large-scale learning models of cognition would therefore seem to require expertise in programing parallel-computing hardware, and this might explain why the use of this promising approach is still largely confined to the machine learning community. Here we show how simulations of deep unsupervised learning can be easily performed on a desktop PC by exploiting the processors of low cost graphic cards (graphic processor units) without any specific programing effort, thanks to the use of high-level programming routines (available in MATLAB or Python). We also show that even an entry-level graphic card can outperform a small high-performance computing cluster in terms of learning time and with no loss of learning quality. We therefore conclude that graphic card implementations pave the way for a widespread use of deep learning among cognitive scientists for modeling cognition and behavior.
Deep Unsupervised Learning on a Desktop PC: A Primer for Cognitive Scientists
Testolin, Alberto; Stoianov, Ivilin; De Filippo De Grazia, Michele; Zorzi, Marco
2013-01-01
Deep belief networks hold great promise for the simulation of human cognition because they show how structured and abstract representations may emerge from probabilistic unsupervised learning. These networks build a hierarchy of progressively more complex distributed representations of the sensory data by fitting a hierarchical generative model. However, learning in deep networks typically requires big datasets and it can involve millions of connection weights, which implies that simulations on standard computers are unfeasible. Developing realistic, medium-to-large-scale learning models of cognition would therefore seem to require expertise in programing parallel-computing hardware, and this might explain why the use of this promising approach is still largely confined to the machine learning community. Here we show how simulations of deep unsupervised learning can be easily performed on a desktop PC by exploiting the processors of low cost graphic cards (graphic processor units) without any specific programing effort, thanks to the use of high-level programming routines (available in MATLAB or Python). We also show that even an entry-level graphic card can outperform a small high-performance computing cluster in terms of learning time and with no loss of learning quality. We therefore conclude that graphic card implementations pave the way for a widespread use of deep learning among cognitive scientists for modeling cognition and behavior. PMID:23653617
MULTI-CORE AND OPTICAL PROCESSOR RELATED APPLICATIONS RESEARCH AT OAK RIDGE NATIONAL LABORATORY
DOE Office of Scientific and Technical Information (OSTI.GOV)
Barhen, Jacob; Kerekes, Ryan A; ST Charles, Jesse Lee
2008-01-01
High-speed parallelization of common tasks holds great promise as a low-risk approach to achieving the significant increases in signal processing and computational performance required for next generation innovations in reconfigurable radio systems. Researchers at the Oak Ridge National Laboratory have been working on exploiting the parallelization offered by this emerging technology and applying it to a variety of problems. This paper will highlight recent experience with four different parallel processors applied to signal processing tasks that are directly relevant to signal processing required for SDR/CR waveforms. The first is the EnLight Optical Core Processor applied to matched filter (MF) correlationmore » processing via fast Fourier transform (FFT) of broadband Dopplersensitive waveforms (DSW) using active sonar arrays for target tracking. The second is the IBM CELL Broadband Engine applied to 2-D discrete Fourier transform (DFT) kernel for image processing and frequency domain processing. And the third is the NVIDIA graphical processor applied to document feature clustering. EnLight Optical Core Processor. Optical processing is inherently capable of high-parallelism that can be translated to very high performance, low power dissipation computing. The EnLight 256 is a small form factor signal processing chip (5x5 cm2) with a digital optical core that is being developed by an Israeli startup company. As part of its evaluation of foreign technology, ORNL's Center for Engineering Science Advanced Research (CESAR) had access to a precursor EnLight 64 Alpha hardware for a preliminary assessment of capabilities in terms of large Fourier transforms for matched filter banks and on applications related to Doppler-sensitive waveforms. This processor is optimized for array operations, which it performs in fixed-point arithmetic at the rate of 16 TeraOPS at 8-bit precision. This is approximately 1000 times faster than the fastest DSP available today. The optical core performs the matrix-vector multiplications, where the nominal matrix size is 256x256. The system clock is 125MHz. At each clock cycle, 128K multiply-and-add operations per second (OPS) are carried out, which yields a peak performance of 16 TeraOPS. IBM Cell Broadband Engine. The Cell processor is the extraordinary resulting product of 5 years of sustained, intensive R&D collaboration (involving over $400M investment) between IBM, Sony, and Toshiba. Its architecture comprises one multithreaded 64-bit PowerPC processor element (PPE) with VMX capabilities and two levels of globally coherent cache, and 8 synergistic processor elements (SPEs). Each SPE consists of a processor (SPU) designed for streaming workloads, local memory, and a globally coherent direct memory access (DMA) engine. Computations are performed in 128-bit wide single instruction multiple data streams (SIMD). An integrated high-bandwidth element interconnect bus (EIB) connects the nine processors and their ports to external memory and to system I/O. The Applied Software Engineering Research (ASER) Group at the ORNL is applying the Cell to a variety of text and image analysis applications. Research on Cell-equipped PlayStation3 (PS3) consoles has led to the development of a correlation-based image recognition engine that enables a single PS3 to process images at more than 10X the speed of state-of-the-art single-core processors. NVIDIA Graphics Processing Units. The ASER group is also employing the latest NVIDIA graphical processing units (GPUs) to accelerate clustering of thousands of text documents using recently developed clustering algorithms such as document flocking and affinity propagation.« less
Scan line graphics generation on the massively parallel processor
NASA Technical Reports Server (NTRS)
Dorband, John E.
1988-01-01
Described here is how researchers implemented a scan line graphics generation algorithm on the Massively Parallel Processor (MPP). Pixels are computed in parallel and their results are applied to the Z buffer in large groups. To perform pixel value calculations, facilitate load balancing across the processors and apply the results to the Z buffer efficiently in parallel requires special virtual routing (sort computation) techniques developed by the author especially for use on single-instruction multiple-data (SIMD) architectures.
The development of an engineering computer graphics laboratory
NASA Technical Reports Server (NTRS)
Anderson, D. C.; Garrett, R. E.
1975-01-01
Hardware and software systems developed to further research and education in interactive computer graphics were described, as well as several of the ongoing application-oriented projects, educational graphics programs, and graduate research projects. The software system consists of a FORTRAN 4 subroutine package, in conjunction with a PDP 11/40 minicomputer as the primary computation processor and the Imlac PDS-1 as an intelligent display processor. The package comprises a comprehensive set of graphics routines for dynamic, structured two-dimensional display manipulation, and numerous routines to handle a variety of input devices at the Imlac.
Komarov, Ivan; D'Souza, Roshan M
2012-01-01
The Gillespie Stochastic Simulation Algorithm (GSSA) and its variants are cornerstone techniques to simulate reaction kinetics in situations where the concentration of the reactant is too low to allow deterministic techniques such as differential equations. The inherent limitations of the GSSA include the time required for executing a single run and the need for multiple runs for parameter sweep exercises due to the stochastic nature of the simulation. Even very efficient variants of GSSA are prohibitively expensive to compute and perform parameter sweeps. Here we present a novel variant of the exact GSSA that is amenable to acceleration by using graphics processing units (GPUs). We parallelize the execution of a single realization across threads in a warp (fine-grained parallelism). A warp is a collection of threads that are executed synchronously on a single multi-processor. Warps executing in parallel on different multi-processors (coarse-grained parallelism) simultaneously generate multiple trajectories. Novel data-structures and algorithms reduce memory traffic, which is the bottleneck in computing the GSSA. Our benchmarks show an 8×-120× performance gain over various state-of-the-art serial algorithms when simulating different types of models.
2015-06-01
5110P and 16 dx360M4 nodes each with one NVIDIA Kepler K20M/K40M GPU. Each node contained dual Intel Xeon E5-2670 (Sandy Bridge) central processing...kernel and as such does not employ multiple processors. This work makes use of a single processing core and a single NVIDIA Kepler K40 GK110...bandwidth (2 × 16 slot), 7.877 GFloat/s; Kepler K40 peak, 4,290 × 1 billion floating-point operations (GFLOPs), and 288 GB/s Kepler K40 memory
Rana, Vijay; Rudin, Stephen; Bednarek, Daniel R.
2012-01-01
We have developed a dose-tracking system (DTS) that calculates the radiation dose to the patient’s skin in real-time by acquiring exposure parameters and imaging-system-geometry from the digital bus on a Toshiba Infinix C-arm unit. The cumulative dose values are then displayed as a color map on an OpenGL-based 3D graphic of the patient for immediate feedback to the interventionalist. Determination of those elements on the surface of the patient 3D-graphic that intersect the beam and calculation of the dose for these elements in real time demands fast computation. Reducing the size of the elements results in more computation load on the computer processor and therefore a tradeoff occurs between the resolution of the patient graphic and the real-time performance of the DTS. The speed of the DTS for calculating dose to the skin is limited by the central processing unit (CPU) and can be improved by using the parallel processing power of a graphics processing unit (GPU). Here, we compare the performance speed of GPU-based DTS software to that of the current CPU-based software as a function of the resolution of the patient graphics. Results show a tremendous improvement in speed using the GPU. While an increase in the spatial resolution of the patient graphics resulted in slowing down the computational speed of the DTS on the CPU, the speed of the GPU-based DTS was hardly affected. This GPU-based DTS can be a powerful tool for providing accurate, real-time feedback about patient skin-dose to physicians while performing interventional procedures. PMID:24027616
Rana, Vijay; Rudin, Stephen; Bednarek, Daniel R
2012-02-23
We have developed a dose-tracking system (DTS) that calculates the radiation dose to the patient's skin in real-time by acquiring exposure parameters and imaging-system-geometry from the digital bus on a Toshiba Infinix C-arm unit. The cumulative dose values are then displayed as a color map on an OpenGL-based 3D graphic of the patient for immediate feedback to the interventionalist. Determination of those elements on the surface of the patient 3D-graphic that intersect the beam and calculation of the dose for these elements in real time demands fast computation. Reducing the size of the elements results in more computation load on the computer processor and therefore a tradeoff occurs between the resolution of the patient graphic and the real-time performance of the DTS. The speed of the DTS for calculating dose to the skin is limited by the central processing unit (CPU) and can be improved by using the parallel processing power of a graphics processing unit (GPU). Here, we compare the performance speed of GPU-based DTS software to that of the current CPU-based software as a function of the resolution of the patient graphics. Results show a tremendous improvement in speed using the GPU. While an increase in the spatial resolution of the patient graphics resulted in slowing down the computational speed of the DTS on the CPU, the speed of the GPU-based DTS was hardly affected. This GPU-based DTS can be a powerful tool for providing accurate, real-time feedback about patient skin-dose to physicians while performing interventional procedures.
Architectures for single-chip image computing
NASA Astrophysics Data System (ADS)
Gove, Robert J.
1992-04-01
This paper will focus on the architectures of VLSI programmable processing components for image computing applications. TI, the maker of industry-leading RISC, DSP, and graphics components, has developed an architecture for a new-generation of image processors capable of implementing a plurality of image, graphics, video, and audio computing functions. We will show that the use of a single-chip heterogeneous MIMD parallel architecture best suits this class of processors--those which will dominate the desktop multimedia, document imaging, computer graphics, and visualization systems of this decade.
Production Level CFD Code Acceleration for Hybrid Many-Core Architectures
NASA Technical Reports Server (NTRS)
Duffy, Austen C.; Hammond, Dana P.; Nielsen, Eric J.
2012-01-01
In this work, a novel graphics processing unit (GPU) distributed sharing model for hybrid many-core architectures is introduced and employed in the acceleration of a production-level computational fluid dynamics (CFD) code. The latest generation graphics hardware allows multiple processor cores to simultaneously share a single GPU through concurrent kernel execution. This feature has allowed the NASA FUN3D code to be accelerated in parallel with up to four processor cores sharing a single GPU. For codes to scale and fully use resources on these and the next generation machines, codes will need to employ some type of GPU sharing model, as presented in this work. Findings include the effects of GPU sharing on overall performance. A discussion of the inherent challenges that parallel unstructured CFD codes face in accelerator-based computing environments is included, with considerations for future generation architectures. This work was completed by the author in August 2010, and reflects the analysis and results of the time.
Exact diagonalization of quantum lattice models on coprocessors
NASA Astrophysics Data System (ADS)
Siro, T.; Harju, A.
2016-10-01
We implement the Lanczos algorithm on an Intel Xeon Phi coprocessor and compare its performance to a multi-core Intel Xeon CPU and an NVIDIA graphics processor. The Xeon and the Xeon Phi are parallelized with OpenMP and the graphics processor is programmed with CUDA. The performance is evaluated by measuring the execution time of a single step in the Lanczos algorithm. We study two quantum lattice models with different particle numbers, and conclude that for small systems, the multi-core CPU is the fastest platform, while for large systems, the graphics processor is the clear winner, reaching speedups of up to 7.6 compared to the CPU. The Xeon Phi outperforms the CPU with sufficiently large particle number, reaching a speedup of 2.5.
NASA Astrophysics Data System (ADS)
Nishiura, Daisuke; Furuichi, Mikito; Sakaguchi, Hide
2015-09-01
The computational performance of a smoothed particle hydrodynamics (SPH) simulation is investigated for three types of current shared-memory parallel computer devices: many integrated core (MIC) processors, graphics processing units (GPUs), and multi-core CPUs. We are especially interested in efficient shared-memory allocation methods for each chipset, because the efficient data access patterns differ between compute unified device architecture (CUDA) programming for GPUs and OpenMP programming for MIC processors and multi-core CPUs. We first introduce several parallel implementation techniques for the SPH code, and then examine these on our target computer architectures to determine the most effective algorithms for each processor unit. In addition, we evaluate the effective computing performance and power efficiency of the SPH simulation on each architecture, as these are critical metrics for overall performance in a multi-device environment. In our benchmark test, the GPU is found to produce the best arithmetic performance as a standalone device unit, and gives the most efficient power consumption. The multi-core CPU obtains the most effective computing performance. The computational speed of the MIC processor on Xeon Phi approached that of two Xeon CPUs. This indicates that using MICs is an attractive choice for existing SPH codes on multi-core CPUs parallelized by OpenMP, as it gains computational acceleration without the need for significant changes to the source code.
High-performance computing on GPUs for resistivity logging of oil and gas wells
NASA Astrophysics Data System (ADS)
Glinskikh, V.; Dudaev, A.; Nechaev, O.; Surodina, I.
2017-10-01
We developed and implemented into software an algorithm for high-performance simulation of electrical logs from oil and gas wells using high-performance heterogeneous computing. The numerical solution of the 2D forward problem is based on the finite-element method and the Cholesky decomposition for solving a system of linear algebraic equations (SLAE). Software implementations of the algorithm used the NVIDIA CUDA technology and computing libraries are made, allowing us to perform decomposition of SLAE and find its solution on central processor unit (CPU) and graphics processor unit (GPU). The calculation time is analyzed depending on the matrix size and number of its non-zero elements. We estimated the computing speed on CPU and GPU, including high-performance heterogeneous CPU-GPU computing. Using the developed algorithm, we simulated resistivity data in realistic models.
Watanabe, Yuuki; Takahashi, Yuhei; Numazawa, Hiroshi
2014-02-01
We demonstrate intensity-based optical coherence tomography (OCT) angiography using the squared difference of two sequential frames with bulk-tissue-motion (BTM) correction. This motion correction was performed by minimization of the sum of the pixel values using axial- and lateral-pixel-shifted structural OCT images. We extract the BTM-corrected image from a total of 25 calculated OCT angiographic images. Image processing was accelerated by a graphics processing unit (GPU) with many stream processors to optimize the parallel processing procedure. The GPU processing rate was faster than that of a line scan camera (46.9 kHz). Our OCT system provides the means of displaying structural OCT images and BTM-corrected OCT angiographic images in real time.
Graphics Processing Unit Acceleration of Gyrokinetic Turbulence Simulations
NASA Astrophysics Data System (ADS)
Hause, Benjamin; Parker, Scott
2012-10-01
We find a substantial increase in on-node performance using Graphics Processing Unit (GPU) acceleration in gyrokinetic delta-f particle-in-cell simulation. Optimization is performed on a two-dimensional slab gyrokinetic particle simulation using the Portland Group Fortran compiler with the GPU accelerator compiler directives. We have implemented the GPU acceleration on a Core I7 gaming PC with a NVIDIA GTX 580 GPU. We find comparable, or better, acceleration relative to the NERSC DIRAC cluster with the NVIDIA Tesla C2050 computing processor. The Tesla C 2050 is about 2.6 times more expensive than the GTX 580 gaming GPU. Optimization strategies and comparisons between DIRAC and the gaming PC will be presented. We will also discuss progress on optimizing the comprehensive three dimensional general geometry GEM code.
Execution of a parallel edge-based Navier-Stokes solver on commodity graphics processor units
NASA Astrophysics Data System (ADS)
Corral, Roque; Gisbert, Fernando; Pueblas, Jesus
2017-02-01
The implementation of an edge-based three-dimensional Reynolds Average Navier-Stokes solver for unstructured grids able to run on multiple graphics processing units (GPUs) is presented. Loops over edges, which are the most time-consuming part of the solver, have been written to exploit the massively parallel capabilities of GPUs. Non-blocking communications between parallel processes and between the GPU and the central processor unit (CPU) have been used to enhance code scalability. The code is written using a mixture of C++ and OpenCL, to allow the execution of the source code on GPUs. The Message Passage Interface (MPI) library is used to allow the parallel execution of the solver on multiple GPUs. A comparative study of the solver parallel performance is carried out using a cluster of CPUs and another of GPUs. It is shown that a single GPU is up to 64 times faster than a single CPU core. The parallel scalability of the solver is mainly degraded due to the loss of computing efficiency of the GPU when the size of the case decreases. However, for large enough grid sizes, the scalability is strongly improved. A cluster featuring commodity GPUs and a high bandwidth network is ten times less costly and consumes 33% less energy than a CPU-based cluster with an equivalent computational power.
A Fast MHD Code for Gravitationally Stratified Media using Graphical Processing Units: SMAUG
NASA Astrophysics Data System (ADS)
Griffiths, M. K.; Fedun, V.; Erdélyi, R.
2015-03-01
Parallelization techniques have been exploited most successfully by the gaming/graphics industry with the adoption of graphical processing units (GPUs), possessing hundreds of processor cores. The opportunity has been recognized by the computational sciences and engineering communities, who have recently harnessed successfully the numerical performance of GPUs. For example, parallel magnetohydrodynamic (MHD) algorithms are important for numerical modelling of highly inhomogeneous solar, astrophysical and geophysical plasmas. Here, we describe the implementation of SMAUG, the Sheffield Magnetohydrodynamics Algorithm Using GPUs. SMAUG is a 1-3D MHD code capable of modelling magnetized and gravitationally stratified plasma. The objective of this paper is to present the numerical methods and techniques used for porting the code to this novel and highly parallel compute architecture. The methods employed are justified by the performance benchmarks and validation results demonstrating that the code successfully simulates the physics for a range of test scenarios including a full 3D realistic model of wave propagation in the solar atmosphere.
Fast generation of computer-generated hologram by graphics processing unit
NASA Astrophysics Data System (ADS)
Matsuda, Sho; Fujii, Tomohiko; Yamaguchi, Takeshi; Yoshikawa, Hiroshi
2009-02-01
A cylindrical hologram is well known to be viewable in 360 deg. This hologram depends high pixel resolution.Therefore, Computer-Generated Cylindrical Hologram (CGCH) requires huge calculation amount.In our previous research, we used look-up table method for fast calculation with Intel Pentium4 2.8 GHz.It took 480 hours to calculate high resolution CGCH (504,000 x 63,000 pixels and the average number of object points are 27,000).To improve quality of CGCH reconstructed image, fringe pattern requires higher spatial frequency and resolution.Therefore, to increase the calculation speed, we have to change the calculation method. In this paper, to reduce the calculation time of CGCH (912,000 x 108,000 pixels), we employ Graphics Processing Unit (GPU).It took 4,406 hours to calculate high resolution CGCH on Xeon 3.4 GHz.Since GPU has many streaming processors and a parallel processing structure, GPU works as the high performance parallel processor.In addition, GPU gives max performance to 2 dimensional data and streaming data.Recently, GPU can be utilized for the general purpose (GPGPU).For example, NVIDIA's GeForce7 series became a programmable processor with Cg programming language.Next GeForce8 series have CUDA as software development kit made by NVIDIA.Theoretically, calculation ability of GPU is announced as 500 GFLOPS. From the experimental result, we have achieved that 47 times faster calculation compared with our previous work which used CPU.Therefore, CGCH can be generated in 95 hours.So, total time is 110 hours to calculate and print the CGCH.
Overview of implementation of DARPA GPU program in SAIC
NASA Astrophysics Data System (ADS)
Braunreiter, Dennis; Furtek, Jeremy; Chen, Hai-Wen; Healy, Dennis
2008-04-01
This paper reviews the implementation of DARPA MTO STAP-BOY program for both Phase I and II conducted at Science Applications International Corporation (SAIC). The STAP-BOY program conducts fast covariance factorization and tuning techniques for space-time adaptive process (STAP) Algorithm Implementation on Graphics Processor unit (GPU) Architectures for Embedded Systems. The first part of our presentation on the DARPA STAP-BOY program will focus on GPU implementation and algorithm innovations for a prototype radar STAP algorithm. The STAP algorithm will be implemented on the GPU, using stream programming (from companies such as PeakStream, ATI Technologies' CTM, and NVIDIA) and traditional graphics APIs. This algorithm will include fast range adaptive STAP weight updates and beamforming applications, each of which has been modified to exploit the parallel nature of graphics architectures.
High-performance dynamic quantum clustering on graphics processors
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wittek, Peter, E-mail: peterwittek@acm.org
2013-01-15
Clustering methods in machine learning may benefit from borrowing metaphors from physics. Dynamic quantum clustering associates a Gaussian wave packet with the multidimensional data points and regards them as eigenfunctions of the Schroedinger equation. The clustering structure emerges by letting the system evolve and the visual nature of the algorithm has been shown to be useful in a range of applications. Furthermore, the method only uses matrix operations, which readily lend themselves to parallelization. In this paper, we develop an implementation on graphics hardware and investigate how this approach can accelerate the computations. We achieve a speedup of up tomore » two magnitudes over a multicore CPU implementation, which proves that quantum-like methods and acceleration by graphics processing units have a great relevance to machine learning.« less
Computational Particle Dynamic Simulations on Multicore Processors (CPDMu) Final Report Phase I
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schmalz, Mark S
2011-07-24
Statement of Problem - Department of Energy has many legacy codes for simulation of computational particle dynamics and computational fluid dynamics applications that are designed to run on sequential processors and are not easily parallelized. Emerging high-performance computing architectures employ massively parallel multicore architectures (e.g., graphics processing units) to increase throughput. Parallelization of legacy simulation codes is a high priority, to achieve compatibility, efficiency, accuracy, and extensibility. General Statement of Solution - A legacy simulation application designed for implementation on mainly-sequential processors has been represented as a graph G. Mathematical transformations, applied to G, produce a graph representation {und G}more » for a high-performance architecture. Key computational and data movement kernels of the application were analyzed/optimized for parallel execution using the mapping G {yields} {und G}, which can be performed semi-automatically. This approach is widely applicable to many types of high-performance computing systems, such as graphics processing units or clusters comprised of nodes that contain one or more such units. Phase I Accomplishments - Phase I research decomposed/profiled computational particle dynamics simulation code for rocket fuel combustion into low and high computational cost regions (respectively, mainly sequential and mainly parallel kernels), with analysis of space and time complexity. Using the research team's expertise in algorithm-to-architecture mappings, the high-cost kernels were transformed, parallelized, and implemented on Nvidia Fermi GPUs. Measured speedups (GPU with respect to single-core CPU) were approximately 20-32X for realistic model parameters, without final optimization. Error analysis showed no loss of computational accuracy. Commercial Applications and Other Benefits - The proposed research will constitute a breakthrough in solution of problems related to efficient parallel computation of particle and fluid dynamics simulations. These problems occur throughout DOE, military and commercial sectors: the potential payoff is high. We plan to license or sell the solution to contractors for military and domestic applications such as disaster simulation (aerodynamic and hydrodynamic), Government agencies (hydrological and environmental simulations), and medical applications (e.g., in tomographic image reconstruction). Keywords - High-performance Computing, Graphic Processing Unit, Fluid/Particle Simulation. Summary for Members of Congress - Department of Energy has many simulation codes that must compute faster, to be effective. The Phase I research parallelized particle/fluid simulations for rocket combustion, for high-performance computing systems.« less
Vector generator scan converter
Moore, James M.; Leighton, James F.
1990-01-01
High printing speeds for graphics data are achieved with a laser printer by transmitting compressed graphics data from a main processor over an I/O (input/output) channel to a vector generator scan converter which reconstructs a full graphics image for input to the laser printer through a raster data input port. The vector generator scan converter includes a microprocessor with associated microcode memory containing a microcode instruction set, a working memory for storing compressed data, vector generator hardward for drawing a full graphic image from vector parameters calculated by the microprocessor, image buffer memory for storing the reconstructed graphics image and an output scanner for reading the graphics image data and inputting the data to the printer. The vector generator scan converter eliminates the bottleneck created by the I/O channel for transmitting graphics data from the main processor to the laser printer, and increases printer speed up to thirty fold.
Vector generator scan converter
Moore, J.M.; Leighton, J.F.
1988-02-05
High printing speeds for graphics data are achieved with a laser printer by transmitting compressed graphics data from a main processor over an I/O channel to a vector generator scan converter which reconstructs a full graphics image for input to the laser printer through a raster data input port. The vector generator scan converter includes a microprocessor with associated microcode memory containing a microcode instruction set, a working memory for storing compressed data, vector generator hardware for drawing a full graphic image from vector parameters calculated by the microprocessor, image buffer memory for storing the reconstructed graphics image and an output scanner for reading the graphics image data and inputting the data to the printer. The vector generator scan converter eliminates the bottleneck created by the I/O channel for transmitting graphics data from the main processor to the laser printer, and increases printer speed up to thirty fold. 7 figs.
Efficient Sorting on the Tilera Manycore Architecture
DOE Office of Scientific and Technical Information (OSTI.GOV)
Morari, Alessandro; Tumeo, Antonino; Villa, Oreste
e present an efficient implementation of the radix sort algo- rithm for the Tilera TILEPro64 processor. The TILEPro64 is one of the first successful commercial manycore processors. It is com- posed of 64 tiles interconnected through multiple fast Networks- on-chip and features a fully coherent, shared distributed cache. The architecture has a large degree of flexibility, and allows various optimization strategies. We describe how we mapped the algorithm to this architecture. We present an in-depth analysis of the optimizations for each phase of the algorithm with respect to the processor’s sustained performance. We discuss the overall throughput reached by ourmore » radix sort implementation (up to 132 MK/s) and show that it provides comparable or better performance-per-watt with respect to state-of-the art implemen- tations on x86 processors and graphic processing units.« less
Visual Media Reasoning - Terrain-based Geolocation
2015-06-01
the drawings, specifications, or other data does not license the holder or any other person or corporation ; or convey any rights or permission to...3.4 Alternative Metric Investigation This section describes a graphics processor unit (GPU) based implementation in the NVIDIA CUDA programming...utilizing 2 concurrent CPU cores, each controlling a single Nvidia C2075 Tesla Fermi CUDA card. Figure 22 shows a comparison of the CPU and the GPU powered
NASA Astrophysics Data System (ADS)
Sewell, Stephen
This thesis introduces a software framework that effectively utilizes low-cost commercially available Graphic Processing Units (GPUs) to simulate complex scientific plasma phenomena that are modeled using the Particle-In-Cell (PIC) paradigm. The software framework that was developed conforms to the Compute Unified Device Architecture (CUDA), a standard for general purpose graphic processing that was introduced by NVIDIA Corporation. This framework has been verified for correctness and applied to advance the state of understanding of the electromagnetic aspects of the development of the Aurora Borealis and Aurora Australis. For each phase of the PIC methodology, this research has identified one or more methods to exploit the problem's natural parallelism and effectively map it for execution on the graphic processing unit and its host processor. The sources of overhead that can reduce the effectiveness of parallelization for each of these methods have also been identified. One of the novel aspects of this research was the utilization of particle sorting during the grid interpolation phase. The final representation resulted in simulations that executed about 38 times faster than simulations that were run on a single-core general-purpose processing system. The scalability of this framework to larger problem sizes and future generation systems has also been investigated.
Line-by-line spectroscopic simulations on graphics processing units
NASA Astrophysics Data System (ADS)
Collange, Sylvain; Daumas, Marc; Defour, David
2008-01-01
We report here on software that performs line-by-line spectroscopic simulations on gases. Elaborate models (such as narrow band and correlated-K) are accurate and efficient for bands where various components are not simultaneously and significantly active. Line-by-line is probably the most accurate model in the infrared for blends of gases that contain high proportions of H 2O and CO 2 as this was the case for our prototype simulation. Our implementation on graphics processing units sustains a speedup close to 330 on computation-intensive tasks and 12 on memory intensive tasks compared to implementations on one core of high-end processors. This speedup is due to data parallelism, efficient memory access for specific patterns and some dedicated hardware operators only available in graphics processing units. It is obtained leaving most of processor resources available and it would scale linearly with the number of graphics processing units in parallel machines. Line-by-line simulation coupled with simulation of fluid dynamics was long believed to be economically intractable but our work shows that it could be done with some affordable additional resources compared to what is necessary to perform simulations on fluid dynamics alone. Program summaryProgram title: GPU4RE Catalogue identifier: ADZY_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADZY_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 62 776 No. of bytes in distributed program, including test data, etc.: 1 513 247 Distribution format: tar.gz Programming language: C++ Computer: x86 PC Operating system: Linux, Microsoft Windows. Compilation requires either gcc/g++ under Linux or Visual C++ 2003/2005 and Cygwin under Windows. It has been tested using gcc 4.1.2 under Ubuntu Linux 7.04 and using Visual C++ 2005 with Cygwin 1.5.24 under Windows XP. RAM: 1 gigabyte Classification: 21.2 External routines: OpenGL ( http://www.opengl.org) Nature of problem: Simulating radiative transfer on high-temperature high-pressure gases. Solution method: Line-by-line Monte-Carlo ray-tracing. Unusual features: Parallel computations are moved to the GPU. Additional comments: nVidia GeForce 7000 or ATI Radeon X1000 series graphics processing unit is required. Running time: A few minutes.
GPU: the biggest key processor for AI and parallel processing
NASA Astrophysics Data System (ADS)
Baji, Toru
2017-07-01
Two types of processors exist in the market. One is the conventional CPU and the other is Graphic Processor Unit (GPU). Typical CPU is composed of 1 to 8 cores while GPU has thousands of cores. CPU is good for sequential processing, while GPU is good to accelerate software with heavy parallel executions. GPU was initially dedicated for 3D graphics. However from 2006, when GPU started to apply general-purpose cores, it was noticed that this architecture can be used as a general purpose massive-parallel processor. NVIDIA developed a software framework Compute Unified Device Architecture (CUDA) that make it possible to easily program the GPU for these application. With CUDA, GPU started to be used in workstations and supercomputers widely. Recently two key technologies are highlighted in the industry. The Artificial Intelligence (AI) and Autonomous Driving Cars. AI requires a massive parallel operation to train many-layers of neural networks. With CPU alone, it was impossible to finish the training in a practical time. The latest multi-GPU system with P100 makes it possible to finish the training in a few hours. For the autonomous driving cars, TOPS class of performance is required to implement perception, localization, path planning processing and again SoC with integrated GPU will play a key role there. In this paper, the evolution of the GPU which is one of the biggest commercial devices requiring state-of-the-art fabrication technology will be introduced. Also overview of the GPU demanding key application like the ones described above will be introduced.
Abdellah, Marwan; Eldeib, Ayman; Owis, Mohamed I
2015-01-01
This paper features an advanced implementation of the X-ray rendering algorithm that harnesses the giant computing power of the current commodity graphics processors to accelerate the generation of high resolution digitally reconstructed radiographs (DRRs). The presented pipeline exploits the latest features of NVIDIA Graphics Processing Unit (GPU) architectures, mainly bindless texture objects and dynamic parallelism. The rendering throughput is substantially improved by exploiting the interoperability mechanisms between CUDA and OpenGL. The benchmarks of our optimized rendering pipeline reflect its capability of generating DRRs with resolutions of 2048(2) and 4096(2) at interactive and semi interactive frame-rates using an NVIDIA GeForce 970 GTX device.
Transputer parallel processing at NASA Lewis Research Center
NASA Technical Reports Server (NTRS)
Ellis, Graham K.
1989-01-01
The transputer parallel processing lab at NASA Lewis Research Center (LeRC) consists of 69 processors (transputers) that can be connected into various networks for use in general purpose concurrent processing applications. The main goal of the lab is to develop concurrent scientific and engineering application programs that will take advantage of the computational speed increases available on a parallel processor over the traditional sequential processor. Current research involves the development of basic programming tools. These tools will help standardize program interfaces to specific hardware by providing a set of common libraries for applications programmers. The thrust of the current effort is in developing a set of tools for graphics rendering/animation. The applications programmer currently has two options for on-screen plotting. One option can be used for static graphics displays and the other can be used for animated motion. The option for static display involves the use of 2-D graphics primitives that can be called from within an application program. These routines perform the standard 2-D geometric graphics operations in real-coordinate space as well as allowing multiple windows on a single screen.
Accelerated Adaptive MGS Phase Retrieval
NASA Technical Reports Server (NTRS)
Lam, Raymond K.; Ohara, Catherine M.; Green, Joseph J.; Bikkannavar, Siddarayappa A.; Basinger, Scott A.; Redding, David C.; Shi, Fang
2011-01-01
The Modified Gerchberg-Saxton (MGS) algorithm is an image-based wavefront-sensing method that can turn any science instrument focal plane into a wavefront sensor. MGS characterizes optical systems by estimating the wavefront errors in the exit pupil using only intensity images of a star or other point source of light. This innovative implementation of MGS significantly accelerates the MGS phase retrieval algorithm by using stream-processing hardware on conventional graphics cards. Stream processing is a relatively new, yet powerful, paradigm to allow parallel processing of certain applications that apply single instructions to multiple data (SIMD). These stream processors are designed specifically to support large-scale parallel computing on a single graphics chip. Computationally intensive algorithms, such as the Fast Fourier Transform (FFT), are particularly well suited for this computing environment. This high-speed version of MGS exploits commercially available hardware to accomplish the same objective in a fraction of the original time. The exploit involves performing matrix calculations in nVidia graphic cards. The graphical processor unit (GPU) is hardware that is specialized for computationally intensive, highly parallel computation. From the software perspective, a parallel programming model is used, called CUDA, to transparently scale multicore parallelism in hardware. This technology gives computationally intensive applications access to the processing power of the nVidia GPUs through a C/C++ programming interface. The AAMGS (Accelerated Adaptive MGS) software takes advantage of these advanced technologies, to accelerate the optical phase error characterization. With a single PC that contains four nVidia GTX-280 graphic cards, the new implementation can process four images simultaneously to produce a JWST (James Webb Space Telescope) wavefront measurement 60 times faster than the previous code.
Particle-In-Cell simulations of high pressure plasmas using graphics processing units
NASA Astrophysics Data System (ADS)
Gebhardt, Markus; Atteln, Frank; Brinkmann, Ralf Peter; Mussenbrock, Thomas; Mertmann, Philipp; Awakowicz, Peter
2009-10-01
Particle-In-Cell (PIC) simulations are widely used to understand the fundamental phenomena in low-temperature plasmas. Particularly plasmas at very low gas pressures are studied using PIC methods. The inherent drawback of these methods is that they are very time consuming -- certain stability conditions has to be satisfied. This holds even more for the PIC simulation of high pressure plasmas due to the very high collision rates. The simulations take up to very much time to run on standard computers and require the help of computer clusters or super computers. Recent advances in the field of graphics processing units (GPUs) provides every personal computer with a highly parallel multi processor architecture for very little money. This architecture is freely programmable and can be used to implement a wide class of problems. In this paper we present the concepts of a fully parallel PIC simulation of high pressure plasmas using the benefits of GPU programming.
SIMD Optimization of Linear Expressions for Programmable Graphics Hardware
Bajaj, Chandrajit; Ihm, Insung; Min, Jungki; Oh, Jinsang
2009-01-01
The increased programmability of graphics hardware allows efficient graphical processing unit (GPU) implementations of a wide range of general computations on commodity PCs. An important factor in such implementations is how to fully exploit the SIMD computing capacities offered by modern graphics processors. Linear expressions in the form of ȳ = Ax̄ + b̄, where A is a matrix, and x̄, ȳ and b̄ are vectors, constitute one of the most basic operations in many scientific computations. In this paper, we propose a SIMD code optimization technique that enables efficient shader codes to be generated for evaluating linear expressions. It is shown that performance can be improved considerably by efficiently packing arithmetic operations into four-wide SIMD instructions through reordering of the operations in linear expressions. We demonstrate that the presented technique can be used effectively for programming both vertex and pixel shaders for a variety of mathematical applications, including integrating differential equations and solving a sparse linear system of equations using iterative methods. PMID:19946569
Parallel Computer System for 3D Visualization Stereo on GPU
NASA Astrophysics Data System (ADS)
Al-Oraiqat, Anas M.; Zori, Sergii A.
2018-03-01
This paper proposes the organization of a parallel computer system based on Graphic Processors Unit (GPU) for 3D stereo image synthesis. The development is based on the modified ray tracing method developed by the authors for fast search of tracing rays intersections with scene objects. The system allows significant increase in the productivity for the 3D stereo synthesis of photorealistic quality. The generalized procedure of 3D stereo image synthesis on the Graphics Processing Unit/Graphics Processing Clusters (GPU/GPC) is proposed. The efficiency of the proposed solutions by GPU implementation is compared with single-threaded and multithreaded implementations on the CPU. The achieved average acceleration in multi-thread implementation on the test GPU and CPU is about 7.5 and 1.6 times, respectively. Studying the influence of choosing the size and configuration of the computational Compute Unified Device Archi-tecture (CUDA) network on the computational speed shows the importance of their correct selection. The obtained experimental estimations can be significantly improved by new GPUs with a large number of processing cores and multiprocessors, as well as optimized configuration of the computing CUDA network.
Proton Testing of nVidia GTX 1050 GPU
NASA Technical Reports Server (NTRS)
Wyrwas, E. J.
2017-01-01
Single-Event Effects (SEE) testing was conducted on the nVidia GTX 1050 Graphics Processor Unit (GPU); herein referred to as device under test (DUT). Testing was conducted at Massachusetts General Hospitals (MGH) Francis H. Burr Proton Therapy Center on April 9th, 2017 using 200-MeV protons. This testing trip was purposed to provide a baseline assessment of the radiation susceptibility of the DUT as no previous testing had been conducted on this component.
Graphics Processing Unit (GPU) Acceleration of the Goddard Earth Observing System Atmospheric Model
NASA Technical Reports Server (NTRS)
Putnam, Williama
2011-01-01
The Goddard Earth Observing System 5 (GEOS-5) is the atmospheric model used by the Global Modeling and Assimilation Office (GMAO) for a variety of applications, from long-term climate prediction at relatively coarse resolution, to data assimilation and numerical weather prediction, to very high-resolution cloud-resolving simulations. GEOS-5 is being ported to a graphics processing unit (GPU) cluster at the NASA Center for Climate Simulation (NCCS). By utilizing GPU co-processor technology, we expect to increase the throughput of GEOS-5 by at least an order of magnitude, and accelerate the process of scientific exploration across all scales of global modeling, including: The large-scale, high-end application of non-hydrostatic, global, cloud-resolving modeling at 10- to I-kilometer (km) global resolutions Intermediate-resolution seasonal climate and weather prediction at 50- to 25-km on small clusters of GPUs Long-range, coarse-resolution climate modeling, enabled on a small box of GPUs for the individual researcher After being ported to the GPU cluster, the primary physics components and the dynamical core of GEOS-5 have demonstrated a potential speedup of 15-40 times over conventional processor cores. Performance improvements of this magnitude reduce the required scalability of 1-km, global, cloud-resolving models from an unfathomable 6 million cores to an attainable 200,000 GPU-enabled cores.
NASA Astrophysics Data System (ADS)
O'Connor, A. S.; Justice, B.; Harris, A. T.
2013-12-01
Graphics Processing Units (GPUs) are high-performance multiple-core processors capable of very high computational speeds and large data throughput. Modern GPUs are inexpensive and widely available commercially. These are general-purpose parallel processors with support for a variety of programming interfaces, including industry standard languages such as C. GPU implementations of algorithms that are well suited for parallel processing can often achieve speedups of several orders of magnitude over optimized CPU codes. Significant improvements in speeds for imagery orthorectification, atmospheric correction, target detection and image transformations like Independent Components Analsyis (ICA) have been achieved using GPU-based implementations. Additional optimizations, when factored in with GPU processing capabilities, can provide 50x - 100x reduction in the time required to process large imagery. Exelis Visual Information Solutions (VIS) has implemented a CUDA based GPU processing frame work for accelerating ENVI and IDL processes that can best take advantage of parallelization. Testing Exelis VIS has performed shows that orthorectification can take as long as two hours with a WorldView1 35,0000 x 35,000 pixel image. With GPU orthorecification, the same orthorectification process takes three minutes. By speeding up image processing, imagery can successfully be used by first responders, scientists making rapid discoveries with near real time data, and provides an operational component to data centers needing to quickly process and disseminate data.
Monte Carlo dose calculation using a cell processor based PlayStation 3 system
NASA Astrophysics Data System (ADS)
Chow, James C. L.; Lam, Phil; Jaffray, David A.
2012-02-01
This study investigates the performance of the EGSnrc computer code coupled with a Cell-based hardware in Monte Carlo simulation of radiation dose in radiotherapy. Performance evaluations of two processor-intensive functions namely, HOWNEAR and RANMAR_GET in the EGSnrc code were carried out basing on the 20-80 rule (Pareto principle). The execution speeds of the two functions were measured by the profiler gprof specifying the number of executions and total time spent on the functions. A testing architecture designed for Cell processor was implemented in the evaluation using a PlayStation3 (PS3) system. The evaluation results show that the algorithms examined are readily parallelizable on the Cell platform, provided that an architectural change of the EGSnrc was made. However, as the EGSnrc performance was limited by the PowerPC Processing Element in the PS3, PC coupled with graphics processing units or GPCPU may provide a more viable avenue for acceleration.
Processing-in-Memory Enabled Graphics Processors for 3D Rendering
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xie, Chenhao; Song, Shuaiwen; Wang, Jing
2017-02-06
The performance of 3D rendering of Graphics Processing Unit that convents 3D vector stream into 2D frame with 3D image effects significantly impact users’ gaming experience on modern computer systems. Due to the high texture throughput in 3D rendering, main memory bandwidth becomes a critical obstacle for improving the overall rendering performance. 3D stacked memory systems such as Hybrid Memory Cube (HMC) provide opportunities to significantly overcome the memory wall by directly connecting logic controllers to DRAM dies. Based on the observation that texel fetches significantly impact off-chip memory traffic, we propose two architectural designs to enable Processing-In-Memory based GPUmore » for efficient 3D rendering.« less
Equalizer: a scalable parallel rendering framework.
Eilemann, Stefan; Makhinya, Maxim; Pajarola, Renato
2009-01-01
Continuing improvements in CPU and GPU performances as well as increasing multi-core processor and cluster-based parallelism demand for flexible and scalable parallel rendering solutions that can exploit multipipe hardware accelerated graphics. In fact, to achieve interactive visualization, scalable rendering systems are essential to cope with the rapid growth of data sets. However, parallel rendering systems are non-trivial to develop and often only application specific implementations have been proposed. The task of developing a scalable parallel rendering framework is even more difficult if it should be generic to support various types of data and visualization applications, and at the same time work efficiently on a cluster with distributed graphics cards. In this paper we introduce a novel system called Equalizer, a toolkit for scalable parallel rendering based on OpenGL which provides an application programming interface (API) to develop scalable graphics applications for a wide range of systems ranging from large distributed visualization clusters and multi-processor multipipe graphics systems to single-processor single-pipe desktop machines. We describe the system architecture, the basic API, discuss its advantages over previous approaches, present example configurations and usage scenarios as well as scalability results.
1991-07-31
INTELLIGENT SCSI DMV-719 MAS MIL CONTROLLER DY-4 SYSTEMS BYTE-WIDE MEMORY CARD DMV-536 MEM MIL DY-4 SYSTEMS POWER SUPPLY UNIT DMV-870 PWR MIL P age No. 5 06/10...FORCE COMPUTERS PROCESSOR CPU-386 SERIES SBC COM FORCE COMPUTERS ADVANCED SYSTEM CONTROL ASCU -1/2 SBC COM UNITI FORCE COMPUTERS GRAPHICS CONTROLLER AGC...RECORD VENDOR: JANZ COMPUTER AG DIVISION: VENDOR ADDRESS: Im Doerener Feld 3 D-4790 Paderborn Germany MARKETING: Johannes Kunz TECHNICAL: Arnulf
Stereo and IMU-Assisted Visual Odometry for Small Robots
NASA Technical Reports Server (NTRS)
2012-01-01
This software performs two functions: (1) taking stereo image pairs as input, it computes stereo disparity maps from them by cross-correlation to achieve 3D (three-dimensional) perception; (2) taking a sequence of stereo image pairs as input, it tracks features in the image sequence to estimate the motion of the cameras between successive image pairs. A real-time stereo vision system with IMU (inertial measurement unit)-assisted visual odometry was implemented on a single 750 MHz/520 MHz OMAP3530 SoC (system on chip) from TI (Texas Instruments). Frame rates of 46 fps (frames per second) were achieved at QVGA (Quarter Video Graphics Array i.e. 320 240), or 8 fps at VGA (Video Graphics Array 640 480) resolutions, while simultaneously tracking up to 200 features, taking full advantage of the OMAP3530's integer DSP (digital signal processor) and floating point ARM processors. This is a substantial advancement over previous work as the stereo implementation produces 146 Mde/s (millions of disparities evaluated per second) in 2.5W, yielding a stereo energy efficiency of 58.8 Mde/J, which is 3.75 better than prior DSP stereo while providing more functionality.
Parallelized Kalman-Filter-Based Reconstruction of Particle Tracks on Many-Core Processors and GPUs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cerati, Giuseppe; Elmer, Peter; Krutelyov, Slava
2017-01-01
For over a decade now, physical and energy constraints have limited clock speed improvements in commodity microprocessors. Instead, chipmakers have been pushed into producing lower-power, multi-core processors such as Graphical Processing Units (GPU), ARM CPUs, and Intel MICs. Broad-based efforts from manufacturers and developers have been devoted to making these processors user-friendly enough to perform general computations. However, extracting performance from a larger number of cores, as well as specialized vector or SIMD units, requires special care in algorithm design and code optimization. One of the most computationally challenging problems in high-energy particle experiments is finding and fitting the charged-particlemore » tracks during event reconstruction. This is expected to become by far the dominant problem at the High-Luminosity Large Hadron Collider (HL-LHC), for example. Today the most common track finding methods are those based on the Kalman filter. Experience with Kalman techniques on real tracking detector systems has shown that they are robust and provide high physics performance. This is why they are currently in use at the LHC, both in the trigger and offine. Previously we reported on the significant parallel speedups that resulted from our investigations to adapt Kalman filters to track fitting and track building on Intel Xeon and Xeon Phi. Here, we discuss our progresses toward the understanding of these processors and the new developments to port the Kalman filter to NVIDIA GPUs.« less
Parallelized Kalman-Filter-Based Reconstruction of Particle Tracks on Many-Core Processors and GPUs
NASA Astrophysics Data System (ADS)
Cerati, Giuseppe; Elmer, Peter; Krutelyov, Slava; Lantz, Steven; Lefebvre, Matthieu; Masciovecchio, Mario; McDermott, Kevin; Riley, Daniel; Tadel, Matevž; Wittich, Peter; Würthwein, Frank; Yagil, Avi
2017-08-01
For over a decade now, physical and energy constraints have limited clock speed improvements in commodity microprocessors. Instead, chipmakers have been pushed into producing lower-power, multi-core processors such as Graphical Processing Units (GPU), ARM CPUs, and Intel MICs. Broad-based efforts from manufacturers and developers have been devoted to making these processors user-friendly enough to perform general computations. However, extracting performance from a larger number of cores, as well as specialized vector or SIMD units, requires special care in algorithm design and code optimization. One of the most computationally challenging problems in high-energy particle experiments is finding and fitting the charged-particle tracks during event reconstruction. This is expected to become by far the dominant problem at the High-Luminosity Large Hadron Collider (HL-LHC), for example. Today the most common track finding methods are those based on the Kalman filter. Experience with Kalman techniques on real tracking detector systems has shown that they are robust and provide high physics performance. This is why they are currently in use at the LHC, both in the trigger and offine. Previously we reported on the significant parallel speedups that resulted from our investigations to adapt Kalman filters to track fitting and track building on Intel Xeon and Xeon Phi. Here, we discuss our progresses toward the understanding of these processors and the new developments to port the Kalman filter to NVIDIA GPUs.
High-speed real-time animated displays on the ADAGE (trademark) RDS 3000 raster graphics system
NASA Technical Reports Server (NTRS)
Kahlbaum, William M., Jr.; Ownbey, Katrina L.
1989-01-01
Techniques which may be used to increase the animation update rate of real-time computer raster graphic displays are discussed. They were developed on the ADAGE RDS 3000 graphic system in support of the Advanced Concepts Simulator at the NASA Langley Research Center. These techniques involve the use of a special purpose parallel processor, for high-speed character generation. The description of the parallel processor includes the Barrel Shifter which is part of the hardware and is the key to the high-speed character rendition. The final result of this total effort was a fourfold increase in the update rate of an existing primary flight display from 4 to 16 frames per second.
Accelerating image recognition on mobile devices using GPGPU
NASA Astrophysics Data System (ADS)
Bordallo López, Miguel; Nykänen, Henri; Hannuksela, Jari; Silvén, Olli; Vehviläinen, Markku
2011-01-01
The future multi-modal user interfaces of battery-powered mobile devices are expected to require computationally costly image analysis techniques. The use of Graphic Processing Units for computing is very well suited for parallel processing and the addition of programmable stages and high precision arithmetic provide for opportunities to implement energy-efficient complete algorithms. At the moment the first mobile graphics accelerators with programmable pipelines are available, enabling the GPGPU implementation of several image processing algorithms. In this context, we consider a face tracking approach that uses efficient gray-scale invariant texture features and boosting. The solution is based on the Local Binary Pattern (LBP) features and makes use of the GPU on the pre-processing and feature extraction phase. We have implemented a series of image processing techniques in the shader language of OpenGL ES 2.0, compiled them for a mobile graphics processing unit and performed tests on a mobile application processor platform (OMAP3530). In our contribution, we describe the challenges of designing on a mobile platform, present the performance achieved and provide measurement results for the actual power consumption in comparison to using the CPU (ARM) on the same platform.
FPGA Acceleration of the phylogenetic likelihood function for Bayesian MCMC inference methods.
Zierke, Stephanie; Bakos, Jason D
2010-04-12
Likelihood (ML)-based phylogenetic inference has become a popular method for estimating the evolutionary relationships among species based on genomic sequence data. This method is used in applications such as RAxML, GARLI, MrBayes, PAML, and PAUP. The Phylogenetic Likelihood Function (PLF) is an important kernel computation for this method. The PLF consists of a loop with no conditional behavior or dependencies between iterations. As such it contains a high potential for exploiting parallelism using micro-architectural techniques. In this paper, we describe a technique for mapping the PLF and supporting logic onto a Field Programmable Gate Array (FPGA)-based co-processor. By leveraging the FPGA's on-chip DSP modules and the high-bandwidth local memory attached to the FPGA, the resultant co-processor can accelerate ML-based methods and outperform state-of-the-art multi-core processors. We use the MrBayes 3 tool as a framework for designing our co-processor. For large datasets, we estimate that our accelerated MrBayes, if run on a current-generation FPGA, achieves a 10x speedup relative to software running on a state-of-the-art server-class microprocessor. The FPGA-based implementation achieves its performance by deeply pipelining the likelihood computations, performing multiple floating-point operations in parallel, and through a natural log approximation that is chosen specifically to leverage a deeply pipelined custom architecture. Heterogeneous computing, which combines general-purpose processors with special-purpose co-processors such as FPGAs and GPUs, is a promising approach for high-performance phylogeny inference as shown by the growing body of literature in this field. FPGAs in particular are well-suited for this task because of their low power consumption as compared to many-core processors and Graphics Processor Units (GPUs).
DOE Office of Scientific and Technical Information (OSTI.GOV)
Burge, S.W.
This report describes the FORCE2 flow program input, output, and the graphical post-processor. The manual describes the steps for creating the model, executing the programs and processing the results into graphical form. The FORCE2 post-processor was developed as an interactive program written in FORTRAN-77. It uses the Graphical Kernel System (GKS) graphics standard recently adopted by International Organization for Standardization, ISO, and American National Standards Institute, ANSI, and, therefore, can be used with many terminals. The post-processor vas written with Calcomp subroutine calls and is compatible with Tektkonix terminals and Calcomp and Nicolet pen plotters. B&W has been developing themore » FORCE2 code as a general-purpose tool for flow analysis of B&W equipment. The version of FORCE2 described in this manual was developed under the sponsorship of ASEA-Babcock as part of their participation in the joint R&D venture, ``Erosion of FBC Heat Transfer Tubes,`` and is applicable to the analyses of bubbling fluid beds. This manual is the principal documentation for program usage and is segmented into several sections to facilitate usage. In Section 2.0 the program is described, including assumptions, capabilities, limitations and uses, program status and location, related programs and program hardware and software requirements. Section 3.0 is a quick user`s reference guide for preparing input, executing FORCE2, and using the post-processor. Section 4.0 is a detailed description of the FORCE2 input. In Section 5.0, FORCE2 output is summarized. Section 6.0 contains a sample application, and Section 7.0 is a detailed reference guide.« less
QuickProbs—A Fast Multiple Sequence Alignment Algorithm Designed for Graphics Processors
Gudyś, Adam; Deorowicz, Sebastian
2014-01-01
Multiple sequence alignment is a crucial task in a number of biological analyses like secondary structure prediction, domain searching, phylogeny, etc. MSAProbs is currently the most accurate alignment algorithm, but its effectiveness is obtained at the expense of computational time. In the paper we present QuickProbs, the variant of MSAProbs customised for graphics processors. We selected the two most time consuming stages of MSAProbs to be redesigned for GPU execution: the posterior matrices calculation and the consistency transformation. Experiments on three popular benchmarks (BAliBASE, PREFAB, OXBench-X) on quad-core PC equipped with high-end graphics card show QuickProbs to be 5.7 to 9.7 times faster than original CPU-parallel MSAProbs. Additional tests performed on several protein families from Pfam database give overall speed-up of 6.7. Compared to other algorithms like MAFFT, MUSCLE, or ClustalW, QuickProbs proved to be much more accurate at similar speed. Additionally we introduce a tuned variant of QuickProbs which is significantly more accurate on sets of distantly related sequences than MSAProbs without exceeding its computation time. The GPU part of QuickProbs was implemented in OpenCL, thus the package is suitable for graphics processors produced by all major vendors. PMID:24586435
The control data "GIRAFFE" system for interactive graphic finite element analysis
NASA Technical Reports Server (NTRS)
Park, S.; Brandon, D. M., Jr.
1975-01-01
The Graphical Interface for Finite Elements (GIRAFFE) general purpose interactive graphics application package was described. This system may be used as a pre/post processor for structural analysis computer programs. It facilitates the operations of creating, editing, or reviewing all the structural input/output data on a graphics terminal in a time-sharing mode of operation. An application program for a simple three-dimensional plate problem was illustrated.
A Programming Framework for Scientific Applications on CPU-GPU Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Owens, John
2013-03-24
At a high level, my research interests center around designing, programming, and evaluating computer systems that use new approaches to solve interesting problems. The rapid change of technology allows a variety of different architectural approaches to computationally difficult problems, and a constantly shifting set of constraints and trends makes the solutions to these problems both challenging and interesting. One of the most important recent trends in computing has been a move to commodity parallel architectures. This sea change is motivated by the industry’s inability to continue to profitably increase performance on a single processor and instead to move to multiplemore » parallel processors. In the period of review, my most significant work has been leading a research group looking at the use of the graphics processing unit (GPU) as a general-purpose processor. GPUs can potentially deliver superior performance on a broad range of problems than their CPU counterparts, but effectively mapping complex applications to a parallel programming model with an emerging programming environment is a significant and important research problem.« less
Accelerating Wright–Fisher Forward Simulations on the Graphics Processing Unit
Lawrie, David S.
2017-01-01
Forward Wright–Fisher simulations are powerful in their ability to model complex demography and selection scenarios, but suffer from slow execution on the Central Processor Unit (CPU), thus limiting their usefulness. However, the single-locus Wright–Fisher forward algorithm is exceedingly parallelizable, with many steps that are so-called “embarrassingly parallel,” consisting of a vast number of individual computations that are all independent of each other and thus capable of being performed concurrently. The rise of modern Graphics Processing Units (GPUs) and programming languages designed to leverage the inherent parallel nature of these processors have allowed researchers to dramatically speed up many programs that have such high arithmetic intensity and intrinsic concurrency. The presented GPU Optimized Wright–Fisher simulation, or “GO Fish” for short, can be used to simulate arbitrary selection and demographic scenarios while running over 250-fold faster than its serial counterpart on the CPU. Even modest GPU hardware can achieve an impressive speedup of over two orders of magnitude. With simulations so accelerated, one can not only do quick parametric bootstrapping of previously estimated parameters, but also use simulated results to calculate the likelihoods and summary statistics of demographic and selection models against real polymorphism data, all without restricting the demographic and selection scenarios that can be modeled or requiring approximations to the single-locus forward algorithm for efficiency. Further, as many of the parallel programming techniques used in this simulation can be applied to other computationally intensive algorithms important in population genetics, GO Fish serves as an exciting template for future research into accelerating computation in evolution. GO Fish is part of the Parallel PopGen Package available at: http://dl42.github.io/ParallelPopGen/. PMID:28768689
Stochastic DT-MRI connectivity mapping on the GPU.
McGraw, Tim; Nadar, Mariappan
2007-01-01
We present a method for stochastic fiber tract mapping from diffusion tensor MRI (DT-MRI) implemented on graphics hardware. From the simulated fibers we compute a connectivity map that gives an indication of the probability that two points in the dataset are connected by a neuronal fiber path. A Bayesian formulation of the fiber model is given and it is shown that the inversion method can be used to construct plausible connectivity. An implementation of this fiber model on the graphics processing unit (GPU) is presented. Since the fiber paths can be stochastically generated independently of one another, the algorithm is highly parallelizable. This allows us to exploit the data-parallel nature of the GPU fragment processors. We also present a framework for the connectivity computation on the GPU. Our implementation allows the user to interactively select regions of interest and observe the evolving connectivity results during computation. Results are presented from the stochastic generation of over 250,000 fiber steps per iteration at interactive frame rates on consumer-grade graphics hardware.
MAP3D: a media processor approach for high-end 3D graphics
NASA Astrophysics Data System (ADS)
Darsa, Lucia; Stadnicki, Steven; Basoglu, Chris
1999-12-01
Equator Technologies, Inc. has used a software-first approach to produce several programmable and advanced VLIW processor architectures that have the flexibility to run both traditional systems tasks and an array of media-rich applications. For example, Equator's MAP1000A is the world's fastest single-chip programmable signal and image processor targeted for digital consumer and office automation markets. The Equator MAP3D is a proposal for the architecture of the next generation of the Equator MAP family. The MAP3D is designed to achieve high-end 3D performance and a variety of customizable special effects by combining special graphics features with high performance floating-point and media processor architecture. As a programmable media processor, it offers the advantages of a completely configurable 3D pipeline--allowing developers to experiment with different algorithms and to tailor their pipeline to achieve the highest performance for a particular application. With the support of Equator's advanced C compiler and toolkit, MAP3D programs can be written in a high-level language. This allows the compiler to successfully find and exploit any parallelism in a programmer's code, thus decreasing the time to market of a given applications. The ability to run an operating system makes it possible to run concurrent applications in the MAP3D chip, such as video decoding while executing the 3D pipelines, so that integration of applications is easily achieved--using real-time decoded imagery for texturing 3D objects, for instance. This novel architecture enables an affordable, integrated solution for high performance 3D graphics.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tasseff, Byron
2016-07-29
NUFLOOD Version 1.x is a surface-water hydrodynamic package designed for the simulation of overland flow of fluids. It consists of various routines to address a wide range of applications (e.g., rainfall-runoff, tsunami, storm surge) and real time, interactive visualization tools. NUFLOOD has been designed for general-purpose computers and workstations containing multi-core processors and/or graphics processing units. The software is easy to use and extensible, constructed in mind for instructors, students, and practicing engineers. NUFLOOD is intended to assist the water resource community in planning against water-related natural disasters.
2012-01-01
atmosphere model, Int. J . High Perform. Comput. Appl. 26 (1) (2012) 74–89. [8] J.M. Dennis, M. Levy, R.D. Nair, H.M. Tufo, T . Voran. Towards and efficient...26] A. Klockner, T . Warburton, J . Bridge, J.S, Hesthaven, Nodal discontinuous galerkin methods on graphics processors, J . Comput. Phys. 228 (21) (2009...mode James F. Kelly, Francis X. Giraldo ⇑ Department of Applied Mathematics, Naval Postgraduate School, Monterey, CA, United States a r t i c l e i n
Detailed description of the HP-9825A HFRMP trajectory processor (TRAJ)
NASA Technical Reports Server (NTRS)
Kindall, S. M.; Wilson, S. W.
1979-01-01
The computer code for the trajectory processor of the HP-9825A High Fidelity Relative Motion Program is described in detail. The processor is a 12-degrees-of-freedom trajectory integrator which can be used to generate digital and graphical data describing the relative motion of the Space Shuttle Orbiter and a free-flying cylindrical payload. Coding standards and flow charts are given and the computational logic is discussed.
Crespo, Alejandro C.; Dominguez, Jose M.; Barreiro, Anxo; Gómez-Gesteira, Moncho; Rogers, Benedict D.
2011-01-01
Smoothed Particle Hydrodynamics (SPH) is a numerical method commonly used in Computational Fluid Dynamics (CFD) to simulate complex free-surface flows. Simulations with this mesh-free particle method far exceed the capacity of a single processor. In this paper, as part of a dual-functioning code for either central processing units (CPUs) or Graphics Processor Units (GPUs), a parallelisation using GPUs is presented. The GPU parallelisation technique uses the Compute Unified Device Architecture (CUDA) of nVidia devices. Simulations with more than one million particles on a single GPU card exhibit speedups of up to two orders of magnitude over using a single-core CPU. It is demonstrated that the code achieves different speedups with different CUDA-enabled GPUs. The numerical behaviour of the SPH code is validated with a standard benchmark test case of dam break flow impacting on an obstacle where good agreement with the experimental results is observed. Both the achieved speed-ups and the quantitative agreement with experiments suggest that CUDA-based GPU programming can be used in SPH methods with efficiency and reliability. PMID:21695185
A Real-Time Capable Software-Defined Receiver Using GPU for Adaptive Anti-Jam GPS Sensors
Seo, Jiwon; Chen, Yu-Hsuan; De Lorenzo, David S.; Lo, Sherman; Enge, Per; Akos, Dennis; Lee, Jiyun
2011-01-01
Due to their weak received signal power, Global Positioning System (GPS) signals are vulnerable to radio frequency interference. Adaptive beam and null steering of the gain pattern of a GPS antenna array can significantly increase the resistance of GPS sensors to signal interference and jamming. Since adaptive array processing requires intensive computational power, beamsteering GPS receivers were usually implemented using hardware such as field-programmable gate arrays (FPGAs). However, a software implementation using general-purpose processors is much more desirable because of its flexibility and cost effectiveness. This paper presents a GPS software-defined radio (SDR) with adaptive beamsteering capability for anti-jam applications. The GPS SDR design is based on an optimized desktop parallel processing architecture using a quad-core Central Processing Unit (CPU) coupled with a new generation Graphics Processing Unit (GPU) having massively parallel processors. This GPS SDR demonstrates sufficient computational capability to support a four-element antenna array and future GPS L5 signal processing in real time. After providing the details of our design and optimization schemes for future GPU-based GPS SDR developments, the jamming resistance of our GPS SDR under synthetic wideband jamming is presented. Since the GPS SDR uses commercial-off-the-shelf hardware and processors, it can be easily adopted in civil GPS applications requiring anti-jam capabilities. PMID:22164116
A real-time capable software-defined receiver using GPU for adaptive anti-jam GPS sensors.
Seo, Jiwon; Chen, Yu-Hsuan; De Lorenzo, David S; Lo, Sherman; Enge, Per; Akos, Dennis; Lee, Jiyun
2011-01-01
Due to their weak received signal power, Global Positioning System (GPS) signals are vulnerable to radio frequency interference. Adaptive beam and null steering of the gain pattern of a GPS antenna array can significantly increase the resistance of GPS sensors to signal interference and jamming. Since adaptive array processing requires intensive computational power, beamsteering GPS receivers were usually implemented using hardware such as field-programmable gate arrays (FPGAs). However, a software implementation using general-purpose processors is much more desirable because of its flexibility and cost effectiveness. This paper presents a GPS software-defined radio (SDR) with adaptive beamsteering capability for anti-jam applications. The GPS SDR design is based on an optimized desktop parallel processing architecture using a quad-core Central Processing Unit (CPU) coupled with a new generation Graphics Processing Unit (GPU) having massively parallel processors. This GPS SDR demonstrates sufficient computational capability to support a four-element antenna array and future GPS L5 signal processing in real time. After providing the details of our design and optimization schemes for future GPU-based GPS SDR developments, the jamming resistance of our GPS SDR under synthetic wideband jamming is presented. Since the GPS SDR uses commercial-off-the-shelf hardware and processors, it can be easily adopted in civil GPS applications requiring anti-jam capabilities.
Computer generated hologram from point cloud using graphics processor.
Chen, Rick H-Y; Wilkinson, Timothy D
2009-12-20
Computer generated holography is an extremely demanding and complex task when it comes to providing realistic reconstructions with full parallax, occlusion, and shadowing. We present an algorithm designed for data-parallel computing on modern graphics processing units to alleviate the computational burden. We apply Gaussian interpolation to create a continuous surface representation from discrete input object points. The algorithm maintains a potential occluder list for each individual hologram plane sample to keep the number of visibility tests to a minimum. We experimented with two approximations that simplify and accelerate occlusion computation. It is observed that letting several neighboring hologram plane samples share visibility information on object points leads to significantly faster computation without causing noticeable artifacts in the reconstructed images. Computing a reduced sample set via nonuniform sampling is also found to be an effective acceleration technique.
Programming the Navier-Stokes computer: An abstract machine model and a visual editor
NASA Technical Reports Server (NTRS)
Middleton, David; Crockett, Tom; Tomboulian, Sherry
1988-01-01
The Navier-Stokes computer is a parallel computer designed to solve Computational Fluid Dynamics problems. Each processor contains several floating point units which can be configured under program control to implement a vector pipeline with several inputs and outputs. Since the development of an effective compiler for this computer appears to be very difficult, machine level programming seems necessary and support tools for this process have been studied. These support tools are organized into a graphical program editor. A programming process is described by which appropriate computations may be efficiently implemented on the Navier-Stokes computer. The graphical editor would support this programming process, verifying various programmer choices for correctness and deducing values such as pipeline delays and network configurations. Step by step details are provided and demonstrated with two example programs.
NASA Technical Reports Server (NTRS)
Treinish, Lloyd A.; Gough, Michael L.; Wildenhain, W. David
1987-01-01
The capability was developed of rapidly producing visual representations of large, complex, multi-dimensional space and earth sciences data sets via the implementation of computer graphics modeling techniques on the Massively Parallel Processor (MPP) by employing techniques recently developed for typically non-scientific applications. Such capabilities can provide a new and valuable tool for the understanding of complex scientific data, and a new application of parallel computing via the MPP. A prototype system with such capabilities was developed and integrated into the National Space Science Data Center's (NSSDC) Pilot Climate Data System (PCDS) data-independent environment for computer graphics data display to provide easy access to users. While developing these capabilities, several problems had to be solved independently of the actual use of the MPP, all of which are outlined.
NASA Technical Reports Server (NTRS)
Whetstone, W. D.
1976-01-01
The functions and operating rules of the SPAR system, which is a group of computer programs used primarily to perform stress, buckling, and vibrational analyses of linear finite element systems, were given. The following subject areas were discussed: basic information, structure definition, format system matrix processors, utility programs, static solutions, stresses, sparse matrix eigensolver, dynamic response, graphics, and substructure processors.
NASA Technical Reports Server (NTRS)
Fijany, Amir (Inventor); Bejczy, Antal K. (Inventor)
1994-01-01
In a computer having a large number of single-instruction multiple data (SIMD) processors, each of the SIMD processors has two sets of three individual processor elements controlled by a master control unit and interconnected among a plurality of register file units where data is stored. The register files input and output data in synchronism with a minor cycle clock under control of two slave control units controlling the register file units connected to respective ones of the two sets of processor elements. Depending upon which ones of the register file units are enabled to store or transmit data during a particular minor clock cycle, the processor elements within an SIMD processor are connected in rings or in pipeline arrays, and may exchange data with the internal bus or with neighboring SIMD processors through interface units controlled by respective ones of the two slave control units.
User's manual for the two-dimensional transputer graphics toolkit
NASA Technical Reports Server (NTRS)
Ellis, Graham K.
1988-01-01
The user manual for the 2-D graphics toolkit for a transputer based parallel processor is presented. The toolkit consists of a package of 2-D display routines that can be used for the simulation visualizations. It supports multiple windows, double buffered screens for animations, and simple graphics transformations such as translation, rotation, and scaling. The display routines are written in occam to take advantage of the multiprocessing features available on transputers. The package is designed to run on a transputer separate from the graphics board.
NASA Astrophysics Data System (ADS)
Hur, Min Young; Verboncoeur, John; Lee, Hae June
2014-10-01
Particle-in-cell (PIC) simulations have high fidelity in the plasma device requiring transient kinetic modeling compared with fluid simulations. It uses less approximation on the plasma kinetics but requires many particles and grids to observe the semantic results. It means that the simulation spends lots of simulation time in proportion to the number of particles. Therefore, PIC simulation needs high performance computing. In this research, a graphic processing unit (GPU) is adopted for high performance computing of PIC simulation for low temperature discharge plasmas. GPUs have many-core processors and high memory bandwidth compared with a central processing unit (CPU). NVIDIA GeForce GPUs were used for the test with hundreds of cores which show cost-effective performance. PIC code algorithm is divided into two modules which are a field solver and a particle mover. The particle mover module is divided into four routines which are named move, boundary, Monte Carlo collision (MCC), and deposit. Overall, the GPU code solves particle motions as well as electrostatic potential in two-dimensional geometry almost 30 times faster than a single CPU code. This work was supported by the Korea Institute of Science Technology Information.
SPROC: A multiple-processor DSP IC
NASA Technical Reports Server (NTRS)
Davis, R.
1991-01-01
A large, single-chip, multiple-processor, digital signal processing (DSP) integrated circuit (IC) fabricated in HP-Cmos34 is presented. The innovative architecture is best suited for analog and real-time systems characterized by both parallel signal data flows and concurrent logic processing. The IC is supported by a powerful development system that transforms graphical signal flow graphs into production-ready systems in minutes. Automatic compiler partitioning of tasks among four on-chip processors gives the IC the signal processing power of several conventional DSP chips.
APRON: A Cellular Processor Array Simulation and Hardware Design Tool
NASA Astrophysics Data System (ADS)
Barr, David R. W.; Dudek, Piotr
2009-12-01
We present a software environment for the efficient simulation of cellular processor arrays (CPAs). This software (APRON) is used to explore algorithms that are designed for massively parallel fine-grained processor arrays, topographic multilayer neural networks, vision chips with SIMD processor arrays, and related architectures. The software uses a highly optimised core combined with a flexible compiler to provide the user with tools for the design of new processor array hardware architectures and the emulation of existing devices. We present performance benchmarks for the software processor array implemented on standard commodity microprocessors. APRON can be configured to use additional processing hardware if necessary and can be used as a complete graphical user interface and development environment for new or existing CPA systems, allowing more users to develop algorithms for CPA systems.
ERIC Educational Resources Information Center
Osguthorpe, Russell T.; Li Chang, Linda
1988-01-01
A computerized symbol processor system using an Apple IIe computer and a Power Pad graphics tablet was tested with 22 nonspeaking, multiply disabled students. The students were taught to express themselves independently in writing, and they did significantly better than control students on measures of language comprehension and symbol recognition.…
Effective correlator for RadioAstron project
NASA Astrophysics Data System (ADS)
Sergeev, Sergey
This paper presents the implementation of programme FX-correlator for Very Long Baseline Interferometry, adapted for the project "RadioAstron". Software correlator implemented for heterogeneous computing systems using graphics accelerators. It is shown that for the task interferometry implementation of the graphics hardware has a high efficiency. The host processor of heterogeneous computing system, performs the function of forming the data flow for graphics accelerators, the number of which corresponds to the number of frequency channels. So, for the Radioastron project, such channels is seven. Each accelerator is perform correlation matrix for all bases for a single frequency channel. Initial data is converted to the floating-point format, is correction for the corresponding delay function and computes the entire correlation matrix simultaneously. Calculation of the correlation matrix is performed using the sliding Fourier transform. Thus, thanks to the compliance of a solved problem for architecture graphics accelerators, managed to get a performance for one processor platform Kepler, which corresponds to the performance of this task, the computing cluster platforms Intel on four nodes. This task successfully scaled not only on a large number of graphics accelerators, but also on a large number of nodes with multiple accelerators.
Accelerating Pseudo-Random Number Generator for MCNP on GPU
NASA Astrophysics Data System (ADS)
Gong, Chunye; Liu, Jie; Chi, Lihua; Hu, Qingfeng; Deng, Li; Gong, Zhenghu
2010-09-01
Pseudo-random number generators (PRNG) are intensively used in many stochastic algorithms in particle simulations, artificial neural networks and other scientific computation. The PRNG in Monte Carlo N-Particle Transport Code (MCNP) requires long period, high quality, flexible jump and fast enough. In this paper, we implement such a PRNG for MCNP on NVIDIA's GTX200 Graphics Processor Units (GPU) using CUDA programming model. Results shows that 3.80 to 8.10 times speedup are achieved compared with 4 to 6 cores CPUs and more than 679.18 million double precision random numbers can be generated per second on GPU.
Watanabe, Yuuki; Maeno, Seiya; Aoshima, Kenji; Hasegawa, Haruyuki; Koseki, Hitoshi
2010-09-01
The real-time display of full-range, 2048?axial pixelx1024?lateral pixel, Fourier-domain optical-coherence tomography (FD-OCT) images is demonstrated. The required speed was achieved by using dual graphic processing units (GPUs) with many stream processors to realize highly parallel processing. We used a zero-filling technique, including a forward Fourier transform, a zero padding to increase the axial data-array size to 8192, an inverse-Fourier transform back to the spectral domain, a linear interpolation from wavelength to wavenumber, a lateral Hilbert transform to obtain the complex spectrum, a Fourier transform to obtain the axial profiles, and a log scaling. The data-transfer time of the frame grabber was 15.73?ms, and the processing time, which includes the data transfer between the GPU memory and the host computer, was 14.75?ms, for a total time shorter than the 36.70?ms frame-interval time using a line-scan CCD camera operated at 27.9?kHz. That is, our OCT system achieved a processed-image display rate of 27.23 frames/s.
permGPU: Using graphics processing units in RNA microarray association studies.
Shterev, Ivo D; Jung, Sin-Ho; George, Stephen L; Owzar, Kouros
2010-06-16
Many analyses of microarray association studies involve permutation, bootstrap resampling and cross-validation, that are ideally formulated as embarrassingly parallel computing problems. Given that these analyses are computationally intensive, scalable approaches that can take advantage of multi-core processor systems need to be developed. We have developed a CUDA based implementation, permGPU, that employs graphics processing units in microarray association studies. We illustrate the performance and applicability of permGPU within the context of permutation resampling for a number of test statistics. An extensive simulation study demonstrates a dramatic increase in performance when using permGPU on an NVIDIA GTX 280 card compared to an optimized C/C++ solution running on a conventional Linux server. permGPU is available as an open-source stand-alone application and as an extension package for the R statistical environment. It provides a dramatic increase in performance for permutation resampling analysis in the context of microarray association studies. The current version offers six test statistics for carrying out permutation resampling analyses for binary, quantitative and censored time-to-event traits.
Li, Jian; Bloch, Pavel; Xu, Jing; Sarunic, Marinko V; Shannon, Lesley
2011-05-01
Fourier domain optical coherence tomography (FD-OCT) provides faster line rates, better resolution, and higher sensitivity for noninvasive, in vivo biomedical imaging compared to traditional time domain OCT (TD-OCT). However, because the signal processing for FD-OCT is computationally intensive, real-time FD-OCT applications demand powerful computing platforms to deliver acceptable performance. Graphics processing units (GPUs) have been used as coprocessors to accelerate FD-OCT by leveraging their relatively simple programming model to exploit thread-level parallelism. Unfortunately, GPUs do not "share" memory with their host processors, requiring additional data transfers between the GPU and CPU. In this paper, we implement a complete FD-OCT accelerator on a consumer grade GPU/CPU platform. Our data acquisition system uses spectrometer-based detection and a dual-arm interferometer topology with numerical dispersion compensation for retinal imaging. We demonstrate that the maximum line rate is dictated by the memory transfer time and not the processing time due to the GPU platform's memory model. Finally, we discuss how the performance trends of GPU-based accelerators compare to the expected future requirements of FD-OCT data rates.
FAST: framework for heterogeneous medical image computing and visualization.
Smistad, Erik; Bozorgi, Mohammadmehdi; Lindseth, Frank
2015-11-01
Computer systems are becoming increasingly heterogeneous in the sense that they consist of different processors, such as multi-core CPUs and graphic processing units. As the amount of medical image data increases, it is crucial to exploit the computational power of these processors. However, this is currently difficult due to several factors, such as driver errors, processor differences, and the need for low-level memory handling. This paper presents a novel FrAmework for heterogeneouS medical image compuTing and visualization (FAST). The framework aims to make it easier to simultaneously process and visualize medical images efficiently on heterogeneous systems. FAST uses common image processing programming paradigms and hides the details of memory handling from the user, while enabling the use of all processors and cores on a system. The framework is open-source, cross-platform and available online. Code examples and performance measurements are presented to show the simplicity and efficiency of FAST. The results are compared to the insight toolkit (ITK) and the visualization toolkit (VTK) and show that the presented framework is faster with up to 20 times speedup on several common medical imaging algorithms. FAST enables efficient medical image computing and visualization on heterogeneous systems. Code examples and performance evaluations have demonstrated that the toolkit is both easy to use and performs better than existing frameworks, such as ITK and VTK.
GPU Based Software Correlators - Perspectives for VLBI2010
NASA Technical Reports Server (NTRS)
Hobiger, Thomas; Kimura, Moritaka; Takefuji, Kazuhiro; Oyama, Tomoaki; Koyama, Yasuhiro; Kondo, Tetsuro; Gotoh, Tadahiro; Amagai, Jun
2010-01-01
Caused by historical separation and driven by the requirements of the PC gaming industry, Graphics Processing Units (GPUs) have evolved to massive parallel processing systems which entered the area of non-graphic related applications. Although a single processing core on the GPU is much slower and provides less functionality than its counterpart on the CPU, the huge number of these small processing entities outperforms the classical processors when the application can be parallelized. Thus, in recent years various radio astronomical projects have started to make use of this technology either to realize the correlator on this platform or to establish the post-processing pipeline with GPUs. Therefore, the feasibility of GPUs as a choice for a VLBI correlator is being investigated, including pros and cons of this technology. Additionally, a GPU based software correlator will be reviewed with respect to energy consumption/GFlop/sec and cost/GFlop/sec.
A fast CT reconstruction scheme for a general multi-core PC.
Zeng, Kai; Bai, Erwei; Wang, Ge
2007-01-01
Expensive computational cost is a severe limitation in CT reconstruction for clinical applications that need real-time feedback. A primary example is bolus-chasing computed tomography (CT) angiography (BCA) that we have been developing for the past several years. To accelerate the reconstruction process using the filtered backprojection (FBP) method, specialized hardware or graphics cards can be used. However, specialized hardware is expensive and not flexible. The graphics processing unit (GPU) in a current graphic card can only reconstruct images in a reduced precision and is not easy to program. In this paper, an acceleration scheme is proposed based on a multi-core PC. In the proposed scheme, several techniques are integrated, including utilization of geometric symmetry, optimization of data structures, single-instruction multiple-data (SIMD) processing, multithreaded computation, and an Intel C++ compilier. Our scheme maintains the original precision and involves no data exchange between the GPU and CPU. The merits of our scheme are demonstrated in numerical experiments against the traditional implementation. Our scheme achieves a speedup of about 40, which can be further improved by several folds using the latest quad-core processors.
A Fast CT Reconstruction Scheme for a General Multi-Core PC
Zeng, Kai; Bai, Erwei; Wang, Ge
2007-01-01
Expensive computational cost is a severe limitation in CT reconstruction for clinical applications that need real-time feedback. A primary example is bolus-chasing computed tomography (CT) angiography (BCA) that we have been developing for the past several years. To accelerate the reconstruction process using the filtered backprojection (FBP) method, specialized hardware or graphics cards can be used. However, specialized hardware is expensive and not flexible. The graphics processing unit (GPU) in a current graphic card can only reconstruct images in a reduced precision and is not easy to program. In this paper, an acceleration scheme is proposed based on a multi-core PC. In the proposed scheme, several techniques are integrated, including utilization of geometric symmetry, optimization of data structures, single-instruction multiple-data (SIMD) processing, multithreaded computation, and an Intel C++ compilier. Our scheme maintains the original precision and involves no data exchange between the GPU and CPU. The merits of our scheme are demonstrated in numerical experiments against the traditional implementation. Our scheme achieves a speedup of about 40, which can be further improved by several folds using the latest quad-core processors. PMID:18256731
Shared performance monitor in a multiprocessor system
Chiu, George; Gara, Alan G.; Salapura, Valentina
2012-07-24
A performance monitoring unit (PMU) and method for monitoring performance of events occurring in a multiprocessor system. The multiprocessor system comprises a plurality of processor devices units, each processor device for generating signals representing occurrences of events in the processor device, and, a single shared counter resource for performance monitoring. The performance monitor unit is shared by all processor cores in the multiprocessor system. The PMU comprises: a plurality of performance counters each for counting signals representing occurrences of events from one or more the plurality of processor units in the multiprocessor system; and, a plurality of input devices for receiving the event signals from one or more processor devices of the plurality of processor units, the plurality of input devices programmable to select event signals for receipt by one or more of the plurality of performance counters for counting, wherein the PMU is shared between multiple processing units, or within a group of processors in the multiprocessing system. The PMU is further programmed to monitor event signals issued from non-processor devices.
A general graphical user interface for automatic reliability modeling
NASA Technical Reports Server (NTRS)
Liceaga, Carlos A.; Siewiorek, Daniel P.
1991-01-01
Reported here is a general Graphical User Interface (GUI) for automatic reliability modeling of Processor Memory Switch (PMS) structures using a Markov model. This GUI is based on a hierarchy of windows. One window has graphical editing capabilities for specifying the system's communication structure, hierarchy, reconfiguration capabilities, and requirements. Other windows have field texts, popup menus, and buttons for specifying parameters and selecting actions. An example application of the GUI is given.
Real-time lens distortion correction: speed, accuracy and efficiency
NASA Astrophysics Data System (ADS)
Bax, Michael R.; Shahidi, Ramin
2014-11-01
Optical lens systems suffer from nonlinear geometrical distortion. Optical imaging applications such as image-enhanced endoscopy and image-based bronchoscope tracking require correction of this distortion for accurate localization, tracking, registration, and measurement of image features. Real-time capability is desirable for interactive systems and live video. The use of a texture-mapping graphics accelerator, which is standard hardware on current motherboard chipsets and add-in video graphics cards, to perform distortion correction is proposed. Mesh generation for image tessellation, an error analysis, and performance results are presented. It is shown that distortion correction using commodity graphics hardware is substantially faster than using the main processor and can be performed at video frame rates (faster than 30 frames per second), and that the polar-based method of mesh generation proposed here is more accurate than a conventional grid-based approach. Using graphics hardware to perform distortion correction is not only fast and accurate but also efficient as it frees the main processor for other tasks, which is an important issue in some real-time applications.
Scheduling time-critical graphics on multiple processors
NASA Technical Reports Server (NTRS)
Meyer, Tom W.; Hughes, John F.
1995-01-01
This paper describes an algorithm for the scheduling of time-critical rendering and computation tasks on single- and multiple-processor architectures, with minimal pipelining. It was developed to manage scientific visualization scenes consisting of hundreds of objects, each of which can be computed and displayed at thousands of possible resolution levels. The algorithm generates the time-critical schedule using progressive-refinement techniques; it always returns a feasible schedule and, when allowed to run to completion, produces a near-optimal schedule which takes advantage of almost the entire multiple-processor system.
NASA Technical Reports Server (NTRS)
Kindall, S. M.
1980-01-01
The computer code for the trajectory processor (#TRAJ) of the high fidelity relative motion program is described. The #TRAJ processor is a 12-degrees-of-freedom trajectory integrator (6 degrees of freedom for each of two vehicles) which can be used to generate digital and graphical data describing the relative motion of the Space Shuttle Orbiter and a free-flying cylindrical payload. A listing of the code, coding standards and conventions, detailed flow charts, and discussions of the computational logic are included.
Souza, W.R.
1999-01-01
This report documents a graphical display post-processor (SutraPlot) for the U.S. Geological Survey Saturated-Unsaturated flow and solute or energy TRAnsport simulation model SUTRA, Version 2D3D.1. This version of SutraPlot is an upgrade to SutraPlot for the 2D-only SUTRA model (Souza, 1987). It has been modified to add 3D functionality, a graphical user interface (GUI), and enhanced graphic output options. Graphical options for 2D SUTRA (2-dimension) simulations include: drawing the 2D finite-element mesh, mesh boundary, and velocity vectors; plots of contours for pressure, saturation, concentration, and temperature within the model region; 2D finite-element based gridding and interpolation; and 2D gridded data export files. Graphical options for 3D SUTRA (3-dimension) simulations include: drawing the 3D finite-element mesh; plots of contours for pressure, saturation, concentration, and temperature in 2D sections of the 3D model region; 3D finite-element based gridding and interpolation; drawing selected regions of velocity vectors (projected on principal coordinate planes); and 3D gridded data export files. Installation instructions and a description of all graphic options are presented. A sample SUTRA problem is described and three step-by-step SutraPlot applications are provided. In addition, the methodology and numerical algorithms for the 2D and 3D finite-element based gridding and interpolation, developed for SutraPlot, are described. 1
Distributed computation of graphics primitives on a transputer network
NASA Technical Reports Server (NTRS)
Ellis, Graham K.
1988-01-01
A method is developed for distributing the computation of graphics primitives on a parallel processing network. Off-the-shelf transputer boards are used to perform the graphics transformations and scan-conversion tasks that would normally be assigned to a single transputer based display processor. Each node in the network performs a single graphics primitive computation. Frequently requested tasks can be duplicated on several nodes. The results indicate that the current distribution of commands on the graphics network shows a performance degradation when compared to the graphics display board alone. A change to more computation per node for every communication (perform more complex tasks on each node) may cause the desired increase in throughput.
[Hardware for graphics systems].
Goetz, C
1991-02-01
In all personal computer applications, be it for private or professional use, the decision of which "brand" of computer to buy is of central importance. In the USA Apple computers are mainly used in universities, while in Europe computers of the so-called "industry standard" by IBM (or clones thereof) have been increasingly used for many years. Independently of any brand name considerations, the computer components purchased must meet the current (and projected) needs of the user. Graphic capabilities and standards, processor speed, the use of co-processors, as well as input and output devices such as "mouse", printers and scanners are discussed. This overview is meant to serve as a decision aid. Potential users are given a short but detailed summary of current technical features.
Wilson, J Adam; Williams, Justin C
2009-01-01
The clock speeds of modern computer processors have nearly plateaued in the past 5 years. Consequently, neural prosthetic systems that rely on processing large quantities of data in a short period of time face a bottleneck, in that it may not be possible to process all of the data recorded from an electrode array with high channel counts and bandwidth, such as electrocorticographic grids or other implantable systems. Therefore, in this study a method of using the processing capabilities of a graphics card [graphics processing unit (GPU)] was developed for real-time neural signal processing of a brain-computer interface (BCI). The NVIDIA CUDA system was used to offload processing to the GPU, which is capable of running many operations in parallel, potentially greatly increasing the speed of existing algorithms. The BCI system records many channels of data, which are processed and translated into a control signal, such as the movement of a computer cursor. This signal processing chain involves computing a matrix-matrix multiplication (i.e., a spatial filter), followed by calculating the power spectral density on every channel using an auto-regressive method, and finally classifying appropriate features for control. In this study, the first two computationally intensive steps were implemented on the GPU, and the speed was compared to both the current implementation and a central processing unit-based implementation that uses multi-threading. Significant performance gains were obtained with GPU processing: the current implementation processed 1000 channels of 250 ms in 933 ms, while the new GPU method took only 27 ms, an improvement of nearly 35 times.
Parallel Agent-Based Simulations on Clusters of GPUs and Multi-Core Processors
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aaby, Brandon G; Perumalla, Kalyan S; Seal, Sudip K
2010-01-01
An effective latency-hiding mechanism is presented in the parallelization of agent-based model simulations (ABMS) with millions of agents. The mechanism is designed to accommodate the hierarchical organization as well as heterogeneity of current state-of-the-art parallel computing platforms. We use it to explore the computation vs. communication trade-off continuum available with the deep computational and memory hierarchies of extant platforms and present a novel analytical model of the tradeoff. We describe our implementation and report preliminary performance results on two distinct parallel platforms suitable for ABMS: CUDA threads on multiple, networked graphical processing units (GPUs), and pthreads on multi-core processors. Messagemore » Passing Interface (MPI) is used for inter-GPU as well as inter-socket communication on a cluster of multiple GPUs and multi-core processors. Results indicate the benefits of our latency-hiding scheme, delivering as much as over 100-fold improvement in runtime for certain benchmark ABMS application scenarios with several million agents. This speed improvement is obtained on our system that is already two to three orders of magnitude faster on one GPU than an equivalent CPU-based execution in a popular simulator in Java. Thus, the overall execution of our current work is over four orders of magnitude faster when executed on multiple GPUs.« less
Computer-Aided Fabrication of Integrated Circuits
1989-09-30
baseline CMOS process. One result of this effort was the identification of several residual bugs in the PATRAN graphics processor . The vendor promises...virtual memory. The internal Nubus architecture uses a 32-bit LISP processor running at 10 megahertz (100 ns clock period). An ethernet controller is...For different patterns, we need different masks for the photo step, and for dif- ferent micro -structures of the wafers, we need different etching
Conditional load and store in a shared memory
Blumrich, Matthias A; Ohmacht, Martin
2015-02-03
A method, system and computer program product for implementing load-reserve and store-conditional instructions in a multi-processor computing system. The computing system includes a multitude of processor units and a shared memory cache, and each of the processor units has access to the memory cache. In one embodiment, the method comprises providing the memory cache with a series of reservation registers, and storing in these registers addresses reserved in the memory cache for the processor units as a result of issuing load-reserve requests. In this embodiment, when one of the processor units makes a request to store data in the memory cache using a store-conditional request, the reservation registers are checked to determine if an address in the memory cache is reserved for that processor unit. If an address in the memory cache is reserved for that processor, the data are stored at this address.
Anandakrishnan, Ramu; Scogland, Tom R. W.; Fenley, Andrew T.; Gordon, John C.; Feng, Wu-chun; Onufriev, Alexey V.
2010-01-01
Tools that compute and visualize biomolecular electrostatic surface potential have been used extensively for studying biomolecular function. However, determining the surface potential for large biomolecules on a typical desktop computer can take days or longer using currently available tools and methods. Two commonly used techniques to speed up these types of electrostatic computations are approximations based on multi-scale coarse-graining and parallelization across multiple processors. This paper demonstrates that for the computation of electrostatic surface potential, these two techniques can be combined to deliver significantly greater speed-up than either one separately, something that is in general not always possible. Specifically, the electrostatic potential computation, using an analytical linearized Poisson Boltzmann (ALPB) method, is approximated using the hierarchical charge partitioning (HCP) multiscale method, and parallelized on an ATI Radeon 4870 graphical processing unit (GPU). The implementation delivers a combined 934-fold speed-up for a 476,040 atom viral capsid, compared to an equivalent non-parallel implementation on an Intel E6550 CPU without the approximation. This speed-up is significantly greater than the 42-fold speed-up for the HCP approximation alone or the 182-fold speed-up for the GPU alone. PMID:20452792
Graphics Processing Unit Acceleration of Gyrokinetic Turbulence Simulations
NASA Astrophysics Data System (ADS)
Hause, Benjamin; Parker, Scott; Chen, Yang
2013-10-01
We find a substantial increase in on-node performance using Graphics Processing Unit (GPU) acceleration in gyrokinetic delta-f particle-in-cell simulation. Optimization is performed on a two-dimensional slab gyrokinetic particle simulation using the Portland Group Fortran compiler with the OpenACC compiler directives and Fortran CUDA. Mixed implementation of both Open-ACC and CUDA is demonstrated. CUDA is required for optimizing the particle deposition algorithm. We have implemented the GPU acceleration on a third generation Core I7 gaming PC with two NVIDIA GTX 680 GPUs. We find comparable, or better, acceleration relative to the NERSC DIRAC cluster with the NVIDIA Tesla C2050 computing processor. The Tesla C 2050 is about 2.6 times more expensive than the GTX 580 gaming GPU. We also see enormous speedups (10 or more) on the Titan supercomputer at Oak Ridge with Kepler K20 GPUs. Results show speed-ups comparable or better than that of OpenMP models utilizing multiple cores. The use of hybrid OpenACC, CUDA Fortran, and MPI models across many nodes will also be discussed. Optimization strategies will be presented. We will discuss progress on optimizing the comprehensive three dimensional general geometry GEM code.
Multilevel Summation of Electrostatic Potentials Using Graphics Processing Units*
Hardy, David J.; Stone, John E.; Schulten, Klaus
2009-01-01
Physical and engineering practicalities involved in microprocessor design have resulted in flat performance growth for traditional single-core microprocessors. The urgent need for continuing increases in the performance of scientific applications requires the use of many-core processors and accelerators such as graphics processing units (GPUs). This paper discusses GPU acceleration of the multilevel summation method for computing electrostatic potentials and forces for a system of charged atoms, which is a problem of paramount importance in biomolecular modeling applications. We present and test a new GPU algorithm for the long-range part of the potentials that computes a cutoff pair potential between lattice points, essentially convolving a fixed 3-D lattice of “weights” over all sub-cubes of a much larger lattice. The implementation exploits the different memory subsystems provided on the GPU to stream optimally sized data sets through the multiprocessors. We demonstrate for the full multilevel summation calculation speedups of up to 26 using a single GPU and 46 using multiple GPUs, enabling the computation of a high-resolution map of the electrostatic potential for a system of 1.5 million atoms in under 12 seconds. PMID:20161132
Shared performance monitor in a multiprocessor system
Chiu, George; Gara, Alan G; Salapura, Valentina
2014-12-02
A performance monitoring unit (PMU) and method for monitoring performance of events occurring in a multiprocessor system. The multiprocessor system comprises a plurality of processor devices units, each processor device for generating signals representing occurrences of events in the processor device, and, a single shared counter resource for performance monitoring. The performance monitor unit is shared by all processor cores in the multiprocessor system. The PMU is further programmed to monitor event signals issued from non-processor devices.
Optimizing ion channel models using a parallel genetic algorithm on graphical processors.
Ben-Shalom, Roy; Aviv, Amit; Razon, Benjamin; Korngreen, Alon
2012-01-01
We have recently shown that we can semi-automatically constrain models of voltage-gated ion channels by combining a stochastic search algorithm with ionic currents measured using multiple voltage-clamp protocols. Although numerically successful, this approach is highly demanding computationally, with optimization on a high performance Linux cluster typically lasting several days. To solve this computational bottleneck we converted our optimization algorithm for work on a graphical processing unit (GPU) using NVIDIA's CUDA. Parallelizing the process on a Fermi graphic computing engine from NVIDIA increased the speed ∼180 times over an application running on an 80 node Linux cluster, considerably reducing simulation times. This application allows users to optimize models for ion channel kinetics on a single, inexpensive, desktop "super computer," greatly reducing the time and cost of building models relevant to neuronal physiology. We also demonstrate that the point of algorithm parallelization is crucial to its performance. We substantially reduced computing time by solving the ODEs (Ordinary Differential Equations) so as to massively reduce memory transfers to and from the GPU. This approach may be applied to speed up other data intensive applications requiring iterative solutions of ODEs. Copyright © 2012 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Weber, Walter H.; Mair, H. Douglas; Jansen, Dion
2003-03-01
A suite of basic signal processors has been developed. These basic building blocks can be cascaded together to form more complex processors without the need for programming. The data structures between each of the processors are handled automatically. This allows a processor built for one purpose to be applied to any type of data such as images, waveform arrays and single values. The processors are part of Winspect Data Acquisition software. The new processors are fast enough to work on A-scan signals live while scanning. Their primary use is to extract features, reduce noise or to calculate material properties. The cascaded processors work equally well on live A-scan displays, live gated data or as a post-processing engine on saved data. Researchers are able to call their own MATLAB or C-code from anywhere within the processor structure. A built-in formula node processor that uses a simple algebraic editor may make external user programs unnecessary. This paper also discusses the problems associated with ad hoc software development and how graphical programming languages can tie up researchers writing software rather than designing experiments.
Beeman, William R.; Obuch, Raymond C.; Brewton, James D.
1996-01-01
This CD-ROM contains files in support of the 1995 USGS National assessment of United States oil and gas resources (DDS-30), which was published separately and summarizes the results of a 3-year study of the oil and gas resources of the onshore and state waters of the United States. The study describes about 560 oil and gas plays in the United States--confirmed and hypothetical, conventional and unconventional. A parallel study of the Federal offshore is being conducted by the U.S. Minerals Management Service. This CD-ROM contains files in multiple formats, so that almost any computer user can import them into word processors and mapping software packages. No proprietary data are released on this CD-ROM. The complete text of DDS-30 is also available, as well as many figures. A companion CD-ROM (DDS-36) includes the tabular data, the programs, and the same text data, but none of the map data.
System analysis of graphics processor architecture using virtual prototyping
NASA Astrophysics Data System (ADS)
Hancock, William R.; Groat, Jeff; Steeves, Todd; Spaanenburg, Henk; Shackleton, John
1995-06-01
Honeywell has been actively involved in the definition of the next generation display processors for military and commercial cockpits. A major concern is how to achieve super graphics workstation performance in avionics application. Most notable are requirements for low volume, low power, harsh environmental conditions, real-time performance and low cost. This paper describes the application of VHDL to the system analysis tasks associated with achieving these goals in a cost effective manner. The paper will describe the top level architecture identified to provide the graphical and video processing power needed to drive future high resolution display devices and to generate more natural panoramic 3D formats. The major discussion, however, will be on the use of VHDL to model the processing elements and customized pipelines needed to realize the architecture and for doing the complex system tradeoff studies necessary to achieve a cost effective implementation. New software tools have been developed to allow 'virtual' prototyping in the VHDL environment. This results in a hardware/software codesign using VHDL performance and functional models. This unique architectural tool allows simulation and tradeoffs within a standard and tightly integrated toolset, which eventually will be used to specify and design the entire system from the top level requirements and system performance to the lowest level individual ASICs. New processing elements, algorithms, and standard graphical inputs can be designed, tested and evaluated without the costly hardware prototyping using the innovative 'virtual' prototyping techniques which are evolving on this project. In addition, virtual prototyping of the display processor does not bind the preliminary design to point solutions as a physical prototype will. when the development schedule is known, one can extrapolate processing elements performance and design the system around the most current technology.
Multimedia architectures: from desktop systems to portable appliances
NASA Astrophysics Data System (ADS)
Bhaskaran, Vasudev; Konstantinides, Konstantinos; Natarajan, Balas R.
1997-01-01
Future desktop and portable computing systems will have as their core an integrated multimedia system. Such a system will seamlessly combine digital video, digital audio, computer animation, text, and graphics. Furthermore, such a system will allow for mixed-media creation, dissemination, and interactive access in real time. Multimedia architectures that need to support these functions have traditionally required special display and processing units for the different media types. This approach tends to be expensive and is inefficient in its use of silicon. Furthermore, such media-specific processing units are unable to cope with the fluid nature of the multimedia market wherein the needs and standards are changing and system manufacturers may demand a single component media engine across a range of products. This constraint has led to a shift towards providing a single-component multimedia specific computing engine that can be integrated easily within desktop systems, tethered consumer appliances, or portable appliances. In this paper, we review some of the recent architectural efforts in developing integrated media systems. We primarily focus on two efforts, namely the evolution of multimedia-capable general purpose processors and a more recent effort in developing single component mixed media co-processors. Design considerations that could facilitate the migration of these technologies to a portable integrated media system also are presented.
A survey of GPU-based medical image computing techniques
Shi, Lin; Liu, Wen; Zhang, Heye; Xie, Yongming
2012-01-01
Medical imaging currently plays a crucial role throughout the entire clinical applications from medical scientific research to diagnostics and treatment planning. However, medical imaging procedures are often computationally demanding due to the large three-dimensional (3D) medical datasets to process in practical clinical applications. With the rapidly enhancing performances of graphics processors, improved programming support, and excellent price-to-performance ratio, the graphics processing unit (GPU) has emerged as a competitive parallel computing platform for computationally expensive and demanding tasks in a wide range of medical image applications. The major purpose of this survey is to provide a comprehensive reference source for the starters or researchers involved in GPU-based medical image processing. Within this survey, the continuous advancement of GPU computing is reviewed and the existing traditional applications in three areas of medical image processing, namely, segmentation, registration and visualization, are surveyed. The potential advantages and associated challenges of current GPU-based medical imaging are also discussed to inspire future applications in medicine. PMID:23256080
Design of a dataway processor for a parallel image signal processing system
NASA Astrophysics Data System (ADS)
Nomura, Mitsuru; Fujii, Tetsuro; Ono, Sadayasu
1995-04-01
Recently, demands for high-speed signal processing have been increasing especially in the field of image data compression, computer graphics, and medical imaging. To achieve sufficient power for real-time image processing, we have been developing parallel signal-processing systems. This paper describes a communication processor called 'dataway processor' designed for a new scalable parallel signal-processing system. The processor has six high-speed communication links (Dataways), a data-packet routing controller, a RISC CORE, and a DMA controller. Each communication link operates at 8-bit parallel in a full duplex mode at 50 MHz. Moreover, data routing, DMA, and CORE operations are processed in parallel. Therefore, sufficient throughput is available for high-speed digital video signals. The processor is designed in a top- down fashion using a CAD system called 'PARTHENON.' The hardware is fabricated using 0.5-micrometers CMOS technology, and its hardware is about 200 K gates.
Digital system for structural dynamics simulation
NASA Technical Reports Server (NTRS)
Krauter, A. I.; Lagace, L. J.; Wojnar, M. K.; Glor, C.
1982-01-01
State-of-the-art digital hardware and software for the simulation of complex structural dynamic interactions, such as those which occur in rotating structures (engine systems). System were incorporated in a designed to use an array of processors in which the computation for each physical subelement or functional subsystem would be assigned to a single specific processor in the simulator. These node processors are microprogrammed bit-slice microcomputers which function autonomously and can communicate with each other and a central control minicomputer over parallel digital lines. Inter-processor nearest neighbor communications busses pass the constants which represent physical constraints and boundary conditions. The node processors are connected to the six nearest neighbor node processors to simulate the actual physical interface of real substructures. Computer generated finite element mesh and force models can be developed with the aid of the central control minicomputer. The control computer also oversees the animation of a graphics display system, disk-based mass storage along with the individual processing elements.
Advanced graphical user interface for multi-physics simulations using AMST
NASA Astrophysics Data System (ADS)
Hoffmann, Florian; Vogel, Frank
2017-07-01
Numerical modelling of particulate matter has gained much popularity in recent decades. Advanced Multi-physics Simulation Technology (AMST) is a state-of-the-art three dimensional numerical modelling technique combining the eX-tended Discrete Element Method (XDEM) with Computational Fluid Dynamics (CFD) and Finite Element Analysis (FEA) [1]. One major limitation of this code is the lack of a graphical user interface (GUI) meaning that all pre-processing has to be made directly in a HDF5-file. This contribution presents the first graphical pre-processor developed for AMST.
BarraCUDA - a fast short read sequence aligner using graphics processing units
2012-01-01
Background With the maturation of next-generation DNA sequencing (NGS) technologies, the throughput of DNA sequencing reads has soared to over 600 gigabases from a single instrument run. General purpose computing on graphics processing units (GPGPU), extracts the computing power from hundreds of parallel stream processors within graphics processing cores and provides a cost-effective and energy efficient alternative to traditional high-performance computing (HPC) clusters. In this article, we describe the implementation of BarraCUDA, a GPGPU sequence alignment software that is based on BWA, to accelerate the alignment of sequencing reads generated by these instruments to a reference DNA sequence. Findings Using the NVIDIA Compute Unified Device Architecture (CUDA) software development environment, we ported the most computational-intensive alignment component of BWA to GPU to take advantage of the massive parallelism. As a result, BarraCUDA offers a magnitude of performance boost in alignment throughput when compared to a CPU core while delivering the same level of alignment fidelity. The software is also capable of supporting multiple CUDA devices in parallel to further accelerate the alignment throughput. Conclusions BarraCUDA is designed to take advantage of the parallelism of GPU to accelerate the alignment of millions of sequencing reads generated by NGS instruments. By doing this, we could, at least in part streamline the current bioinformatics pipeline such that the wider scientific community could benefit from the sequencing technology. BarraCUDA is currently available from http://seqbarracuda.sf.net PMID:22244497
A fast ultrasonic simulation tool based on massively parallel implementations
NASA Astrophysics Data System (ADS)
Lambert, Jason; Rougeron, Gilles; Lacassagne, Lionel; Chatillon, Sylvain
2014-02-01
This paper presents a CIVA optimized ultrasonic inspection simulation tool, which takes benefit of the power of massively parallel architectures: graphical processing units (GPU) and multi-core general purpose processors (GPP). This tool is based on the classical approach used in CIVA: the interaction model is based on Kirchoff, and the ultrasonic field around the defect is computed by the pencil method. The model has been adapted and parallelized for both architectures. At this stage, the configurations addressed by the tool are : multi and mono-element probes, planar specimens made of simple isotropic materials, planar rectangular defects or side drilled holes of small diameter. Validations on the model accuracy and performances measurements are presented.
Design and implementation of highly parallel pipelined VLSI systems
NASA Astrophysics Data System (ADS)
Delange, Alphonsus Anthonius Jozef
A methodology and its realization as a prototype CAD (Computer Aided Design) system for the design and analysis of complex multiprocessor systems is presented. The design is an iterative process in which the behavioral specifications of the system components are refined into structural descriptions consisting of interconnections and lower level components etc. A model for the representation and analysis of multiprocessor systems at several levels of abstraction and an implementation of a CAD system based on this model are described. A high level design language, an object oriented development kit for tool design, a design data management system, and design and analysis tools such as a high level simulator and graphics design interface which are integrated into the prototype system and graphics interface are described. Procedures for the synthesis of semiregular processor arrays, and to compute the switching of input/output signals, memory management and control of processor array, and sequencing and segmentation of input/output data streams due to partitioning and clustering of the processor array during the subsequent synthesis steps, are described. The architecture and control of a parallel system is designed and each component mapped to a module or module generator in a symbolic layout library, compacted for design rules of VLSI (Very Large Scale Integration) technology. An example of the design of a processor that is a useful building block for highly parallel pipelined systems in the signal/image processing domains is given.
PC-CUBE: A Personal Computer Based Hypercube
NASA Technical Reports Server (NTRS)
Ho, Alex; Fox, Geoffrey; Walker, David; Snyder, Scott; Chang, Douglas; Chen, Stanley; Breaden, Matt; Cole, Terry
1988-01-01
PC-CUBE is an ensemble of IBM PCs or close compatibles connected in the hypercube topology with ordinary computer cables. Communication occurs at the rate of 115.2 K-band via the RS-232 serial links. Available for PC-CUBE is the Crystalline Operating System III (CrOS III), Mercury Operating System, CUBIX and PLOTIX which are parallel I/O and graphics libraries. A CrOS performance monitor was developed to facilitate the measurement of communication and computation time of a program and their effects on performance. Also available are CXLISP, a parallel version of the XLISP interpreter; GRAFIX, some graphics routines for the EGA and CGA; and a general execution profiler for determining execution time spent by program subroutines. PC-CUBE provides a programming environment similar to all hypercube systems running CrOS III, Mercury and CUBIX. In addition, every node (personal computer) has its own graphics display monitor and storage devices. These allow data to be displayed or stored at every processor, which has much instructional value and enables easier debugging of applications. Some application programs which are taken from the book Solving Problems on Concurrent Processors (Fox 88) were implemented with graphics enhancement on PC-CUBE. The applications range from solving the Mandelbrot set, Laplace equation, wave equation, long range force interaction, to WaTor, an ecological simulation.
Autonomic Closure for Turbulent Flows Using Approximate Bayesian Computation
NASA Astrophysics Data System (ADS)
Doronina, Olga; Christopher, Jason; Hamlington, Peter; Dahm, Werner
2017-11-01
Autonomic closure is a new technique for achieving fully adaptive and physically accurate closure of coarse-grained turbulent flow governing equations, such as those solved in large eddy simulations (LES). Although autonomic closure has been shown in recent a priori tests to more accurately represent unclosed terms than do dynamic versions of traditional LES models, the computational cost of the approach makes it challenging to implement for simulations of practical turbulent flows at realistically high Reynolds numbers. The optimization step used in the approach introduces large matrices that must be inverted and is highly memory intensive. In order to reduce memory requirements, here we propose to use approximate Bayesian computation (ABC) in place of the optimization step, thereby yielding a computationally-efficient implementation of autonomic closure that trades memory-intensive for processor-intensive computations. The latter challenge can be overcome as co-processors such as general purpose graphical processing units become increasingly available on current generation petascale and exascale supercomputers. In this work, we outline the formulation of ABC-enabled autonomic closure and present initial results demonstrating the accuracy and computational cost of the approach.
The Distributed Diagonal Force Decomposition Method for Parallelizing Molecular Dynamics Simulations
Boršnik, Urban; Miller, Benjamin T.; Brooks, Bernard R.; Janežič, Dušanka
2011-01-01
Parallelization is an effective way to reduce the computational time needed for molecular dynamics simulations. We describe a new parallelization method, the distributed-diagonal force decomposition method, with which we extend and improve the existing force decomposition methods. Our new method requires less data communication during molecular dynamics simulations than replicated data and current force decomposition methods, increasing the parallel efficiency. It also dynamically load-balances the processors' computational load throughout the simulation. The method is readily implemented in existing molecular dynamics codes and it has been incorporated into the CHARMM program, allowing its immediate use in conjunction with the many molecular dynamics simulation techniques that are already present in the program. We also present the design of the Force Decomposition Machine, a cluster of personal computers and networks that is tailored to running molecular dynamics simulations using the distributed diagonal force decomposition method. The design is expandable and provides various degrees of fault resilience. This approach is easily adaptable to computers with Graphics Processing Units because it is independent of the processor type being used. PMID:21793007
Development of an embedded atmospheric turbulence mitigation engine
NASA Astrophysics Data System (ADS)
Paolini, Aaron; Bonnett, James; Kozacik, Stephen; Kelmelis, Eric
2017-05-01
Methods to reconstruct pictures from imagery degraded by atmospheric turbulence have been under development for decades. The techniques were initially developed for observing astronomical phenomena from the Earth's surface, but have more recently been modified for ground and air surveillance scenarios. Such applications can impose significant constraints on deployment options because they both increase the computational complexity of the algorithms themselves and often dictate a requirement for low size, weight, and power (SWaP) form factors. Consequently, embedded implementations must be developed that can perform the necessary computations on low-SWaP platforms. Fortunately, there is an emerging class of embedded processors driven by the mobile and ubiquitous computing industries. We have leveraged these processors to develop embedded versions of the core atmospheric correction engine found in our ATCOM software. In this paper, we will present our experience adapting our algorithms for embedded systems on a chip (SoCs), namely the NVIDIA Tegra that couples general-purpose ARM cores with their graphics processing unit (GPU) technology and the Xilinx Zynq which pairs similar ARM cores with their field-programmable gate array (FPGA) fabric.
Next Generation Security for the 10,240 Processor Columbia System
NASA Technical Reports Server (NTRS)
Hinke, Thomas; Kolano, Paul; Shaw, Derek; Keller, Chris; Tweton, Dave; Welch, Todd; Liu, Wen (Betty)
2005-01-01
This presentation includes a discussion of the Columbia 10,240-processor system located at the NASA Advanced Supercomputing (NAS) division at the NASA Ames Research Center which supports each of NASA's four missions: science, exploration systems, aeronautics, and space operations. It is comprised of 20 Silicon Graphics nodes, each consisting of 512 Itanium II processors. A 64 processor Columbia front-end system supports users as they prepare their jobs and then submits them to the PBS system. Columbia nodes and front-end systems use the Linux OS. Prior to SC04, the Columbia system was used to attain a processing speed of 51.87 TeraFlops, which made it number two on the Top 500 list of the world's supercomputers and the world's fastest "operational" supercomputer since it was fully engaged in supporting NASA users.
Stroboscope Controller for Imaging Helicopter Rotors
NASA Technical Reports Server (NTRS)
Jensen, Scott; Marmie, John; Mai, Nghia
2004-01-01
A versatile electronic timing-and-control unit, denoted a rotorcraft strobe controller, has been developed for use in controlling stroboscopes, lasers, video cameras, and other instruments for capturing still images of rotating machine parts especially helicopter rotors. This unit is designed to be compatible with a variety of sources of input shaftangle or timing signals and to be capable of generating a variety of output signals suitable for triggering instruments characterized by different input-signal specifications. It is also designed to be flexible and reconfigurable in that it can be modified and updated through changes in its control software, without need to change its hardware. Figure 1 is a block diagram of the rotorcraft strobe controller. The control processor is a high-density complementary metal oxide semiconductor, singlechip 8-bit microcontroller. It is connected to a 32K x 8 nonvolatile static random-access memory (RAM) module. Also connected to the control processor is a 32K 8 electrically programmable read-only-memory (EPROM) module, which is used to store the control software. Digital logic support circuitry is implemented in a field-programmable gate array (FPGA). A 240 x 128-dot, 40- character 16-line liquid-crystal display (LCD) module serves as a graphical user interface; the user provides input through a 16-key keypad mounted next to the LCD. A 12-bit digital-to-analog converter (DAC) generates a 0-to-10-V ramp output signal used as part of a rotor-blade monitoring system, while the control processor generates all the appropriate strobing signals. Optocouplers are used to isolate all input and output digital signals, and optoisolators are used to isolate all analog signals. The unit is designed to fit inside a 19-in. (.48-cm) rack-mount enclosure. Electronic components are mounted on a custom printed-circuit board (see Figure 2). Two power-conversion modules on the printedcircuit board convert AC power to +5 VDC and 15 VDC, respectively.
2013-05-25
graphics processors by IBM, AMD, and nVIDIA . They are between general-purpose pro- cessors and special-purpose processors. In Phase II. 3.10 Measure of...particular, Dr. Kevin Irick started a company Silicon Scapes and he has been the CEO. 5 Implications for Related/Future Research We speculate that...final project report in Jan. 2011. At the test and validation stage of the project. FANTOM’s partner at Raytheon quit from his company and hence from
Contemporary issues in HIM. The application layer--III.
Wear, L L; Pinkert, J R
1993-07-01
We have seen document preparation systems evolve from basic line editors through powerful, sophisticated desktop publishing programs. This component of the application layer is probably one of the most used, and most readily identifiable. Ask grade school children nowadays, and many will tell you that they have written a paper on a computer. Next month will be a "fun" tour through a number of other application programs we find useful. They will range from a simple notebook reminder to a sophisticated photograph processor. Application layer: Software targeted for the end user, focusing on a specific application area, and typically residing in the computer system as distinct components on top of the OS. Desktop publishing: A document preparation program that begins with the text features of a word processor, then adds the ability for a user to incorporate outputs from a variety of graphic programs, spreadsheets, and other applications. Line editor: A document preparation program that manipulates text in a file on the basis of numbered lines. Word processor: A document preparation program that can, among other things, reformat sections of documents, move and replace blocks of text, use multiple character fonts, automatically create a table of contents and index, create complex tables, and combine text and graphics.
Editing of EIA coded, numerically controlled, machine tool tapes
NASA Technical Reports Server (NTRS)
Weiner, J. M.
1975-01-01
Editing of numerically controlled (N/C) machine tool tapes (8-level paper tape) using an interactive graphic display processor is described. A rapid technique required for correcting production errors in N/C tapes was developed using the interactive text editor on the IMLAC PDS-ID graphic display system and two special programs resident on disk. The correction technique and special programs for processing N/C tapes coded to EIA specifications are discussed.
Modelling Nonlinear Dynamic Textures using Hybrid DWT-DCT and Kernel PCA with GPU
NASA Astrophysics Data System (ADS)
Ghadekar, Premanand Pralhad; Chopade, Nilkanth Bhikaji
2016-12-01
Most of the real-world dynamic textures are nonlinear, non-stationary, and irregular. Nonlinear motion also has some repetition of motion, but it exhibits high variation, stochasticity, and randomness. Hybrid DWT-DCT and Kernel Principal Component Analysis (KPCA) with YCbCr/YIQ colour coding using the Dynamic Texture Unit (DTU) approach is proposed to model a nonlinear dynamic texture, which provides better results than state-of-art methods in terms of PSNR, compression ratio, model coefficients, and model size. Dynamic texture is decomposed into DTUs as they help to extract temporal self-similarity. Hybrid DWT-DCT is used to extract spatial redundancy. YCbCr/YIQ colour encoding is performed to capture chromatic correlation. KPCA is applied to capture nonlinear motion. Further, the proposed algorithm is implemented on Graphics Processing Unit (GPU), which comprise of hundreds of small processors to decrease time complexity and to achieve parallelism.
Anandakrishnan, Ramu; Scogland, Tom R W; Fenley, Andrew T; Gordon, John C; Feng, Wu-chun; Onufriev, Alexey V
2010-06-01
Tools that compute and visualize biomolecular electrostatic surface potential have been used extensively for studying biomolecular function. However, determining the surface potential for large biomolecules on a typical desktop computer can take days or longer using currently available tools and methods. Two commonly used techniques to speed-up these types of electrostatic computations are approximations based on multi-scale coarse-graining and parallelization across multiple processors. This paper demonstrates that for the computation of electrostatic surface potential, these two techniques can be combined to deliver significantly greater speed-up than either one separately, something that is in general not always possible. Specifically, the electrostatic potential computation, using an analytical linearized Poisson-Boltzmann (ALPB) method, is approximated using the hierarchical charge partitioning (HCP) multi-scale method, and parallelized on an ATI Radeon 4870 graphical processing unit (GPU). The implementation delivers a combined 934-fold speed-up for a 476,040 atom viral capsid, compared to an equivalent non-parallel implementation on an Intel E6550 CPU without the approximation. This speed-up is significantly greater than the 42-fold speed-up for the HCP approximation alone or the 182-fold speed-up for the GPU alone. Copyright (c) 2010 Elsevier Inc. All rights reserved.
Tsuchimoto, Masashi; Tanimura, Yoshitaka
2015-08-11
A system with many energy states coupled to a harmonic oscillator bath is considered. To study quantum non-Markovian system-bath dynamics numerically rigorously and nonperturbatively, we developed a computer code for the reduced hierarchy equations of motion (HEOM) for a graphics processor unit (GPU) that can treat the system as large as 4096 energy states. The code employs a Padé spectrum decomposition (PSD) for a construction of HEOM and the exponential integrators. Dynamics of a quantum spin glass system are studied by calculating the free induction decay signal for the cases of 3 × 2 to 3 × 4 triangular lattices with antiferromagnetic interactions. We found that spins relax faster at lower temperature due to transitions through a quantum coherent state, as represented by the off-diagonal elements of the reduced density matrix, while it has been known that the spins relax slower due to suppression of thermal activation in a classical case. The decay of the spins are qualitatively similar regardless of the lattice sizes. The pathway of spin relaxation is analyzed under a sudden temperature drop condition. The Compute Unified Device Architecture (CUDA) based source code used in the present calculations is provided as Supporting Information .
NASA Astrophysics Data System (ADS)
Russkova, Tatiana V.
2017-11-01
One tool to improve the performance of Monte Carlo methods for numerical simulation of light transport in the Earth's atmosphere is the parallel technology. A new algorithm oriented to parallel execution on the CUDA-enabled NVIDIA graphics processor is discussed. The efficiency of parallelization is analyzed on the basis of calculating the upward and downward fluxes of solar radiation in both a vertically homogeneous and inhomogeneous models of the atmosphere. The results of testing the new code under various atmospheric conditions including continuous singlelayered and multilayered clouds, and selective molecular absorption are presented. The results of testing the code using video cards with different compute capability are analyzed. It is shown that the changeover of computing from conventional PCs to the architecture of graphics processors gives more than a hundredfold increase in performance and fully reveals the capabilities of the technology used.
ProteinShader: illustrative rendering of macromolecules
Weber, Joseph R
2009-01-01
Background Cartoon-style illustrative renderings of proteins can help clarify structural features that are obscured by space filling or balls and sticks style models, and recent advances in programmable graphics cards offer many new opportunities for improving illustrative renderings. Results The ProteinShader program, a new tool for macromolecular visualization, uses information from Protein Data Bank files to produce illustrative renderings of proteins that approximate what an artist might create by hand using pen and ink. A combination of Hermite and spherical linear interpolation is used to draw smooth, gradually rotating three-dimensional tubes and ribbons with a repeating pattern of texture coordinates, which allows the application of texture mapping, real-time halftoning, and smooth edge lines. This free platform-independent open-source program is written primarily in Java, but also makes extensive use of the OpenGL Shading Language to modify the graphics pipeline. Conclusion By programming to the graphics processor unit, ProteinShader is able to produce high quality images and illustrative rendering effects in real-time. The main feature that distinguishes ProteinShader from other free molecular visualization tools is its use of texture mapping techniques that allow two-dimensional images to be mapped onto the curved three-dimensional surfaces of ribbons and tubes with minimum distortion of the images. PMID:19331660
Using R in Taverna: RShell v1.2
Wassink, Ingo; Rauwerda, Han; Neerincx, Pieter BT; Vet, Paul E van der; Breit, Timo M; Leunissen, Jack AM; Nijholt, Anton
2009-01-01
Background R is the statistical language commonly used by many life scientists in (omics) data analysis. At the same time, these complex analyses benefit from a workflow approach, such as used by the open source workflow management system Taverna. However, Taverna had limited support for R, because it supported just a few data types and only a single output. Also, there was no support for graphical output and persistent sessions. Altogether this made using R in Taverna impractical. Findings We have developed an R plugin for Taverna: RShell, which provides R functionality within workflows designed in Taverna. In order to fully support the R language, our RShell plugin directly uses the R interpreter. The RShell plugin consists of a Taverna processor for R scripts and an RShell Session Manager that communicates with the R server. We made the RShell processor highly configurable allowing the user to define multiple inputs and outputs. Also, various data types are supported, such as strings, numeric data and images. To limit data transport between multiple RShell processors, the RShell plugin also supports persistent sessions. Here, we will describe the architecture of RShell and the new features that are introduced in version 1.2, i.e.: i) Support for R up to and including R version 2.9; ii) Support for persistent sessions to limit data transfer; iii) Support for vector graphics output through PDF; iv)Syntax highlighting of the R code; v) Improved usability through fewer port types. Our new RShell processor is backwards compatible with workflows that use older versions of the RShell processor. We demonstrate the value of the RShell processor by a use-case workflow that maps oligonucleotide probes designed with DNA sequence information from Vega onto the Ensembl genome assembly. Conclusion Our RShell plugin enables Taverna users to employ R scripts within their workflows in a highly configurable way. PMID:19607662
NASA Technical Reports Server (NTRS)
Chawner, David M.; Gomez, Ray J.
2010-01-01
In the Applied Aerosciences and CFD branch at Johnson Space Center, computational simulations are run that face many challenges. Two of which are the ability to customize software for specialized needs and the need to run simulations as fast as possible. There are many different tools that are used for running these simulations and each one has its own pros and cons. Once these simulations are run, there needs to be software capable of visualizing the results in an appealing manner. Some of this software is called open source, meaning that anyone can edit the source code to make modifications and distribute it to all other users in a future release. This is very useful, especially in this branch where many different tools are being used. File readers can be written to load any file format into a program, to ease the bridging from one tool to another. Programming such a reader requires knowledge of the file format that is being read as well as the equations necessary to obtain the derived values after loading. When running these CFD simulations, extremely large files are being loaded and having values being calculated. These simulations usually take a few hours to complete, even on the fastest machines. Graphics processing units (GPUs) are usually used to load the graphics for computers; however, in recent years, GPUs are being used for more generic applications because of the speed of these processors. Applications run on GPUs have been known to run up to forty times faster than they would on normal central processing units (CPUs). If these CFD programs are extended to run on GPUs, the amount of time they would require to complete would be much less. This would allow more simulations to be run in the same amount of time and possibly perform more complex computations.
Chikkagoudar, Satish; Wang, Kai; Li, Mingyao
2011-05-26
Gene-gene interaction in genetic association studies is computationally intensive when a large number of SNPs are involved. Most of the latest Central Processing Units (CPUs) have multiple cores, whereas Graphics Processing Units (GPUs) also have hundreds of cores and have been recently used to implement faster scientific software. However, currently there are no genetic analysis software packages that allow users to fully utilize the computing power of these multi-core devices for genetic interaction analysis for binary traits. Here we present a novel software package GENIE, which utilizes the power of multiple GPU or CPU processor cores to parallelize the interaction analysis. GENIE reads an entire genetic association study dataset into memory and partitions the dataset into fragments with non-overlapping sets of SNPs. For each fragment, GENIE analyzes: 1) the interaction of SNPs within it in parallel, and 2) the interaction between the SNPs of the current fragment and other fragments in parallel. We tested GENIE on a large-scale candidate gene study on high-density lipoprotein cholesterol. Using an NVIDIA Tesla C1060 graphics card, the GPU mode of GENIE achieves a speedup of 27 times over its single-core CPU mode run. GENIE is open-source, economical, user-friendly, and scalable. Since the computing power and memory capacity of graphics cards are increasing rapidly while their cost is going down, we anticipate that GENIE will achieve greater speedups with faster GPU cards. Documentation, source code, and precompiled binaries can be downloaded from http://www.cceb.upenn.edu/~mli/software/GENIE/.
2011-01-01
Background Gene-gene interaction in genetic association studies is computationally intensive when a large number of SNPs are involved. Most of the latest Central Processing Units (CPUs) have multiple cores, whereas Graphics Processing Units (GPUs) also have hundreds of cores and have been recently used to implement faster scientific software. However, currently there are no genetic analysis software packages that allow users to fully utilize the computing power of these multi-core devices for genetic interaction analysis for binary traits. Findings Here we present a novel software package GENIE, which utilizes the power of multiple GPU or CPU processor cores to parallelize the interaction analysis. GENIE reads an entire genetic association study dataset into memory and partitions the dataset into fragments with non-overlapping sets of SNPs. For each fragment, GENIE analyzes: 1) the interaction of SNPs within it in parallel, and 2) the interaction between the SNPs of the current fragment and other fragments in parallel. We tested GENIE on a large-scale candidate gene study on high-density lipoprotein cholesterol. Using an NVIDIA Tesla C1060 graphics card, the GPU mode of GENIE achieves a speedup of 27 times over its single-core CPU mode run. Conclusions GENIE is open-source, economical, user-friendly, and scalable. Since the computing power and memory capacity of graphics cards are increasing rapidly while their cost is going down, we anticipate that GENIE will achieve greater speedups with faster GPU cards. Documentation, source code, and precompiled binaries can be downloaded from http://www.cceb.upenn.edu/~mli/software/GENIE/. PMID:21615923
Modelling excitonic-energy transfer in light-harvesting complexes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kramer, Tobias; Kreisbeck, Christoph
The theoretical and experimental study of energy transfer in photosynthesis has revealed an interesting transport regime, which lies at the borderline between classical transport dynamics and quantum-mechanical interference effects. Dissipation is caused by the coupling of electronic degrees of freedom to vibrational modes and leads to a directional energy transfer from the antenna complex to the target reaction-center. The dissipative driving is robust and does not rely on fine-tuning of specific vibrational modes. For the parameter regime encountered in the biological systems new theoretical tools are required to directly compare theoretical results with experimental spectroscopy data. The calculations require tomore » utilize massively parallel graphics processor units (GPUs) for efficient and exact computations.« less
The 2nd Symposium on the Frontiers of Massively Parallel Computations
NASA Technical Reports Server (NTRS)
Mills, Ronnie (Editor)
1988-01-01
Programming languages, computer graphics, neural networks, massively parallel computers, SIMD architecture, algorithms, digital terrain models, sort computation, simulation of charged particle transport on the massively parallel processor and image processing are among the topics discussed.
Hierarchical fractional-step approximations and parallel kinetic Monte Carlo algorithms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Arampatzis, Giorgos, E-mail: garab@math.uoc.gr; Katsoulakis, Markos A., E-mail: markos@math.umass.edu; Plechac, Petr, E-mail: plechac@math.udel.edu
2012-10-01
We present a mathematical framework for constructing and analyzing parallel algorithms for lattice kinetic Monte Carlo (KMC) simulations. The resulting algorithms have the capacity to simulate a wide range of spatio-temporal scales in spatially distributed, non-equilibrium physiochemical processes with complex chemistry and transport micro-mechanisms. Rather than focusing on constructing exactly the stochastic trajectories, our approach relies on approximating the evolution of observables, such as density, coverage, correlations and so on. More specifically, we develop a spatial domain decomposition of the Markov operator (generator) that describes the evolution of all observables according to the kinetic Monte Carlo algorithm. This domain decompositionmore » corresponds to a decomposition of the Markov generator into a hierarchy of operators and can be tailored to specific hierarchical parallel architectures such as multi-core processors or clusters of Graphical Processing Units (GPUs). Based on this operator decomposition, we formulate parallel Fractional step kinetic Monte Carlo algorithms by employing the Trotter Theorem and its randomized variants; these schemes, (a) are partially asynchronous on each fractional step time-window, and (b) are characterized by their communication schedule between processors. The proposed mathematical framework allows us to rigorously justify the numerical and statistical consistency of the proposed algorithms, showing the convergence of our approximating schemes to the original serial KMC. The approach also provides a systematic evaluation of different processor communicating schedules. We carry out a detailed benchmarking of the parallel KMC schemes using available exact solutions, for example, in Ising-type systems and we demonstrate the capabilities of the method to simulate complex spatially distributed reactions at very large scales on GPUs. Finally, we discuss work load balancing between processors and propose a re-balancing scheme based on probabilistic mass transport methods.« less
Commercial Off-The-Shelf (COTS) Graphics Processing Board (GPB) Radiation Test Evaluation Report
NASA Technical Reports Server (NTRS)
Salazar, George A.; Steele, Glen F.
2013-01-01
Large round trip communications latency for deep space missions will require more onboard computational capabilities to enable the space vehicle to undertake many tasks that have traditionally been ground-based, mission control responsibilities. As a result, visual display graphics will be required to provide simpler vehicle situational awareness through graphical representations, as well as provide capabilities never before done in a space mission, such as augmented reality for in-flight maintenance or Telepresence activities. These capabilities will require graphics processors and associated support electronic components for high computational graphics processing. In an effort to understand the performance of commercial graphics card electronics operating in the expected radiation environment, a preliminary test was performed on five commercial offthe- shelf (COTS) graphics cards. This paper discusses the preliminary evaluation test results of five COTS graphics processing cards tested to the International Space Station (ISS) low earth orbit radiation environment. Three of the five graphics cards were tested to a total dose of 6000 rads (Si). The test articles, test configuration, preliminary results, and recommendations are discussed.
Computer animation of modal and transient vibrations
NASA Technical Reports Server (NTRS)
Lipman, Robert R.
1987-01-01
An interactive computer graphics processor is described that is capable of generating input to animate modal and transient vibrations of finite element models on an interactive graphics system. The results from NASTRAN can be postprocessed such that a three dimensional wire-frame picture, in perspective, of the finite element mesh is drawn on the graphics display. Modal vibrations of any mode shape or transient motions over any range of steps can be animated. The finite element mesh can be color-coded by any component of displacement. Viewing parameters and the rate of vibration of the finite element model can be interactively updated while the structure is vibrating.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stevens, E.J.; McNeilly, G.S.
The existing National Center for Atmospheric Research (NCAR) code in the Hamburg Oceanic Carbon Cycle Circulation Model and the Hamburg Large-Scale Geostrophic Ocean General Circulation Model was modernized and reduced in size while still producing an equivalent end result. A reduction in the size of the existing code from more than 50,000 lines to approximately 7,500 lines in the new code has made the new code much easier to maintain. The existing code in Hamburg model uses legacy NCAR (including even emulated CALCOMP subrountines) graphics to display graphical output. The new code uses only current (version 3.1) NCAR subrountines.
The role of graphics super-workstations in a supercomputing environment
NASA Technical Reports Server (NTRS)
Levin, E.
1989-01-01
A new class of very powerful workstations has recently become available which integrate near supercomputer computational performance with very powerful and high quality graphics capability. These graphics super-workstations are expected to play an increasingly important role in providing an enhanced environment for supercomputer users. Their potential uses include: off-loading the supercomputer (by serving as stand-alone processors, by post-processing of the output of supercomputer calculations, and by distributed or shared processing), scientific visualization (understanding of results, communication of results), and by real time interaction with the supercomputer (to steer an iterative computation, to abort a bad run, or to explore and develop new algorithms).
Scalable Unix tools on parallel processors
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gropp, W.; Lusk, E.
1994-12-31
The introduction of parallel processors that run a separate copy of Unix on each process has introduced new problems in managing the user`s environment. This paper discusses some generalizations of common Unix commands for managing files (e.g. 1s) and processes (e.g. ps) that are convenient and scalable. These basic tools, just like their Unix counterparts, are text-based. We also discuss a way to use these with a graphical user interface (GUI). Some notes on the implementation are provided. Prototypes of these commands are publicly available.
Programmable architecture for pixel level processing tasks in lightweight strapdown IR seekers
NASA Astrophysics Data System (ADS)
Coates, James L.
1993-06-01
Typical processing tasks associated with missile IR seeker applications are described, and a straw man suite of algorithms is presented. A fully programmable multiprocessor architecture is realized on a multimedia video processor (MVP) developed by Texas Instruments. The MVP combines the elements of RISC, floating point, advanced DSPs, graphics processors, display and acquisition control, RAM, and external memory. Front end pixel level tasks typical of missile interceptor applications, operating on 256 x 256 sensor imagery, can be processed at frame rates exceeding 100 Hz in a single MVP chip.
List-mode PET image reconstruction for motion correction using the Intel XEON PHI co-processor
NASA Astrophysics Data System (ADS)
Ryder, W. J.; Angelis, G. I.; Bashar, R.; Gillam, J. E.; Fulton, R.; Meikle, S.
2014-03-01
List-mode image reconstruction with motion correction is computationally expensive, as it requires projection of hundreds of millions of rays through a 3D array. To decrease reconstruction time it is possible to use symmetric multiprocessing computers or graphics processing units. The former can have high financial costs, while the latter can require refactoring of algorithms. The Xeon Phi is a new co-processor card with a Many Integrated Core architecture that can run 4 multiple-instruction, multiple data threads per core with each thread having a 512-bit single instruction, multiple data vector register. Thus, it is possible to run in the region of 220 threads simultaneously. The aim of this study was to investigate whether the Xeon Phi co-processor card is a viable alternative to an x86 Linux server for accelerating List-mode PET image reconstruction for motion correction. An existing list-mode image reconstruction algorithm with motion correction was ported to run on the Xeon Phi coprocessor with the multi-threading implemented using pthreads. There were no differences between images reconstructed using the Phi co-processor card and images reconstructed using the same algorithm run on a Linux server. However, it was found that the reconstruction runtimes were 3 times greater for the Phi than the server. A new version of the image reconstruction algorithm was developed in C++ using OpenMP for mutli-threading and the Phi runtimes decreased to 1.67 times that of the host Linux server. Data transfer from the host to co-processor card was found to be a rate-limiting step; this needs to be carefully considered in order to maximize runtime speeds. When considering the purchase price of a Linux workstation with Xeon Phi co-processor card and top of the range Linux server, the former is a cost-effective computation resource for list-mode image reconstruction. A multi-Phi workstation could be a viable alternative to cluster computers at a lower cost for medical imaging applications.
Image selection system. [computerized data storage and retrieval system
NASA Technical Reports Server (NTRS)
Knutson, M. A.; Hurd, D.; Hubble, L.; Kroeck, R. M.
1974-01-01
An image selection (ISS) was developed for the NASA-Ames Research Center Earth Resources Aircraft Project. The ISS is an interactive, graphics oriented, computer retrieval system for aerial imagery. An analysis of user coverage requests and retrieval strategies is presented, followed by a complete system description. Data base structure, retrieval processors, command language, interactive display options, file structures, and the system's capability to manage sets of selected imagery are described. A detailed example of an area coverage request is graphically presented.
Implementation of a cone-beam backprojection algorithm on the cell broadband engine processor
NASA Astrophysics Data System (ADS)
Bockenbach, Olivier; Knaup, Michael; Kachelrieß, Marc
2007-03-01
Tomographic image reconstruction is computationally very demanding. In all cases the backprojection represents the performance bottleneck due to the high operational count and due to the high demand put on the memory subsystem. In the past, solving this problem has lead to the implementation of specific architectures, connecting Application Specific Integrated Circuits (ASICs) or Field Programmable Gate Arrays (FPGAs) to memory through dedicated high speed busses. More recently, there have also been attempt to use Graphic Processing Units (GPUs) to perform the backprojection step. Originally aimed at the gaming market, IBM, Toshiba and Sony have introduced the Cell Broadband Engine (CBE) processor, often considered as a multicomputer on a chip. Clocked at 3 GHz, the Cell allows for a theoretical performance of 192 GFlops and a peak data transfer rate over the internal bus of 200 GB/s. This performance indeed makes the Cell a very attractive architecture for implementing tomographic image reconstruction algorithms. In this study, we investigate the relative performance of a perspective backprojection algorithm when implemented on a standard PC and on the Cell processor. We compare these results to the performance achievable with FPGAs based boards and high end GPUs. The cone-beam backprojection performance was assessed by backprojecting a full circle scan of 512 projections of 1024x1024 pixels into a volume of size 512x512x512 voxels. It took 3.2 minutes on the PC (single CPU) and is as fast as 13.6 seconds on the Cell.
NASA Astrophysics Data System (ADS)
Hill, C.
2008-12-01
Low cost graphic cards today use many, relatively simple, compute cores to deliver support for memory bandwidth of more than 100GB/s and theoretical floating point performance of more than 500 GFlop/s. Right now this performance is, however, only accessible to highly parallel algorithm implementations that, (i) can use a hundred or more, 32-bit floating point, concurrently executing cores, (ii) can work with graphics memory that resides on the graphics card side of the graphics bus and (iii) can be partially expressed in a language that can be compiled by a graphics programming tool. In this talk we describe our experiences implementing a complete, but relatively simple, time dependent shallow-water equations simulation targeting a cluster of 30 computers each hosting one graphics card. The implementation takes into account the considerations (i), (ii) and (iii) listed previously. We code our algorithm as a series of numerical kernels. Each kernel is designed to be executed by multiple threads of a single process. Kernels are passed memory blocks to compute over which can be persistent blocks of memory on a graphics card. Each kernel is individually implemented using the NVidia CUDA language but driven from a higher level supervisory code that is almost identical to a standard model driver. The supervisory code controls the overall simulation timestepping, but is written to minimize data transfer between main memory and graphics memory (a massive performance bottle-neck on current systems). Using the recipe outlined we can boost the performance of our cluster by nearly an order of magnitude, relative to the same algorithm executing only on the cluster CPU's. Achieving this performance boost requires that many threads are available to each graphics processor for execution within each numerical kernel and that the simulations working set of data can fit into the graphics card memory. As we describe, this puts interesting upper and lower bounds on the problem sizes for which this technology is currently most useful. However, many interesting problems fit within this envelope. Looking forward, we extrapolate our experience to estimate full-scale ocean model performance and applicability. Finally we describe preliminary hybrid mixed 32-bit and 64-bit experiments with graphics cards that support 64-bit arithmetic, albeit at a lower performance.
NASA Astrophysics Data System (ADS)
Andrade, Xavier; Alberdi-Rodriguez, Joseba; Strubbe, David A.; Oliveira, Micael J. T.; Nogueira, Fernando; Castro, Alberto; Muguerza, Javier; Arruabarrena, Agustin; Louie, Steven G.; Aspuru-Guzik, Alán; Rubio, Angel; Marques, Miguel A. L.
2012-06-01
Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great potential for execution in massively parallel systems such as modern supercomputers with thousands of processors and graphics processing units (GPUs). For harvesting the potential of conventional supercomputers, the main strategy is a multi-level parallelization scheme that combines the inherent scalability of real-time TDDFT with a real-space grid domain-partitioning approach. A scalable Poisson solver is critical for the efficiency of this scheme. For GPUs, we show how using blocks of Kohn-Sham states provides the required level of data parallelism and that this strategy is also applicable for code optimization on standard processors. Our results show that real-time TDDFT, as implemented in octopus, can be the method of choice for studying the excited states of large molecular systems in modern parallel architectures.
Andrade, Xavier; Alberdi-Rodriguez, Joseba; Strubbe, David A; Oliveira, Micael J T; Nogueira, Fernando; Castro, Alberto; Muguerza, Javier; Arruabarrena, Agustin; Louie, Steven G; Aspuru-Guzik, Alán; Rubio, Angel; Marques, Miguel A L
2012-06-13
Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great potential for execution in massively parallel systems such as modern supercomputers with thousands of processors and graphics processing units (GPUs). For harvesting the potential of conventional supercomputers, the main strategy is a multi-level parallelization scheme that combines the inherent scalability of real-time TDDFT with a real-space grid domain-partitioning approach. A scalable Poisson solver is critical for the efficiency of this scheme. For GPUs, we show how using blocks of Kohn-Sham states provides the required level of data parallelism and that this strategy is also applicable for code optimization on standard processors. Our results show that real-time TDDFT, as implemented in octopus, can be the method of choice for studying the excited states of large molecular systems in modern parallel architectures.
Towards the formal specification of the requirements and design of a processor interface unit
NASA Technical Reports Server (NTRS)
Fura, David A.; Windley, Phillip J.; Cohen, Gerald C.
1993-01-01
Work to formally specify the requirements and design of a Processor Interface Unit (PIU), a single-chip subsystem providing memory interface, bus interface, and additional support services for a commercial microprocessor within a fault-tolerant computer system, is described. This system, the Fault-Tolerant Embedded Processor (FTEP), is targeted towards applications in avionics and space requiring extremely high levels of mission reliability, extended maintenance free operation, or both. The approaches that were developed for modeling the PIU requirements and for composition of the PIU subcomponents at high levels of abstraction are described. These approaches were used to specify and verify a nontrivial subset of the PIU behavior. The PIU specification in Higher Order Logic (HOL) is documented in a companion NASA contractor report entitled 'Towards the Formal Specification of the Requirements and Design of a Processor Interfacs Unit - HOL Listings.' The subsequent verification approach and HOL listings are documented in NASA contractor report entitled 'Towards the Formal Verification of the Requirements and Design of a Processor Interface Unit' and NASA contractor report entitled 'Towards the Formal Verification of the Requirements and Design of a Processor Interface Unit - HOL Listings.'
NASTRAN pre and postprocessors using low-cost interactive graphics
NASA Technical Reports Server (NTRS)
Herness, E. D.; Kriloff, H. Z.
1975-01-01
A design for a NASTRAN preprocessor is given to illustrate a typical preprocessor. Several displays of NASTRAN models illustrate the preprocessor's capabilities. A design of a NASTRAN postprocessor is presented along with an example of displays generated by that NASTRAN processor.
Software Reviews: Programs Worth a Second Look.
ERIC Educational Resources Information Center
Classroom Computer Learning, 1989
1989-01-01
Reviews three software programs: (1) "Microsoft Works 2.0": word processing, data processing, and telecommunications, grades 7 and up; (2) "AppleWorks GS": word processor, database, spreadsheet, graphics, and telecommunications, grades 3-12, Apple IIGS; (3) "Choices, Choices: On the Playground, Taking Responsibility":…
Federal Register 2010, 2011, 2012, 2013, 2014
2017-09-21
...Notice is hereby given that the U.S. International Trade Commission has determined not to review the presiding administrative law judge's (``ALJ'') initial determination (``ID'') (Order No. 26) terminating the investigation on the basis of settlement.
Freiberger, Manuel; Egger, Herbert; Liebmann, Manfred; Scharfetter, Hermann
2011-11-01
Image reconstruction in fluorescence optical tomography is a three-dimensional nonlinear ill-posed problem governed by a system of partial differential equations. In this paper we demonstrate that a combination of state of the art numerical algorithms and a careful hardware optimized implementation allows to solve this large-scale inverse problem in a few seconds on standard desktop PCs with modern graphics hardware. In particular, we present methods to solve not only the forward but also the non-linear inverse problem by massively parallel programming on graphics processors. A comparison of optimized CPU and GPU implementations shows that the reconstruction can be accelerated by factors of about 15 through the use of the graphics hardware without compromising the accuracy in the reconstructed images.
Multi-mode sensor processing on a dynamically reconfigurable massively parallel processor array
NASA Astrophysics Data System (ADS)
Chen, Paul; Butts, Mike; Budlong, Brad; Wasson, Paul
2008-04-01
This paper introduces a novel computing architecture that can be reconfigured in real time to adapt on demand to multi-mode sensor platforms' dynamic computational and functional requirements. This 1 teraOPS reconfigurable Massively Parallel Processor Array (MPPA) has 336 32-bit processors. The programmable 32-bit communication fabric provides streamlined inter-processor connections with deterministically high performance. Software programmability, scalability, ease of use, and fast reconfiguration time (ranging from microseconds to milliseconds) are the most significant advantages over FPGAs and DSPs. This paper introduces the MPPA architecture, its programming model, and methods of reconfigurability. An MPPA platform for reconfigurable computing is based on a structural object programming model. Objects are software programs running concurrently on hundreds of 32-bit RISC processors and memories. They exchange data and control through a network of self-synchronizing channels. A common application design pattern on this platform, called a work farm, is a parallel set of worker objects, with one input and one output stream. Statically configured work farms with homogeneous and heterogeneous sets of workers have been used in video compression and decompression, network processing, and graphics applications.
The 28-entity IGES test file results using ComputerVision CADDS 4X
NASA Technical Reports Server (NTRS)
Kuan, Anchyi; Shah, Saurin; Smith, Kevin
1987-01-01
The investigation was based on the following steps: (1) Read the 28 Entity IGES (Initial Graphics Exchange Specification) Test File into the CAD data base with the IGES post-processor; (2) Make the modifications to the displayed geometries, which should produce the normalized front view and the drawing entity defined display; (3) Produce the drawing entity defined display of the file as it appears in the CAD system after modification to the geometry; (4) Translate the file back to IGES format using IGES pre-processor; (5) Read the IGES file produced by the pre-processor back into the CAD data base; (6) Produce another drawing entity defined display of the CAD display; and (7) Compare the plots resulting from steps 3 and 6 - they should be identical to each other.
Real time 3D structural and Doppler OCT imaging on graphics processing units
NASA Astrophysics Data System (ADS)
Sylwestrzak, Marcin; Szlag, Daniel; Szkulmowski, Maciej; Gorczyńska, Iwona; Bukowska, Danuta; Wojtkowski, Maciej; Targowski, Piotr
2013-03-01
In this report the application of graphics processing unit (GPU) programming for real-time 3D Fourier domain Optical Coherence Tomography (FdOCT) imaging with implementation of Doppler algorithms for visualization of the flows in capillary vessels is presented. Generally, the time of the data processing of the FdOCT data on the main processor of the computer (CPU) constitute a main limitation for real-time imaging. Employing additional algorithms, such as Doppler OCT analysis, makes this processing even more time consuming. Lately developed GPUs, which offers a very high computational power, give a solution to this problem. Taking advantages of them for massively parallel data processing, allow for real-time imaging in FdOCT. The presented software for structural and Doppler OCT allow for the whole processing with visualization of 2D data consisting of 2000 A-scans generated from 2048 pixels spectra with frame rate about 120 fps. The 3D imaging in the same mode of the volume data build of 220 × 100 A-scans is performed at a rate of about 8 frames per second. In this paper a software architecture, organization of the threads and optimization applied is shown. For illustration the screen shots recorded during real time imaging of the phantom (homogeneous water solution of Intralipid in glass capillary) and the human eye in-vivo is presented.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yang, Yi; Du, Liang
A system for different electric loads includes sensors structured to sense voltage and current signals for each of the different electric loads; a hierarchical load feature database having a plurality of layers, with one of the layers including a plurality of different load categories; and a processor. The processor acquires voltage and current waveforms from the sensors for a corresponding one of the different electric loads; maps a voltage-current trajectory to a grid including a plurality of cells, each of which is assigned a binary value of zero or one; extracts a plurality of different features from the mapped gridmore » of cells as a graphical signature of the corresponding one of the different electric loads; derives a category of the corresponding one of the different electric loads from the database; and identifies one of a plurality of different electric load types for the corresponding one of the different electric loads.« less
NASA Astrophysics Data System (ADS)
Stark, Julian; Rothe, Thomas; Kieß, Steffen; Simon, Sven; Kienle, Alwin
2016-04-01
Single cell nuclei were investigated using two-dimensional angularly and spectrally resolved scattering microscopy. We show that even for a qualitative comparison of experimental and theoretical data, the standard Mie model of a homogeneous sphere proves to be insufficient. Hence, an accelerated finite-difference time-domain method using a graphics processor unit and domain decomposition was implemented to analyze the experimental scattering patterns. The measured cell nuclei were modeled as single spheres with randomly distributed spherical inclusions of different size and refractive index representing the nucleoli and clumps of chromatin. Taking into account the nuclear heterogeneity of a large number of inclusions yields a qualitative agreement between experimental and theoretical spectra and illustrates the impact of the nuclear micro- and nanostructure on the scattering patterns.
Ramses-GPU: Second order MUSCL-Handcock finite volume fluid solver
NASA Astrophysics Data System (ADS)
Kestener, Pierre
2017-10-01
RamsesGPU is a reimplementation of RAMSES (ascl:1011.007) which drops the adaptive mesh refinement (AMR) features to optimize 3D uniform grid algorithms for modern graphics processor units (GPU) to provide an efficient software package for astrophysics applications that do not need AMR features but do require a very large number of integration time steps. RamsesGPU provides an very efficient C++/CUDA/MPI software implementation of a second order MUSCL-Handcock finite volume fluid solver for compressible hydrodynamics as a magnetohydrodynamics solver based on the constraint transport technique. Other useful modules includes static gravity, dissipative terms (viscosity, resistivity), and forcing source term for turbulence studies, and special care was taken to enhance parallel input/output performance by using state-of-the-art libraries such as HDF5 and parallel-netcdf.
Stochastic first passage time accelerated with CUDA
NASA Astrophysics Data System (ADS)
Pierro, Vincenzo; Troiano, Luigi; Mejuto, Elena; Filatrella, Giovanni
2018-05-01
The numerical integration of stochastic trajectories to estimate the time to pass a threshold is an interesting physical quantity, for instance in Josephson junctions and atomic force microscopy, where the full trajectory is not accessible. We propose an algorithm suitable for efficient implementation on graphical processing unit in CUDA environment. The proposed approach for well balanced loads achieves almost perfect scaling with the number of available threads and processors, and allows an acceleration of about 400× with a GPU GTX980 respect to standard multicore CPU. This method allows with off the shell GPU to challenge problems that are otherwise prohibitive, as thermal activation in slowly tilted potentials. In particular, we demonstrate that it is possible to simulate the switching currents distributions of Josephson junctions in the timescale of actual experiments.
Design and implementation of H.264 based embedded video coding technology
NASA Astrophysics Data System (ADS)
Mao, Jian; Liu, Jinming; Zhang, Jiemin
2016-03-01
In this paper, an embedded system for remote online video monitoring was designed and developed to capture and record the real-time circumstances in elevator. For the purpose of improving the efficiency of video acquisition and processing, the system selected Samsung S5PV210 chip as the core processor which Integrated graphics processing unit. And the video was encoded with H.264 format for storage and transmission efficiently. Based on S5PV210 chip, the hardware video coding technology was researched, which was more efficient than software coding. After running test, it had been proved that the hardware video coding technology could obviously reduce the cost of system and obtain the more smooth video display. It can be widely applied for the security supervision [1].
Stark, Julian; Rothe, Thomas; Kieß, Steffen; Simon, Sven; Kienle, Alwin
2016-04-07
Single cell nuclei were investigated using two-dimensional angularly and spectrally resolved scattering microscopy. We show that even for a qualitative comparison of experimental and theoretical data, the standard Mie model of a homogeneous sphere proves to be insufficient. Hence, an accelerated finite-difference time-domain method using a graphics processor unit and domain decomposition was implemented to analyze the experimental scattering patterns. The measured cell nuclei were modeled as single spheres with randomly distributed spherical inclusions of different size and refractive index representing the nucleoli and clumps of chromatin. Taking into account the nuclear heterogeneity of a large number of inclusions yields a qualitative agreement between experimental and theoretical spectra and illustrates the impact of the nuclear micro- and nanostructure on the scattering patterns.
NASA Astrophysics Data System (ADS)
Moody, Marc; Fisher, Robert; Little, J. Kristin
2014-06-01
Boeing has developed a degraded visual environment navigational aid that is flying on the Boeing AH-6 light attack helicopter. The navigational aid is a two dimensional software digital map underlay generated by the Boeing™ Geospatial Embedded Mapping Software (GEMS) and fully integrated with the operational flight program. The page format on the aircraft's multi function displays (MFD) is termed the Approach page. The existing work utilizes Digital Terrain Elevation Data (DTED) and OpenGL ES 2.0 graphics capabilities to compute the pertinent graphics underlay entirely on the graphics processor unit (GPU) within the AH-6 mission computer. The next release will incorporate cultural databases containing Digital Vertical Obstructions (DVO) to warn the crew of towers, buildings, and power lines when choosing an opportune landing site. Future IRAD will include Light Detection and Ranging (LIDAR) point cloud generating sensors to provide 2D and 3D synthetic vision on the final approach to the landing zone. Collision detection with respect to terrain, cultural, and point cloud datasets may be used to further augment the crew warning system. The techniques for creating the digital map underlay leverage the GPU almost entirely, making this solution viable on most embedded mission computing systems with an OpenGL ES 2.0 capable GPU. This paper focuses on the AH-6 crew interface process for determining a landing zone and flying the aircraft to it.
Fast 2D flood modelling using GPU technology - recent applications and new developments
NASA Astrophysics Data System (ADS)
Crossley, Amanda; Lamb, Rob; Waller, Simon; Dunning, Paul
2010-05-01
In recent years there has been considerable interest amongst scientists and engineers in exploiting the potential of commodity graphics hardware for desktop parallel computing. The Graphics Processing Units (GPUs) that are used in PC graphics cards have now evolved into powerful parallel co-processors that can be used to accelerate the numerical codes used for floodplain inundation modelling. We report in this paper on experience over the past two years in developing and applying two dimensional (2D) flood inundation models using GPUs to achieve significant practical performance benefits. Starting with a solution scheme for the 2D diffusion wave approximation to the 2D Shallow Water Equations (SWEs), we have demonstrated the capability to reduce model run times in ‘real-world' applications using GPU hardware and programming techniques. We then present results from a GPU-based 2D finite volume SWE solver. A series of numerical test cases demonstrate that the model produces outputs that are accurate and consistent with reference results published elsewhere. In comparisons conducted for a real world test case, the GPU-based SWE model was over 100 times faster than the CPU version. We conclude with some discussion of practical experience in using the GPU technology for flood mapping applications, and for research projects investigating use of Monte Carlo simulation methods for the analysis of uncertainty in 2D flood modelling.
VASP-4096: a very high performance programmable device for digital media processing applications
NASA Astrophysics Data System (ADS)
Krikelis, Argy
2001-03-01
Over the past few years, technology drivers for microprocessors have changed significantly. Media data delivery and processing--such as telecommunications, networking, video processing, speech recognition and 3D graphics--is increasing in importance and will soon dominate the processing cycles consumed in computer-based systems. This paper presents the architecture of the VASP-4096 processor. VASP-4096 provides high media performance with low energy consumption by integrating associative SIMD parallel processing with embedded microprocessor technology. The major innovations in the VASP-4096 is the integration of thousands of processing units in a single chip that are capable of support software programmable high-performance mathematical functions as well as abstract data processing. In addition to 4096 processing units, VASP-4096 integrates on a single chip a RISC controller that is an implementation of the SPARC architecture, 128 Kbytes of Data Memory, and I/O interfaces. The SIMD processing in VASP-4096 implements the ASProCore architecture, which is a proprietary implementation of SIMD processing, operates at 266 MHz with program instructions issued by the RISC controller. The device also integrates a 64-bit synchronous main memory interface operating at 133 MHz (double-data rate), and a 64- bit 66 MHz PCI interface. VASP-4096, compared with other processors architectures that support media processing, offers true performance scalability, support for deterministic and non-deterministic data processing on a single device, and software programmability that can be re- used in future chip generations.
Large Capacity Missile Carrier (CMX)
1993-12-01
FSU) is emerging from a turbulent period that lasted from about 1990-2005. Some of the great hopes for the emergence of democracy and open markets ...34* AN/ UGC -I43A(V) NST "• OK-455(V) LJHF DAMA "* AN/UYQ-62 C2P VER I Link Processor "* ANIWSC-3(V)3 UHF SAT Transmitter/Receiver "* AN/USC-38 EHF...Graphics System Command, Control & NAVMACS II Communications (C’) AN/ UGC -143A(V) NST OKg455(V) UHF DAMA AN/UYQ-62 C2P VER I Link Processor AN/WSC-3(V
NASA Technical Reports Server (NTRS)
Dorband, John E.
1987-01-01
Generating graphics to faithfully represent information can be a computationally intensive task. A way of using the Massively Parallel Processor to generate images by ray tracing is presented. This technique uses sort computation, a method of performing generalized routing interspersed with computation on a single-instruction-multiple-data (SIMD) computer.
NASA Astrophysics Data System (ADS)
Baregheh, Mandana; Mezentsev, Vladimir; Schmitz, Holger
2011-06-01
We describe a parallel multi-threaded approach for high performance modelling of wide class of phenomena in ultrafast nonlinear optics. Specific implementation has been performed using the highly parallel capabilities of a programmable graphics processor.
Symplectic multi-particle tracking on GPUs
NASA Astrophysics Data System (ADS)
Liu, Zhicong; Qiang, Ji
2018-05-01
A symplectic multi-particle tracking model is implemented on the Graphic Processing Units (GPUs) using the Compute Unified Device Architecture (CUDA) language. The symplectic tracking model can preserve phase space structure and reduce non-physical effects in long term simulation, which is important for beam property evaluation in particle accelerators. Though this model is computationally expensive, it is very suitable for parallelization and can be accelerated significantly by using GPUs. In this paper, we optimized the implementation of the symplectic tracking model on both single GPU and multiple GPUs. Using a single GPU processor, the code achieves a factor of 2-10 speedup for a range of problem sizes compared with the time on a single state-of-the-art Central Processing Unit (CPU) node with similar power consumption and semiconductor technology. It also shows good scalability on a multi-GPU cluster at Oak Ridge Leadership Computing Facility. In an application to beam dynamics simulation, the GPU implementation helps save more than a factor of two total computing time in comparison to the CPU implementation.
Sharp, G C; Kandasamy, N; Singh, H; Folkert, M
2007-10-07
This paper shows how to significantly accelerate cone-beam CT reconstruction and 3D deformable image registration using the stream-processing model. We describe data-parallel designs for the Feldkamp, Davis and Kress (FDK) reconstruction algorithm, and the demons deformable registration algorithm, suitable for use on a commodity graphics processing unit. The streaming versions of these algorithms are implemented using the Brook programming environment and executed on an NVidia 8800 GPU. Performance results using CT data of a preserved swine lung indicate that the GPU-based implementations of the FDK and demons algorithms achieve a substantial speedup--up to 80 times for FDK and 70 times for demons when compared to an optimized reference implementation on a 2.8 GHz Intel processor. In addition, the accuracy of the GPU-based implementations was found to be excellent. Compared with CPU-based implementations, the RMS differences were less than 0.1 Hounsfield unit for reconstruction and less than 0.1 mm for deformable registration.
GPU-based Parallel Application Design for Emerging Mobile Devices
NASA Astrophysics Data System (ADS)
Gupta, Kshitij
A revolution is underway in the computing world that is causing a fundamental paradigm shift in device capabilities and form-factor, with a move from well-established legacy desktop/laptop computers to mobile devices in varying sizes and shapes. Amongst all the tasks these devices must support, graphics has emerged as the 'killer app' for providing a fluid user interface and high-fidelity game rendering, effectively making the graphics processor (GPU) one of the key components in (present and future) mobile systems. By utilizing the GPU as a general-purpose parallel processor, this dissertation explores the GPU computing design space from an applications standpoint, in the mobile context, by focusing on key challenges presented by these devices---limited compute, memory bandwidth, and stringent power consumption requirements---while improving the overall application efficiency of the increasingly important speech recognition workload for mobile user interaction. We broadly partition trends in GPU computing into four major categories. We analyze hardware and programming model limitations in current-generation GPUs and detail an alternate programming style called Persistent Threads, identify four use case patterns, and propose minimal modifications that would be required for extending native support. We show how by manually extracting data locality and altering the speech recognition pipeline, we are able to achieve significant savings in memory bandwidth while simultaneously reducing the compute burden on GPU-like parallel processors. As we foresee GPU computing to evolve from its current 'co-processor' model into an independent 'applications processor' that is capable of executing complex work independently, we create an alternate application framework that enables the GPU to handle all control-flow dependencies autonomously at run-time while minimizing host involvement to just issuing commands, that facilitates an efficient application implementation. Finally, as compute and communication capabilities of mobile devices improve, we analyze energy implications of processing speech recognition locally (on-chip) and offloading it to servers (in-cloud).
Nearly Interactive Parabolized Navier-Stokes Solver for High Speed Forebody and Inlet Flows
NASA Technical Reports Server (NTRS)
Benson, Thomas J.; Liou, May-Fun; Jones, William H.; Trefny, Charles J.
2009-01-01
A system of computer programs is being developed for the preliminary design of high speed inlets and forebodies. The system comprises four functions: geometry definition, flow grid generation, flow solver, and graphics post-processor. The system runs on a dedicated personal computer using the Windows operating system and is controlled by graphical user interfaces written in MATLAB (The Mathworks, Inc.). The flow solver uses the Parabolized Navier-Stokes equations to compute millions of mesh points in several minutes. Sample two-dimensional and three-dimensional calculations are demonstrated in the paper.
Rubus: A compiler for seamless and extensible parallelism.
Adnan, Muhammad; Aslam, Faisal; Nawaz, Zubair; Sarwar, Syed Mansoor
2017-01-01
Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU), originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer's expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84 times has been achieved by Rubus on the same GPU. Moreover, Rubus achieves this performance without drastically increasing the memory footprint of a program.
Rubus: A compiler for seamless and extensible parallelism
Adnan, Muhammad; Aslam, Faisal; Sarwar, Syed Mansoor
2017-01-01
Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU), originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer’s expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84 times has been achieved by Rubus on the same GPU. Moreover, Rubus achieves this performance without drastically increasing the memory footprint of a program. PMID:29211758
Integrated rate isolation sensor
NASA Technical Reports Server (NTRS)
Brady, Tye (Inventor); Henderson, Timothy (Inventor); Phillips, Richard (Inventor); Zimpfer, Doug (Inventor); Crain, Tim (Inventor)
2012-01-01
In one embodiment, a system for providing fault-tolerant inertial measurement data includes a sensor for measuring an inertial parameter and a processor. The sensor has less accuracy than a typical inertial measurement unit (IMU). The processor detects whether a difference exists between a first data stream received from a first inertial measurement unit and a second data stream received from a second inertial measurement unit. Upon detecting a difference, the processor determines whether at least one of the first or second inertial measurement units has failed by comparing each of the first and second data streams to the inertial parameter.
Accelerating Monte Carlo simulations with an NVIDIA ® graphics processor
NASA Astrophysics Data System (ADS)
Martinsen, Paul; Blaschke, Johannes; Künnemeyer, Rainer; Jordan, Robert
2009-10-01
Modern graphics cards, commonly used in desktop computers, have evolved beyond a simple interface between processor and display to incorporate sophisticated calculation engines that can be applied to general purpose computing. The Monte Carlo algorithm for modelling photon transport in turbid media has been implemented on an NVIDIA ® 8800 GT graphics card using the CUDA toolkit. The Monte Carlo method relies on following the trajectory of millions of photons through the sample, often taking hours or days to complete. The graphics-processor implementation, processing roughly 110 million scattering events per second, was found to run more than 70 times faster than a similar, single-threaded implementation on a 2.67 GHz desktop computer. Program summaryProgram title: Phoogle-C/Phoogle-G Catalogue identifier: AEEB_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEEB_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 51 264 No. of bytes in distributed program, including test data, etc.: 2 238 805 Distribution format: tar.gz Programming language: C++ Computer: Designed for Intel PCs. Phoogle-G requires a NVIDIA graphics card with support for CUDA 1.1 Operating system: Windows XP Has the code been vectorised or parallelized?: Phoogle-G is written for SIMD architectures RAM: 1 GB Classification: 21.1 External routines: Charles Karney Random number library. Microsoft Foundation Class library. NVIDA CUDA library [1]. Nature of problem: The Monte Carlo technique is an effective algorithm for exploring the propagation of light in turbid media. However, accurate results require tracing the path of many photons within the media. The independence of photons naturally lends the Monte Carlo technique to implementation on parallel architectures. Generally, parallel computing can be expensive, but recent advances in consumer grade graphics cards have opened the possibility of high-performance desktop parallel-computing. Solution method: In this pair of programmes we have implemented the Monte Carlo algorithm described by Prahl et al. [2] for photon transport in infinite scattering media to compare the performance of two readily accessible architectures: a standard desktop PC and a consumer grade graphics card from NVIDIA. Restrictions: The graphics card implementation uses single precision floating point numbers for all calculations. Only photon transport from an isotropic point-source is supported. The graphics-card version has no user interface. The simulation parameters must be set in the source code. The desktop version has a simple user interface; however some properties can only be accessed through an ActiveX client (such as Matlab). Additional comments: The random number library used has a LGPL ( http://www.gnu.org/copyleft/lesser.html) licence. Running time: Runtime can range from minutes to months depending on the number of photons simulated and the optical properties of the medium. References:http://www.nvidia.com/object/cuda_home.html. S. Prahl, M. Keijzer, Sl. Jacques, A. Welch, SPIE Institute Series 5 (1989) 102.
Combining high performance simulation, data acquisition, and graphics display computers
NASA Technical Reports Server (NTRS)
Hickman, Robert J.
1989-01-01
Issues involved in the continuing development of an advanced simulation complex are discussed. This approach provides the capability to perform the majority of tests on advanced systems, non-destructively. The controlled test environments can be replicated to examine the response of the systems under test to alternative treatments of the system control design, or test the function and qualification of specific hardware. Field tests verify that the elements simulated in the laboratories are sufficient. The digital computer is hosted by a Digital Equipment Corp. MicroVAX computer with an Aptec Computer Systems Model 24 I/O computer performing the communication function. An Applied Dynamics International AD100 performs the high speed simulation computing and an Evans and Sutherland PS350 performs on-line graphics display. A Scientific Computer Systems SCS40 acts as a high performance FORTRAN program processor to support the complex, by generating numerous large files from programs coded in FORTRAN that are required for the real time processing. Four programming languages are involved in the process, FORTRAN, ADSIM, ADRIO, and STAPLE. FORTRAN is employed on the MicroVAX host to initialize and terminate the simulation runs on the system. The generation of the data files on the SCS40 also is performed with FORTRAN programs. ADSIM and ADIRO are used to program the processing elements of the AD100 and its IOCP processor. STAPLE is used to program the Aptec DIP and DIA processors.
The Application of Logic Programming to Communication Education.
ERIC Educational Resources Information Center
Sanford, David L.
Recommending that communication students be required to learn to use computers not merely as number crunchers, word processors, data bases, and graphics generators, but also as logical inference makers, this paper examines the recently developed technology of logical programing in computer languages. It presents two syllogisms and shows how they…
Consistent improvements in processor speed and computer access have substantially increased the use of computer modeling by experts and non-experts alike. Several new computer modeling packages operating under graphical operating systems (i.e. Microsoft Windows or Macintosh) m...
Towards the formal verification of the requirements and design of a processor interface unit
NASA Technical Reports Server (NTRS)
Fura, David A.; Windley, Phillip J.; Cohen, Gerald C.
1993-01-01
The formal verification of the design and partial requirements for a Processor Interface Unit (PIU) using the Higher Order Logic (HOL) theorem-proving system is described. The processor interface unit is a single-chip subsystem within a fault-tolerant embedded system under development within the Boeing Defense and Space Group. It provides the opportunity to investigate the specification and verification of a real-world subsystem within a commercially-developed fault-tolerant computer. An overview of the PIU verification effort is given. The actual HOL listing from the verification effort are documented in a companion NASA contractor report entitled 'Towards the Formal Verification of the Requirements and Design of a Processor Interface Unit - HOL Listings' including the general-purpose HOL theories and definitions that support the PIU verification as well as tactics used in the proofs.
Graphics Processors in HEP Low-Level Trigger Systems
NASA Astrophysics Data System (ADS)
Ammendola, Roberto; Biagioni, Andrea; Chiozzi, Stefano; Cotta Ramusino, Angelo; Cretaro, Paolo; Di Lorenzo, Stefano; Fantechi, Riccardo; Fiorini, Massimiliano; Frezza, Ottorino; Lamanna, Gianluca; Lo Cicero, Francesca; Lonardo, Alessandro; Martinelli, Michele; Neri, Ilaria; Paolucci, Pier Stanislao; Pastorelli, Elena; Piandani, Roberto; Pontisso, Luca; Rossetti, Davide; Simula, Francesco; Sozzi, Marco; Vicini, Piero
2016-11-01
Usage of Graphics Processing Units (GPUs) in the so called general-purpose computing is emerging as an effective approach in several fields of science, although so far applications have been employing GPUs typically for offline computations. Taking into account the steady performance increase of GPU architectures in terms of computing power and I/O capacity, the real-time applications of these devices can thrive in high-energy physics data acquisition and trigger systems. We will examine the use of online parallel computing on GPUs for the synchronous low-level trigger, focusing on tests performed on the trigger system of the CERN NA62 experiment. To successfully integrate GPUs in such an online environment, latencies of all components need analysing, networking being the most critical. To keep it under control, we envisioned NaNet, an FPGA-based PCIe Network Interface Card (NIC) enabling GPUDirect connection. Furthermore, it is assessed how specific trigger algorithms can be parallelized and thus benefit from a GPU implementation, in terms of increased execution speed. Such improvements are particularly relevant for the foreseen Large Hadron Collider (LHC) luminosity upgrade where highly selective algorithms will be essential to maintain sustainable trigger rates with very high pileup.
Graphics processing unit (GPU)-based computation of heat conduction in thermally anisotropic solids
NASA Astrophysics Data System (ADS)
Nahas, C. A.; Balasubramaniam, Krishnan; Rajagopal, Prabhu
2013-01-01
Numerical modeling of anisotropic media is a computationally intensive task since it brings additional complexity to the field problem in such a way that the physical properties are different in different directions. Largely used in the aerospace industry because of their lightweight nature, composite materials are a very good example of thermally anisotropic media. With advancements in video gaming technology, parallel processors are much cheaper today and accessibility to higher-end graphical processing devices has increased dramatically over the past couple of years. Since these massively parallel GPUs are very good in handling floating point arithmetic, they provide a new platform for engineers and scientists to accelerate their numerical models using commodity hardware. In this paper we implement a parallel finite difference model of thermal diffusion through anisotropic media using the NVIDIA CUDA (Compute Unified device Architecture). We use the NVIDIA GeForce GTX 560 Ti as our primary computing device which consists of 384 CUDA cores clocked at 1645 MHz with a standard desktop pc as the host platform. We compare the results from standard CPU implementation for its accuracy and speed and draw implications for simulation using the GPU paradigm.
Assessment of mammographic film processor performance in a hospital and mobile screening unit.
Murray, J G; Dowsett, D J; Laird, O; Ennis, J T
1992-12-01
In contrast to the majority of mammographic breast screening programmes, film processing at this centre occurs on site in both hospital and mobile trailer units. Initial (1989) quality control (QC) sensitometric tests revealed a large variation in film processor performance in the mobile unit. The clinical significance of these variations was assessed and acceptance limits for processor performance determined. Abnormal mammograms were used as reference material and copied using high definition 35 mm film over a range of exposure settings. The copies were than matched with QC film density variation from the mobile unit. All films were subsequently ranked for spatial and contrast resolution. Optimal values for processing time of 2 min (equivalent to film transit time 3 min and developer time 46 s) and temperature of 36 degrees C were obtained. The widespread anomaly of reporting film transit time as processing time is highlighted. Use of mammogram copies as a means of measuring the influence of film processor variation is advocated. Careful monitoring of the mobile unit film processor performance has produced stable quality comparable with the hospital based unit. The advantages of on site film processing are outlined. The addition of a sensitometric step wedge to all mammography film stock as a means of assessing image quality is recommended.
Minicomputer front end. [Modcomp II/CP as buffer between CDC 6600 and PDP-9 at graphics stations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hudson, J.A.
1976-01-01
Sandia Labs developed an Interactive Graphics System (SIGS) that was established on a CDC 6600 using a communication scheme based on the Control Data Corporation product IGS. As implemented at Sandia, the graphics station consists primarily of a PDP-9 with a Vector General display. A system is being developed which uses a minicomputer (Modcomp II/CP) as the buffer machine for the graphics stations. The original SIGS required a dedicated peripheral processor (PP) on the CDC 6600 to handle the communication with the stations; however, with the Modcomp handling the actual communication protocol, the PP is only assigned as needed tomore » handle data transfer within the CDC 6600 portion of SIGS. The new system will thus support additional graphics stations with less impact on the CDC 6600. This paper discusses the design philosophy of the system, and the hardware and software used to implement it. 1 figure.« less
Methods and apparatus for graphical display and editing of flight plans
NASA Technical Reports Server (NTRS)
Gibbs, Michael J. (Inventor); Adams, Jr., Mike B. (Inventor); Chase, Karl L. (Inventor); Lewis, Daniel E. (Inventor); McCrobie, Daniel E. (Inventor); Omen, Debi Van (Inventor)
2002-01-01
Systems and methods are provided for an integrated graphical user interface which facilitates the display and editing of aircraft flight-plan data. A user (e.g., a pilot) located within the aircraft provides input to a processor through a cursor control device and receives visual feedback via a display produced by a monitor. The display includes various graphical elements associated with the lateral position, vertical position, flight-plan and/or other indicia of the aircraft's operational state as determined from avionics data and/or various data sources. Through use of the cursor control device, the user may modify the flight-plan and/or other such indicia graphically in accordance with feedback provided by the display. In one embodiment, the display includes a lateral view, a vertical profile view, and a hot-map view configured to simplify the display and editing of the aircraft's flight-plan data.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Perumalla, Kalyan S.; Alam, Maksudul
A novel parallel algorithm is presented for generating random scale-free networks using the preferential-attachment model. The algorithm, named cuPPA, is custom-designed for single instruction multiple data (SIMD) style of parallel processing supported by modern processors such as graphical processing units (GPUs). To the best of our knowledge, our algorithm is the first to exploit GPUs, and also the fastest implementation available today, to generate scale free networks using the preferential attachment model. A detailed performance study is presented to understand the scalability and runtime characteristics of the cuPPA algorithm. In one of the best cases, when executed on an NVidiamore » GeForce 1080 GPU, cuPPA generates a scale free network of a billion edges in less than 2 seconds.« less
Leang, Sarom S; Rendell, Alistair P; Gordon, Mark S
2014-03-11
Increasingly, modern computer systems comprise a multicore general-purpose processor augmented with a number of special purpose devices or accelerators connected via an external interface such as a PCI bus. The NVIDIA Kepler Graphical Processing Unit (GPU) and the Intel Phi are two examples of such accelerators. Accelerators offer peak performances that can be well above those of the host processor. How to exploit this heterogeneous environment for legacy application codes is not, however, straightforward. This paper considers how matrix operations in typical quantum chemical calculations can be migrated to the GPU and Phi systems. Double precision general matrix multiply operations are endemic in electronic structure calculations, especially methods that include electron correlation, such as density functional theory, second order perturbation theory, and coupled cluster theory. The use of approaches that automatically determine whether to use the host or an accelerator, based on problem size, is explored, with computations that are occurring on the accelerator and/or the host. For data-transfers over PCI-e, the GPU provides the best overall performance for data sizes up to 4096 MB with consistent upload and download rates between 5-5.6 GB/s and 5.4-6.3 GB/s, respectively. The GPU outperforms the Phi for both square and nonsquare matrix multiplications.
Design and optimization of a portable LQCD Monte Carlo code using OpenACC
NASA Astrophysics Data System (ADS)
Bonati, Claudio; Coscetti, Simone; D'Elia, Massimo; Mesiti, Michele; Negro, Francesco; Calore, Enrico; Schifano, Sebastiano Fabio; Silvi, Giorgio; Tripiccione, Raffaele
The present panorama of HPC architectures is extremely heterogeneous, ranging from traditional multi-core CPU processors, supporting a wide class of applications but delivering moderate computing performance, to many-core Graphics Processor Units (GPUs), exploiting aggressive data-parallelism and delivering higher performances for streaming computing applications. In this scenario, code portability (and performance portability) become necessary for easy maintainability of applications; this is very relevant in scientific computing where code changes are very frequent, making it tedious and prone to error to keep different code versions aligned. In this work, we present the design and optimization of a state-of-the-art production-level LQCD Monte Carlo application, using the directive-based OpenACC programming model. OpenACC abstracts parallel programming to a descriptive level, relieving programmers from specifying how codes should be mapped onto the target architecture. We describe the implementation of a code fully written in OpenAcc, and show that we are able to target several different architectures, including state-of-the-art traditional CPUs and GPUs, with the same code. We also measure performance, evaluating the computing efficiency of our OpenACC code on several architectures, comparing with GPU-specific implementations and showing that a good level of performance-portability can be reached.
NASA Technical Reports Server (NTRS)
Ward, William Douglas (Inventor)
2014-01-01
The different advantageous embodiments provide for identifying gas leakage in a platform. A processor unit identifies a rate of the gas of the substance leaking from a container in a first compartment for a platform. The processor unit also identifies an amount of gas that has leaked from the container at a selected time based on the rate of the gas of the substance leaking from the container and a total time. The processor unit identifies an amount of the gas of the substance present in a number of compartments associated with the first compartment using the amount of gas leaked from the container in the first compartment and a pressure for each compartment in the number of compartments. The processor unit determines whether the amount of gas in at least one of the first compartment and the number of compartments is outside of a desired amount for the gas.
Electrical Prototype Power Processor for the 30-cm Mercury electric propulsion engine
NASA Technical Reports Server (NTRS)
Biess, J. J.; Frye, R. J.
1978-01-01
An Electrical Prototpye Power Processor has been designed to the latest electrical and performance requirements for a flight-type 30-cm ion engine and includes all the necessary power, command, telemetry and control interfaces for a typical electric propulsion subsystem. The power processor was configured into seven separate mechanical modules that would allow subassembly fabrication, test and integration into a complete power processor unit assembly. The conceptual mechanical packaging of the electrical prototype power processor unit demonstrated the relative location of power, high voltage and control electronic components to minimize electrical interactions and to provide adequate thermal control in a vacuum environment. Thermal control was accomplished with a heat pipe simulator attached to the base of the modules.
Designing a Humane Multimedia Interface for the Visually Impaired.
ERIC Educational Resources Information Center
Ghaoui, Claude; Mann, M.; Ng, Eng Huat
2001-01-01
Promotes the provision of interfaces that allow users to access most of the functionality of existing graphical user interfaces (GUI) using speech. Uses the design of a speech control tool that incorporates speech recognition and synthesis into existing packaged software such as Teletext, the Internet, or a word processor. (Contains 22…
Evaluation of an Adaptive Automation Trigger Based on Task Performance, Priority, and Frequency
2013-06-01
with dual Intel ® Xeon ® CPU x5550 processors @ 2.67 GHz each, 12.0 GB RAM, and a 1.5 GB PCIe nVidia Quadro FX 4800 graphics card (Microsoft...Cole Publishing Company . Miller, C. A., & Parasuraman, R. (2007). Designing for flexible interaction between humans and automation: Delegation
Design, Development, and Testing of a Network Frequency Selection Service (NFSS)
1994-02-14
mercial simulation software (Sim++), word processor ( FrameMaker ), editor (Gnu Emacs), software ver- sion control (Revision Control System (RCS)), system...of FrameMaker ".mif" files. When viewed using FrameMaker or a PostScript reader, each page of results appears as two columns by four rows of graphics
A novel VLSI processor architecture for supercomputing arrays
NASA Technical Reports Server (NTRS)
Venkateswaran, N.; Pattabiraman, S.; Devanathan, R.; Ahmed, Ashaf; Venkataraman, S.; Ganesh, N.
1993-01-01
Design of the processor element for general purpose massively parallel supercomputing arrays is highly complex and cost ineffective. To overcome this, the architecture and organization of the functional units of the processor element should be such as to suit the diverse computational structures and simplify mapping of complex communication structures of different classes of algorithms. This demands that the computation and communication structures of different class of algorithms be unified. While unifying the different communication structures is a difficult process, analysis of a wide class of algorithms reveals that their computation structures can be expressed in terms of basic IP,IP,OP,CM,R,SM, and MAA operations. The execution of these operations is unified on the PAcube macro-cell array. Based on this PAcube macro-cell array, we present a novel processor element called the GIPOP processor, which has dedicated functional units to perform the above operations. The architecture and organization of these functional units are such to satisfy the two important criteria mentioned above. The structure of the macro-cell and the unification process has led to a very regular and simpler design of the GIPOP processor. The production cost of the GIPOP processor is drastically reduced as it is designed on high performance mask programmable PAcube arrays.
Monitoring Data-Structure Evolution in Distributed Message-Passing Programs
NASA Technical Reports Server (NTRS)
Sarukkai, Sekhar R.; Beers, Andrew; Woodrow, Thomas S. (Technical Monitor)
1996-01-01
Monitoring the evolution of data structures in parallel and distributed programs, is critical for debugging its semantics and performance. However, the current state-of-art in tracking and presenting data-structure information on parallel and distributed environments is cumbersome and does not scale. In this paper we present a methodology that automatically tracks memory bindings (not the actual contents) of static and dynamic data-structures of message-passing C programs, using PVM. With the help of a number of examples we show that in addition to determining the impact of memory allocation overheads on program performance, graphical views can help in debugging the semantics of program execution. Scalable animations of virtual address bindings of source-level data-structures are used for debugging the semantics of parallel programs across all processors. In conjunction with light-weight core-files, this technique can be used to complement traditional debuggers on single processors. Detailed information (such as data-structure contents), on specific nodes, can be determined using traditional debuggers after the data structure evolution leading to the semantic error is observed graphically.
Near-memory data reorganization engine
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gokhale, Maya; Lloyd, G. Scott
A memory subsystem package is provided that has processing logic for data reorganization within the memory subsystem package. The processing logic is adapted to reorganize data stored within the memory subsystem package. In some embodiments, the memory subsystem package includes memory units, a memory interconnect, and a data reorganization engine ("DRE"). The data reorganization engine includes a stream interconnect and DRE units including a control processor and a load-store unit. The control processor is adapted to execute instructions to control a data reorganization. The load-store unit is adapted to process data move commands received from the control processor via themore » stream interconnect for loading data from a load memory address of a memory unit and storing data to a store memory address of a memory unit.« less
Tactical Operations Analysis Support Facility.
1981-05-01
Punch/Reader 2 DMC-11AR DDCMP Micro Processor 2 DMC-11DA Network Link Line Unit 2 DL-11E Async Serial Line Interface 4 Intel IN-1670 448K Words MOS Memory...86 5.3 VIRTUAL PROCESSORS - VAX-11/750 ........................... 89 5.4 A RELATIONAL DATA MANAGEMENT SYSTEM - ORACLE...Central Processing Unit (CPU) is a 16 bit processor for high-speed, real time applications, and for large multi-user, multi- task, time shared
a Real-Time Computer Music Synthesis System
NASA Astrophysics Data System (ADS)
Lent, Keith Henry
A real time sound synthesis system has been developed at the Computer Music Center of The University of Texas at Austin. This system consists of several stand alone processors that were constructed jointly with White Instruments in Austin. These processors can be programmed as general purpose computers, but are provided with a number of specialized interfaces including: MIDI, 8 bit parallel, high speed serial, 2 channels analog input (18 bit A/Ds, 48kHz sample rate), and 4 channels analog output (18 bit D/As). In addition, a basic music synthesis language (Music56000) has been written in assembly code. On top of this, a symbolic compiler (PatchWork) has been developed to enable algorithms which run in these processors to be created graphically. And finally, a number of efficient time domain numerical models have been developed to enable the construction, simulation, control, and synthesis of many musical acoustics systems in real time on these processors. Specifically, assembly language models for cylindrical and conical horn sections, dissipative losses, tone holes, bells, and a number of linear and nonlinear boundary conditions have been developed.
Adaptive MCMC in Bayesian phylogenetics: an application to analyzing partitioned data in BEAST.
Baele, Guy; Lemey, Philippe; Rambaut, Andrew; Suchard, Marc A
2017-06-15
Advances in sequencing technology continue to deliver increasingly large molecular sequence datasets that are often heavily partitioned in order to accurately model the underlying evolutionary processes. In phylogenetic analyses, partitioning strategies involve estimating conditionally independent models of molecular evolution for different genes and different positions within those genes, requiring a large number of evolutionary parameters that have to be estimated, leading to an increased computational burden for such analyses. The past two decades have also seen the rise of multi-core processors, both in the central processing unit (CPU) and Graphics processing unit processor markets, enabling massively parallel computations that are not yet fully exploited by many software packages for multipartite analyses. We here propose a Markov chain Monte Carlo (MCMC) approach using an adaptive multivariate transition kernel to estimate in parallel a large number of parameters, split across partitioned data, by exploiting multi-core processing. Across several real-world examples, we demonstrate that our approach enables the estimation of these multipartite parameters more efficiently than standard approaches that typically use a mixture of univariate transition kernels. In one case, when estimating the relative rate parameter of the non-coding partition in a heterochronous dataset, MCMC integration efficiency improves by > 14-fold. Our implementation is part of the BEAST code base, a widely used open source software package to perform Bayesian phylogenetic inference. guy.baele@kuleuven.be. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Simulating Synchronous Processors
1988-06-01
34f Fvtvru m LABORATORY FOR INMASSACHUSETTSFCOMPUTER SCIENCE TECHNOLOGY MIT/LCS/TM-359 SIMULATING SYNCHRONOUS PROCESSORS Jennifer Lundelius Welch...PROJECT TASK WORK UNIT Arlington, VA 22217 ELEMENT NO. NO. NO ACCESSION NO. 11. TITLE Include Security Classification) Simulating Synchronous Processors...necessary and identify by block number) In this paper we show how a distributed system with synchronous processors and asynchro- nous message delays can
NASA Astrophysics Data System (ADS)
Pruhs, Kirk
A particularly important emergent technology is heterogeneous processors (or cores), which many computer architects believe will be the dominant architectural design in the future. The main advantage of a heterogeneous architecture, relative to an architecture of identical processors, is that it allows for the inclusion of processors whose design is specialized for particular types of jobs, and for jobs to be assigned to a processor best suited for that job. Most notably, it is envisioned that these heterogeneous architectures will consist of a small number of high-power high-performance processors for critical jobs, and a larger number of lower-power lower-performance processors for less critical jobs. Naturally, the lower-power processors would be more energy efficient in terms of the computation performed per unit of energy expended, and would generate less heat per unit of computation. For a given area and power budget, heterogeneous designs can give significantly better performance for standard workloads. Moreover, even processors that were designed to be homogeneous, are increasingly likely to be heterogeneous at run time: the dominant underlying cause is the increasing variability in the fabrication process as the feature size is scaled down (although run time faults will also play a role). Since manufacturing yields would be unacceptably low if every processor/core was required to be perfect, and since there would be significant performance loss from derating the entire chip to the functioning of the least functional processor (which is what would be required in order to attain processor homogeneity), some processor heterogeneity seems inevitable in chips with many processors/cores.
Parallel hyperbolic PDE simulation on clusters: Cell versus GPU
NASA Astrophysics Data System (ADS)
Rostrup, Scott; De Sterck, Hans
2010-12-01
Increasingly, high-performance computing is looking towards data-parallel computational devices to enhance computational performance. Two technologies that have received significant attention are IBM's Cell Processor and NVIDIA's CUDA programming model for graphics processing unit (GPU) computing. In this paper we investigate the acceleration of parallel hyperbolic partial differential equation simulation on structured grids with explicit time integration on clusters with Cell and GPU backends. The message passing interface (MPI) is used for communication between nodes at the coarsest level of parallelism. Optimizations of the simulation code at the several finer levels of parallelism that the data-parallel devices provide are described in terms of data layout, data flow and data-parallel instructions. Optimized Cell and GPU performance are compared with reference code performance on a single x86 central processing unit (CPU) core in single and double precision. We further compare the CPU, Cell and GPU platforms on a chip-to-chip basis, and compare performance on single cluster nodes with two CPUs, two Cell processors or two GPUs in a shared memory configuration (without MPI). We finally compare performance on clusters with 32 CPUs, 32 Cell processors, and 32 GPUs using MPI. Our GPU cluster results use NVIDIA Tesla GPUs with GT200 architecture, but some preliminary results on recently introduced NVIDIA GPUs with the next-generation Fermi architecture are also included. This paper provides computational scientists and engineers who are considering porting their codes to accelerator environments with insight into how structured grid based explicit algorithms can be optimized for clusters with Cell and GPU accelerators. It also provides insight into the speed-up that may be gained on current and future accelerator architectures for this class of applications. Program summaryProgram title: SWsolver Catalogue identifier: AEGY_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEGY_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GPL v3 No. of lines in distributed program, including test data, etc.: 59 168 No. of bytes in distributed program, including test data, etc.: 453 409 Distribution format: tar.gz Programming language: C, CUDA Computer: Parallel Computing Clusters. Individual compute nodes may consist of x86 CPU, Cell processor, or x86 CPU with attached NVIDIA GPU accelerator. Operating system: Linux Has the code been vectorised or parallelized?: Yes. Tested on 1-128 x86 CPU cores, 1-32 Cell Processors, and 1-32 NVIDIA GPUs. RAM: Tested on Problems requiring up to 4 GB per compute node. Classification: 12 External routines: MPI, CUDA, IBM Cell SDK Nature of problem: MPI-parallel simulation of Shallow Water equations using high-resolution 2D hyperbolic equation solver on regular Cartesian grids for x86 CPU, Cell Processor, and NVIDIA GPU using CUDA. Solution method: SWsolver provides 3 implementations of a high-resolution 2D Shallow Water equation solver on regular Cartesian grids, for CPU, Cell Processor, and NVIDIA GPU. Each implementation uses MPI to divide work across a parallel computing cluster. Additional comments: Sub-program numdiff is used for the test run.
Augmented Reality Comes to Physics
ERIC Educational Resources Information Center
Buesing, Mark; Cook, Michael
2013-01-01
Augmented reality (AR) is a technology used on computing devices where processor-generated graphics are rendered over real objects to enhance the sensory experience in real time. In other words, what you are really seeing is augmented by the computer. Many AR games already exist for systems such as Kinect and Nintendo 3DS and mobile apps, such as…
Yes! You Can Build a Web Site.
ERIC Educational Resources Information Center
Holzberg, Carol
2001-01-01
With specially formatted templates or simple Web page editors, teachers can lay out text and graphics in a work space resembling the interface of a word processor. Several options are presented to help teachers build Web sites. ree templates include Class Homepage Builder, AppliTools: HomePage, MySchoolOnline.com, and BigChalk.com. Web design…
Bin-Hash Indexing: A Parallel Method for Fast Query Processing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bethel, Edward W; Gosink, Luke J.; Wu, Kesheng
2008-06-27
This paper presents a new parallel indexing data structure for answering queries. The index, called Bin-Hash, offers extremely high levels of concurrency, and is therefore well-suited for the emerging commodity of parallel processors, such as multi-cores, cell processors, and general purpose graphics processing units (GPU). The Bin-Hash approach first bins the base data, and then partitions and separately stores the values in each bin as a perfect spatial hash table. To answer a query, we first determine whether or not a record satisfies the query conditions based on the bin boundaries. For the bins with records that can not bemore » resolved, we examine the spatial hash tables. The procedures for examining the bin numbers and the spatial hash tables offer the maximum possible level of concurrency; all records are able to be evaluated by our procedure independently in parallel. Additionally, our Bin-Hash procedures access much smaller amounts of data than similar parallel methods, such as the projection index. This smaller data footprint is critical for certain parallel processors, like GPUs, where memory resources are limited. To demonstrate the effectiveness of Bin-Hash, we implement it on a GPU using the data-parallel programming language CUDA. The concurrency offered by the Bin-Hash index allows us to fully utilize the GPU's massive parallelism in our work; over 12,000 records can be simultaneously evaluated at any one time. We show that our new query processing method is an order of magnitude faster than current state-of-the-art CPU-based indexing technologies. Additionally, we compare our performance to existing GPU-based projection index strategies.« less
NASA Astrophysics Data System (ADS)
Tanikawa, Ataru; Yoshikawa, Kohji; Okamoto, Takashi; Nitadori, Keigo
2012-02-01
We present a high-performance N-body code for self-gravitating collisional systems accelerated with the aid of a new SIMD instruction set extension of the x86 architecture: Advanced Vector eXtensions (AVX), an enhanced version of the Streaming SIMD Extensions (SSE). With one processor core of Intel Core i7-2600 processor (8 MB cache and 3.40 GHz) based on Sandy Bridge micro-architecture, we implemented a fourth-order Hermite scheme with individual timestep scheme ( Makino and Aarseth, 1992), and achieved the performance of ˜20 giga floating point number operations per second (GFLOPS) for double-precision accuracy, which is two times and five times higher than that of the previously developed code implemented with the SSE instructions ( Nitadori et al., 2006b), and that of a code implemented without any explicit use of SIMD instructions with the same processor core, respectively. We have parallelized the code by using so-called NINJA scheme ( Nitadori et al., 2006a), and achieved ˜90 GFLOPS for a system containing more than N = 8192 particles with 8 MPI processes on four cores. We expect to achieve about 10 tera FLOPS (TFLOPS) for a self-gravitating collisional system with N ˜ 10 5 on massively parallel systems with at most 800 cores with Sandy Bridge micro-architecture. This performance will be comparable to that of Graphic Processing Unit (GPU) cluster systems, such as the one with about 200 Tesla C1070 GPUs ( Spurzem et al., 2010). This paper offers an alternative to collisional N-body simulations with GRAPEs and GPUs.
Implementing direct, spatially isolated problems on transputer networks
NASA Technical Reports Server (NTRS)
Ellis, Graham K.
1988-01-01
Parametric studies were performed on transputer networks of up to 40 processors to determine how to implement and maximize the performance of the solution of problems where no processor-to-processor data transfer is required for the problem solution (spatially isolated). Two types of problems are investigated a computationally intensive problem where the solution required the transmission of 160 bytes of data through the parallel network, and a communication intensive example that required the transmission of 3 Mbytes of data through the network. This data consists of solutions being sent back to the host processor and not intermediate results for another processor to work on. Studies were performed on both integer and floating-point transputers. The latter features an on-chip floating-point math unit and offers approximately an order of magnitude performance increase over the integer transputer on real valued computations. The results indicate that a minimum amount of work is required on each node per communication to achieve high network speedups (efficiencies). The floating-point processor requires approximately an order of magnitude more work per communication than the integer processor because of the floating-point unit's increased computing capacity.
MATCHED FILTER COMPUTATION ON FPGA, CELL, AND GPU
DOE Office of Scientific and Technical Information (OSTI.GOV)
BAKER, ZACHARY K.; GOKHALE, MAYA B.; TRIPP, JUSTIN L.
2007-01-08
The matched filter is an important kernel in the processing of hyperspectral data. The filter enables researchers to sift useful data from instruments that span large frequency bands. In this work, they evaluate the performance of a matched filter algorithm implementation on accelerated co-processor (XD1000), the IBM Cell microprocessor, and the NVIDIA GeForce 6900 GTX GPU graphics card. They provide extensive discussion of the challenges and opportunities afforded by each platform. In particular, they explore the problems of partitioning the filter most efficiently between the host CPU and the co-processor. Using their results, they derive several performance metrics that providemore » the optimal solution for a variety of application situations.« less
Benchmarking NWP Kernels on Multi- and Many-core Processors
NASA Astrophysics Data System (ADS)
Michalakes, J.; Vachharajani, M.
2008-12-01
Increased computing power for weather, climate, and atmospheric science has provided direct benefits for defense, agriculture, the economy, the environment, and public welfare and convenience. Today, very large clusters with many thousands of processors are allowing scientists to move forward with simulations of unprecedented size. But time-critical applications such as real-time forecasting or climate prediction need strong scaling: faster nodes and processors, not more of them. Moreover, the need for good cost- performance has never been greater, both in terms of performance per watt and per dollar. For these reasons, the new generations of multi- and many-core processors being mass produced for commercial IT and "graphical computing" (video games) are being scrutinized for their ability to exploit the abundant fine- grain parallelism in atmospheric models. We present results of our work to date identifying key computational kernels within the dynamics and physics of a large community NWP model, the Weather Research and Forecast (WRF) model. We benchmark and optimize these kernels on several different multi- and many-core processors. The goals are to (1) characterize and model performance of the kernels in terms of computational intensity, data parallelism, memory bandwidth pressure, memory footprint, etc. (2) enumerate and classify effective strategies for coding and optimizing for these new processors, (3) assess difficulties and opportunities for tool or higher-level language support, and (4) establish a continuing set of kernel benchmarks that can be used to measure and compare effectiveness of current and future designs of multi- and many-core processors for weather and climate applications.
Charpentier, Ronald R.; Klett, T.R.; Obuch, R.C.; Brewton, J.D.
1996-01-01
This CD-ROM contains files in support of the 1995 USGS National assessment of United States oil and gas resources (DDS-30), which was published separately and summarizes the results of a 3-year study of the oil and gas resources of the onshore and state waters of the United States. The study describes about 560 oil and gas plays in the United States; confirmed and hypothetical, conventional and unconventional. A parallel study of the Federal offshore is being conducted by the U.S. Minerals Management Service. This CD-ROM contains files in multiple formats, so that almost any computer user can import them into word processors and spreadsheets. The tabular data include some tables not released in DDS-30. No proprietary data are released on this CD-ROM, but some tables of summary statistics from the proprietary files are provided. The complete text of DDS-30 is also available, as well as many figures. Also included are some of the programs used in the assessment, in source code and with supporting documentation. A companion CD-ROM (DDS-35) includes the map data and the same text data, but none of the tabular data or assessment programs.
PARALLELISATION OF THE MODEL-BASED ITERATIVE RECONSTRUCTION ALGORITHM DIRA.
Örtenberg, A; Magnusson, M; Sandborg, M; Alm Carlsson, G; Malusek, A
2016-06-01
New paradigms for parallel programming have been devised to simplify software development on multi-core processors and many-core graphical processing units (GPU). Despite their obvious benefits, the parallelisation of existing computer programs is not an easy task. In this work, the use of the Open Multiprocessing (OpenMP) and Open Computing Language (OpenCL) frameworks is considered for the parallelisation of the model-based iterative reconstruction algorithm DIRA with the aim to significantly shorten the code's execution time. Selected routines were parallelised using OpenMP and OpenCL libraries; some routines were converted from MATLAB to C and optimised. Parallelisation of the code with the OpenMP was easy and resulted in an overall speedup of 15 on a 16-core computer. Parallelisation with OpenCL was more difficult owing to differences between the central processing unit and GPU architectures. The resulting speedup was substantially lower than the theoretical peak performance of the GPU; the cause was explained. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
GPU-Accelerated Molecular Modeling Coming Of Age
Stone, John E.; Hardy, David J.; Ufimtsev, Ivan S.
2010-01-01
Graphics processing units (GPUs) have traditionally been used in molecular modeling solely for visualization of molecular structures and animation of trajectories resulting from molecular dynamics simulations. Modern GPUs have evolved into fully programmable, massively parallel co-processors that can now be exploited to accelerate many scientific computations, typically providing about one order of magnitude speedup over CPU code and in special cases providing speedups of two orders of magnitude. This paper surveys the development of molecular modeling algorithms that leverage GPU computing, the advances already made and remaining issues to be resolved, and the continuing evolution of GPU technology that promises to become even more useful to molecular modeling. Hardware acceleration with commodity GPUs is expected to benefit the overall computational biology community by bringing teraflops performance to desktop workstations and in some cases potentially changing what were formerly batch-mode computational jobs into interactive tasks. PMID:20675161
Laser Speckle Imaging of Cerebral Blood Flow
NASA Astrophysics Data System (ADS)
Luo, Qingming; Jiang, Chao; Li, Pengcheng; Cheng, Haiying; Wang, Zhen; Wang, Zheng; Tuchin, Valery V.
Monitoring the spatio-temporal characteristics of cerebral blood flow (CBF) is crucial for studying the normal and pathophysiologic conditions of brain metabolism. By illuminating the cortex with laser light and imaging the resulting speckle pattern, relative CBF images with tens of microns spatial and millisecond temporal resolution can be obtained. In this chapter, a laser speckle imaging (LSI) method for monitoring dynamic, high-resolution CBF is introduced. To improve the spatial resolution of current LSI, a modified LSI method is proposed. To accelerate the speed of data processing, three LSI data processing frameworks based on graphics processing unit (GPU), digital signal processor (DSP), and field-programmable gate array (FPGA) are also presented. Applications for detecting the changes in local CBF induced by sensory stimulation and thermal stimulation, the influence of a chemical agent on CBF, and the influence of acute hyperglycemia following cortical spreading depression on CBF are given.
GPU-accelerated molecular modeling coming of age.
Stone, John E; Hardy, David J; Ufimtsev, Ivan S; Schulten, Klaus
2010-09-01
Graphics processing units (GPUs) have traditionally been used in molecular modeling solely for visualization of molecular structures and animation of trajectories resulting from molecular dynamics simulations. Modern GPUs have evolved into fully programmable, massively parallel co-processors that can now be exploited to accelerate many scientific computations, typically providing about one order of magnitude speedup over CPU code and in special cases providing speedups of two orders of magnitude. This paper surveys the development of molecular modeling algorithms that leverage GPU computing, the advances already made and remaining issues to be resolved, and the continuing evolution of GPU technology that promises to become even more useful to molecular modeling. Hardware acceleration with commodity GPUs is expected to benefit the overall computational biology community by bringing teraflops performance to desktop workstations and in some cases potentially changing what were formerly batch-mode computational jobs into interactive tasks. (c) 2010 Elsevier Inc. All rights reserved.
Fang, Wai-Chi; Huang, Kuan-Ju; Chou, Chia-Ching; Chang, Jui-Chung; Cauwenberghs, Gert; Jung, Tzyy-Ping
2014-01-01
This is a proposal for an efficient very-large-scale integration (VLSI) design, 16-channel on-line recursive independent component analysis (ORICA) processor ASIC for real-time EEG system, implemented with TSMC 40 nm CMOS technology. ORICA is appropriate to be used in real-time EEG system to separate artifacts because of its highly efficient and real-time process features. The proposed ORICA processor is composed of an ORICA processing unit and a singular value decomposition (SVD) processing unit. Compared with previous work [1], this proposed ORICA processor has enhanced effectiveness and reduced hardware complexity by utilizing a deeper pipeline architecture, shared arithmetic processing unit, and shared registers. The 16-channel random signals which contain 8-channel super-Gaussian and 8-channel sub-Gaussian components are used to analyze the dependence of the source components, and the average correlation coefficient is 0.95452 between the original source signals and extracted ORICA signals. Finally, the proposed ORICA processor ASIC is implemented with TSMC 40 nm CMOS technology, and it consumes 15.72 mW at 100 MHz operating frequency.
1979-12-01
intelligent graphics terminals in real-tim processing S (e) 5-1 to 5-9 MIel ita|ger The application of high-speed processors to propagation e.piriamnts...interface SACLANTCEN CP-25 5-2 M IM M STEIGER: Intelligent graphics terminals The less desirable features of the terminal are listed below. reiatively small...hours. Dismantling of the equipment is normally performed in less than one-half hour and often while waiting to clear customs. Transportation of the
NMF-mGPU: non-negative matrix factorization on multi-GPU systems.
Mejía-Roa, Edgardo; Tabas-Madrid, Daniel; Setoain, Javier; García, Carlos; Tirado, Francisco; Pascual-Montano, Alberto
2015-02-13
In the last few years, the Non-negative Matrix Factorization ( NMF ) technique has gained a great interest among the Bioinformatics community, since it is able to extract interpretable parts from high-dimensional datasets. However, the computing time required to process large data matrices may become impractical, even for a parallel application running on a multiprocessors cluster. In this paper, we present NMF-mGPU, an efficient and easy-to-use implementation of the NMF algorithm that takes advantage of the high computing performance delivered by Graphics-Processing Units ( GPUs ). Driven by the ever-growing demands from the video-games industry, graphics cards usually provided in PCs and laptops have evolved from simple graphics-drawing platforms into high-performance programmable systems that can be used as coprocessors for linear-algebra operations. However, these devices may have a limited amount of on-board memory, which is not considered by other NMF implementations on GPU. NMF-mGPU is based on CUDA ( Compute Unified Device Architecture ), the NVIDIA's framework for GPU computing. On devices with low memory available, large input matrices are blockwise transferred from the system's main memory to the GPU's memory, and processed accordingly. In addition, NMF-mGPU has been explicitly optimized for the different CUDA architectures. Finally, platforms with multiple GPUs can be synchronized through MPI ( Message Passing Interface ). In a four-GPU system, this implementation is about 120 times faster than a single conventional processor, and more than four times faster than a single GPU device (i.e., a super-linear speedup). Applications of GPUs in Bioinformatics are getting more and more attention due to their outstanding performance when compared to traditional processors. In addition, their relatively low price represents a highly cost-effective alternative to conventional clusters. In life sciences, this results in an excellent opportunity to facilitate the daily work of bioinformaticians that are trying to extract biological meaning out of hundreds of gigabytes of experimental information. NMF-mGPU can be used "out of the box" by researchers with little or no expertise in GPU programming in a variety of platforms, such as PCs, laptops, or high-end GPU clusters. NMF-mGPU is freely available at https://github.com/bioinfo-cnb/bionmf-gpu .
NASA Technical Reports Server (NTRS)
Szatkowski, G. P.
1983-01-01
A computer simulation system has been developed for the Space Shuttle's advanced Centaur liquid fuel booster rocket, in order to conduct systems safety verification and flight operations training. This simulation utility is designed to analyze functional system behavior by integrating control avionics with mechanical and fluid elements, and is able to emulate any system operation, from simple relay logic to complex VLSI components, with wire-by-wire detail. A novel graphics data entry system offers a pseudo-wire wrap data base that can be easily updated. Visual subsystem operations can be selected and displayed in color on a six-monitor graphics processor. System timing and fault verification analyses are conducted by injecting component fault modes and min/max timing delays, and then observing system operation through a red line monitor.
A light hydrocarbon fuel processor producing high-purity hydrogen
NASA Astrophysics Data System (ADS)
Löffler, Daniel G.; Taylor, Kyle; Mason, Dylan
This paper discusses the design process and presents performance data for a dual fuel (natural gas and LPG) fuel processor for PEM fuel cells delivering between 2 and 8 kW electric power in stationary applications. The fuel processor resulted from a series of design compromises made to address different design constraints. First, the product quality was selected; then, the unit operations needed to achieve that product quality were chosen from the pool of available technologies. Next, the specific equipment needed for each unit operation was selected. Finally, the unit operations were thermally integrated to achieve high thermal efficiency. Early in the design process, it was decided that the fuel processor would deliver high-purity hydrogen. Hydrogen can be separated from other gases by pressure-driven processes based on either selective adsorption or permeation. The pressure requirement made steam reforming (SR) the preferred reforming technology because it does not require compression of combustion air; therefore, steam reforming is more efficient in a high-pressure fuel processor than alternative technologies like autothermal reforming (ATR) or partial oxidation (POX), where the combustion occurs at the pressure of the process stream. A low-temperature pre-reformer reactor is needed upstream of a steam reformer to suppress coke formation; yet, low temperatures facilitate the formation of metal sulfides that deactivate the catalyst. For this reason, a desulfurization unit is needed upstream of the pre-reformer. Hydrogen separation was implemented using a palladium alloy membrane. Packed beds were chosen for the pre-reformer and reformer reactors primarily because of their low cost, relatively simple operation and low maintenance. Commercial, off-the-shelf balance of plant (BOP) components (pumps, valves, and heat exchangers) were used to integrate the unit operations. The fuel processor delivers up to 100 slm hydrogen >99.9% pure with <1 ppm CO, <3 ppm CO 2. The thermal efficiency is better than 67% operating at full load. This fuel processor has been integrated with a 5-kW fuel cell producing electricity and hot water.
High Resolution Mass Spectrometry of Polyfluorinated Polyether-Based Formulation.
Dimzon, Ian Ken; Trier, Xenia; Frömel, Tobias; Helmus, Rick; Knepper, Thomas P; de Voogt, Pim
2016-02-01
High resolution mass spectrometry (HRMS) was successfully applied to elucidate the structure of a polyfluorinated polyether (PFPE)-based formulation. The mass spectrum generated from direct injection into the MS was examined by identifying the different repeating units manually and with the aid of an instrument data processor. Highly accurate mass spectral data enabled the calculation of higher-order mass defects. The different plots of MW and the nth-order mass defects (up to n = 3) could aid in assessing the structure of the different repeating units and estimating their absolute and relative number per molecule. The three major repeating units were -C2H4O-, -C2F4O-, and -CF2O-. Tandem MS was used to identify the end groups that appeared to be phosphates, as well as the possible distribution of the repeating units. Reversed-phase HPLC separated of the polymer molecules on the basis of number of nonpolar repeating units. The elucidated structure resembles the structure in the published manufacturer technical data. This analytical approach to the characterization of a PFPE-based formulation can serve as a guide in analyzing not just other PFPE-based formulations but also other fluorinated and non-fluorinated polymers. The information from MS is essential in studying the physico-chemical properties of PFPEs and can help in assessing the risks they pose to the environment and to human health. Graphical Abstract ᅟ.
SuperMemo; XIA LI BA REN (Macintosh Version 1.0).
ERIC Educational Resources Information Center
Wharton, Charlotte; Bourgerie, Dana S.
1994-01-01
Describes "SuperMemo," a memorization tool that uses an automated flashcard scheme that can include sound and graphics in the database of study items. Based on the learner's performance, "SuperMemo" schedules items to appear for review. Xia Li Ba Ren ("common person" in Chinese) is the name of a Chinese word processor that runs with a standard…
Investigation of IGES for CAD/CAE data transfer
NASA Technical Reports Server (NTRS)
Zobrist, George W.
1989-01-01
In a CAD/CAE facility there is always the possibility that one may want to transfer the design graphics database from the native system to a non-native system. This may occur because of dissimilar systems within an organization or a new CAD/CAE system is to be purchased. The Initial Graphics Exchange Specification (IGES) was developed in an attempt to solve this scenario. IGES is a neutral database format into which the CAD/CAE native database format can be translated to and from. Translating the native design database format to IGES requires a pre-processor and transling from IGES to the native database format requires a post-processor. IGES is an artifice to represent CAD/CAE product data in a neutral environment to allow interfacing applications, archive the database, interchange of product data between dissimilar CAD/CAE systems, and other applications. The intent here is to present test data on translating design product data from a CAD/CAE system to itself and to translate data initially prepared in IGES format to various native design formats. This information can be utilized in planning potential procurement and developing a design discipline within the CAD/CAE community.
Zhmurov, A; Dima, R I; Kholodov, Y; Barsegov, V
2010-11-01
Theoretical exploration of fundamental biological processes involving the forced unraveling of multimeric proteins, the sliding motion in protein fibers and the mechanical deformation of biomolecular assemblies under physiological force loads is challenging even for distributed computing systems. Using a C(α)-based coarse-grained self organized polymer (SOP) model, we implemented the Langevin simulations of proteins on graphics processing units (SOP-GPU program). We assessed the computational performance of an end-to-end application of the program, where all the steps of the algorithm are running on a GPU, by profiling the simulation time and memory usage for a number of test systems. The ∼90-fold computational speedup on a GPU, compared with an optimized central processing unit program, enabled us to follow the dynamics in the centisecond timescale, and to obtain the force-extension profiles using experimental pulling speeds (v(f) = 1-10 μm/s) employed in atomic force microscopy and in optical tweezers-based dynamic force spectroscopy. We found that the mechanical molecular response critically depends on the conditions of force application and that the kinetics and pathways for unfolding change drastically even upon a modest 10-fold increase in v(f). This implies that, to resolve accurately the free energy landscape and to relate the results of single-molecule experiments in vitro and in silico, molecular simulations should be carried out under the experimentally relevant force loads. This can be accomplished in reasonable wall-clock time for biomolecules of size as large as 10(5) residues using the SOP-GPU package. © 2010 Wiley-Liss, Inc.
Electric prototype power processor for a 30cm ion thruster
NASA Technical Reports Server (NTRS)
Biess, J. J.; Inouye, L. Y.; Schoenfeld, A. D.
1977-01-01
An electrical prototype power processor unit was designed, fabricated and tested with a 30 cm mercury ion engine for primary space propulsion. The power processor unit used the thyristor series resonant inverter as the basic power stage for the high power beam and discharge supplies. A transistorized series resonant inverter processed the remaining power for the low power outputs. The power processor included a digital interface unit to process all input commands and internal telemetry signals so that electric propulsion systems could be operated with a central computer system. The electrical prototype unit included design improvement in the power components such as thyristors, transistors, filters and resonant capacitors, and power transformers and inductors in order to reduce component weight, to minimize losses, and to control the component temperature rise. A design analysis for the electrical prototype is also presented on the component weight, losses, part count and reliability estimate. The electrical prototype was tested in a thermal vacuum environment. Integration tests were performed with a 30 cm ion engine and demonstrated operational compatibility. Electromagnetic interference data was also recorded on the design to provide information for spacecraft integration.
Data processing techniques used with MST radars: A review
NASA Technical Reports Server (NTRS)
Rastogi, P. K.
1983-01-01
The data processing methods used in high power radar probing of the middle atmosphere are examined. The radar acts as a spatial filter on the small scale refractivity fluctuations in the medium. The characteristics of the received signals are related to the statistical properties of these fluctuations. A functional outline of the components of a radar system is given. Most computation intensive tasks are carried out by the processor. The processor computes a statistical function of the received signals, simultaneously for a large number of ranges. The slow fading of atmospheric signals is used to reduce the data input rate to the processor by coherent integration. The inherent range resolution of the radar experiments can be improved significant with the use of pseudonoise phase codes to modulate the transmitted pulses and a corresponding decoding operation on the received signals. Commutability of the decoding and coherent integration operations is used to obtain a significant reduction in computations. The limitations of the processors are outlined. At the next level of data reduction, the measured function is parameterized by a few spectral moments that can be related to physical processes in the medium. The problems encountered in estimating the spectral moments in the presence of strong ground clutter, external interference, and noise are discussed. The graphical and statistical analysis of the inferred parameters are outlined. The requirements for special purpose processors for MST radars are discussed.
Communications systems and methods for subsea processors
Gutierrez, Jose; Pereira, Luis
2016-04-26
A subsea processor may be located near the seabed of a drilling site and used to coordinate operations of underwater drilling components. The subsea processor may be enclosed in a single interchangeable unit that fits a receptor on an underwater drilling component, such as a blow-out preventer (BOP). The subsea processor may issue commands to control the BOP and receive measurements from sensors located throughout the BOP. A shared communications bus may interconnect the subsea processor and underwater components and the subsea processor and a surface or onshore network. The shared communications bus may be operated according to a time division multiple access (TDMA) scheme.
Development of a Novel, Two-Processor Architecture for a Small UAV Autopilot System,
2006-07-26
is, and the control laws the user implements to control it. The flight control system board will contain the processor selected for this system...Unit (IMU). The IMU contains solid-state gyros and accelerometers and uses these to determine the attitude of the UAV within the three dimensions of...multiple-UAV swarming for combat support operations. The mission processor board will contain the processor selected to execute the mission
NASA Technical Reports Server (NTRS)
Lund, D.
1998-01-01
This report presents a description of the tests performed, and the test data, for the AI METSAT Signal Processor Assembly P/N 1331670-2, S/N F05. The assembly was tested in accordance with AE-26754, "METSAT Signal Processor Scan Drive and Integration Procedure." The objective is to demonstrate functionality of the signal processor prior to instrument integration.
NASA Technical Reports Server (NTRS)
Lund, D.
1998-01-01
This report presents a description of tests performed, and the test data, for the A1 METSAT Signal Processor Assembly PN: 1331679-2, S/N F03. This assembly was tested in accordance with AE-26754, "METSAT Signal Processor Scan Drive Test and Integration Procedure." The objective is to demonstrate functionality of the signal processor prior to instrument integration.
Interactive digital signal processor
NASA Technical Reports Server (NTRS)
Mish, W. H.; Wenger, R. M.; Behannon, K. W.; Byrnes, J. B.
1982-01-01
The Interactive Digital Signal Processor (IDSP) is examined. It consists of a set of time series analysis Operators each of which operates on an input file to produce an output file. The operators can be executed in any order that makes sense and recursively, if desired. The operators are the various algorithms used in digital time series analysis work. User written operators can be easily interfaced to the sysatem. The system can be operated both interactively and in batch mode. In IDSP a file can consist of up to n (currently n=8) simultaneous time series. IDSP currently includes over thirty standard operators that range from Fourier transform operations, design and application of digital filters, eigenvalue analysis, to operators that provide graphical output, allow batch operation, editing and display information.
Reproducibility of Mammography Units, Film Processing and Quality Imaging
NASA Astrophysics Data System (ADS)
Gaona, Enrique
2003-09-01
The purpose of this study was to carry out an exploratory survey of the problems of quality control in mammography and processors units as a diagnosis of the current situation of mammography facilities. Measurements of reproducibility, optical density, optical difference and gamma index are included. Breast cancer is the most frequently diagnosed cancer and is the second leading cause of cancer death among women in the Mexican Republic. Mammography is a radiographic examination specially designed for detecting breast pathology. We found that the problems of reproducibility of AEC are smaller than the problems of processors units because almost all processors fall outside of the acceptable variation limits and they can affect the mammography quality image and the dose to breast. Only four mammography units agree with the minimum score established by ACR and FDA for the phantom image.
High performance computing applications in neurobiological research
NASA Technical Reports Server (NTRS)
Ross, Muriel D.; Cheng, Rei; Doshay, David G.; Linton, Samuel W.; Montgomery, Kevin; Parnas, Bruce R.
1994-01-01
The human nervous system is a massively parallel processor of information. The vast numbers of neurons, synapses and circuits is daunting to those seeking to understand the neural basis of consciousness and intellect. Pervading obstacles are lack of knowledge of the detailed, three-dimensional (3-D) organization of even a simple neural system and the paucity of large scale, biologically relevant computer simulations. We use high performance graphics workstations and supercomputers to study the 3-D organization of gravity sensors as a prototype architecture foreshadowing more complex systems. Scaled-down simulations run on a Silicon Graphics workstation and scale-up, three-dimensional versions run on the Cray Y-MP and CM5 supercomputers.
Algorithmic commonalities in the parallel environment
NASA Technical Reports Server (NTRS)
Mcanulty, Michael A.; Wainer, Michael S.
1987-01-01
The ultimate aim of this project was to analyze procedures from substantially different application areas to discover what is either common or peculiar in the process of conversion to the Massively Parallel Processor (MPP). Three areas were identified: molecular dynamic simulation, production systems (rule systems), and various graphics and vision algorithms. To date, only selected graphics procedures have been investigated. They are the most readily available, and produce the most visible results. These include simple polygon patch rendering, raycasting against a constructive solid geometric model, and stochastic or fractal based textured surface algorithms. Only the simplest of conversion strategies, mapping a major loop to the array, has been investigated so far. It is not entirely satisfactory.
The application of NASCAD as a NASTRAN pre- and post-processor
NASA Technical Reports Server (NTRS)
Peltzman, Alan N.
1987-01-01
The NASA Computer Aided Design (NASCAD) graphics package provides an effective way to interactively create, view, and refine analytic data models. NASCAD's macro language, combined with its powerful 3-D geometric data base allows the user important flexibility and speed in constructing his model. This flexibility has the added benefit of enabling the user to keep pace with any new NASTRAN developments. NASCAD allows models to be conveniently viewed and plotted to best advantage in both pre- and post-process phases of development, providing useful visual feedback to the analysis process. NASCAD, used as a graphics compliment to NASTRAN, can play a valuable role in the process of finite element modeling.
NASA Technical Reports Server (NTRS)
Lund, D.
1998-01-01
This report presents a description of the tests performed, and the test data, for the A1 METSAT Signal Processor Assembly PN: 1331679-2, S/N F04. The assembly was tested in accordance with AE-26754, "METSAT Signal Processor Scan Drive Test and Integration Procedure." The objective is to demonstrate functionality of the signal processor prior to instrument integration.
Implementation of kernels on the Maestro processor
NASA Astrophysics Data System (ADS)
Suh, Jinwoo; Kang, D. I. D.; Crago, S. P.
Currently, most microprocessors use multiple cores to increase performance while limiting power usage. Some processors use not just a few cores, but tens of cores or even 100 cores. One such many-core microprocessor is the Maestro processor, which is based on Tilera's TILE64 processor. The Maestro chip is a 49-core, general-purpose, radiation-hardened processor designed for space applications. The Maestro processor, unlike the TILE64, has a floating point unit (FPU) in each core for improved floating point performance. The Maestro processor runs at 342 MHz clock frequency. On the Maestro processor, we implemented several widely used kernels: matrix multiplication, vector add, FIR filter, and FFT. We measured and analyzed the performance of these kernels. The achieved performance was up to 5.7 GFLOPS, and the speedup compared to single tile was up to 49 using 49 tiles.
NASA Astrophysics Data System (ADS)
Zhang, Yuli; Han, Jun; Weng, Xinqian; He, Zhongzhu; Zeng, Xiaoyang
This paper presents an Application Specific Instruction-set Processor (ASIP) for the SHA-3 BLAKE algorithm family by instruction set extensions (ISE) from an RISC (reduced instruction set computer) processor. With a design space exploration for this ASIP to increase the performance and reduce the area cost, we accomplish an efficient hardware and software implementation of BLAKE algorithm. The special instructions and their well-matched hardware function unit improve the calculation of the key section of the algorithm, namely G-functions. Also, relaxing the time constraint of the special function unit can decrease its hardware cost, while keeping the high data throughput of the processor. Evaluation results reveal the ASIP achieves 335Mbps and 176Mbps for BLAKE-256 and BLAKE-512. The extra area cost is only 8.06k equivalent gates. The proposed ASIP outperforms several software approaches on various platforms in cycle per byte. In fact, both high throughput and low hardware cost achieved by this programmable processor are comparable to that of ASIC implementations.
Cher, Chen-Yong; Coteus, Paul W; Gara, Alan; Kursun, Eren; Paulsen, David P; Schuelke, Brian A; Sheets, II, John E; Tian, Shurong
2013-10-01
A processor-implemented method for determining aging of a processing unit in a processor the method comprising: calculating an effective aging profile for the processing unit wherein the effective aging profile quantifies the effects of aging on the processing unit; combining the effective aging profile with process variation data, actual workload data and operating conditions data for the processing unit; and determining aging through an aging sensor of the processing unit using the effective aging profile, the process variation data, the actual workload data, architectural characteristics and redundancy data, and the operating conditions data for the processing unit.
HLYWD: a program for post-processing data files to generate selected plots or time-lapse graphics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Munro, J.K. Jr.
1980-05-01
The program HLYWD is a post-processor of output files generated by large plasma simulation computations or of data files containing a time sequence of plasma diagnostics. It is intended to be used in a production mode for either type of application; i.e., it allows one to generate along with the graphics sequence, segments containing title, credits to those who performed the work, text to describe the graphics, and acknowledgement of funding agency. The current version is designed to generate 3D plots and allows one to select type of display (linear or semi-log scales), choice of normalization of function values formore » display purposes, viewing perspective, and an option to allow continuous rotations of surfaces. This program was developed with the intention of being relatively easy to use, reasonably flexible, and requiring a minimum investment of the user's time. It uses the TV80 library of graphics software and ORDERLIB system software on the CDC 7600 at the National Magnetic Fusion Energy Computing Center at Lawrence Livermore Laboratory in California.« less
Deep searches for broadband extended gravitational-wave emission bursts by heterogeneous computing
NASA Astrophysics Data System (ADS)
van Putten, Maurice H. P. M.
2017-09-01
We present a heterogeneous search algorithm for broadband extended gravitational-wave emission, expected from gamma-ray bursts and energetic core-collapse supernovae. It searches the (f,\\dot{f})-plane for long-duration bursts by inner engines slowly exhausting their energy reservoir by matched filtering on a graphics processor unit (GPU) over a template bank of millions of 1 s duration chirps. Parseval's theorem is used to predict the standard deviation σ of the filter output, taking advantage of the near-Gaussian noise in the LIGO S6 data over 350-2000 Hz. Tails exceeding a multiple of σ are communicated back to a central processing unit. This algorithm attains about 65% efficiency overall, normalized to the fast Fourier transform. At about one million correlations per second over data segments of 16 s duration (N=2^{16} samples), better than real-time analysis is achieved on a cluster of about a dozen GPUs. We demonstrate its application to the capture of high-frequency hardware LIGO injections. This algorithm serves as a starting point for deep all-sky searches in both archive data and real-time analysis in current observational runs.
A Heterogeneous Multiprocessor Graphics System Using Processor-Enhanced Memories
1989-02-01
frames per second, font generation directly from conic spline descriptions, and rapid calculation of radiosity form factors. The hardware consists of...generality for rendering curved surfaces, volume data, objects dcscri id with Constructive Solid Geometry, for rendering scenes using the radiosity ...f.aces and for computing a spherical radiosity lighting model (see Section 7.6). Custom Memory Chips \\ 208 bits x 128 pixels - Renderer Board ix p o a
Rapid Prototyping of Application Specific Signal Processors (RASSP)
1992-10-01
as well as government, research and and COMPASS , and how the improved plan academic institutions. CFI believes that effective might fit in with the... Compass ). libraries for COTS parts Tools and standards would be strongly based on - Ease of Use VHDL in its latest form(s). Block 2 would take * Open...EDIF Comrcial Rel:wased * Logic Inc. capture for Proprietary boards graphical language Logic Compass Schematic Proprietary EDIF; Commercial Released
AFTOMS Technology Issues and Alternatives Report
1989-12-01
color , resolu- power requirements, physi- tion; memory , processor speed; cal and weather rugged- IAN interfaces, etc,) f,: these ness. display...Telephone and Telegraph 3 CD-I Compact Disk - Interactive CD-ROM Compact Disk-Read Only Memory CGM Computer Graphics Metafile CNWDI Critical Nuclear...Database Management System RFP Request For Proposal 3 RFS Remote File System ROM Read Only Memory 3 S SA-ALC San Antonio Air Logistics Center 3 SAC
Acceleration of the Smith-Waterman algorithm using single and multiple graphics processors
NASA Astrophysics Data System (ADS)
Khajeh-Saeed, Ali; Poole, Stephen; Blair Perot, J.
2010-06-01
Finding regions of similarity between two very long data streams is a computationally intensive problem referred to as sequence alignment. Alignment algorithms must allow for imperfect sequence matching with different starting locations and some gaps and errors between the two data sequences. Perhaps the most well known application of sequence matching is the testing of DNA or protein sequences against genome databases. The Smith-Waterman algorithm is a method for precisely characterizing how well two sequences can be aligned and for determining the optimal alignment of those two sequences. Like many applications in computational science, the Smith-Waterman algorithm is constrained by the memory access speed and can be accelerated significantly by using graphics processors (GPUs) as the compute engine. In this work we show that effective use of the GPU requires a novel reformulation of the Smith-Waterman algorithm. The performance of this new version of the algorithm is demonstrated using the SSCA#1 (Bioinformatics) benchmark running on one GPU and on up to four GPUs executing in parallel. The results indicate that for large problems a single GPU is up to 45 times faster than a CPU for this application, and the parallel implementation shows linear speed up on up to 4 GPUs.
Interactive graphical computer-aided design system
NASA Technical Reports Server (NTRS)
Edge, T. M.
1975-01-01
System is used for design, layout, and modification of large-scale-integrated (LSI) metal-oxide semiconductor (MOS) arrays. System is structured around small computer which provides real-time support for graphics storage display unit with keyboard, slave display unit, hard copy unit, and graphics tablet for designer/computer interface.
Spectral-element Seismic Wave Propagation on CUDA/OpenCL Hardware Accelerators
NASA Astrophysics Data System (ADS)
Peter, D. B.; Videau, B.; Pouget, K.; Komatitsch, D.
2015-12-01
Seismic wave propagation codes are essential tools to investigate a variety of wave phenomena in the Earth. Furthermore, they can now be used for seismic full-waveform inversions in regional- and global-scale adjoint tomography. Although these seismic wave propagation solvers are crucial ingredients to improve the resolution of tomographic images to answer important questions about the nature of Earth's internal processes and subsurface structure, their practical application is often limited due to high computational costs. They thus need high-performance computing (HPC) facilities to improving the current state of knowledge. At present, numerous large HPC systems embed many-core architectures such as graphics processing units (GPUs) to enhance numerical performance. Such hardware accelerators can be programmed using either the CUDA programming environment or the OpenCL language standard. CUDA software development targets NVIDIA graphic cards while OpenCL was adopted by additional hardware accelerators, like e.g. AMD graphic cards, ARM-based processors as well as Intel Xeon Phi coprocessors. For seismic wave propagation simulations using the open-source spectral-element code package SPECFEM3D_GLOBE, we incorporated an automatic source-to-source code generation tool (BOAST) which allows us to use meta-programming of all computational kernels for forward and adjoint runs. Using our BOAST kernels, we generate optimized source code for both CUDA and OpenCL languages within the source code package. Thus, seismic wave simulations are able now to fully utilize CUDA and OpenCL hardware accelerators. We show benchmarks of forward seismic wave propagation simulations using SPECFEM3D_GLOBE on CUDA/OpenCL GPUs, validating results and comparing performances for different simulations and hardware usages.
Discovering epistasis in large scale genetic association studies by exploiting graphics cards.
Chen, Gary K; Guo, Yunfei
2013-12-03
Despite the enormous investments made in collecting DNA samples and generating germline variation data across thousands of individuals in modern genome-wide association studies (GWAS), progress has been frustratingly slow in explaining much of the heritability in common disease. Today's paradigm of testing independent hypotheses on each single nucleotide polymorphism (SNP) marker is unlikely to adequately reflect the complex biological processes in disease risk. Alternatively, modeling risk as an ensemble of SNPs that act in concert in a pathway, and/or interact non-additively on log risk for example, may be a more sensible way to approach gene mapping in modern studies. Implementing such analyzes genome-wide can quickly become intractable due to the fact that even modest size SNP panels on modern genotype arrays (500k markers) pose a combinatorial nightmare, require tens of billions of models to be tested for evidence of interaction. In this article, we provide an in-depth analysis of programs that have been developed to explicitly overcome these enormous computational barriers through the use of processors on graphics cards known as Graphics Processing Units (GPU). We include tutorials on GPU technology, which will convey why they are growing in appeal with today's numerical scientists. One obvious advantage is the impressive density of microprocessor cores that are available on only a single GPU. Whereas high end servers feature up to 24 Intel or AMD CPU cores, the latest GPU offerings from nVidia feature over 2600 cores. Each compute node may be outfitted with up to 4 GPU devices. Success on GPUs varies across problems. However, epistasis screens fare well due to the high degree of parallelism exposed in these problems. Papers that we review routinely report GPU speedups of over two orders of magnitude (>100x) over standard CPU implementations.
High-performance parallel computing in the classroom using the public goods game as an example
NASA Astrophysics Data System (ADS)
Perc, Matjaž
2017-07-01
The use of computers in statistical physics is common because the sheer number of equations that describe the behaviour of an entire system particle by particle often makes it impossible to solve them exactly. Monte Carlo methods form a particularly important class of numerical methods for solving problems in statistical physics. Although these methods are simple in principle, their proper use requires a good command of statistical mechanics, as well as considerable computational resources. The aim of this paper is to demonstrate how the usage of widely accessible graphics cards on personal computers can elevate the computing power in Monte Carlo simulations by orders of magnitude, thus allowing live classroom demonstration of phenomena that would otherwise be out of reach. As an example, we use the public goods game on a square lattice where two strategies compete for common resources in a social dilemma situation. We show that the second-order phase transition to an absorbing phase in the system belongs to the directed percolation universality class, and we compare the time needed to arrive at this result by means of the main processor and by means of a suitable graphics card. Parallel computing on graphics processing units has been developed actively during the last decade, to the point where today the learning curve for entry is anything but steep for those familiar with programming. The subject is thus ripe for inclusion in graduate and advanced undergraduate curricula, and we hope that this paper will facilitate this process in the realm of physics education. To that end, we provide a documented source code for an easy reproduction of presented results and for further development of Monte Carlo simulations of similar systems.
Discovering epistasis in large scale genetic association studies by exploiting graphics cards
Chen, Gary K.; Guo, Yunfei
2013-01-01
Despite the enormous investments made in collecting DNA samples and generating germline variation data across thousands of individuals in modern genome-wide association studies (GWAS), progress has been frustratingly slow in explaining much of the heritability in common disease. Today's paradigm of testing independent hypotheses on each single nucleotide polymorphism (SNP) marker is unlikely to adequately reflect the complex biological processes in disease risk. Alternatively, modeling risk as an ensemble of SNPs that act in concert in a pathway, and/or interact non-additively on log risk for example, may be a more sensible way to approach gene mapping in modern studies. Implementing such analyzes genome-wide can quickly become intractable due to the fact that even modest size SNP panels on modern genotype arrays (500k markers) pose a combinatorial nightmare, require tens of billions of models to be tested for evidence of interaction. In this article, we provide an in-depth analysis of programs that have been developed to explicitly overcome these enormous computational barriers through the use of processors on graphics cards known as Graphics Processing Units (GPU). We include tutorials on GPU technology, which will convey why they are growing in appeal with today's numerical scientists. One obvious advantage is the impressive density of microprocessor cores that are available on only a single GPU. Whereas high end servers feature up to 24 Intel or AMD CPU cores, the latest GPU offerings from nVidia feature over 2600 cores. Each compute node may be outfitted with up to 4 GPU devices. Success on GPUs varies across problems. However, epistasis screens fare well due to the high degree of parallelism exposed in these problems. Papers that we review routinely report GPU speedups of over two orders of magnitude (>100x) over standard CPU implementations. PMID:24348518
Efficient Execution of Microscopy Image Analysis on CPU, GPU, and MIC Equipped Cluster Systems.
Andrade, G; Ferreira, R; Teodoro, George; Rocha, Leonardo; Saltz, Joel H; Kurc, Tahsin
2014-10-01
High performance computing is experiencing a major paradigm shift with the introduction of accelerators, such as graphics processing units (GPUs) and Intel Xeon Phi (MIC). These processors have made available a tremendous computing power at low cost, and are transforming machines into hybrid systems equipped with CPUs and accelerators. Although these systems can deliver a very high peak performance, making full use of its resources in real-world applications is a complex problem. Most current applications deployed to these machines are still being executed in a single processor, leaving other devices underutilized. In this paper we explore a scenario in which applications are composed of hierarchical data flow tasks which are allocated to nodes of a distributed memory machine in coarse-grain, but each of them may be composed of several finer-grain tasks which can be allocated to different devices within the node. We propose and implement novel performance aware scheduling techniques that can be used to allocate tasks to devices. We evaluate our techniques using a pathology image analysis application used to investigate brain cancer morphology, and our experimental evaluation shows that the proposed scheduling strategies significantly outperforms other efficient scheduling techniques, such as Heterogeneous Earliest Finish Time - HEFT, in cooperative executions using CPUs, GPUs, and MICs. We also experimentally show that our strategies are less sensitive to inaccuracy in the scheduling input data and that the performance gains are maintained as the application scales.
Efficient Execution of Microscopy Image Analysis on CPU, GPU, and MIC Equipped Cluster Systems
Andrade, G.; Ferreira, R.; Teodoro, George; Rocha, Leonardo; Saltz, Joel H.; Kurc, Tahsin
2015-01-01
High performance computing is experiencing a major paradigm shift with the introduction of accelerators, such as graphics processing units (GPUs) and Intel Xeon Phi (MIC). These processors have made available a tremendous computing power at low cost, and are transforming machines into hybrid systems equipped with CPUs and accelerators. Although these systems can deliver a very high peak performance, making full use of its resources in real-world applications is a complex problem. Most current applications deployed to these machines are still being executed in a single processor, leaving other devices underutilized. In this paper we explore a scenario in which applications are composed of hierarchical data flow tasks which are allocated to nodes of a distributed memory machine in coarse-grain, but each of them may be composed of several finer-grain tasks which can be allocated to different devices within the node. We propose and implement novel performance aware scheduling techniques that can be used to allocate tasks to devices. We evaluate our techniques using a pathology image analysis application used to investigate brain cancer morphology, and our experimental evaluation shows that the proposed scheduling strategies significantly outperforms other efficient scheduling techniques, such as Heterogeneous Earliest Finish Time - HEFT, in cooperative executions using CPUs, GPUs, and MICs. We also experimentally show that our strategies are less sensitive to inaccuracy in the scheduling input data and that the performance gains are maintained as the application scales. PMID:26640423
NASA Technical Reports Server (NTRS)
1998-01-01
This report presents a description of the tests performed, and the test data, for the A2 METSAT Signal Processor Assembly PN: 1331120-2, S/N F03. The assembly was tested in accordance with AE-26754, "METSAT Signal Processor Scan Drive Test and Integration Procedure."
NASA Technical Reports Server (NTRS)
1998-01-01
This report presents a description of the tests performed, and the test data, for the A2 METSAT Signal Processor Assembly PN: 1331120-2, S/N F04. The assembly was tested in accordance with AE-26754, "METSAT Signal Processor Scan Drive Test and Integration Procedure."
Northeast Parallel Architectures Center (NPAC)
1992-07-01
Computational Techniques: Mapping receptor units to processors , using NEWS communication to model interaction in the inhibitory field Goal of the Research...algorithms for classical problems to take advantage of multiple processors . Experiments in probability that have been too time consuming on serial...machine and achieved speedups of 4 to 5 times with 11 processors . It is believed that a slightly better speedup is achievable. In the case of stuck
Interactive 3D visualization speeds well, reservoir planning
DOE Office of Scientific and Technical Information (OSTI.GOV)
Petzet, G.A.
1997-11-24
Texaco Exploration and Production has begun making expeditious analyses and drilling decisions that result from interactive, large screen visualization of seismic and other three dimensional data. A pumpkin shaped room or pod inside a 3,500 sq ft, state-of-the-art facility in Southwest Houston houses a supercomputer and projection equipment Texaco said will help its people sharply reduce 3D seismic project cycle time, boost production from existing fields, and find more reserves. Oil and gas related applications of the visualization center include reservoir engineering, plant walkthrough simulation for facilities/piping design, and new field exploration. The center houses a Silicon Graphics Onyx2 infinitemore » reality supercomputer configured with 8 processors, 3 graphics pipelines, and 6 gigabytes of main memory.« less
OpenCL Implementation of NeuroIsing
NASA Astrophysics Data System (ADS)
Zapart, C. A.
Recent advances in graphics card hardware combined with anintroduction of the OpenCL standard promise to accelerate numerical simulations across diverse scientific disciplines. One such field benefiting from new hardware/software paradigms is econophysics. The paper describes an OpenCL implementation of a selected econophysics model: NeuroIsing, which has been designed to execute in parallel on a vendor-independent graphics card. Originally introduced in the paper [C.~A.~Zapart, ``Econophysics in Financial Time Series Prediction'', PhD thesis, Graduate University for Advanced Studies, Japan (2009)], at first it was implemented on a CELL processor running inside a SONY PS3 games console. The NeuroIsing framework can be applied to predicting and trading foreign exchange as well as stock market index futures.
Gschwind, Michael K [Chappaqua, NY
2011-03-01
Mechanisms for implementing a floating point only single instruction multiple data instruction set architecture are provided. A processor is provided that comprises an issue unit, an execution unit coupled to the issue unit, and a vector register file coupled to the execution unit. The execution unit has logic that implements a floating point (FP) only single instruction multiple data (SIMD) instruction set architecture (ISA). The floating point vector registers of the vector register file store both scalar and floating point values as vectors having a plurality of vector elements. The processor may be part of a data processing system.
Maximum-Likelihood Estimation With a Contracting-Grid Search Algorithm
Hesterman, Jacob Y.; Caucci, Luca; Kupinski, Matthew A.; Barrett, Harrison H.; Furenlid, Lars R.
2010-01-01
A fast search algorithm capable of operating in multi-dimensional spaces is introduced. As a sample application, we demonstrate its utility in the 2D and 3D maximum-likelihood position-estimation problem that arises in the processing of PMT signals to derive interaction locations in compact gamma cameras. We demonstrate that the algorithm can be parallelized in pipelines, and thereby efficiently implemented in specialized hardware, such as field-programmable gate arrays (FPGAs). A 2D implementation of the algorithm is achieved in Cell/BE processors, resulting in processing speeds above one million events per second, which is a 20× increase in speed over a conventional desktop machine. Graphics processing units (GPUs) are used for a 3D application of the algorithm, resulting in processing speeds of nearly 250,000 events per second which is a 250× increase in speed over a conventional desktop machine. These implementations indicate the viability of the algorithm for use in real-time imaging applications. PMID:20824155
GPU acceleration of particle-in-cell methods
NASA Astrophysics Data System (ADS)
Cowan, Benjamin; Cary, John; Meiser, Dominic
2015-11-01
Graphics processing units (GPUs) have become key components in many supercomputing systems, as they can provide more computations relative to their cost and power consumption than conventional processors. However, to take full advantage of this capability, they require a strict programming model which involves single-instruction multiple-data execution as well as significant constraints on memory accesses. To bring the full power of GPUs to bear on plasma physics problems, we must adapt the computational methods to this new programming model. We have developed a GPU implementation of the particle-in-cell (PIC) method, one of the mainstays of plasma physics simulation. This framework is highly general and enables advanced PIC features such as high order particles and absorbing boundary conditions. The main elements of the PIC loop, including field interpolation and particle deposition, are designed to optimize memory access. We describe the performance of these algorithms and discuss some of the methods used. Work supported by DARPA contract W31P4Q-15-C-0061 (SBIR).
GPU accelerated simulations of three-dimensional flow of power-law fluids in a driven cube
NASA Astrophysics Data System (ADS)
Jin, K.; Vanka, S. P.; Agarwal, R. K.; Thomas, B. G.
2017-01-01
Newtonian fluid flow in two- and three-dimensional cavities with a moving wall has been studied extensively in a number of previous works. However, relatively a fewer number of studies have considered the motion of non-Newtonian fluids such as shear thinning and shear thickening power law fluids. In this paper, we have simulated the three-dimensional, non-Newtonian flow of a power law fluid in a cubic cavity driven by shear from the top wall. We have used an in-house developed fractional step code, implemented on a Graphics Processor Unit. Three Reynolds numbers have been studied with power law index set to 0.5, 1.0 and 1.5. The flow patterns, viscosity distributions and velocity profiles are presented for Reynolds numbers of 100, 400 and 1000. All three Reynolds numbers are found to yield steady state flows. Tabulated values of velocity are given for the nine cases studied, including the Newtonian cases.
Tang, Dawei; Gao, Feng; Jiang, X
2014-08-20
We present a spectral domain low-coherence interferometry (SD-LCI) method that is effective for applications in on-line surface inspection because it can obtain a surface profile in a single shot. It has an advantage over existing spectral interferometry techniques by using cylindrical lenses as the objective lenses in a Michelson interferometric configuration to enable the measurement of long profiles. Combined with a modern high-speed CCD camera, general-purpose graphics processing unit, and multicore processors computing technology, fast measurement can be achieved. By translating the tested sample during the measurement procedure, real-time surface inspection was implemented, which is proved by the large-scale 3D surface measurement in this paper. ZEMAX software is used to simulate the SD-LCI system and analyze the alignment errors. Two step height surfaces were measured, and the captured interferograms were analyzed using a fast Fourier transform algorithm. Both 2D profile results and 3D surface maps closely align with the calibrated specifications given by the manufacturer.
The PALM-3000 high-order adaptive optics system for Palomar Observatory
NASA Astrophysics Data System (ADS)
Bouchez, Antonin H.; Dekany, Richard G.; Angione, John R.; Baranec, Christoph; Britton, Matthew C.; Bui, Khanh; Burruss, Rick S.; Cromer, John L.; Guiwits, Stephen R.; Henning, John R.; Hickey, Jeff; McKenna, Daniel L.; Moore, Anna M.; Roberts, Jennifer E.; Trinh, Thang Q.; Troy, Mitchell; Truong, Tuan N.; Velur, Viswa
2008-07-01
Deployed as a multi-user shared facility on the 5.1 meter Hale Telescope at Palomar Observatory, the PALM-3000 highorder upgrade to the successful Palomar Adaptive Optics System will deliver extreme AO correction in the near-infrared, and diffraction-limited images down to visible wavelengths, using both natural and sodium laser guide stars. Wavefront control will be provided by two deformable mirrors, a 3368 active actuator woofer and 349 active actuator tweeter, controlled at up to 3 kHz using an innovative wavefront processor based on a cluster of 17 graphics processing units. A Shack-Hartmann wavefront sensor with selectable pupil sampling will provide high-order wavefront sensing, while an infrared tip/tilt sensor and visible truth wavefront sensor will provide low-order LGS control. Four back-end instruments are planned at first light: the PHARO near-infrared camera/spectrograph, the SWIFT visible light integral field spectrograph, Project 1640, a near-infrared coronagraphic integral field spectrograph, and 888Cam, a high-resolution visible light imager.
Methods for compressible fluid simulation on GPUs using high-order finite differences
NASA Astrophysics Data System (ADS)
Pekkilä, Johannes; Väisälä, Miikka S.; Käpylä, Maarit J.; Käpylä, Petri J.; Anjum, Omer
2017-08-01
We focus on implementing and optimizing a sixth-order finite-difference solver for simulating compressible fluids on a GPU using third-order Runge-Kutta integration. Since graphics processing units perform well in data-parallel tasks, this makes them an attractive platform for fluid simulation. However, high-order stencil computation is memory-intensive with respect to both main memory and the caches of the GPU. We present two approaches for simulating compressible fluids using 55-point and 19-point stencils. We seek to reduce the requirements for memory bandwidth and cache size in our methods by using cache blocking and decomposing a latency-bound kernel into several bandwidth-bound kernels. Our fastest implementation is bandwidth-bound and integrates 343 million grid points per second on a Tesla K40t GPU, achieving a 3 . 6 × speedup over a comparable hydrodynamics solver benchmarked on two Intel Xeon E5-2690v3 processors. Our alternative GPU implementation is latency-bound and achieves the rate of 168 million updates per second.
System Architecture For High Speed Sorting Of Potatoes
NASA Astrophysics Data System (ADS)
Marchant, J. A.; Onyango, C. M.; Street, M. J.
1989-03-01
This paper illustrates an industrial application of vision processing in which potatoes are sorted according to their size and shape at speeds of up to 40 objects per second. The result is a multi-processing approach built around the VME bus. A hardware unit has been designed and constructed to encode the boundary of the potatoes, to reducing the amount of data to be processed. A master 68000 processor is used to control this unit and to handle data transfers along the bus. Boundary data is passed to one of three 68010 slave processors each responsible for a line of potatoes across a conveyor belt. The slave processors calculate attributes such as shape, size and estimated weight of each potato and the master processor uses this data to operate the sorting mechanism. The system has been interfaced with a commercial grading machine and performance trials are now in progress.
SU-E-J-91: FFT Based Medical Image Registration Using a Graphics Processing Unit (GPU).
Luce, J; Hoggarth, M; Lin, J; Block, A; Roeske, J
2012-06-01
To evaluate the efficiency gains obtained from using a Graphics Processing Unit (GPU) to perform a Fourier Transform (FT) based image registration. Fourier-based image registration involves obtaining the FT of the component images, and analyzing them in Fourier space to determine the translations and rotations of one image set relative to another. An important property of FT registration is that by enlarging the images (adding additional pixels), one can obtain translations and rotations with sub-pixel resolution. The expense, however, is an increased computational time. GPUs may decrease the computational time associated with FT image registration by taking advantage of their parallel architecture to perform matrix computations much more efficiently than a Central Processor Unit (CPU). In order to evaluate the computational gains produced by a GPU, images with known translational shifts were utilized. A program was written in the Interactive Data Language (IDL; Exelis, Boulder, CO) to performCPU-based calculations. Subsequently, the program was modified using GPU bindings (Tech-X, Boulder, CO) to perform GPU-based computation on the same system. Multiple image sizes were used, ranging from 256×256 to 2304×2304. The time required to complete the full algorithm by the CPU and GPU were benchmarked and the speed increase was defined as the ratio of the CPU-to-GPU computational time. The ratio of the CPU-to- GPU time was greater than 1.0 for all images, which indicates the GPU is performing the algorithm faster than the CPU. The smallest improvement, a 1.21 ratio, was found with the smallest image size of 256×256, and the largest speedup, a 4.25 ratio, was observed with the largest image size of 2304×2304. GPU programming resulted in a significant decrease in computational time associated with a FT image registration algorithm. The inclusion of the GPU may provide near real-time, sub-pixel registration capability. © 2012 American Association of Physicists in Medicine.
Design of an integrated fuel processor for residential PEMFCs applications
NASA Astrophysics Data System (ADS)
Seo, Yu Taek; Seo, Dong Joo; Jeong, Jin Hyeok; Yoon, Wang Lai
KIER has been developing a novel fuel processing system to provide hydrogen rich gas to residential PEMFCs system. For the effective design of a compact hydrogen production system, each unit process for steam reforming and water gas shift, has a steam generator and internal heat exchangers which are thermally and physically integrated into a single packaged hardware system. The newly designed fuel processor (prototype II) showed a thermal efficiency of 78% as a HHV basis with methane conversion of 89%. The preferential oxidation unit with two staged cascade reactors, reduces, the CO concentration to below 10 ppm without complicated temperature control hardware, which is the prerequisite CO limit for the PEMFC stack. After we achieve the initial performance of the fuel processor, partial load operation was carried out to test the performance and reliability of the fuel processor at various loads. The stability of the fuel processor was also demonstrated for three successive days with a stable composition of product gas and thermal efficiency. The CO concentration remained below 10 ppm during the test period and confirmed the stable performance of the two-stage PrOx reactors.
ERIC Educational Resources Information Center
Smith, Herbert R., Jr.
This monograph describes the role of medical and graphic arts units within the comprehensive communications departments of health science educational institutions. Historical trends and contemporary practices are described. Suggestions are made for: organizational structure; services and activities; staff requirements; budget; facility…
Integrating Commercial Off-The-Shelf (COTS) graphics and extended memory packages with CLIPS
NASA Technical Reports Server (NTRS)
Callegari, Andres C.
1990-01-01
This paper addresses the question of how to mix CLIPS with graphics and how to overcome PC's memory limitations by using the extended memory available in the computer. By adding graphics and extended memory capabilities, CLIPS can be converted into a complete and powerful system development tool, on the other most economical and popular computer platform. New models of PCs have amazing processing capabilities and graphic resolutions that cannot be ignored and should be used to the fullest of their resources. CLIPS is a powerful expert system development tool, but it cannot be complete without the support of a graphics package needed to create user interfaces and general purpose graphics, or without enough memory to handle large knowledge bases. Now, a well known limitation on the PC's is the usage of real memory which limits CLIPS to use only 640 Kb of real memory, but now that problem can be solved by developing a version of CLIPS that uses extended memory. The user has access of up to 16 MB of memory on 80286 based computers and, practically, all the available memory (4 GB) on computers that use the 80386 processor. So if we give CLIPS a self-configuring graphics package that will automatically detect the graphics hardware and pointing device present in the computer, and we add the availability of the extended memory that exists in the computer (with no special hardware needed), the user will be able to create more powerful systems at a fraction of the cost and on the most popular, portable, and economic platform available such as the PC platform.
C-MOS array design techniques: SUMC multiprocessor system study
NASA Technical Reports Server (NTRS)
Clapp, W. A.; Helbig, W. A.; Merriam, A. S.
1972-01-01
The current capabilities of LSI techniques for speed and reliability, plus the possibilities of assembling large configurations of LSI logic and storage elements, have demanded the study of multiprocessors and multiprocessing techniques, problems, and potentialities. Evaluated are three previous systems studies for a space ultrareliable modular computer multiprocessing system, and a new multiprocessing system is proposed that is flexibly configured with up to four central processors, four 1/0 processors, and 16 main memory units, plus auxiliary memory and peripheral devices. This multiprocessor system features a multilevel interrupt, qualified S/360 compatibility for ground-based generation of programs, virtual memory management of a storage hierarchy through 1/0 processors, and multiport access to multiple and shared memory units.
NASA Astrophysics Data System (ADS)
Hayakawa, Hitoshi; Ogawa, Makoto; Shibata, Tadashi
2005-04-01
A very large scale integrated circuit (VLSI) architecture for a multiple-instruction-stream multiple-data-stream (MIMD) associative processor has been proposed. The processor employs an architecture that enables seamless switching from associative operations to arithmetic operations. The MIMD element is convertible to a regular central processing unit (CPU) while maintaining its high performance as an associative processor. Therefore, the MIMD associative processor can perform not only on-chip perception, i.e., searching for the vector most similar to an input vector throughout the on-chip cache memory, but also arithmetic and logic operations similar to those in ordinary CPUs, both simultaneously in parallel processing. Three key technologies have been developed to generate the MIMD element: associative-operation-and-arithmetic-operation switchable calculation units, a versatile register control scheme within the MIMD element for flexible operations, and a short instruction set for minimizing the memory size for program storage. Key circuit blocks were designed and fabricated using 0.18 μm complementary metal-oxide-semiconductor (CMOS) technology. As a result, the full-featured MIMD element is estimated to be 3 mm2, showing the feasibility of an 8-parallel-MIMD-element associative processor in a single chip of 5 mm× 5 mm.
2012-02-17
to be solved. Disclaimer: Reference herein to any specific commercial company , product, process, or service by trade name, trademark...data processing rather than data caching and control flow. To make use of this computational power, NVIDIA introduced a general purpose parallel...GPU implementations were run on an Intel Nehalem Xeon E5520 2.26GHz processor with an NVIDIA Tesla C2070 graphics card for varying numbers of
NASA Astrophysics Data System (ADS)
Tanikawa, Ataru; Yoshikawa, Kohji; Nitadori, Keigo; Okamoto, Takashi
2013-02-01
We have developed a numerical software library for collisionless N-body simulations named "Phantom-GRAPE" which highly accelerates force calculations among particles by use of a new SIMD instruction set extension to the x86 architecture, Advanced Vector eXtensions (AVX), an enhanced version of the Streaming SIMD Extensions (SSE). In our library, not only the Newton's forces, but also central forces with an arbitrary shape f(r), which has a finite cutoff radius rcut (i.e. f(r)=0 at r>rcut), can be quickly computed. In computing such central forces with an arbitrary force shape f(r), we refer to a pre-calculated look-up table. We also present a new scheme to create the look-up table whose binning is optimal to keep good accuracy in computing forces and whose size is small enough to avoid cache misses. Using an Intel Core i7-2600 processor, we measure the performance of our library for both of the Newton's forces and the arbitrarily shaped central forces. In the case of Newton's forces, we achieve 2×109 interactions per second with one processor core (or 75 GFLOPS if we count 38 operations per interaction), which is 20 times higher than the performance of an implementation without any explicit use of SIMD instructions, and 2 times than that with the SSE instructions. With four processor cores, we obtain the performance of 8×109 interactions per second (or 300 GFLOPS). In the case of the arbitrarily shaped central forces, we can calculate 1×109 and 4×109 interactions per second with one and four processor cores, respectively. The performance with one processor core is 6 times and 2 times higher than those of the implementations without any use of SIMD instructions and with the SSE instructions. These performances depend only weakly on the number of particles, irrespective of the force shape. It is good contrast with the fact that the performance of force calculations accelerated by graphics processing units (GPUs) depends strongly on the number of particles. Substantially weak dependence of the performance on the number of particles is suitable to collisionless N-body simulations, since these simulations are usually performed with sophisticated N-body solvers such as Tree- and TreePM-methods combined with an individual timestep scheme. We conclude that collisionless N-body simulations accelerated with our library have significant advantage over those accelerated by GPUs, especially on massively parallel environments.
LV software support for supersonic flow analysis
NASA Technical Reports Server (NTRS)
Bell, W. A.; Lepicovsky, J.
1992-01-01
The software for configuring an LV counter processor system has been developed using structured design. The LV system includes up to three counter processors and a rotary encoder. The software for configuring and testing the LV system has been developed, tested, and included in an overall software package for data acquisition, analysis, and reduction. Error handling routines respond to both operator and instrument errors which often arise in the course of measuring complex, high-speed flows. The use of networking capabilities greatly facilitates the software development process by allowing software development and testing from a remote site. In addition, high-speed transfers allow graphics files or commands to provide viewing of the data from a remote site. Further advances in data analysis require corresponding advances in procedures for statistical and time series analysis of nonuniformly sampled data.
LV software support for supersonic flow analysis
NASA Technical Reports Server (NTRS)
Bell, William A.
1992-01-01
The software for configuring a Laser Velocimeter (LV) counter processor system was developed using structured design. The LV system includes up to three counter processors and a rotary encoder. The software for configuring and testing the LV system was developed, tested, and included in an overall software package for data acquisition, analysis, and reduction. Error handling routines respond to both operator and instrument errors which often arise in the course of measuring complex, high-speed flows. The use of networking capabilities greatly facilitates the software development process by allowing software development and testing from a remote site. In addition, high-speed transfers allow graphics files or commands to provide viewing of the data from a remote site. Further advances in data analysis require corresponding advances in procedures for statistical and time series analysis of nonuniformly sampled data.
Parallel architecture for rapid image generation and analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nerheim, R.J.
1987-01-01
A multiprocessor architecture inspired by the Disney multiplane camera is proposed. For many applications, this approach produces a natural mapping of processors to objects in a scene. Such a mapping promotes parallelism and reduces the hidden-surface work with minimal interprocessor communication and low-overhead cost. Existing graphics architectures store the final picture as a monolithic entity. The architecture here stores each object's image separately. It assembles the final composite picture from component images only when the video display needs to be refreshed. This organization simplifies the work required to animate moving objects that occlude other objects. In addition, the architecture hasmore » multiple processors that generate the component images in parallel. This further shortens the time needed to create a composite picture. In addition to generating images for animation, the architecture has the ability to decompose images.« less
FPGA-based distributed computing microarchitecture for complex physical dynamics investigation.
Borgese, Gianluca; Pace, Calogero; Pantano, Pietro; Bilotta, Eleonora
2013-09-01
In this paper, we present a distributed computing system, called DCMARK, aimed at solving partial differential equations at the basis of many investigation fields, such as solid state physics, nuclear physics, and plasma physics. This distributed architecture is based on the cellular neural network paradigm, which allows us to divide the differential equation system solving into many parallel integration operations to be executed by a custom multiprocessor system. We push the number of processors to the limit of one processor for each equation. In order to test the present idea, we choose to implement DCMARK on a single FPGA, designing the single processor in order to minimize its hardware requirements and to obtain a large number of easily interconnected processors. This approach is particularly suited to study the properties of 1-, 2- and 3-D locally interconnected dynamical systems. In order to test the computing platform, we implement a 200 cells, Korteweg-de Vries (KdV) equation solver and perform a comparison between simulations conducted on a high performance PC and on our system. Since our distributed architecture takes a constant computing time to solve the equation system, independently of the number of dynamical elements (cells) of the CNN array, it allows us to reduce the elaboration time more than other similar systems in the literature. To ensure a high level of reconfigurability, we design a compact system on programmable chip managed by a softcore processor, which controls the fast data/control communication between our system and a PC Host. An intuitively graphical user interface allows us to change the calculation parameters and plot the results.
Onboard shuttle on-line software requirements system: Prototype
NASA Technical Reports Server (NTRS)
Kolkhorst, Barbara; Ogletree, Barry
1989-01-01
The prototype discussed here was developed as proof of a concept for a system which could support high volumes of requirements documents with integrated text and graphics; the solution proposed here could be extended to other projects whose goal is to place paper documents in an electronic system for viewing and printing purposes. The technical problems (such as conversion of documentation between word processors, management of a variety of graphics file formats, and difficulties involved in scanning integrated text and graphics) would be very similar for other systems of this type. Indeed, technological advances in areas such as scanning hardware and software and display terminals insure that some of the problems encountered here will be solved in the near-term (less than five years). Examples of these solvable problems include automated input of integrated text and graphics, errors in the recognition process, and the loss of image information which results from the digitization process. The solution developed for the Online Software Requirements System is modular and allows hardware and software components to be upgraded or replaced as industry solutions mature. The extensive commercial software content allows the NASA customer to apply resources to solving the problem and maintaining documents.
An application of the MPP to the interactive manipulation of stereo images of digital terrain models
NASA Technical Reports Server (NTRS)
Pol, Sanjay; Mcallister, David; Davis, Edward
1987-01-01
Massively Parallel Processor algorithms were developed for the interactive manipulation of flat shaded digital terrain models defined over grids. The emphasis is on real time manipulation of stereo images. Standard graphics transformations are applied to a 128 x 128 grid of elevations followed by shading and a perspective projection to produce the right eye image. The surface is then rendered using a simple painter's algorithm for hidden surface removal. The left eye image is produced by rotating the surface 6 degs about the viewer's y axis followed by a perspective projection and rendering of the image as described above. The left and right eye images are then presented on a graphics device using standard stereo technology. Performance evaluations and comparisons are presented.
Formal design specification of a Processor Interface Unit
NASA Technical Reports Server (NTRS)
Fura, David A.; Windley, Phillip J.; Cohen, Gerald C.
1992-01-01
This report describes work to formally specify the requirements and design of a processor interface unit (PIU), a single-chip subsystem providing memory-interface bus-interface, and additional support services for a commercial microprocessor within a fault-tolerant computer system. This system, the Fault-Tolerant Embedded Processor (FTEP), is targeted towards applications in avionics and space requiring extremely high levels of mission reliability, extended maintenance-free operation, or both. The need for high-quality design assurance in such applications is an undisputed fact, given the disastrous consequences that even a single design flaw can produce. Thus, the further development and application of formal methods to fault-tolerant systems is of critical importance as these systems see increasing use in modern society.
Expedition Seven CDR Malenkenko performs IFM on Condensate Water Processor
2003-07-03
ISS007-E-09229 (3 July 2003) --- Cosmonaut Yuri I. Malenchenko, Expedition 7 mission commander, performs scheduled in-flight maintenance (IFM) on the condensate water processor (SRV-K2M) by removing and replacing its BKO multifiltration/purification column unit, which has reached its service life limit (450 liters min.). The old unit will be discarded on Progress. The IFM took place in the Zvezda Service Module on the International Space Station (ISS). Malenchenko represents Rosaviakosmos.
Expedition Seven CDR Malenkenko performs IFM on Condensate Water Processor
2003-07-03
ISS007-E-09231 (3 July 2003) --- Cosmonaut Yuri I. Malenchenko, Expedition 7 mission commander, performs scheduled in-flight maintenance (IFM) on the condensate water processor (SRV-K2M) by removing and replacing its BKO multifiltration/purification column unit, which has reached its service life limit (450 liters min.). The old unit will be discarded on Progress. The IFM took place in the Zvezda Service Module on the International Space Station (ISS). Malenchenko represents Rosaviakosmos.
Thermal Hotspots in CPU Die and It's Future Architecture
NASA Astrophysics Data System (ADS)
Wang, Jian; Hu, Fu-Yuan
Owing to the increasing core frequency and chip integration and the limited die dimension, the power densities in CPU chip have been increasing fastly. The high temperature on chip resulted by power densities threats the processor's performance and chip's reliability. This paper analyzed the thermal hotspots in die and their properties. A new architecture of function units in die - - hot units distributed architecture is suggested to cope with the problems of high power densities for future processor chip.
DTD Creation for the Software Technology for Adaptable, Reliable Systems (STARS) Program
1990-06-23
developed to store documents in a format peculiar to the program’s design . Editing the document became easy since word processors adjust all spacing and...descriptive markup may be output to a 3 CDRL 1810 January 26, 1990 variety of devices ranging from high quality typography printers through laser printers...provision for non-SGML material, such as graphics , to be inserted in a document. For these reasons the Computer-Aided Acquisition and Logistics Support
A software tool for dataflow graph scheduling
NASA Technical Reports Server (NTRS)
Jones, Robert L., III
1994-01-01
A graph-theoretic design process and software tool is presented for selecting a multiprocessing scheduling solution for a class of computational problems. The problems of interest are those that can be described using a dataflow graph and are intended to be executed repetitively on multiple processors. The dataflow paradigm is very useful in exposing the parallelism inherent in algorithms. It provides a graphical and mathematical model which describes a partial ordering of algorithm tasks based on data precedence.
Digital Waveguide Architectures for Virtual Musical Instruments
NASA Astrophysics Data System (ADS)
Smith, Julius O.
Digital sound synthesis has become a standard staple of modern music studios, videogames, personal computers, and hand-held devices. As processing power has increased over the years, sound synthesis implementations have evolved from dedicated chip sets, to single-chip solutions, and ultimately to software implementations within processors used primarily for other tasks (such as for graphics or general purpose computing). With the cost of implementation dropping closer and closer to zero, there is increasing room for higher quality algorithms.
Cabrera, Álvaro Cortés; Gil-Redondo, Rubén; Perona, Almudena; Gago, Federico; Morreale, Antonio
2011-09-01
A graphical user interface (GUI) for our previously published virtual screening (VS) and data management platform VSDMIP (Gil-Redondo et al. J Comput Aided Mol Design, 23:171-184, 2009) that has been developed as a plugin for the popular molecular visualization program PyMOL is presented. In addition, a ligand-based VS module (LBVS) has been implemented that complements the already existing structure-based VS (SBVS) module and can be used in those cases where the receptor's 3D structure is not known or for pre-filtering purposes. This updated version of VSDMIP is placed in the context of similar available software and its LBVS and SBVS capabilities are tested here on a reduced set of the Directory of Useful Decoys database. Comparison of results from both approaches confirms the trend found in previous studies that LBVS outperforms SBVS. We also show that by combining LBVS and SBVS, and using a cluster of ~100 modern processors, it is possible to perform complete VS studies of several million molecules in less than a month. As the main processes in VSDMIP are 100% scalable, more powerful processors and larger clusters would notably decrease this time span. The plugin is distributed under an academic license upon request from the authors. © Springer Science+Business Media B.V. 2011
50 CFR 648.6 - Dealer/processor permits.
Code of Federal Regulations, 2013 CFR
2013-10-01
... 50 Wildlife and Fisheries 12 2013-10-01 2013-10-01 false Dealer/processor permits. 648.6 Section 648.6 Wildlife and Fisheries FISHERY CONSERVATION AND MANAGEMENT, NATIONAL OCEANIC AND ATMOSPHERIC ADMINISTRATION, DEPARTMENT OF COMMERCE FISHERIES OF THE NORTHEASTERN UNITED STATES General Provisions § 648.6...
2017-08-01
access to the GPU for general purpose processing .5 CUDA is designed to work easily with multiple programming languages , including Fortran. CUDA is a...Using Graphics Processing Unit (GPU) Computing by Leelinda P Dawson Approved for public release; distribution unlimited...The Performance Improvement of the Lagrangian Particle Dispersion Model (LPDM) Using Graphics Processing Unit (GPU) Computing by Leelinda
Configurable software for satellite graphics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hartzman, P D
An important goal in interactive computer graphics is to provide users with both quick system responses for basic graphics functions and enough computing power for complex calculations. One solution is to have a distributed graphics system in which a minicomputer and a powerful large computer share the work. The most versatile type of distributed system is an intelligent satellite system in which the minicomputer is programmable by the application user and can do most of the work while the large remote machine is used for difficult computations. At New York University, the hardware was configured from available equipment. The levelmore » of system intelligence resulted almost completely from software development. Unlike previous work with intelligent satellites, the resulting system had system control centered in the satellite. It also had the ability to reconfigure software during realtime operation. The design of the system was done at a very high level using set theoretic language. The specification clearly illustrated processor boundaries and interfaces. The high-level specification also produced a compact, machine-independent virtual graphics data structure for picture representation. The software was written in a systems implementation language; thus, only one set of programs was needed for both machines. A user can program both machines in a single language. Tests of the system with an application program indicate that is has very high potential. A major result of this work is the demonstration that a gigantic investment in new hardware is not necessary for computing facilities interested in graphics.« less
NASA Astrophysics Data System (ADS)
Chun, Won-Suk; Napoli, Joshua; Cossairt, Oliver S.; Dorval, Rick K.; Hall, Deirdre M.; Purtell, Thomas J., II; Schooler, James F.; Banker, Yigal; Favalora, Gregg E.
2005-03-01
We present a software and hardware foundation to enable the rapid adoption of 3-D displays. Different 3-D displays - such as multiplanar, multiview, and electroholographic displays - naturally require different rendering methods. The adoption of these displays in the marketplace will be accelerated by a common software framework. The authors designed the SpatialGL API, a new rendering framework that unifies these display methods under one interface. SpatialGL enables complementary visualization assets to coexist through a uniform infrastructure. Also, SpatialGL supports legacy interfaces such as the OpenGL API. The authors" first implementation of SpatialGL uses multiview and multislice rendering algorithms to exploit the performance of modern graphics processing units (GPUs) to enable real-time visualization of 3-D graphics from medical imaging, oil & gas exploration, and homeland security. At the time of writing, SpatialGL runs on COTS workstations (both Windows and Linux) and on Actuality"s high-performance embedded computational engine that couples an NVIDIA GeForce 6800 Ultra GPU, an AMD Athlon 64 processor, and a proprietary, high-speed, programmable volumetric frame buffer that interfaces to a 1024 x 768 x 3 digital projector. Progress is illustrated using an off-the-shelf multiview display, Actuality"s multiplanar Perspecta Spatial 3D System, and an experimental multiview display. The experimental display is a quasi-holographic view-sequential system that generates aerial imagery measuring 30 mm x 25 mm x 25 mm, providing 198 horizontal views.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Davis, E.L.
A novel method for performing real-time acquisition and processing Landsat/EROS data covers all aspects including radiometric and geometric corrections of multispectral scanner or return-beam vidicon inputs, image enhancement, statistical analysis, feature extraction, and classification. Radiometric transformations include bias/gain adjustment, noise suppression, calibration, scan angle compensation, and illumination compensation, including topography and atmospheric effects. Correction or compensation for geometric distortion includes sensor-related distortions, such as centering, skew, size, scan nonlinearity, radial symmetry, and tangential symmetry. Also included are object image-related distortions such as aspect angle (altitude), scale distortion (altitude), terrain relief, and earth curvature. Ephemeral corrections are also applied to compensatemore » for satellite forward movement, earth rotation, altitude variations, satellite vibration, and mirror scan velocity. Image enhancement includes high-pass, low-pass, and Laplacian mask filtering and data restoration for intermittent losses. Resource classification is provided by statistical analysis including histograms, correlational analysis, matrix manipulations, and determination of spectral responses. Feature extraction includes spatial frequency analysis, which is used in parallel discriminant functions in each array processor for rapid determination. The technique uses integrated parallel array processors that decimate the tasks concurrently under supervision of a control processor. The operator-machine interface is optimized for programming ease and graphics image windowing.« less
Real time processor for array speckle interferometry
NASA Astrophysics Data System (ADS)
Chin, Gordon; Florez, Jose; Borelli, Renan; Fong, Wai; Miko, Joseph; Trujillo, Carlos
1989-02-01
The authors are constructing a real-time processor to acquire image frames, perform array flat-fielding, execute a 64 x 64 element two-dimensional complex FFT (fast Fourier transform) and average the power spectrum, all within the 25 ms coherence time for speckles at near-IR (infrared) wavelength. The processor will be a compact unit controlled by a PC with real-time display and data storage capability. This will provide the ability to optimize observations and obtain results on the telescope rather than waiting several weeks before the data can be analyzed and viewed with offline methods. The image acquisition and processing, design criteria, and processor architecture are described.
Real time processor for array speckle interferometry
NASA Technical Reports Server (NTRS)
Chin, Gordon; Florez, Jose; Borelli, Renan; Fong, Wai; Miko, Joseph; Trujillo, Carlos
1989-01-01
The authors are constructing a real-time processor to acquire image frames, perform array flat-fielding, execute a 64 x 64 element two-dimensional complex FFT (fast Fourier transform) and average the power spectrum, all within the 25 ms coherence time for speckles at near-IR (infrared) wavelength. The processor will be a compact unit controlled by a PC with real-time display and data storage capability. This will provide the ability to optimize observations and obtain results on the telescope rather than waiting several weeks before the data can be analyzed and viewed with offline methods. The image acquisition and processing, design criteria, and processor architecture are described.
NASA Technical Reports Server (NTRS)
Jacklin, S. A.; Leyland, J. A.; Warmbrodt, W.
1985-01-01
Modern control systems must typically perform real-time identification and control, as well as coordinate a host of other activities related to user interaction, online graphics, and file management. This paper discusses five global design considerations which are useful to integrate array processor, multimicroprocessor, and host computer system architectures into versatile, high-speed controllers. Such controllers are capable of very high control throughput, and can maintain constant interaction with the nonreal-time or user environment. As an application example, the architecture of a high-speed, closed-loop controller used to actively control helicopter vibration is briefly discussed. Although this system has been designed for use as the controller for real-time rotorcraft dynamics and control studies in a wind tunnel environment, the controller architecture can generally be applied to a wide range of automatic control applications.
Parallel programming with Easy Java Simulations
NASA Astrophysics Data System (ADS)
Esquembre, F.; Christian, W.; Belloni, M.
2018-01-01
Nearly all of today's processors are multicore, and ideally programming and algorithm development utilizing the entire processor should be introduced early in the computational physics curriculum. Parallel programming is often not introduced because it requires a new programming environment and uses constructs that are unfamiliar to many teachers. We describe how we decrease the barrier to parallel programming by using a java-based programming environment to treat problems in the usual undergraduate curriculum. We use the easy java simulations programming and authoring tool to create the program's graphical user interface together with objects based on those developed by Kaminsky [Building Parallel Programs (Course Technology, Boston, 2010)] to handle common parallel programming tasks. Shared-memory parallel implementations of physics problems, such as time evolution of the Schrödinger equation, are available as source code and as ready-to-run programs from the AAPT-ComPADRE digital library.
Hernández, Moisés; Guerrero, Ginés D.; Cecilia, José M.; García, José M.; Inuggi, Alberto; Jbabdi, Saad; Behrens, Timothy E. J.; Sotiropoulos, Stamatios N.
2013-01-01
With the performance of central processing units (CPUs) having effectively reached a limit, parallel processing offers an alternative for applications with high computational demands. Modern graphics processing units (GPUs) are massively parallel processors that can execute simultaneously thousands of light-weight processes. In this study, we propose and implement a parallel GPU-based design of a popular method that is used for the analysis of brain magnetic resonance imaging (MRI). More specifically, we are concerned with a model-based approach for extracting tissue structural information from diffusion-weighted (DW) MRI data. DW-MRI offers, through tractography approaches, the only way to study brain structural connectivity, non-invasively and in-vivo. We parallelise the Bayesian inference framework for the ball & stick model, as it is implemented in the tractography toolbox of the popular FSL software package (University of Oxford). For our implementation, we utilise the Compute Unified Device Architecture (CUDA) programming model. We show that the parameter estimation, performed through Markov Chain Monte Carlo (MCMC), is accelerated by at least two orders of magnitude, when comparing a single GPU with the respective sequential single-core CPU version. We also illustrate similar speed-up factors (up to 120x) when comparing a multi-GPU with a multi-CPU implementation. PMID:23658616
The research of laser marking control technology
NASA Astrophysics Data System (ADS)
Zhang, Qiue; Zhang, Rong
2009-08-01
In the area of Laser marking, the general control method is insert control card to computer's mother board, it can not support hot swap, it is difficult to assemble or it. Moreover, the one marking system must to equip one computer. In the system marking, the computer can not to do the other things except to transmit marking digital information. Otherwise it can affect marking precision. Based on traditional control methods existed some problems, introduced marking graphic editing and digital processing by the computer finish, high-speed digital signal processor (DSP) control marking the whole process. The laser marking controller is mainly contain DSP2812, digital memorizer, DAC (digital analog converting) transform unit circuit, USB interface control circuit, man-machine interface circuit, and other logic control circuit. Download the marking information which is processed by computer to U disk, DSP read the information by USB interface on time, then processing it, adopt the DSP inter timer control the marking time sequence, output the scanner control signal by D/A parts. Apply the technology can realize marking offline, thereby reduce the product cost, increase the product efficiency. The system have good effect in actual unit markings, the marking speed is more quickly than PCI control card to 20 percent. It has application value in practicality.
20 CFR 61.4 - Definitions and use of terms.
Code of Federal Regulations, 2012 CFR
2012-04-01
... the United States or any of its allies; (3) The discharge or explosion of munitions intended for use... employees of a manufacturer, processor, or transporter of munitions during the manufacture, processing, or transporting of munitions, or while stored on the premises of the manufacturer, processor, or transporter); (4...
20 CFR 61.4 - Definitions and use of terms.
Code of Federal Regulations, 2014 CFR
2014-04-01
... the United States or any of its allies; (3) The discharge or explosion of munitions intended for use... employees of a manufacturer, processor, or transporter of munitions during the manufacture, processing, or transporting of munitions, or while stored on the premises of the manufacturer, processor, or transporter); (4...
20 CFR 61.4 - Definitions and use of terms.
Code of Federal Regulations, 2013 CFR
2013-04-01
... the United States or any of its allies; (3) The discharge or explosion of munitions intended for use... employees of a manufacturer, processor, or transporter of munitions during the manufacture, processing, or transporting of munitions, or while stored on the premises of the manufacturer, processor, or transporter); (4...
20 CFR 61.4 - Definitions and use of terms.
Code of Federal Regulations, 2011 CFR
2011-04-01
... the United States or any of its allies; (3) The discharge or explosion of munitions intended for use... employees of a manufacturer, processor, or transporter of munitions during the manufacture, processing, or transporting of munitions, or while stored on the premises of the manufacturer, processor, or transporter); (4...
Diesel fuel to dc power: Navy & Marine Corps Applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bloomfield, D.P.
1996-12-31
During the past year Analytic Power has tested fuel cell stacks and diesel fuel processors for US Navy and Marine Corps applications. The units are 10 kW demonstration power plants. The USN power plant was built to demonstrate the feasibility of diesel fueled PEM fuel cell power plants for 250 kW and 2.5 MW shipboard power systems. We designed and tested a ten cell, 1 kW USMC substack and fuel processor. The complete 10 kW prototype power plant, which has application to both power and hydrogen generation, is now under construction. The USN and USMC fuel cell stacks have beenmore » tested on both actual and simulated reformate. Analytic Power has accumulated operating experience with autothermal reforming based fuel processors operating on sulfur bearing diesel fuel, jet fuel, propane and natural gas. We have also completed the design and fabrication of an advanced regenerative ATR for the USMC. One of the significant problems with small fuel processors is heat loss which limits its ability to operate with the high steam to carbon ratios required for coke free high efficiency operation. The new USMC unit specifically addresses these heat transfer issues. The advances in the mill programs have been incorporated into Analytic Power`s commercial units which are now under test.« less
NASA Technical Reports Server (NTRS)
Barrett, Eamon B. (Editor); Pearson, James J. (Editor)
1989-01-01
Image understanding concepts and models, image understanding systems and applications, advanced digital processors and software tools, and advanced man-machine interfaces are among the topics discussed. Particular papers are presented on such topics as neural networks for computer vision, object-based segmentation and color recognition in multispectral images, the application of image algebra to image measurement and feature extraction, and the integration of modeling and graphics to create an infrared signal processing test bed.
Investigating the Importance of Stereo Displays for Helicopter Landing Simulation
2016-08-11
visualization. The two instances of X Plane® were implemented using two separate PCs, each incorporating Intel i7 processors and Nvidia Quadro K4200... Nvidia GeForce GTX 680 graphics card was used to administer the stereo acuity and fusion range tests. The tests were displayed on an Asus VG278HE 3D...monitor with 1920x1080 pixels that was compatible with Nvidia 3D Vision2 and that used active shutter glasses. At a 1-m viewing distance, the
Adaptive-optics optical coherence tomography processing using a graphics processing unit.
Shafer, Brandon A; Kriske, Jeffery E; Kocaoglu, Omer P; Turner, Timothy L; Liu, Zhuolin; Lee, John Jaehwan; Miller, Donald T
2014-01-01
Graphics processing units are increasingly being used for scientific computing for their powerful parallel processing abilities, and moderate price compared to super computers and computing grids. In this paper we have used a general purpose graphics processing unit to process adaptive-optics optical coherence tomography (AOOCT) images in real time. Increasing the processing speed of AOOCT is an essential step in moving the super high resolution technology closer to clinical viability.
Compact propane fuel processor for auxiliary power unit application
NASA Astrophysics Data System (ADS)
Dokupil, M.; Spitta, C.; Mathiak, J.; Beckhaus, P.; Heinzel, A.
With focus on mobile applications a fuel cell auxiliary power unit (APU) using liquefied petroleum gas (LPG) is currently being developed at the Centre for Fuel Cell Technology (Zentrum für BrennstoffzellenTechnik, ZBT gGmbH). The system is consisting of an integrated compact and lightweight fuel processor and a low temperature PEM fuel cell for an electric power output of 300 W. This article is presenting the current status of development of the fuel processor which is designed for a nominal hydrogen output of 1 k Wth,H2 within a load range from 50 to 120%. A modular setup was chosen defining a reformer/burner module and a CO-purification module. Based on the performance specifications, thermodynamic simulations, benchmarking and selection of catalysts the modules have been developed and characterised simultaneously and then assembled to the complete fuel processor. Automated operation results in a cold startup time of about 25 min for nominal load and carbon monoxide output concentrations below 50 ppm for steady state and dynamic operation. Also fast transient response of the fuel processor at load changes with low fluctuations of the reformate gas composition have been achieved. Beside the development of the main reactors the transfer of the fuel processor to an autonomous system is of major concern. Hence, concepts for packaging have been developed resulting in a volume of 7 l and a weight of 3 kg. Further a selection of peripheral components has been tested and evaluated regarding to the substitution of the laboratory equipment.
DFT algorithms for bit-serial GaAs array processor architectures
NASA Technical Reports Server (NTRS)
Mcmillan, Gary B.
1988-01-01
Systems and Processes Engineering Corporation (SPEC) has developed an innovative array processor architecture for computing Fourier transforms and other commonly used signal processing algorithms. This architecture is designed to extract the highest possible array performance from state-of-the-art GaAs technology. SPEC's architectural design includes a high performance RISC processor implemented in GaAs, along with a Floating Point Coprocessor and a unique Array Communications Coprocessor, also implemented in GaAs technology. Together, these data processors represent the latest in technology, both from an architectural and implementation viewpoint. SPEC has examined numerous algorithms and parallel processing architectures to determine the optimum array processor architecture. SPEC has developed an array processor architecture with integral communications ability to provide maximum node connectivity. The Array Communications Coprocessor embeds communications operations directly in the core of the processor architecture. A Floating Point Coprocessor architecture has been defined that utilizes Bit-Serial arithmetic units, operating at very high frequency, to perform floating point operations. These Bit-Serial devices reduce the device integration level and complexity to a level compatible with state-of-the-art GaAs device technology.
Support for Diagnosis of Custom Computer Hardware
NASA Technical Reports Server (NTRS)
Molock, Dwaine S.
2008-01-01
The Coldfire SDN Diagnostics software is a flexible means of exercising, testing, and debugging custom computer hardware. The software is a set of routines that, collectively, serve as a common software interface through which one can gain access to various parts of the hardware under test and/or cause the hardware to perform various functions. The routines can be used to construct tests to exercise, and verify the operation of, various processors and hardware interfaces. More specifically, the software can be used to gain access to memory, to execute timer delays, to configure interrupts, and configure processor cache, floating-point, and direct-memory-access units. The software is designed to be used on diverse NASA projects, and can be customized for use with different processors and interfaces. The routines are supported, regardless of the architecture of a processor that one seeks to diagnose. The present version of the software is configured for Coldfire processors on the Subsystem Data Node processor boards of the Solar Dynamics Observatory. There is also support for the software with respect to Mongoose V, RAD750, and PPC405 processors or their equivalents.
Multitask neurovision processor with extensive feedback and feedforward connections
NASA Astrophysics Data System (ADS)
Gupta, Madan M.; Knopf, George K.
1991-11-01
A multi-task neuro-vision parameter which performs a variety of information processing operations associated with the early stages of biological vision is presented. The network architecture of this neuro-vision processor, called the positive-negative (PN) neural processor, is loosely based on the neural activity fields exhibited by thalamic and cortical nervous tissue layers. The computational operation performed by the processor arises from the strength of the recurrent feedback among the numerous positive and negative neural computing units. By adjusting the feedback connections it is possible to generate diverse dynamic behavior that may be used for short-term visual memory (STVM), spatio-temporal filtering (STF), and pulse frequency modulation (PFM). The information attributes that are to be processes may be regulated by modifying the feedforward connections from the signal space to the neural processor.
A 500 megabyte/second disk array
NASA Technical Reports Server (NTRS)
Ruwart, Thomas M.; Okeefe, Matthew T.
1994-01-01
Applications at the Army High Performance Computing Research Center's (AHPCRC) Graphic and Visualization Laboratory (GVL) at the University of Minnesota require a tremendous amount of I/O bandwidth and this appetite for data is growing. Silicon Graphics workstations are used to perform the post-processing, visualization, and animation of multi-terabyte size datasets produced by scientific simulations performed of AHPCRC supercomputers. The M.A.X. (Maximum Achievable Xfer) was designed to find the maximum achievable I/O performance of the Silicon Graphics CHALLENGE/Onyx-class machines that run these applications. Running a fully configured Onyx machine with 12-150MHz R4400 processors, 512MB of 8-way interleaved memory, 31 fast/wide SCSI-2 channel each with a Ciprico disk array controller we were able to achieve a maximum sustained transfer rate of 509.8 megabytes per second. However, after analyzing the results it became clear that the true maximum transfer rate is somewhat beyond this figure and we will need to do further testing with more disk array controllers in order to find the true maximum.
Grace: A cross-platform micromagnetic simulator on graphics processing units
NASA Astrophysics Data System (ADS)
Zhu, Ru
2015-12-01
A micromagnetic simulator running on graphics processing units (GPUs) is presented. Different from GPU implementations of other research groups which are predominantly running on NVidia's CUDA platform, this simulator is developed with C++ Accelerated Massive Parallelism (C++ AMP) and is hardware platform independent. It runs on GPUs from venders including NVidia, AMD and Intel, and achieves significant performance boost as compared to previous central processing unit (CPU) simulators, up to two orders of magnitude. The simulator paved the way for running large size micromagnetic simulations on both high-end workstations with dedicated graphics cards and low-end personal computers with integrated graphics cards, and is freely available to download.
Reconfigurable pipelined processor
DOE Office of Scientific and Technical Information (OSTI.GOV)
Saccardi, R.J.
1989-09-19
This patent describes a reconfigurable pipelined processor for processing data. It comprises: a plurality of memory devices for storing bits of data; a plurality of arithmetic units for performing arithmetic functions with the data; cross bar means for connecting the memory devices with the arithmetic units for transferring data therebetween; at least one counter connected with the cross bar means for providing a source of addresses to the memory devices; at least one variable tick delay device connected with each of the memory devices and arithmetic units; and means for providing control bits to the variable tick delay device formore » variably controlling the input and output operations thereof to selectively delay the memory devices and arithmetic units to align the data for processing in a selected sequence.« less
Fast Automatic Segmentation of White Matter Streamlines Based on a Multi-Subject Bundle Atlas.
Labra, Nicole; Guevara, Pamela; Duclap, Delphine; Houenou, Josselin; Poupon, Cyril; Mangin, Jean-François; Figueroa, Miguel
2017-01-01
This paper presents an algorithm for fast segmentation of white matter bundles from massive dMRI tractography datasets using a multisubject atlas. We use a distance metric to compare streamlines in a subject dataset to labeled centroids in the atlas, and label them using a per-bundle configurable threshold. In order to reduce segmentation time, the algorithm first preprocesses the data using a simplified distance metric to rapidly discard candidate streamlines in multiple stages, while guaranteeing that no false negatives are produced. The smaller set of remaining streamlines is then segmented using the original metric, thus eliminating any false positives from the preprocessing stage. As a result, a single-thread implementation of the algorithm can segment a dataset of almost 9 million streamlines in less than 6 minutes. Moreover, parallel versions of our algorithm for multicore processors and graphics processing units further reduce the segmentation time to less than 22 seconds and to 5 seconds, respectively. This performance enables the use of the algorithm in truly interactive applications for visualization, analysis, and segmentation of large white matter tractography datasets.
The order of three lowest-energy states of the six-electron harmonium at small force constant
DOE Office of Scientific and Technical Information (OSTI.GOV)
Strasburger, Krzysztof
2016-06-21
The order of low-energy states of six-electron harmonium is uncertain in the case of strong correlation, which is not a desired situation for the model system being considered for future testing of approximate methods of quantum chemistry. The computational study of these states has been carried out at the frequency parameter ω = 0.01, using the variational method with the basis of symmetry-projected, explicitly correlated Gaussian (ECG) lobe functions. It has revealed that the six-electron harmonium at this confinement strength is an octahedral Wigner molecule, whose order of states is different than in the strong confinement regime and does notmore » agree with the earlier predictions. The results obtained for ω = 0.5 and 10 are consistent with the findings based on the Hund’s rules for the s{sup 2}p{sup 4} electron configuration. Substantial part of the computations has been carried out on the graphical processing units and the efficiency of these devices in calculation of the integrals over ECG functions has been compared with traditional processors.« less
Advanced Software V&V for Civil Aviation and Autonomy
NASA Technical Reports Server (NTRS)
Brat, Guillaume P.
2017-01-01
With the advances in high-computing platform (e.g., advanced graphical processing units or multi-core processors), computationally-intensive software techniques such as the ones used in artificial intelligence or formal methods have provided us with an opportunity to further increase safety in the aviation industry. Some of these techniques have facilitated building safety at design time, like in aircraft engines or software verification and validation, and others can introduce safety benefits during operations as long as we adapt our processes. In this talk, I will present how NASA is taking advantage of these new software techniques to build in safety at design time through advanced software verification and validation, which can be applied earlier and earlier in the design life cycle and thus help also reduce the cost of aviation assurance. I will then show how run-time techniques (such as runtime assurance or data analytics) offer us a chance to catch even more complex problems, even in the face of changing and unpredictable environments. These new techniques will be extremely useful as our aviation systems become more complex and more autonomous.
High-throughput Bayesian Network Learning using Heterogeneous Multicore Computers
Linderman, Michael D.; Athalye, Vivek; Meng, Teresa H.; Asadi, Narges Bani; Bruggner, Robert; Nolan, Garry P.
2017-01-01
Aberrant intracellular signaling plays an important role in many diseases. The causal structure of signal transduction networks can be modeled as Bayesian Networks (BNs), and computationally learned from experimental data. However, learning the structure of Bayesian Networks (BNs) is an NP-hard problem that, even with fast heuristics, is too time consuming for large, clinically important networks (20–50 nodes). In this paper, we present a novel graphics processing unit (GPU)-accelerated implementation of a Monte Carlo Markov Chain-based algorithm for learning BNs that is up to 7.5-fold faster than current general-purpose processor (GPP)-based implementations. The GPU-based implementation is just one of several implementations within the larger application, each optimized for a different input or machine configuration. We describe the methodology we use to build an extensible application, assembled from these variants, that can target a broad range of heterogeneous systems, e.g., GPUs, multicore GPPs. Specifically we show how we use the Merge programming model to efficiently integrate, test and intelligently select among the different potential implementations. PMID:28819655
Design and Evaluation of a Scalable and Reconfigurable Multi-Platform System for Acoustic Imaging
Izquierdo, Alberto; Villacorta, Juan José; del Val Puente, Lara; Suárez, Luis
2016-01-01
This paper proposes a scalable and multi-platform framework for signal acquisition and processing, which allows for the generation of acoustic images using planar arrays of MEMS (Micro-Electro-Mechanical Systems) microphones with low development and deployment costs. Acoustic characterization of MEMS sensors was performed, and the beam pattern of a module, based on an 8 × 8 planar array and of several clusters of modules, was obtained. A flexible framework, formed by an FPGA, an embedded processor, a computer desktop, and a graphic processing unit, was defined. The processing times of the algorithms used to obtain the acoustic images, including signal processing and wideband beamforming via FFT, were evaluated in each subsystem of the framework. Based on this analysis, three frameworks are proposed, defined by the specific subsystems used and the algorithms shared. Finally, a set of acoustic images obtained from sound reflected from a person are presented as a case study in the field of biometric identification. These results reveal the feasibility of the proposed system. PMID:27727174
BlochSolver: A GPU-optimized fast 3D MRI simulator for experimentally compatible pulse sequences
NASA Astrophysics Data System (ADS)
Kose, Ryoichi; Kose, Katsumi
2017-08-01
A magnetic resonance imaging (MRI) simulator, which reproduces MRI experiments using computers, has been developed using two graphic-processor-unit (GPU) boards (GTX 1080). The MRI simulator was developed to run according to pulse sequences used in experiments. Experiments and simulations were performed to demonstrate the usefulness of the MRI simulator for three types of pulse sequences, namely, three-dimensional (3D) gradient-echo, 3D radio-frequency spoiled gradient-echo, and gradient-echo multislice with practical matrix sizes. The results demonstrated that the calculation speed using two GPU boards was typically about 7 TFLOPS and about 14 times faster than the calculation speed using CPUs (two 18-core Xeons). We also found that MR images acquired by experiment could be reproduced using an appropriate number of subvoxels, and that 3D isotropic and two-dimensional multislice imaging experiments for practical matrix sizes could be simulated using the MRI simulator. Therefore, we concluded that such powerful MRI simulators are expected to become an indispensable tool for MRI research and development.
Sharma, Diksha; Badano, Aldo
2013-03-01
hybridMANTIS is a Monte Carlo package for modeling indirect x-ray imagers using columnar geometry based on a hybrid concept that maximizes the utilization of available CPU and graphics processing unit processors in a workstation. The authors compare hybridMANTIS x-ray response simulations to previously published MANTIS and experimental data for four cesium iodide scintillator screens. These screens have a variety of reflective and absorptive surfaces with different thicknesses. The authors analyze hybridMANTIS results in terms of modulation transfer function and calculate the root mean square difference and Swank factors from simulated and experimental results. The comparison suggests that hybridMANTIS better matches the experimental data as compared to MANTIS, especially at high spatial frequencies and for the thicker screens. hybridMANTIS simulations are much faster than MANTIS with speed-ups up to 5260. hybridMANTIS is a useful tool for improved description and optimization of image acquisition stages in medical imaging systems and for modeling the forward problem in iterative reconstruction algorithms.
Large calculation of the flow over a hypersonic vehicle using a GPU
NASA Astrophysics Data System (ADS)
Elsen, Erich; LeGresley, Patrick; Darve, Eric
2008-12-01
Graphics processing units are capable of impressive computing performance up to 518 Gflops peak performance. Various groups have been using these processors for general purpose computing; most efforts have focussed on demonstrating relatively basic calculations, e.g. numerical linear algebra, or physical simulations for visualization purposes with limited accuracy. This paper describes the simulation of a hypersonic vehicle configuration with detailed geometry and accurate boundary conditions using the compressible Euler equations. To the authors' knowledge, this is the most sophisticated calculation of this kind in terms of complexity of the geometry, the physical model, the numerical methods employed, and the accuracy of the solution. The Navier-Stokes Stanford University Solver (NSSUS) was used for this purpose. NSSUS is a multi-block structured code with a provably stable and accurate numerical discretization which uses a vertex-based finite-difference method. A multi-grid scheme is used to accelerate the solution of the system. Based on a comparison of the Intel Core 2 Duo and NVIDIA 8800GTX, speed-ups of over 40× were demonstrated for simple test geometries and 20× for complex geometries.
Space Station Water Processor Mostly Liquid Separator (MLS)
NASA Technical Reports Server (NTRS)
Lanzarone, Anthony
1995-01-01
This report presents the results of the development testing conducted under this contract to the Space Station Water Processor (WP) Mostly Liquid Separator (MLS). The MLS units built and modified during this testing demonstrated acceptable air/water separation results in a variety of water conditions with inlet flow rates ranging from 60 - 960 LB/hr.
Introduction to Parallel Computing
1992-05-01
Instruction Stream, Multiple Data Stream Machines .................... 19 2.4 Networks of M achines...independent memory units and connecting them to the processors by an interconnection network . Many different interconnection schemes have been considered, and...connected to the same processor at the same time. Crossbar switching networks are still too expensive to be practical for connecting large numbers of
DOE Office of Scientific and Technical Information (OSTI.GOV)
O'Donnell, T.J.; Olson, A.J.
1981-08-01
GRAMPS, a graphics language interpreter has been developed in FORTRAN 77 to be used in conjunction with an interactive vector display list processor (Evans and Sutherland Multi-Picture-System). Several of the features of the language make it very useful and convenient for real-time scene construction, manipulation and animation. The GRAMPS language syntax allows natural interaction with scene elements as well as easy, interactive assignment of graphics input devices. GRAMPS facilitates the creation, manipulation and copying of complex nested picture structures. The language has a powerful macro feature that enables new graphics commands to be developed and incorporated interactively. Animation may bemore » achieved in GRAMPS by two different, yet mutually compatible means. Picture structures may contain framed data, which consist of a sequence of fixed objects. These structures may be displayed sequentially to give a traditional frame animation effect. In addition, transformation information on picture structures may be saved at any time in the form of new macro commands that will transform these structures from one saved state to another in a specified number of steps, yielding an interpolated transformation animation effect. An overview of the GRAMPS command structure is given and several examples of application of the language to molecular modeling and animation are presented.« less
Design of RISC Processor Using VHDL and Cadence
NASA Astrophysics Data System (ADS)
Moslehpour, Saeid; Puliroju, Chandrasekhar; Abu-Aisheh, Akram
The project deals about development of a basic RISC processor. The processor is designed with basic architecture consisting of internal modules like clock generator, memory, program counter, instruction register, accumulator, arithmetic and logic unit and decoder. This processor is mainly used for simple general purpose like arithmetic operations and which can be further developed for general purpose processor by increasing the size of the instruction register. The processor is designed in VHDL by using Xilinx 8.1i version. The present project also serves as an application of the knowledge gained from past studies of the PSPICE program. The study will show how PSPICE can be used to simplify massive complex circuits designed in VHDL Synthesis. The purpose of the project is to explore the designed RISC model piece by piece, examine and understand the Input/ Output pins, and to show how the VHDL synthesis code can be converted to a simplified PSPICE model. The project will also serve as a collection of various research materials about the pieces of the circuit.
Architecture of security management unit for safe hosting of multiple agents
NASA Astrophysics Data System (ADS)
Gilmont, Tanguy; Legat, Jean-Didier; Quisquater, Jean-Jacques
1999-04-01
In such growing areas as remote applications in large public networks, electronic commerce, digital signature, intellectual property and copyright protection, and even operating system extensibility, the hardware security level offered by existing processors is insufficient. They lack protection mechanisms that prevent the user from tampering critical data owned by those applications. Some devices make exception, but have not enough processing power nor enough memory to stand up to such applications (e.g. smart cards). This paper proposes an architecture of secure processor, in which the classical memory management unit is extended into a new security management unit. It allows ciphered code execution and ciphered data processing. An internal permanent memory can store cipher keys and critical data for several client agents simultaneously. The ordinary supervisor privilege scheme is replaced by a privilege inheritance mechanism that is more suited to operating system extensibility. The result is a secure processor that has hardware support for extensible multitask operating systems, and can be used for both general applications and critical applications needing strong protection. The security management unit and the internal permanent memory can be added to an existing CPU core without loss of performance, and do not require it to be modified.
Quantum Monte Carlo: Faster, More Reliable, And More Accurate
NASA Astrophysics Data System (ADS)
Anderson, Amos Gerald
2010-06-01
The Schrodinger Equation has been available for about 83 years, but today, we still strain to apply it accurately to molecules of interest. The difficulty is not theoretical in nature, but practical, since we're held back by lack of sufficient computing power. Consequently, effort is applied to find acceptable approximations to facilitate real time solutions. In the meantime, computer technology has begun rapidly advancing and changing the way we think about efficient algorithms. For those who can reorganize their formulas to take advantage of these changes and thereby lift some approximations, incredible new opportunities await. Over the last decade, we've seen the emergence of a new kind of computer processor, the graphics card. Designed to accelerate computer games by optimizing quantity instead of quality in processor, they have become of sufficient quality to be useful to some scientists. In this thesis, we explore the first known use of a graphics card to computational chemistry by rewriting our Quantum Monte Carlo software into the requisite "data parallel" formalism. We find that notwithstanding precision considerations, we are able to speed up our software by about a factor of 6. The success of a Quantum Monte Carlo calculation depends on more than just processing power. It also requires the scientist to carefully design the trial wavefunction used to guide simulated electrons. We have studied the use of Generalized Valence Bond wavefunctions to simply, and yet effectively, captured the essential static correlation in atoms and molecules. Furthermore, we have developed significantly improved two particle correlation functions, designed with both flexibility and simplicity considerations, representing an effective and reliable way to add the necessary dynamic correlation. Lastly, we present our method for stabilizing the statistical nature of the calculation, by manipulating configuration weights, thus facilitating efficient and robust calculations. Our combination of Generalized Valence Bond wavefunctions, improved correlation functions, and stabilized weighting techniques for calculations run on graphics cards, represents a new way for using Quantum Monte Carlo to study arbitrarily sized molecules.
More than Mere Motivation: Learning Specific Content through Multimodal Narratives
ERIC Educational Resources Information Center
Brugar, Kristy A.; Roberts, Kathryn L.; Jiménez, Laura M.; Meyer, Carla K.
2018-01-01
This study explores the possibilities for learning content that might accompany the use of an historically accurate graphic novel as part of a language arts instructional unit. During a 6-day unit, 16 sixth grade students engaged in graphic novels in ways that support comprehension, both in the context of a graphic novel text set and a specific…
Power processor for a 30cm ion thruster
NASA Technical Reports Server (NTRS)
Biess, J. J.; Inouye, L. Y.
1974-01-01
A thermal vacuum power processor for the NASA Lewis 30cm Mercury Ion Engine was designed, fabricated and tested to determine compliance with electrical specifications. The power processor breadboard used the silicon controlled rectifier (SCR) series resonant inverter as the basic power stage to process all the power to an ion engine. The power processor includes a digital interface unit to process all input commands and internal telemetry signals so that operation is compatible with a central computer system. The breadboard was tested in a thermal vacuum environment. Integration tests were performed with the ion engine and demonstrate operational compatibility and reliable operation without any component failures. Electromagnetic interference data were also recorded on the design to provide information on the interaction with total spacecraft.
Programmable DNA-Mediated Multitasking Processor.
Shu, Jian-Jun; Wang, Qi-Wen; Yong, Kian-Yan; Shao, Fangwei; Lee, Kee Jin
2015-04-30
Because of DNA appealing features as perfect material, including minuscule size, defined structural repeat and rigidity, programmable DNA-mediated processing is a promising computing paradigm, which employs DNAs as information storing and processing substrates to tackle the computational problems. The massive parallelism of DNA hybridization exhibits transcendent potential to improve multitasking capabilities and yield a tremendous speed-up over the conventional electronic processors with stepwise signal cascade. As an example of multitasking capability, we present an in vitro programmable DNA-mediated optimal route planning processor as a functional unit embedded in contemporary navigation systems. The novel programmable DNA-mediated processor has several advantages over the existing silicon-mediated methods, such as conducting massive data storage and simultaneous processing via much fewer materials than conventional silicon devices.
Manyscale Computing for Sensor Processing in Support of Space Situational Awareness
NASA Astrophysics Data System (ADS)
Schmalz, M.; Chapman, W.; Hayden, E.; Sahni, S.; Ranka, S.
2014-09-01
Increasing image and signal data burden associated with sensor data processing in support of space situational awareness implies continuing computational throughput growth beyond the petascale regime. In addition to growing applications data burden and diversity, the breadth, diversity and scalability of high performance computing architectures and their various organizations challenge the development of a single, unifying, practicable model of parallel computation. Therefore, models for scalable parallel processing have exploited architectural and structural idiosyncrasies, yielding potential misapplications when legacy programs are ported among such architectures. In response to this challenge, we have developed a concise, efficient computational paradigm and software called Manyscale Computing to facilitate efficient mapping of annotated application codes to heterogeneous parallel architectures. Our theory, algorithms, software, and experimental results support partitioning and scheduling of application codes for envisioned parallel architectures, in terms of work atoms that are mapped (for example) to threads or thread blocks on computational hardware. Because of the rigor, completeness, conciseness, and layered design of our manyscale approach, application-to-architecture mapping is feasible and scalable for architectures at petascales, exascales, and above. Further, our methodology is simple, relying primarily on a small set of primitive mapping operations and support routines that are readily implemented on modern parallel processors such as graphics processing units (GPUs) and hybrid multi-processors (HMPs). In this paper, we overview the opportunities and challenges of manyscale computing for image and signal processing in support of space situational awareness applications. We discuss applications in terms of a layered hardware architecture (laboratory > supercomputer > rack > processor > component hierarchy). Demonstration applications include performance analysis and results in terms of execution time as well as storage, power, and energy consumption for bus-connected and/or networked architectures. The feasibility of the manyscale paradigm is demonstrated by addressing four principal challenges: (1) architectural/structural diversity, parallelism, and locality, (2) masking of I/O and memory latencies, (3) scalability of design as well as implementation, and (4) efficient representation/expression of parallel applications. Examples will demonstrate how manyscale computing helps solve these challenges efficiently on real-world computing systems.
Yi, Faliu; Lee, Jieun; Moon, Inkyu
2014-05-01
The reconstruction of multiple depth images with a ray back-propagation algorithm in three-dimensional (3D) computational integral imaging is computationally burdensome. Further, a reconstructed depth image consists of a focus and an off-focus area. Focus areas are 3D points on the surface of an object that are located at the reconstructed depth, while off-focus areas include 3D points in free-space that do not belong to any object surface in 3D space. Generally, without being removed, the presence of an off-focus area would adversely affect the high-level analysis of a 3D object, including its classification, recognition, and tracking. Here, we use a graphics processing unit (GPU) that supports parallel processing with multiple processors to simultaneously reconstruct multiple depth images using a lookup table containing the shifted values along the x and y directions for each elemental image in a given depth range. Moreover, each 3D point on a depth image can be measured by analyzing its statistical variance with its corresponding samples, which are captured by the two-dimensional (2D) elemental images. These statistical variances can be used to classify depth image pixels as either focus or off-focus points. At this stage, the measurement of focus and off-focus points in multiple depth images is also implemented in parallel on a GPU. Our proposed method is conducted based on the assumption that there is no occlusion of the 3D object during the capture stage of the integral imaging process. Experimental results have demonstrated that this method is capable of removing off-focus points in the reconstructed depth image. The results also showed that using a GPU to remove the off-focus points could greatly improve the overall computational speed compared with using a CPU.
Development of small scale cluster computer for numerical analysis
NASA Astrophysics Data System (ADS)
Zulkifli, N. H. N.; Sapit, A.; Mohammed, A. N.
2017-09-01
In this study, two units of personal computer were successfully networked together to form a small scale cluster. Each of the processor involved are multicore processor which has four cores in it, thus made this cluster to have eight processors. Here, the cluster incorporate Ubuntu 14.04 LINUX environment with MPI implementation (MPICH2). Two main tests were conducted in order to test the cluster, which is communication test and performance test. The communication test was done to make sure that the computers are able to pass the required information without any problem and were done by using simple MPI Hello Program where the program written in C language. Additional, performance test was also done to prove that this cluster calculation performance is much better than single CPU computer. In this performance test, four tests were done by running the same code by using single node, 2 processors, 4 processors, and 8 processors. The result shows that with additional processors, the time required to solve the problem decrease. Time required for the calculation shorten to half when we double the processors. To conclude, we successfully develop a small scale cluster computer using common hardware which capable of higher computing power when compare to single CPU processor, and this can be beneficial for research that require high computing power especially numerical analysis such as finite element analysis, computational fluid dynamics, and computational physics analysis.
Store-operate-coherence-on-value
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Dong; Heidelberger, Philip; Kumar, Sameer
A system, method and computer program product for performing various store-operate instructions in a parallel computing environment that includes a plurality of processors and at least one cache memory device. A queue in the system receives, from a processor, a store-operate instruction that specifies under which condition a cache coherence operation is to be invoked. A hardware unit in the system runs the received store-operate instruction. The hardware unit evaluates whether a result of the running the received store-operate instruction satisfies the condition. The hardware unit invokes a cache coherence operation on a cache memory address associated with the receivedmore » store-operate instruction if the result satisfies the condition. Otherwise, the hardware unit does not invoke the cache coherence operation on the cache memory device.« less
Miniature Intelligent Sensor Module
NASA Technical Reports Server (NTRS)
Beech, Russell S.
2007-01-01
An electronic unit denoted the Miniature Intelligent Sensor Module performs sensor-signal-conditioning functions and local processing of sensor data. The unit includes four channels of analog input/output circuitry, a processor, volatile and nonvolatile memory, and two Ethernet communication ports, all housed in a weathertight enclosure. The unit accepts AC or DC power. The analog inputs provide programmable gain, offset, and filtering as well as shunt calibration and auto-zeroing. Analog outputs include sine, square, and triangular waves having programmable frequencies and amplitudes, as well as programmable amplitude DC. One innovative aspect of the design of this unit is the integration of a relatively powerful processor and large amount of memory along with the sensor-signalconditioning circuitry so that sophisticated computer programs can be used to acquire and analyze sensor data and estimate and track the health of the overall sensor-data-acquisition system of which the unit is a part. The unit includes calibration, zeroing, and signalfeedback circuitry to facilitate health monitoring. The processor is also integrated with programmable logic circuitry in such a manner as to simplify and enhance acquisition of data and generation of analog outputs. A notable unique feature of the unit is a cold-junction compensation circuit in the back shell of a sensor connector. This circuit makes it possible to use Ktype thermocouples without compromising a housing seal. Replicas of this unit may prove useful in industrial and manufacturing settings - especially in such large outdoor facilities as refineries. Two features can be expected to simplify installation: the weathertight housings should make it possible to mount the units near sensors, and the Ethernet communication capability of the units should facilitate establishment of communication connections for the units.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Johnstad, H.
The purpose of this meeting is to discuss the current and future HEP computing support and environments from the perspective of new horizons in accelerator, physics, and computing technologies. Topics of interest to the Meeting include (but are limited to): the forming of the HEPLIB world user group for High Energy Physic computing; mandate, desirables, coordination, organization, funding; user experience, international collaboration; the roles of national labs, universities, and industry; range of software, Monte Carlo, mathematics, physics, interactive analysis, text processors, editors, graphics, data base systems, code management tools; program libraries, frequency of updates, distribution; distributed and interactive computing, datamore » base systems, user interface, UNIX operating systems, networking, compilers, Xlib, X-Graphics; documentation, updates, availability, distribution; code management in large collaborations, keeping track of program versions; and quality assurance, testing, conventions, standards.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Johnstad, H.
The purpose of this meeting is to discuss the current and future HEP computing support and environments from the perspective of new horizons in accelerator, physics, and computing technologies. Topics of interest to the Meeting include (but are limited to): the forming of the HEPLIB world user group for High Energy Physic computing; mandate, desirables, coordination, organization, funding; user experience, international collaboration; the roles of national labs, universities, and industry; range of software, Monte Carlo, mathematics, physics, interactive analysis, text processors, editors, graphics, data base systems, code management tools; program libraries, frequency of updates, distribution; distributed and interactive computing, datamore » base systems, user interface, UNIX operating systems, networking, compilers, Xlib, X-Graphics; documentation, updates, availability, distribution; code management in large collaborations, keeping track of program versions; and quality assurance, testing, conventions, standards.« less
Preprocessor and postprocessor computer programs for a radial-flow finite-element model
Pucci, A.A.; Pope, D.A.
1987-01-01
Preprocessing and postprocessing computer programs that enhance the utility of the U.S. Geological Survey radial-flow model have been developed. The preprocessor program: (1) generates a triangular finite element mesh from minimal data input, (2) produces graphical displays and tabulations of data for the mesh , and (3) prepares an input data file to use with the radial-flow model. The postprocessor program is a version of the radial-flow model, which was modified to (1) produce graphical output for simulation and field results, (2) generate a statistic for comparing the simulation results with observed data, and (3) allow hydrologic properties to vary in the simulated region. Examples of the use of the processor programs for a hypothetical aquifer test are presented. Instructions for the data files, format instructions, and a listing of the preprocessor and postprocessor source codes are given in the appendixes. (Author 's abstract)
Real-time generation of infrared ocean scene based on GPU
NASA Astrophysics Data System (ADS)
Jiang, Zhaoyi; Wang, Xun; Lin, Yun; Jin, Jianqiu
2007-12-01
Infrared (IR) image synthesis for ocean scene has become more and more important nowadays, especially for remote sensing and military application. Although a number of works present ready-to-use simulations, those techniques cover only a few possible ways of water interacting with the environment. And the detail calculation of ocean temperature is rarely considered by previous investigators. With the advance of programmable features of graphic card, many algorithms previously limited to offline processing have become feasible for real-time usage. In this paper, we propose an efficient algorithm for real-time rendering of infrared ocean scene using the newest features of programmable graphics processors (GPU). It differs from previous works in three aspects: adaptive GPU-based ocean surface tessellation, sophisticated balance equation of thermal balance for ocean surface, and GPU-based rendering for infrared ocean scene. Finally some results of infrared image are shown, which are in good accordance with real images.
TOOKUIL: A case study in user interface development for safety code application
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gray, D.L.; Harkins, C.K.; Hoole, J.G.
1997-07-01
Traditionally, there has been a very high learning curve associated with using nuclear power plant (NPP) analysis codes. Even for seasoned plant analysts and engineers, the process of building or modifying an input model for present day NPP analysis codes is tedious, error prone, and time consuming. Current cost constraints and performance demands place an additional burden on today`s safety analysis community. Advances in graphical user interface (GUI) technology have been applied to obtain significant productivity and quality assurance improvements for the Transient Reactor Analysis Code (TRAC) input model development. KAPL Inc. has developed an X Windows-based graphical user interfacemore » named TOOKUIL which supports the design and analysis process, acting as a preprocessor, runtime editor, help system, and post processor for TRAC. This paper summarizes the objectives of the project, the GUI development process and experiences, and the resulting end product, TOOKUIL.« less
A multi-satellite orbit determination problem in a parallel processing environment
NASA Technical Reports Server (NTRS)
Deakyne, M. S.; Anderle, R. J.
1988-01-01
The Engineering Orbit Analysis Unit at GE Valley Forge used an Intel Hypercube Parallel Processor to investigate the performance and gain experience of parallel processors with a multi-satellite orbit determination problem. A general study was selected in which major blocks of computation for the multi-satellite orbit computations were used as units to be assigned to the various processors on the Hypercube. Problems encountered or successes achieved in addressing the orbit determination problem would be more likely to be transferable to other parallel processors. The prime objective was to study the algorithm to allow processing of observations later in time than those employed in the state update. Expertise in ephemeris determination was exploited in addressing these problems and the facility used to bring a realism to the study which would highlight the problems which may not otherwise be anticipated. Secondary objectives were to gain experience of a non-trivial problem in a parallel processor environment, to explore the necessary interplay of serial and parallel sections of the algorithm in terms of timing studies, to explore the granularity (coarse vs. fine grain) to discover the granularity limit above which there would be a risk of starvation where the majority of nodes would be idle or under the limit where the overhead associated with splitting the problem may require more work and communication time than is useful.
Multinode reconfigurable pipeline computer
NASA Technical Reports Server (NTRS)
Nosenchuck, Daniel M. (Inventor); Littman, Michael G. (Inventor)
1989-01-01
A multinode parallel-processing computer is made up of a plurality of innerconnected, large capacity nodes each including a reconfigurable pipeline of functional units such as Integer Arithmetic Logic Processors, Floating Point Arithmetic Processors, Special Purpose Processors, etc. The reconfigurable pipeline of each node is connected to a multiplane memory by a Memory-ALU switch NETwork (MASNET). The reconfigurable pipeline includes three (3) basic substructures formed from functional units which have been found to be sufficient to perform the bulk of all calculations. The MASNET controls the flow of signals from the memory planes to the reconfigurable pipeline and vice versa. the nodes are connectable together by an internode data router (hyperspace router) so as to form a hypercube configuration. The capability of the nodes to conditionally configure the pipeline at each tick of the clock, without requiring a pipeline flush, permits many powerful algorithms to be implemented directly.
NASA Astrophysics Data System (ADS)
Fellman, Ronald D.; Kaneshiro, Ronald T.; Konstantinides, Konstantinos
1990-03-01
The authors present the design and evaluation of an architecture for a monolithic, programmable, floating-point digital signal processor (DSP) for instrumentation applications. An investigation of the most commonly used algorithms in instrumentation led to a design that satisfies the requirements for high computational and I/O (input/output) throughput. In the arithmetic unit, a 16- x 16-bit multiplier and a 32-bit accumulator provide the capability for single-cycle multiply/accumulate operations, and three format adjusters automatically adjust the data format for increased accuracy and dynamic range. An on-chip I/O unit is capable of handling data block transfers through a direct memory access port and real-time data streams through a pair of parallel I/O ports. I/O operations and program execution are performed in parallel. In addition, the processor includes two data memories with independent addressing units, a microsequencer with instruction RAM, and multiplexers for internal data redirection. The authors also present the structure and implementation of a design environment suitable for the algorithmic, behavioral, and timing simulation of a complete DSP system. Various benchmarking results are reported.
GPU-Q-J, a fast method for calculating root mean square deviation (RMSD) after optimal superposition
2011-01-01
Background Calculation of the root mean square deviation (RMSD) between the atomic coordinates of two optimally superposed structures is a basic component of structural comparison techniques. We describe a quaternion based method, GPU-Q-J, that is stable with single precision calculations and suitable for graphics processor units (GPUs). The application was implemented on an ATI 4770 graphics card in C/C++ and Brook+ in Linux where it was 260 to 760 times faster than existing unoptimized CPU methods. Source code is available from the Compbio website http://software.compbio.washington.edu/misc/downloads/st_gpu_fit/ or from the author LHH. Findings The Nutritious Rice for the World Project (NRW) on World Community Grid predicted de novo, the structures of over 62,000 small proteins and protein domains returning a total of 10 billion candidate structures. Clustering ensembles of structures on this scale requires calculation of large similarity matrices consisting of RMSDs between each pair of structures in the set. As a real-world test, we calculated the matrices for 6 different ensembles from NRW. The GPU method was 260 times faster that the fastest existing CPU based method and over 500 times faster than the method that had been previously used. Conclusions GPU-Q-J is a significant advance over previous CPU methods. It relieves a major bottleneck in the clustering of large numbers of structures for NRW. It also has applications in structure comparison methods that involve multiple superposition and RMSD determination steps, particularly when such methods are applied on a proteome and genome wide scale. PMID:21453553
Comparison of Acceleration Techniques for Selected Low-Level Bioinformatics Operations
Langenkämper, Daniel; Jakobi, Tobias; Feld, Dustin; Jelonek, Lukas; Goesmann, Alexander; Nattkemper, Tim W.
2016-01-01
Within the recent years clock rates of modern processors stagnated while the demand for computing power continued to grow. This applied particularly for the fields of life sciences and bioinformatics, where new technologies keep on creating rapidly growing piles of raw data with increasing speed. The number of cores per processor increased in an attempt to compensate for slight increments of clock rates. This technological shift demands changes in software development, especially in the field of high performance computing where parallelization techniques are gaining in importance due to the pressing issue of large sized datasets generated by e.g., modern genomics. This paper presents an overview of state-of-the-art manual and automatic acceleration techniques and lists some applications employing these in different areas of sequence informatics. Furthermore, we provide examples for automatic acceleration of two use cases to show typical problems and gains of transforming a serial application to a parallel one. The paper should aid the reader in deciding for a certain techniques for the problem at hand. We compare four different state-of-the-art automatic acceleration approaches (OpenMP, PluTo-SICA, PPCG, and OpenACC). Their performance as well as their applicability for selected use cases is discussed. While optimizations targeting the CPU worked better in the complex k-mer use case, optimizers for Graphics Processing Units (GPUs) performed better in the matrix multiplication example. But performance is only superior at a certain problem size due to data migration overhead. We show that automatic code parallelization is feasible with current compiler software and yields significant increases in execution speed. Automatic optimizers for CPU are mature and usually no additional manual adjustment is required. In contrast, some automatic parallelizers targeting GPUs still lack maturity and are limited to simple statements and structures. PMID:26904094
Comparison of Acceleration Techniques for Selected Low-Level Bioinformatics Operations.
Langenkämper, Daniel; Jakobi, Tobias; Feld, Dustin; Jelonek, Lukas; Goesmann, Alexander; Nattkemper, Tim W
2016-01-01
Within the recent years clock rates of modern processors stagnated while the demand for computing power continued to grow. This applied particularly for the fields of life sciences and bioinformatics, where new technologies keep on creating rapidly growing piles of raw data with increasing speed. The number of cores per processor increased in an attempt to compensate for slight increments of clock rates. This technological shift demands changes in software development, especially in the field of high performance computing where parallelization techniques are gaining in importance due to the pressing issue of large sized datasets generated by e.g., modern genomics. This paper presents an overview of state-of-the-art manual and automatic acceleration techniques and lists some applications employing these in different areas of sequence informatics. Furthermore, we provide examples for automatic acceleration of two use cases to show typical problems and gains of transforming a serial application to a parallel one. The paper should aid the reader in deciding for a certain techniques for the problem at hand. We compare four different state-of-the-art automatic acceleration approaches (OpenMP, PluTo-SICA, PPCG, and OpenACC). Their performance as well as their applicability for selected use cases is discussed. While optimizations targeting the CPU worked better in the complex k-mer use case, optimizers for Graphics Processing Units (GPUs) performed better in the matrix multiplication example. But performance is only superior at a certain problem size due to data migration overhead. We show that automatic code parallelization is feasible with current compiler software and yields significant increases in execution speed. Automatic optimizers for CPU are mature and usually no additional manual adjustment is required. In contrast, some automatic parallelizers targeting GPUs still lack maturity and are limited to simple statements and structures.
Jeong, Kyeong-Min; Kim, Hee-Seung; Hong, Sung-In; Lee, Sung-Keun; Jo, Na-Young; Kim, Yong-Soo; Lim, Hong-Gi; Park, Jae-Hyeung
2012-10-08
Speed enhancement of integral imaging based incoherent Fourier hologram capture using a graphic processing unit is reported. Integral imaging based method enables exact hologram capture of real-existing three-dimensional objects under regular incoherent illumination. In our implementation, we apply parallel computation scheme using the graphic processing unit, accelerating the processing speed. Using enhanced speed of hologram capture, we also implement a pseudo real-time hologram capture and optical reconstruction system. The overall operation speed is measured to be 1 frame per second.
Vatsavai, Ranga Raju; Graesser, Jordan B.; Bhaduri, Budhendra L.
2016-07-05
A programmable media includes a graphical processing unit in communication with a memory element. The graphical processing unit is configured to detect one or more settlement regions from a high resolution remote sensed image based on the execution of programming code. The graphical processing unit identifies one or more settlements through the execution of the programming code that executes a multi-instance learning algorithm that models portions of the high resolution remote sensed image. The identification is based on spectral bands transmitted by a satellite and on selected designations of the image patches.
Climatological Data Option in My Weather Impacts Decision Aid (MyWIDA) Overview
2017-07-18
rules. It consists of 2 databases, a data service server, a collection of web service, and web applications that show weather impacts on selected...3.1.2 ClimoDB 5 3.2 Data Service 5 3.2.1 Data Requestor 5 3.2.2 Data Decoder 6 3.2.3 Post Processor 6 3.2.4 Job Scheduler 6 3.3 Web Service 6...6.1 Additional Data Option 9 6.2 Impact Overlay Web Service 9 6.3 Graphical User Interface 9 7. References 10 List of Symbols, Abbreviations, and
[Design of an embedded stroke rehabilitation apparatus system based on Linux computer engineering].
Zhuang, Pengfei; Tian, XueLong; Zhu, Lin
2014-04-01
A realizaton project of electrical stimulator aimed at motor dysfunction of stroke is proposed in this paper. Based on neurophysiological biofeedback, this system, using an ARM9 S3C2440 as the core processor, integrates collection and display of surface electromyography (sEMG) signal, as well as neuromuscular electrical stimulation (NMES) into one system. By embedding Linux system, the project is able to use Qt/Embedded as a graphical interface design tool to accomplish the design of stroke rehabilitation apparatus. Experiments showed that this system worked well.
A personal computer-based, multitasking data acquisition system
NASA Technical Reports Server (NTRS)
Bailey, Steven A.
1990-01-01
A multitasking, data acquisition system was written to simultaneously collect meteorological radar and telemetry data from two sources. This system is based on the personal computer architecture. Data is collected via two asynchronous serial ports and is deposited to disk. The system is written in both the C programming language and assembler. It consists of three parts: a multitasking kernel for data collection, a shell with pull down windows as user interface, and a graphics processor for editing data and creating coded messages. An explanation of both system principles and program structure is presented.
SIG: a general-purpose signal processing program. User's manual. Revision 1
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lager, D.; Azevedo, S.
1985-05-09
SIG is a general-purpose signal processing, analysis, and display program. Its main purpose is to perform manipulations on time-domain and frequenccy-domain signals. The manual contains a complete description of the SIG program from the user's stand-point. A brief exercise in using SIG is shown. Complete descriptions are given of each command in the SIG core. General information about the SIG structure, command processor, and graphics options are provided. An example usage of SIG for solving a problem is developed, and error message formats are briefly discussed. (LEW)
The recovery and utilization of space suit range-of-motion data
NASA Technical Reports Server (NTRS)
Reinhardt, AL; Walton, James S.
1988-01-01
A technique for recovering data for the range of motion of a subject wearing a space suit is described along with the validation of this technique on an EVA space suit. Digitized data are automatically acquired from video images of the subject; three-dimensional trajectories are recovered from these data, and can be displayed using three-dimensional computer graphics. Target locations are recovered using a unique video processor and close-range photogrammetry. It is concluded that such data can be used in such applications as the animation of anthropometric computer models.
A cost-effective methodology for the design of massively-parallel VLSI functional units
NASA Technical Reports Server (NTRS)
Venkateswaran, N.; Sriram, G.; Desouza, J.
1993-01-01
In this paper we propose a generalized methodology for the design of cost-effective massively-parallel VLSI Functional Units. This methodology is based on a technique of generating and reducing a massive bit-array on the mask-programmable PAcube VLSI array. This methodology unifies (maintains identical data flow and control) the execution of complex arithmetic functions on PAcube arrays. It is highly regular, expandable and uniform with respect to problem-size and wordlength, thereby reducing the communication complexity. The memory-functional unit interface is regular and expandable. Using this technique functional units of dedicated processors can be mask-programmed on the naked PAcube arrays, reducing the turn-around time. The production cost of such dedicated processors can be drastically reduced since the naked PAcube arrays can be mass-produced. Analysis of the the performance of functional units designed by our method yields promising results.
NASA Astrophysics Data System (ADS)
Yang, Mei; Jiao, Fengjun; Li, Shulian; Li, Hengqiang; Chen, Guangwen
2015-08-01
A self-sustained, complete and miniaturized methanol fuel processor has been developed based on modular integration and microreactor technology. The fuel processor is comprised of one methanol oxidative reformer, one methanol combustor and one two-stage CO preferential oxidation unit. Microchannel heat exchanger is employed to recover heat from hot stream, miniaturize system size and thus achieve high energy utilization efficiency. By optimized thermal management and proper operation parameter control, the fuel processor can start up in 10 min at room temperature without external heating. A self-sustained state is achieved with H2 production rate of 0.99 Nm3 h-1 and extremely low CO content below 25 ppm. This amount of H2 is sufficient to supply a 1 kWe proton exchange membrane fuel cell. The corresponding thermal efficiency of whole processor is higher than 86%. The size and weight of the assembled reactors integrated with microchannel heat exchangers are 1.4 L and 5.3 kg, respectively, demonstrating a very compact construction of the fuel processor.
NASA Astrophysics Data System (ADS)
Erez, Mattan; Dally, William J.
Stream processors, like other multi core architectures partition their functional units and storage into multiple processing elements. In contrast to typical architectures, which contain symmetric general-purpose cores and a cache hierarchy, stream processors have a significantly leaner design. Stream processors are specifically designed for the stream execution model, in which applications have large amounts of explicit parallel computation, structured and predictable control, and memory accesses that can be performed at a coarse granularity. Applications in the streaming model are expressed in a gather-compute-scatter form, yielding programs with explicit control over transferring data to and from on-chip memory. Relying on these characteristics, which are common to many media processing and scientific computing applications, stream architectures redefine the boundary between software and hardware responsibilities with software bearing much of the complexity required to manage concurrency, locality, and latency tolerance. Thus, stream processors have minimal control consisting of fetching medium- and coarse-grained instructions and executing them directly on the many ALUs. Moreover, the on-chip storage hierarchy of stream processors is under explicit software control, as is all communication, eliminating the need for complex reactive hardware mechanisms.
Visualization of information with an established order
Wong, Pak Chung [Richland, WA; Foote, Harlan P [Richmond, WA; Thomas, James J [Richland, WA; Wong, Kwong-Kwok [Sugar Land, TX
2007-02-13
Among the embodiments of the present invention is a system including one or more processors operable to access data representative of a biopolymer sequence of monomer units. The one or more processors are further operable to establish a pattern corresponding to at least one fractal curve and generate one or more output signals corresponding to a number of image elements each representative of one of the monomer units. Also included is a display device responsive to the one or more output signals to visualize the biopolymer sequence by displaying the image elements in accordance with the pattern.
Compact gasoline fuel processor for passenger vehicle APU
NASA Astrophysics Data System (ADS)
Severin, Christopher; Pischinger, Stefan; Ogrzewalla, Jürgen
Due to the increasing demand for electrical power in today's passenger vehicles, and with the requirements regarding fuel consumption and environmental sustainability tightening, a fuel cell-based auxiliary power unit (APU) becomes a promising alternative to the conventional generation of electrical energy via internal combustion engine, generator and battery. It is obvious that the on-board stored fuel has to be used for the fuel cell system, thus, gasoline or diesel has to be reformed on board. This makes the auxiliary power unit a complex integrated system of stack, air supply, fuel processor, electrics as well as heat and water management. Aside from proving the technical feasibility of such a system, the development has to address three major barriers:start-up time, costs, and size/weight of the systems. In this paper a packaging concept for an auxiliary power unit is presented. The main emphasis is placed on the fuel processor, as good packaging of this large subsystem has the strongest impact on overall size. The fuel processor system consists of an autothermal reformer in combination with water-gas shift and selective oxidation stages, based on adiabatic reactors with inter-cooling. The configuration was realized in a laboratory set-up and experimentally investigated. The results gained from this confirm a general suitability for mobile applications. A start-up time of 30 min was measured, while a potential reduction to 10 min seems feasible. An overall fuel processor efficiency of about 77% was measured. On the basis of the know-how gained by the experimental investigation of the laboratory set-up a packaging concept was developed. Using state-of-the-art catalyst and heat exchanger technology, the volumes of these components are fixed. However, the overall volume is higher mainly due to mixing zones and flow ducts, which do not contribute to the chemical or thermal function of the system. Thus, the concept developed mainly focuses on minimization of those component volumes. Therefore, the packaging utilizes rectangular catalyst bricks and integrates flow ducts into the heat exchangers. A concept is presented with a 25 l fuel processor volume including thermal isolation for a 3 kW el auxiliary power unit. The overall size of the system, i.e. including stack, air supply and auxiliaries can be estimated to 44 l.
NASA Technical Reports Server (NTRS)
Fura, David A.; Windley, Phillip J.; Cohen, Gerald C.
1993-01-01
This technical report contains the HOL listings of the specification of the design and major portions of the requirements for a commercially developed processor interface unit (or PIU). The PIU is an interface chip performing memory interface, bus interface, and additional support services for a commercial microprocessor within a fault-tolerant computer system. This system, the Fault-Tolerant Embedded Processor (FTEP), is targeted towards applications in avionics and space requiring extremely high levels of mission reliability, extended maintenance-free operation, or both. This report contains the actual HOL listings of the PIU specification as it currently exists. Section two of this report contains general-purpose HOL theories that support the PIU specification. These theories include definitions for the hardware components used in the PIU, our implementation of bit words, and our implementation of temporal logic. Section three contains the HOL listings for the PIU design specification. Aside from the PIU internal bus (I-Bus), this specification is complete. Section four contains the HOL listings for a major portion of the PIU requirements specification. Specifically, it contains most of the definition for the PIU behavior associated with memory accesses initiated by the local processor.
IDSP- INTERACTIVE DIGITAL SIGNAL PROCESSOR
NASA Technical Reports Server (NTRS)
Mish, W. H.
1994-01-01
The Interactive Digital Signal Processor, IDSP, consists of a set of time series analysis "operators" based on the various algorithms commonly used for digital signal analysis work. The processing of a digital time series to extract information is usually achieved by the application of a number of fairly standard operations. However, it is often desirable to "experiment" with various operations and combinations of operations to explore their effect on the results. IDSP is designed to provide an interactive and easy-to-use system for this type of digital time series analysis. The IDSP operators can be applied in any sensible order (even recursively), and can be applied to single time series or to simultaneous time series. IDSP is being used extensively to process data obtained from scientific instruments onboard spacecraft. It is also an excellent teaching tool for demonstrating the application of time series operators to artificially-generated signals. IDSP currently includes over 43 standard operators. Processing operators provide for Fourier transformation operations, design and application of digital filters, and Eigenvalue analysis. Additional support operators provide for data editing, display of information, graphical output, and batch operation. User-developed operators can be easily interfaced with the system to provide for expansion and experimentation. Each operator application generates one or more output files from an input file. The processing of a file can involve many operators in a complex application. IDSP maintains historical information as an integral part of each file so that the user can display the operator history of the file at any time during an interactive analysis. IDSP is written in VAX FORTRAN 77 for interactive or batch execution and has been implemented on a DEC VAX-11/780 operating under VMS. The IDSP system generates graphics output for a variety of graphics systems. The program requires the use of Versaplot and Template plotting routines and IMSL Math/Library routines. These software packages are not included in IDSP. The virtual memory requirement for the program is approximately 2.36 MB. The IDSP system was developed in 1982 and was last updated in 1986. Versaplot is a registered trademark of Versatec Inc. Template is a registered trademark of Template Graphics Software Inc. IMSL Math/Library is a registered trademark of IMSL Inc.
Heterogeneous real-time computing in radio astronomy
NASA Astrophysics Data System (ADS)
Ford, John M.; Demorest, Paul; Ransom, Scott
2010-07-01
Modern computer architectures suited for general purpose computing are often not the best choice for either I/O-bound or compute-bound problems. Sometimes the best choice is not to choose a single architecture, but to take advantage of the best characteristics of different computer architectures to solve your problems. This paper examines the tradeoffs between using computer systems based on the ubiquitous X86 Central Processing Units (CPU's), Field Programmable Gate Array (FPGA) based signal processors, and Graphical Processing Units (GPU's). We will show how a heterogeneous system can be produced that blends the best of each of these technologies into a real-time signal processing system. FPGA's tightly coupled to analog-to-digital converters connect the instrument to the telescope and supply the first level of computing to the system. These FPGA's are coupled to other FPGA's to continue to provide highly efficient processing power. Data is then packaged up and shipped over fast networks to a cluster of general purpose computers equipped with GPU's, which are used for floating-point intensive computation. Finally, the data is handled by the CPU and written to disk, or further processed. Each of the elements in the system has been chosen for its specific characteristics and the role it can play in creating a system that does the most for the least, in terms of power, space, and money.
Implementing Molecular Dynamics for Hybrid High Performance Computers - 1. Short Range Forces
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brown, W Michael; Wang, Peng; Plimpton, Steven J
The use of accelerators such as general-purpose graphics processing units (GPGPUs) have become popular in scientific computing applications due to their low cost, impressive floating-point capabilities, high memory bandwidth, and low electrical power requirements. Hybrid high performance computers, machines with more than one type of floating-point processor, are now becoming more prevalent due to these advantages. In this work, we discuss several important issues in porting a large molecular dynamics code for use on parallel hybrid machines - 1) choosing a hybrid parallel decomposition that works on central processing units (CPUs) with distributed memory and accelerator cores with shared memory,more » 2) minimizing the amount of code that must be ported for efficient acceleration, 3) utilizing the available processing power from both many-core CPUs and accelerators, and 4) choosing a programming model for acceleration. We present our solution to each of these issues for short-range force calculation in the molecular dynamics package LAMMPS. We describe algorithms for efficient short range force calculation on hybrid high performance machines. We describe a new approach for dynamic load balancing of work between CPU and accelerator cores. We describe the Geryon library that allows a single code to compile with both CUDA and OpenCL for use on a variety of accelerators. Finally, we present results on a parallel test cluster containing 32 Fermi GPGPUs and 180 CPU cores.« less
NASA Technical Reports Server (NTRS)
Rickard, D. A.; Bodenheimer, R. E.
1976-01-01
Digital computer components which perform two dimensional array logic operations (Tse logic) on binary data arrays are described. The properties of Golay transforms which make them useful in image processing are reviewed, and several architectures for Golay transform processors are presented with emphasis on the skeletonizing algorithm. Conventional logic control units developed for the Golay transform processors are described. One is a unique microprogrammable control unit that uses a microprocessor to control the Tse computer. The remaining control units are based on programmable logic arrays. Performance criteria are established and utilized to compare the various Golay transform machines developed. A critique of Tse logic is presented, and recommendations for additional research are included.
The 30-cm ion thruster power processor
NASA Technical Reports Server (NTRS)
Herron, B. G.; Hopper, D. J.
1978-01-01
A power processor unit for powering and controlling the 30 cm Mercury Electron-Bombardment Ion Thruster was designed, fabricated, and tested. The unit uses a unique and highly efficient transistor bridge inverter power stage in its implementation. The system operated from a 200 to 400 V dc input power bus, provides 12 independently controllable and closely regulated dc power outputs, and has an overall power conditioning capacity of 3.5 kW. Protective circuitry was incorporated as an integral part of the design to assure failure-free operation during transient and steady-state load faults. The implemented unit demonstrated an electrical efficiency between 91.5 and 91.9 at its nominal rated load over the 200 to 400 V dc input bus range.
Automated collection and processing of environmental samples
Troyer, Gary L.; McNeece, Susan G.; Brayton, Darryl D.; Panesar, Amardip K.
1997-01-01
For monitoring an environmental parameter such as the level of nuclear radiation, at distributed sites, bar coded sample collectors are deployed and their codes are read using a portable data entry unit that also records the time of deployment. The time and collector identity are cross referenced in memory in the portable unit. Similarly, when later recovering the collector for testing, the code is again read and the time of collection is stored as indexed to the sample collector, or to a further bar code, for example as provided on a container for the sample. The identity of the operator can also be encoded and stored. After deploying and/or recovering the sample collectors, the data is transmitted to a base processor. The samples are tested, preferably using a test unit coupled to the base processor, and again the time is recorded. The base processor computes the level of radiation at the site during exposure of the sample collector, using the detected radiation level of the sample, the delay between recovery and testing, the duration of exposure and the half life of the isotopes collected. In one embodiment, an identity code and a site code are optically read by an image grabber coupled to the portable data entry unit.
Synthetic vision in the cockpit: 3D systems for general aviation
NASA Astrophysics Data System (ADS)
Hansen, Andrew J.; Rybacki, Richard M.; Smith, W. Garth
2001-08-01
Synthetic vision has the potential to improve safety in aviation through better pilot situational awareness and enhanced navigational guidance. The technological advances enabling synthetic vision are GPS based navigation (position and attitude) systems and efficient graphical systems for rendering 3D displays in the cockpit. A benefit for military, commercial, and general aviation platforms alike is the relentless drive to miniaturize computer subsystems. Processors, data storage, graphical and digital signal processing chips, RF circuitry, and bus architectures are at or out-pacing Moore's Law with the transition to mobile computing and embedded systems. The tandem of fundamental GPS navigation services such as the US FAA's Wide Area and Local Area Augmentation Systems (WAAS) and commercially viable mobile rendering systems puts synthetic vision well with the the technological reach of general aviation. Given the appropriate navigational inputs, low cost and power efficient graphics solutions are capable of rendering a pilot's out-the-window view into visual databases with photo-specific imagery and geo-specific elevation and feature content. Looking beyond the single airframe, proposed aviation technologies such as ADS-B would provide a communication channel for bringing traffic information on-board and into the cockpit visually via the 3D display for additional pilot awareness. This paper gives a view of current 3D graphics system capability suitable for general aviation and presents a potential road map following the current trends.
Geometrical model for DBMS: an experimental DBMS using IBM solid modeling
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ali, D.E.D.L.
1985-01-01
This research presents a new model for data base management systems (DBMS). The new model, Geometrical DBMS, is based on using solid modelling technology in designing and implementing DBMS. The Geometrical DBMS is implemented using the IBM solid modelling Geometric Design Processor (GDP). Built basically on computer-graphics concepts, Geometrical DBMS is indeed a unique model. Traditionally, researchers start with one of the existent DBMS models and then put a graphical front end on it. In Geometrical DBMS, the graphical aspect of the model is not an alien concept tailored to the model but is, as a matter of fact, themore » atom around which the model is designed. The main idea in Geometrical DBMS is to allow the user and the system to refer to and manipulate data items as a solid object in 3D space, and representing a record as a group of logically related solid objects. In Geometical DBMS, hierarchical structure is used to present the data relations and the user sees the data as a group of arrays; yet, for the user and the system together, the data structure is a multidimensional tree.« less
Organization of brain tissue - Is the brain a noisy processor.
NASA Technical Reports Server (NTRS)
Adey, W. R.
1972-01-01
This paper presents some thoughts on functional organization in cerebral tissue. 'Spontaneous' wave and unit firing are considered as essential phenomena in the handling of information. Various models are discussed which have been suggested to describe the pseudorandom behavior of brain cells, leading to a view of the brain as an information processor and its role in learning, memory, remembering and forgetting.
Reconfigurable signal processor designs for advanced digital array radar systems
NASA Astrophysics Data System (ADS)
Suarez, Hernan; Zhang, Yan (Rockee); Yu, Xining
2017-05-01
The new challenges originated from Digital Array Radar (DAR) demands a new generation of reconfigurable backend processor in the system. The new FPGA devices can support much higher speed, more bandwidth and processing capabilities for the need of digital Line Replaceable Unit (LRU). This study focuses on using the latest Altera and Xilinx devices in an adaptive beamforming processor. The field reprogrammable RF devices from Analog Devices are used as analog front end transceivers. Different from other existing Software-Defined Radio transceivers on the market, this processor is designed for distributed adaptive beamforming in a networked environment. The following aspects of the novel radar processor will be presented: (1) A new system-on-chip architecture based on Altera's devices and adaptive processing module, especially for the adaptive beamforming and pulse compression, will be introduced, (2) Successful implementation of generation 2 serial RapidIO data links on FPGA, which supports VITA-49 radio packet format for large distributed DAR processing. (3) Demonstration of the feasibility and capabilities of the processor in a Micro-TCA based, SRIO switching backplane to support multichannel beamforming in real-time. (4) Application of this processor in ongoing radar system development projects, including OU's dual-polarized digital array radar, the planned new cylindrical array radars, and future airborne radars.
NASA Astrophysics Data System (ADS)
Son, In-Hyuk; Shin, Woo-Cheol; Lee, Yong-Kul; Lee, Sung-Chul; Ahn, Jin-Gu; Han, Sang-Il; kweon, Ho-Jin; Kim, Ju-Yong; Kim, Moon-Chan; Park, Jun-Yong
A polymer electrolyte membrane fuel cell (PEMFC) system is developed to power a notebook computer. The system consists of a compact methanol-reforming system with a CO preferential oxidation unit, a 16-cell PEMFC stack, and a control unit for the management of the system with a d.c.-d.c. converter. The compact fuel-processor system (260 cm 3) generates about 1.2 L min -1 of reformate, which corresponds to 35 We, with a low CO concentration (<30 ppm, typically 0 ppm), and is thus proven to be capable of being targetted at notebook computers.
NASA Technical Reports Server (NTRS)
Djuth, Frank T.; Elder, John H.; Williams, Kenneth L.
1996-01-01
This research program focused on the construction of several key radio wave diagnostics in support of the HF Active Auroral Ionospheric Research Program (HAARP). Project activities led to the design, development, and fabrication of a variety of hardware units and to the development of several menu-driven software packages for data acquisition and analysis. The principal instrumentation includes an HF (28 MHz) radar system, a VHF (50 MHz) radar system, and a high-speed radar processor consisting of three separable processing units. The processor system supports the HF and VHF radars and is capable of acquiring very detailed data with large incoherent scatter radars. In addition, a tunable HF receiver system having high dynamic range was developed primarily for measurements of stimulated electromagnetic emissions (SEE). A separate processor unit was constructed for the SEE receiver. Finally, a large amount of support instrumentation was developed to accommodate complex field experiments. Overall, the HAARP diagnostics are powerful tools for studying diverse ionospheric modification phenomena. They are also flexible enough to support a host of other missions beyond the scope of HAARP. Many new research programs have been initiated by applying the HAARP diagnostics to studies of natural atmospheric processes.
Optical apparatus for forming correlation spectrometers and optical processors
Butler, Michael A.; Ricco, Antonio J.; Sinclair, Michael B.; Senturia, Stephen D.
1999-01-01
Optical apparatus for forming correlation spectrometers and optical processors. The optical apparatus comprises one or more diffractive optical elements formed on a substrate for receiving light from a source and processing the incident light. The optical apparatus includes an addressing element for alternately addressing each diffractive optical element thereof to produce for one unit of time a first correlation with the incident light, and to produce for a different unit of time a second correlation with the incident light that is different from the first correlation. In preferred embodiments of the invention, the optical apparatus is in the form of a correlation spectrometer; and in other embodiments, the apparatus is in the form of an optical processor. In some embodiments, the optical apparatus comprises a plurality of diffractive optical elements on a common substrate for forming first and second gratings that alternately intercept the incident light for different units of time. In other embodiments, the optical apparatus includes an electrically-programmable diffraction grating that may be alternately switched between a plurality of grating states thereof for processing the incident light. The optical apparatus may be formed, at least in part, by a micromachining process.
Optical apparatus for forming correlation spectrometers and optical processors
Butler, M.A.; Ricco, A.J.; Sinclair, M.B.; Senturia, S.D.
1999-05-18
Optical apparatus is disclosed for forming correlation spectrometers and optical processors. The optical apparatus comprises one or more diffractive optical elements formed on a substrate for receiving light from a source and processing the incident light. The optical apparatus includes an addressing element for alternately addressing each diffractive optical element thereof to produce for one unit of time a first correlation with the incident light, and to produce for a different unit of time a second correlation with the incident light that is different from the first correlation. In preferred embodiments of the invention, the optical apparatus is in the form of a correlation spectrometer; and in other embodiments, the apparatus is in the form of an optical processor. In some embodiments, the optical apparatus comprises a plurality of diffractive optical elements on a common substrate for forming first and second gratings that alternately intercept the incident light for different units of time. In other embodiments, the optical apparatus includes an electrically-programmable diffraction grating that may be alternately switched between a plurality of grating states thereof for processing the incident light. The optical apparatus may be formed, at least in part, by a micromachining process. 24 figs.
Jayashree, B; Rajgopal, S; Hoisington, D; Prasanth, V P; Chandra, S
2008-09-24
Structure, is a widely used software tool to investigate population genetic structure with multi-locus genotyping data. The software uses an iterative algorithm to group individuals into "K" clusters, representing possibly K genetically distinct subpopulations. The serial implementation of this programme is processor-intensive even with small datasets. We describe an implementation of the program within a parallel framework. Speedup was achieved by running different replicates and values of K on each node of the cluster. A web-based user-oriented GUI has been implemented in PHP, through which the user can specify input parameters for the programme. The number of processors to be used can be specified in the background command. A web-based visualization tool "Visualstruct", written in PHP (HTML and Java script embedded), allows for the graphical display of population clusters output from Structure, where each individual may be visualized as a line segment with K colors defining its possible genomic composition with respect to the K genetic sub-populations. The advantage over available programs is in the increased number of individuals that can be visualized. The analyses of real datasets indicate a speedup of up to four, when comparing the speed of execution on clusters of eight processors with the speed of execution on one desktop. The software package is freely available to interested users upon request.
Accuracy of the lattice-Boltzmann method using the Cell processor
NASA Astrophysics Data System (ADS)
Harvey, M. J.; de Fabritiis, G.; Giupponi, G.
2008-11-01
Accelerator processors like the new Cell processor are extending the traditional platforms for scientific computation, allowing orders of magnitude more floating-point operations per second (flops) compared to standard central processing units. However, they currently lack double-precision support and support for some IEEE 754 capabilities. In this work, we develop a lattice-Boltzmann (LB) code to run on the Cell processor and test the accuracy of this lattice method on this platform. We run tests for different flow topologies, boundary conditions, and Reynolds numbers in the range Re=6 350 . In one case, simulation results show a reduced mass and momentum conservation compared to an equivalent double-precision LB implementation. All other cases demonstrate the utility of the Cell processor for fluid dynamics simulations. Benchmarks on two Cell-based platforms are performed, the Sony Playstation3 and the QS20/QS21 IBM blade, obtaining a speed-up factor of 7 and 21, respectively, compared to the original PC version of the code, and a conservative sustained performance of 28 gigaflops per single Cell processor. Our results suggest that choice of IEEE 754 rounding mode is possibly as important as double-precision support for this specific scientific application.
NASA Technical Reports Server (NTRS)
Fura, David A.; Windley, Phillip J.; Cohen, Gerald C.
1993-01-01
This technical report contains the Higher-Order Logic (HOL) listings of the partial verification of the requirements and design for a commercially developed processor interface unit (PIU). The PIU is an interface chip performing memory interface, bus interface, and additional support services for a commercial microprocessor within a fault tolerant computer system. This system, the Fault Tolerant Embedded Processor (FTEP), is targeted towards applications in avionics and space requiring extremely high levels of mission reliability, extended maintenance-free operation, or both. This report contains the actual HOL listings of the PIU verification as it currently exists. Section two of this report contains general-purpose HOL theories and definitions that support the PIU verification. These include arithmetic theories dealing with inequalities and associativity, and a collection of tactics used in the PIU proofs. Section three contains the HOL listings for the completed PIU design verification. Section 4 contains the HOL listings for the partial requirements verification of the P-Port.
Target Information Processing: A Joint Decision and Estimation Approach
2012-03-29
ground targets ( track - before - detect ) using computer cluster and graphics processing unit. Estimation and filtering theory is one of the most important...targets ( track - before - detect ) using computer cluster and graphics processing unit. Estimation and filtering theory is one of the most important
High performance, low cost, self-contained, multipurpose PC based ground systems
NASA Technical Reports Server (NTRS)
Forman, Michael; Nickum, William; Troendly, Gregory
1993-01-01
The use of embedded processors greatly enhances the capabilities of personal computers when used for telemetry processing and command control center functions. Parallel architectures based on the use of transputers are shown to be very versatile and reusable, and the synergism between the PC and the embedded processor with transputers results in single unit, low cost workstations of 20 less than MIPS less than or equal to 1000.
Integrating a Natural Language Message Pre-Processor with UIMA
2008-01-01
Carnegie Mellon Language Technologies Institute NL Message Preprocessing with UIMA Copyright © 2008, Carnegie Mellon. All Rights Reserved...Integrating a Natural Language Message Pre-Processor with UIMA Eric Nyberg, Eric Riebling, Richard C. Wang & Robert Frederking Language Technologies Institute...with UIMA 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER
NASA Astrophysics Data System (ADS)
Rodríguez-Sánchez, Rafael; Martínez, José Luis; Cock, Jan De; Fernández-Escribano, Gerardo; Pieters, Bart; Sánchez, José L.; Claver, José M.; de Walle, Rik Van
2013-12-01
The H.264/AVC video coding standard introduces some improved tools in order to increase compression efficiency. Moreover, the multi-view extension of H.264/AVC, called H.264/MVC, adopts many of them. Among the new features, variable block-size motion estimation is one which contributes to high coding efficiency. Furthermore, it defines a different prediction structure that includes hierarchical bidirectional pictures, outperforming traditional Group of Pictures patterns in both scenarios: single-view and multi-view. However, these video coding techniques have high computational complexity. Several techniques have been proposed in the literature over the last few years which are aimed at accelerating the inter prediction process, but there are no works focusing on bidirectional prediction or hierarchical prediction. In this article, with the emergence of many-core processors or accelerators, a step forward is taken towards an implementation of an H.264/AVC and H.264/MVC inter prediction algorithm on a graphics processing unit. The results show a negligible rate distortion drop with a time reduction of up to 98% for the complete H.264/AVC encoder.
Data Acquisition with GPUs: The DAQ for the Muon $g$-$2$ Experiment at Fermilab
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gohn, W.
Graphical Processing Units (GPUs) have recently become a valuable computing tool for the acquisition of data at high rates and for a relatively low cost. The devices work by parallelizing the code into thousands of threads, each executing a simple process, such as identifying pulses from a waveform digitizer. The CUDA programming library can be used to effectively write code to parallelize such tasks on Nvidia GPUs, providing a significant upgrade in performance over CPU based acquisition systems. The muonmore » $g$-$2$ experiment at Fermilab is heavily relying on GPUs to process its data. The data acquisition system for this experiment must have the ability to create deadtime-free records from 700 $$\\mu$$s muon spills at a raw data rate 18 GB per second. Data will be collected using 1296 channels of $$\\mu$$TCA-based 800 MSPS, 12 bit waveform digitizers and processed in a layered array of networked commodity processors with 24 GPUs working in parallel to perform a fast recording of the muon decays during the spill. The described data acquisition system is currently being constructed, and will be fully operational before the start of the experiment in 2017.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sharma, Diksha; Badano, Aldo
2013-03-15
Purpose: hybridMANTIS is a Monte Carlo package for modeling indirect x-ray imagers using columnar geometry based on a hybrid concept that maximizes the utilization of available CPU and graphics processing unit processors in a workstation. Methods: The authors compare hybridMANTIS x-ray response simulations to previously published MANTIS and experimental data for four cesium iodide scintillator screens. These screens have a variety of reflective and absorptive surfaces with different thicknesses. The authors analyze hybridMANTIS results in terms of modulation transfer function and calculate the root mean square difference and Swank factors from simulated and experimental results. Results: The comparison suggests thatmore » hybridMANTIS better matches the experimental data as compared to MANTIS, especially at high spatial frequencies and for the thicker screens. hybridMANTIS simulations are much faster than MANTIS with speed-ups up to 5260. Conclusions: hybridMANTIS is a useful tool for improved description and optimization of image acquisition stages in medical imaging systems and for modeling the forward problem in iterative reconstruction algorithms.« less
CUDAICA: GPU Optimization of Infomax-ICA EEG Analysis
Raimondo, Federico; Kamienkowski, Juan E.; Sigman, Mariano; Fernandez Slezak, Diego
2012-01-01
In recent years, Independent Component Analysis (ICA) has become a standard to identify relevant dimensions of the data in neuroscience. ICA is a very reliable method to analyze data but it is, computationally, very costly. The use of ICA for online analysis of the data, used in brain computing interfaces, results are almost completely prohibitive. We show an increase with almost no cost (a rapid video card) of speed of ICA by about 25 fold. The EEG data, which is a repetition of many independent signals in multiple channels, is very suitable for processing using the vector processors included in the graphical units. We profiled the implementation of this algorithm and detected two main types of operations responsible of the processing bottleneck and taking almost 80% of computing time: vector-matrix and matrix-matrix multiplications. By replacing function calls to basic linear algebra functions to the standard CUBLAS routines provided by GPU manufacturers, it does not increase performance due to CUDA kernel launch overhead. Instead, we developed a GPU-based solution that, comparing with the original BLAS and CUBLAS versions, obtains a 25x increase of performance for the ICA calculation. PMID:22811699
Chodkiewicz, Michał L; Migacz, Szymon; Rudnicki, Witold; Makal, Anna; Kalinowski, Jarosław A; Moriarty, Nigel W; Grosse-Kunstleve, Ralf W; Afonine, Pavel V; Adams, Paul D; Dominiak, Paulina Maria
2018-02-01
It has been recently established that the accuracy of structural parameters from X-ray refinement of crystal structures can be improved by using a bank of aspherical pseudoatoms instead of the classical spherical model of atomic form factors. This comes, however, at the cost of increased complexity of the underlying calculations. In order to facilitate the adoption of this more advanced electron density model by the broader community of crystallographers, a new software implementation called DiSCaMB , 'densities in structural chemistry and molecular biology', has been developed. It addresses the challenge of providing for high performance on modern computing architectures. With parallelization options for both multi-core processors and graphics processing units (using CUDA), the library features calculation of X-ray scattering factors and their derivatives with respect to structural parameters, gives access to intermediate steps of the scattering factor calculations (thus allowing for experimentation with modifications of the underlying electron density model), and provides tools for basic structural crystallographic operations. Permissively (MIT) licensed, DiSCaMB is an open-source C++ library that can be embedded in both academic and commercial tools for X-ray structure refinement.
Prefiltering Model for Homology Detection Algorithms on GPU.
Retamosa, Germán; de Pedro, Luis; González, Ivan; Tamames, Javier
2016-01-01
Homology detection has evolved over the time from heavy algorithms based on dynamic programming approaches to lightweight alternatives based on different heuristic models. However, the main problem with these algorithms is that they use complex statistical models, which makes it difficult to achieve a relevant speedup and find exact matches with the original results. Thus, their acceleration is essential. The aim of this article was to prefilter a sequence database. To make this work, we have implemented a groundbreaking heuristic model based on NVIDIA's graphics processing units (GPUs) and multicore processors. Depending on the sensitivity settings, this makes it possible to quickly reduce the sequence database by factors between 50% and 95%, while rejecting no significant sequences. Furthermore, this prefiltering application can be used together with multiple homology detection algorithms as a part of a next-generation sequencing system. Extensive performance and accuracy tests have been carried out in the Spanish National Centre for Biotechnology (NCB). The results show that GPU hardware can accelerate the execution times of former homology detection applications, such as National Centre for Biotechnology Information (NCBI), Basic Local Alignment Search Tool for Proteins (BLASTP), up to a factor of 4.
Augmented Reality Comes to Physics
NASA Astrophysics Data System (ADS)
Buesing, Mark; Cook, Michael
2013-04-01
Augmented reality (AR) is a technology used on computing devices where processor-generated graphics are rendered over real objects to enhance the sensory experience in real time. In other words, what you are really seeing is augmented by the computer. Many AR games already exist for systems such as Kinect and Nintendo 3DS and mobile apps, such as Tagwhat and Star Chart (a must for astronomy class). The yellow line marking first downs in a televised football game2 and the enhanced puck that makes televised hockey easier to follow3 both use augmented reality to do the job.
Virtual reality applications to automated rendezvous and capture
NASA Technical Reports Server (NTRS)
Hale, Joseph; Oneil, Daniel
1991-01-01
Virtual Reality (VR) is a rapidly developing Human/Computer Interface (HCI) technology. The evolution of high-speed graphics processors and development of specialized anthropomorphic user interface devices, that more fully involve the human senses, have enabled VR technology. Recently, the maturity of this technology has reached a level where it can be used as a tool in a variety of applications. This paper provides an overview of: VR technology, VR activities at Marshall Space Flight Center (MSFC), applications of VR to Automated Rendezvous and Capture (AR&C), and identifies areas of VR technology that requires further development.
Efficient implementation of constant pH molecular dynamics on modern graphics processors.
Arthur, Evan J; Brooks, Charles L
2016-09-15
The treatment of pH sensitive ionization states for titratable residues in proteins is often omitted from molecular dynamics (MD) simulations. While static charge models can answer many questions regarding protein conformational equilibrium and protein-ligand interactions, pH-sensitive phenomena such as acid-activated chaperones and amyloidogenic protein aggregation are inaccessible to such models. Constant pH molecular dynamics (CPHMD) coupled with the Generalized Born with a Simple sWitching function (GBSW) implicit solvent model provide an accurate framework for simulating pH sensitive processes in biological systems. Although this combination has demonstrated success in predicting pKa values of protein structures, and in exploring dynamics of ionizable side-chains, its speed has been an impediment to routine application. The recent availability of low-cost graphics processing unit (GPU) chipsets with thousands of processing cores, together with the implementation of the accurate GBSW implicit solvent model on those chipsets (Arthur and Brooks, J. Comput. Chem. 2016, 37, 927), provide an opportunity to improve the speed of CPHMD and ionization modeling greatly. Here, we present a first implementation of GPU-enabled CPHMD within the CHARMM-OpenMM simulation package interface. Depending on the system size and nonbonded force cutoff parameters, we find speed increases of between one and three orders of magnitude. Additionally, the algorithm scales better with system size than the CPU-based algorithm, thus allowing for larger systems to be modeled in a cost effective manner. We anticipate that the improved performance of this methodology will open the door for broad-spread application of CPHMD in its modeling pH-mediated biological processes. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Dong, Han; Sharma, Diksha; Badano, Aldo
2014-12-01
Monte Carlo simulations play a vital role in the understanding of the fundamental limitations, design, and optimization of existing and emerging medical imaging systems. Efforts in this area have resulted in the development of a wide variety of open-source software packages. One such package, hybridmantis, uses a novel hybrid concept to model indirect scintillator detectors by balancing the computational load using dual CPU and graphics processing unit (GPU) processors, obtaining computational efficiency with reasonable accuracy. In this work, the authors describe two open-source visualization interfaces, webmantis and visualmantis to facilitate the setup of computational experiments via hybridmantis. The visualization tools visualmantis and webmantis enable the user to control simulation properties through a user interface. In the case of webmantis, control via a web browser allows access through mobile devices such as smartphones or tablets. webmantis acts as a server back-end and communicates with an NVIDIA GPU computing cluster that can support multiuser environments where users can execute different experiments in parallel. The output consists of point response and pulse-height spectrum, and optical transport statistics generated by hybridmantis. The users can download the output images and statistics through a zip file for future reference. In addition, webmantis provides a visualization window that displays a few selected optical photon path as they get transported through the detector columns and allows the user to trace the history of the optical photons. The visualization tools visualmantis and webmantis provide features such as on the fly generation of pulse-height spectra and response functions for microcolumnar x-ray imagers while allowing users to save simulation parameters and results from prior experiments. The graphical interfaces simplify the simulation setup and allow the user to go directly from specifying input parameters to receiving visual feedback for the model predictions.
A Survey of Techniques for Modeling and Improving Reliability of Computing Systems
Mittal, Sparsh; Vetter, Jeffrey S.
2015-04-24
Recent trends of aggressive technology scaling have greatly exacerbated the occurrences and impact of faults in computing systems. This has made `reliability' a first-order design constraint. To address the challenges of reliability, several techniques have been proposed. In this study, we provide a survey of architectural techniques for improving resilience of computing systems. We especially focus on techniques proposed for microarchitectural components, such as processor registers, functional units, cache and main memory etc. In addition, we discuss techniques proposed for non-volatile memory, GPUs and 3D-stacked processors. To underscore the similarities and differences of the techniques, we classify them based onmore » their key characteristics. We also review the metrics proposed to quantify vulnerability of processor structures. Finally, we believe that this survey will help researchers, system-architects and processor designers in gaining insights into the techniques for improving reliability of computing systems.« less
A GaAs vector processor based on parallel RISC microprocessors
NASA Astrophysics Data System (ADS)
Misko, Tim A.; Rasset, Terry L.
A vector processor architecture based on the development of a 32-bit microprocessor using gallium arsenide (GaAs) technology has been developed. The McDonnell Douglas vector processor (MVP) will be fabricated completely from GaAs digital integrated circuits. The MVP architecture includes a vector memory of 1 megabyte, a parallel bus architecture with eight processing elements connected in parallel, and a control processor. The processing elements consist of a reduced instruction set CPU (RISC) with four floating-point coprocessor units and necessary memory interface functions. This architecture has been simulated for several benchmark programs including complex fast Fourier transform (FFT), complex inner product, trigonometric functions, and sort-merge routine. The results of this study indicate that the MVP can process a 1024-point complex FFT at a speed of 112 microsec (389 megaflops) while consuming approximately 618 W of power in a volume of approximately 0.1 ft-cubed.
A Survey of Techniques for Modeling and Improving Reliability of Computing Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mittal, Sparsh; Vetter, Jeffrey S.
Recent trends of aggressive technology scaling have greatly exacerbated the occurrences and impact of faults in computing systems. This has made `reliability' a first-order design constraint. To address the challenges of reliability, several techniques have been proposed. In this study, we provide a survey of architectural techniques for improving resilience of computing systems. We especially focus on techniques proposed for microarchitectural components, such as processor registers, functional units, cache and main memory etc. In addition, we discuss techniques proposed for non-volatile memory, GPUs and 3D-stacked processors. To underscore the similarities and differences of the techniques, we classify them based onmore » their key characteristics. We also review the metrics proposed to quantify vulnerability of processor structures. Finally, we believe that this survey will help researchers, system-architects and processor designers in gaining insights into the techniques for improving reliability of computing systems.« less
Performance Testing of GPU-Based Approximate Matching Algorithm on Network Traffic
2015-03-01
Defense Department’s use. vi THIS PAGE INTENTIONALLY LEFT BLANK vii TABLE OF CONTENTS I. INTRODUCTION...22 D. GENERATING DIGESTS ............................................................................23 1. Reference...the-shelf GPU Graphical Processing Unit GPGPU General -Purpose Graphic Processing Unit HBSS Host-Based Security System HIPS Host Intrusion
Parallelized CCHE2D flow model with CUDA Fortran on Graphics Process Units
USDA-ARS?s Scientific Manuscript database
This paper presents the CCHE2D implicit flow model parallelized using CUDA Fortran programming technique on Graphics Processing Units (GPUs). A parallelized implicit Alternating Direction Implicit (ADI) solver using Parallel Cyclic Reduction (PCR) algorithm on GPU is developed and tested. This solve...
Graphic Arts: Book Three. The Press and Related Processes.
ERIC Educational Resources Information Center
Farajollahi, Karim; And Others
The third of a three-volume set of instructional materials for a graphic arts course, this manual consists of nine instructional units dealing with presses and related processes. Covered in the units are basic press fundamentals, offset press systems, offset press operating procedures, offset inks and dampening chemistry, preventive maintenance…
Vascular system modeling in parallel environment - distributed and shared memory approaches
Jurczuk, Krzysztof; Kretowski, Marek; Bezy-Wendling, Johanne
2011-01-01
The paper presents two approaches in parallel modeling of vascular system development in internal organs. In the first approach, new parts of tissue are distributed among processors and each processor is responsible for perfusing its assigned parts of tissue to all vascular trees. Communication between processors is accomplished by passing messages and therefore this algorithm is perfectly suited for distributed memory architectures. The second approach is designed for shared memory machines. It parallelizes the perfusion process during which individual processing units perform calculations concerning different vascular trees. The experimental results, performed on a computing cluster and multi-core machines, show that both algorithms provide a significant speedup. PMID:21550891
A hybrid algorithm for parallel molecular dynamics simulations
NASA Astrophysics Data System (ADS)
Mangiardi, Chris M.; Meyer, R.
2017-10-01
This article describes algorithms for the hybrid parallelization and SIMD vectorization of molecular dynamics simulations with short-range forces. The parallelization method combines domain decomposition with a thread-based parallelization approach. The goal of the work is to enable efficient simulations of very large (tens of millions of atoms) and inhomogeneous systems on many-core processors with hundreds or thousands of cores and SIMD units with large vector sizes. In order to test the efficiency of the method, simulations of a variety of configurations with up to 74 million atoms have been performed. Results are shown that were obtained on multi-core systems with Sandy Bridge and Haswell processors as well as systems with Xeon Phi many-core processors.
NASA Technical Reports Server (NTRS)
Cheng, Thomas D.; Angelici, Gary L.; Slye, Robert E.; Ma, Matt
1991-01-01
The USDA presently uses labor-intensive photographic interpretation procedures to delineate large geographical areas into manageable size sampling units for the estimation of domestic crop and livestock production. Computer software to automate the boundary delineation procedure, called the computer-assisted stratification and sampling (CASS) system, was developed using a Hewlett Packard color-graphics workstation. The CASS procedures display Thematic Mapper (TM) satellite digital imagery on a graphics display workstation as the backdrop for the onscreen delineation of sampling units. USGS Digital Line Graph (DLG) data for roads and waterways are displayed over the TM imagery to aid in identifying potential sample unit boundaries. Initial analysis conducted with three Missouri counties indicated that CASS was six times faster than the manual techniques in delineating sampling units.
LOSITAN: a workbench to detect molecular adaptation based on a Fst-outlier method.
Antao, Tiago; Lopes, Ana; Lopes, Ricardo J; Beja-Pereira, Albano; Luikart, Gordon
2008-07-28
Testing for selection is becoming one of the most important steps in the analysis of multilocus population genetics data sets. Existing applications are difficult to use, leaving many non-trivial, error-prone tasks to the user. Here we present LOSITAN, a selection detection workbench based on a well evaluated Fst-outlier detection method. LOSITAN greatly facilitates correct approximation of model parameters (e.g., genome-wide average, neutral Fst), provides data import and export functions, iterative contour smoothing and generation of graphics in a easy to use graphical user interface. LOSITAN is able to use modern multi-core processor architectures by locally parallelizing fdist, reducing computation time by half in current dual core machines and with almost linear performance gains in machines with more cores. LOSITAN makes selection detection feasible to a much wider range of users, even for large population genomic datasets, by both providing an easy to use interface and essential functionality to complete the whole selection detection process.
A software tool for analyzing multichannel cochlear implant signals.
Lai, Wai Kong; Bögli, Hans; Dillier, Norbert
2003-10-01
A useful and convenient means to analyze the radio frequency (RF) signals being sent by a speech processor to a cochlear implant would be to actually capture and display them with appropriate software. This is particularly useful for development or diagnostic purposes. sCILab (Swiss Cochlear Implant Laboratory) is such a PC-based software tool intended for the Nucleus family of Multichannel Cochlear Implants. Its graphical user interface provides a convenient and intuitive means for visualizing and analyzing the signals encoding speech information. Both numerical and graphic displays are available for detailed examination of the captured CI signals, as well as an acoustic simulation of these CI signals. sCILab has been used in the design and verification of new speech coding strategies, and has also been applied as an analytical tool in studies of how different parameter settings of existing speech coding strategies affect speech perception. As a diagnostic tool, it is also useful for troubleshooting problems with the external equipment of the cochlear implant systems.
Plant, Richard R; Turner, Garry
2009-08-01
Since the publication of Plant, Hammond, and Turner (2004), which highlighted a pressing need for researchers to pay more attention to sources of error in computer-based experiments, the landscape has undoubtedly changed, but not necessarily for the better. Readily available hardware has improved in terms of raw speed; multi core processors abound; graphics cards now have hundreds of megabytes of RAM; main memory is measured in gigabytes; drive space is measured in terabytes; ever larger thin film transistor displays capable of single-digit response times, together with newer Digital Light Processing multimedia projectors, enable much greater graphic complexity; and new 64-bit operating systems, such as Microsoft Vista, are now commonplace. However, have millisecond-accurate presentation and response timing improved, and will they ever be available in commodity computers and peripherals? In the present article, we used a Black Box ToolKit to measure the variability in timing characteristics of hardware used commonly in psychological research.
Ward, R. E.; Purves, T.; Feldman, M.; Schiffman, R. M.; Barry, S.; Christner, M.; Kipa, G.; McCarthy, B. D.; Stiphout, R.
1991-01-01
The Care Windows development project demonstrated the feasibility of an approach designed to add the benefits of an event-driven, graphically-oriented user interface to an existing Medical Information Management System (MIMS) without overstepping economic and logistic constraints. The design solution selected for the Care Windows project incorporates three important design features: (1) the effective de-coupling of severs from requesters, permitting the use of an extensive pre-existing library of MIMS servers, (2) the off-loading of program control functions of the requesters to the workstation processor, reducing the load per transaction on central resources and permitting the use of object-oriented development environments available for microcomputers, (3) the selection of a low end, GUI-capable workstation consisting of a PC-compatible personal computer running Microsoft Windows 3.0, and (4) the development of a highly layered, modular workstation application, permitting the development of interchangeable modules to insure portability and adaptability. PMID:1807665
78 FR 27857 - United States Standards for Wheat
Federal Register 2010, 2011, 2012, 2013, 2014
2013-05-13
... DEPARTMENT OF AGRICULTURE Grain Inspection, Packers and Stockyards Administration 7 CFR Part 810... INFORMATION: Background The United States Grain Standards Act (USGSA) authorizes the Secretary of Agriculture...-2007 Census of Agriculture-updated), handlers, processors, and merchandisers are the primary users of...
ISS EPS Orbital Replacement Unit Block Diagrams
NASA Technical Reports Server (NTRS)
Schmitz, Gregory V.
2001-01-01
The attached documents are being provided to Switching Power Magazine for information purposes. This magazine is writing a feature article on the International Space Station Electrical Power System, focusing on the switching power processors. These units include the DC-DC Converter Unit (DDCU), the Bi-directional Charge/Discharge Unit (BCDU), and the Sequential Shunt Unit (SSU). These diagrams are high-level schematics/block diagrams depicting the overall functionality of each unit.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dr. Dale M. Snider
2011-02-28
This report gives the result from the Phase-1 work on demonstrating greater than 10x speedup of the Barracuda computer program using parallel methods and GPU processors (General-Purpose Graphics Processing Unit or Graphics Processing Unit). Phase-1 demonstrated a 12x speedup on a typical Barracuda function using the GPU processor. The problem test case used about 5 million particles and 250,000 Eulerian grid cells. The relative speedup, compared to a single CPU, increases with increased number of particles giving greater than 12x speedup. Phase-1 work provided a path for reformatting data structure modifications to give good parallel performance while keeping a friendlymore » environment for new physics development and code maintenance. The implementation of data structure changes will be in Phase-2. Phase-1 laid the ground work for the complete parallelization of Barracuda in Phase-2, with the caveat that implemented computer practices for parallel programming done in Phase-1 gives immediate speedup in the current Barracuda serial running code. The Phase-1 tasks were completed successfully laying the frame work for Phase-2. The detailed results of Phase-1 are within this document. In general, the speedup of one function would be expected to be higher than the speedup of the entire code because of I/O functions and communication between the algorithms. However, because one of the most difficult Barracuda algorithms was parallelized in Phase-1 and because advanced parallelization methods and proposed parallelization optimization techniques identified in Phase-1 will be used in Phase-2, an overall Barracuda code speedup (relative to a single CPU) is expected to be greater than 10x. This means that a job which takes 30 days to complete will be done in 3 days. Tasks completed in Phase-1 are: Task 1: Profile the entire Barracuda code and select which subroutines are to be parallelized (See Section Choosing a Function to Accelerate) Task 2: Select a GPU consultant company and jointly parallelize subroutines (CPFD chose the small business EMPhotonics for the Phase-1 the technical partner. See Section Technical Objective and Approach) Task 3: Integrate parallel subroutines into Barracuda (See Section Results from Phase-1 and its subsections) Task 4: Testing, refinement, and optimization of parallel methodology (See Section Results from Phase-1 and Section Result Comparison Program) Task 5: Integrate Phase-1 parallel subroutines into Barracuda and release (See Section Results from Phase-1 and its subsections) Task 6: Roadmap of Phase-2 (See Section Plan for Phase-2) With the completion of Phase 1 we have the base understanding to completely parallelize Barracuda. An overview of the work to move Barracuda to a parallelized code is given in Plan for Phase-2.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chiang, Patrick
2014-01-31
The research goal of this CAREER proposal is to develop energy-efficient, VLSI interconnect circuits and systems that will facilitate future massively-parallel, high-performance computing. Extreme-scale computing will exhibit massive parallelism on multiple vertical levels, from thou sands of computational units on a single processor to thousands of processors in a single data center. Unfortunately, the energy required to communicate between these units at every level (on chip, off-chip, off-rack) will be the critical limitation to energy efficiency. Therefore, the PI's career goal is to become a leading researcher in the design of energy-efficient VLSI interconnect for future computing systems.
Design of an intelligent flight instrumentation unit using embedded RTOS
NASA Astrophysics Data System (ADS)
Estrada-Marmolejo, R.; García-Torales, G.; Torres-Ortega, H. H.; Flores, J. L.
2011-09-01
Micro Unmanned Aerial Vehicles (MUAV) must calculate its spatial position to control the flight dynamics, which is done by Inertial Measurement Units (IMUs). MEMS Inertial sensors have made possible to reduce the size and power consumption of such units. Commonly the flight instrumentation operates independently of the main processor. This work presents an instrumentation block design, which reduces size and power consumption of the complete system of a MUAV. This is done by coupling the inertial sensors to the main processor without considering any intermediate level of processing aside. Using Real Time Operating Systems (RTOS) reduces the number of intermediate components, increasing MUAV reliability. One advantage is the possibility to control several different sensors with a single communication bus. This feature of the MEMS sensors makes a smaller and less complex MUAV design possible.
A GPU-Based Wide-Band Radio Spectrometer
NASA Astrophysics Data System (ADS)
Chennamangalam, Jayanth; Scott, Simon; Jones, Glenn; Chen, Hong; Ford, John; Kepley, Amanda; Lorimer, D. R.; Nie, Jun; Prestage, Richard; Roshi, D. Anish; Wagner, Mark; Werthimer, Dan
2014-12-01
The graphics processing unit has become an integral part of astronomical instrumentation, enabling high-performance online data reduction and accelerated online signal processing. In this paper, we describe a wide-band reconfigurable spectrometer built using an off-the-shelf graphics processing unit card. This spectrometer, when configured as a polyphase filter bank, supports a dual-polarisation bandwidth of up to 1.1 GHz (or a single-polarisation bandwidth of up to 2.2 GHz) on the latest generation of graphics processing units. On the other hand, when configured as a direct fast Fourier transform, the spectrometer supports a dual-polarisation bandwidth of up to 1.4 GHz (or a single-polarisation bandwidth of up to 2.8 GHz).
NASA Astrophysics Data System (ADS)
Pillans, Luke; Harmer, Jack; Edwards, Tim; Richardson, Lee
2016-05-01
Geolocation is the process of calculating a target position based on bearing and range relative to the known location of the observer. A high performance thermal imager with integrated geolocation functions is a powerful long range targeting device. Firefly is a software defined camera core incorporating a system-on-a-chip processor running the AndroidTM operating system. The processor has a range of industry standard serial interfaces which were used to interface to peripheral devices including a laser rangefinder and a digital magnetic compass. The core has built in Global Positioning System (GPS) which provides the third variable required for geolocation. The graphical capability of Firefly allowed flexibility in the design of the man-machine interface (MMI), so the finished system can give access to extensive functionality without appearing cumbersome or over-complicated to the user. This paper covers both the hardware and software design of the system, including how the camera core influenced the selection of peripheral hardware, and the MMI design process which incorporated user feedback at various stages.
The INTELSAT VI SSTDMA network diagnostic system
NASA Astrophysics Data System (ADS)
Tamboli, Satish P.; Zhu, Xiaobo; Wilkins, Kim N.; Gupta, Ramesh K.
The system-level design of an expert-system-based, near-real-time diagnostic system for INTELSAT VI satellite-switched time-division multiple access (SSTDMA) network is described. The challenges of INTELSAT VI diagnostics are discussed, along with alternative approaches for network diagnostics and the rationale for choosing a method based on burst unique-word detection. The focal point of the diagnostic system is the diagnostic processor, which resides in the central control and monitoring facility known as the INTELSAT Operations Center TDMA Facility (IOCTF). As real-time information such as burst unique-word detection data, reference terminal status data, and satellite telemetry alarm data are received at the IOCTF, the diagnostic processor continuously monitors the data streams. When a burst status change is detected, a 'snapshot' of the real-time data is forwarded to the expert system. Receipt of the change causes a set of rules to be invoked which associate the traffic pattern with a set of probable causes. A user-friendly interface allows a graphical view of the burst time plan and provides the ability to browse through the knowledge bases.
NASA Technical Reports Server (NTRS)
Pordes, Ruth (Editor)
1989-01-01
Papers on real-time computer applications in nuclear, particle, and plasma physics are presented, covering topics such as expert systems tactics in testing FASTBUS segment interconnect modules, trigger control in a high energy physcis experiment, the FASTBUS read-out system for the Aleph time projection chamber, a multiprocessor data acquisition systems, DAQ software architecture for Aleph, a VME multiprocessor system for plasma control at the JT-60 upgrade, and a multiasking, multisinked, multiprocessor data acquisition front end. Other topics include real-time data reduction using a microVAX processor, a transputer based coprocessor for VEDAS, simulation of a macropipelined multi-CPU event processor for use in FASTBUS, a distributed VME control system for the LISA superconducting Linac, a distributed system for laboratory process automation, and a distributed system for laboratory process automation. Additional topics include a structure macro assembler for the event handler, a data acquisition and control system for Thomson scattering on ATF, remote procedure execution software for distributed systems, and a PC-based graphic display real-time particle beam uniformity.
NASA Astrophysics Data System (ADS)
Fay, James A.; Sonwalkar, Nishikant
1996-05-01
This CD-ROM is designed to accompany James Fay's Introduction to Fluid Mechanics. An enhanced hypermedia version of the textbook, it offers a number of ways to explore the fluid mechanics domain. These include a complete hypertext version of the original book, physical-experiment video clips, excerpts from external references, audio annotations, colored graphics, review questions, and progressive hints for solving problems. Throughout, the authors provide expert guidance in navigating the typed links so that students do not get lost in the learning process. System requirements: Macintosh with 68030 or greater processor and with at least 16 Mb of RAM. Operating System 6.0.4 or later for 680x0 processor and System 7.1.2 or later for Power-PC. CD-ROM drive with 256- color capability. Preferred display 14 inches or above (SuperVGA with 1 megabyte of VRAM). Additional system font software: Computer Modern postscript fonts (CM/PS Screen Fonts, CMBSY10, and CMTT10) and Adobe Type Manager (ATM 3.0 or later). James A. Fay is Professor Emeritus and Senior Lecturer in the Department of Mechanical Engineering at MIT.
Systems, methods, and products for graphically illustrating and controlling a droplet actuator
NASA Technical Reports Server (NTRS)
Brafford, Keith R. (Inventor); Pamula, Vamsee K. (Inventor); Paik, Philip Y. (Inventor); Pollack, Michael G. (Inventor); Sturmer, Ryan A. (Inventor); Smith, Gregory F. (Inventor)
2010-01-01
Systems for controlling a droplet microactuator are provided. According to one embodiment, a system is provided and includes a controller, a droplet microactuator electronically coupled to the controller, and a display device displaying a user interface electronically coupled to the controller, wherein the system is programmed and configured to permit a user to effect a droplet manipulation by interacting with the user interface. According to another embodiment, a system is provided and includes a processor, a display device electronically coupled to the processor, and software loaded and/or stored in a storage device electronically coupled to the controller, a memory device electronically coupled to the controller, and/or the controller and programmed to display an interactive map of a droplet microactuator. According to yet another embodiment, a system is provided and includes a controller, a droplet microactuator electronically coupled to the controller, a display device displaying a user interface electronically coupled to the controller, and software for executing a protocol loaded and/or stored in a storage device electronically coupled to the controller, a memory device electronically coupled to the controller, and/or the controller.
Rapid indirect trajectory optimization on highly parallel computing architectures
NASA Astrophysics Data System (ADS)
Antony, Thomas
Trajectory optimization is a field which can benefit greatly from the advantages offered by parallel computing. The current state-of-the-art in trajectory optimization focuses on the use of direct optimization methods, such as the pseudo-spectral method. These methods are favored due to their ease of implementation and large convergence regions while indirect methods have largely been ignored in the literature in the past decade except for specific applications in astrodynamics. It has been shown that the shortcomings conventionally associated with indirect methods can be overcome by the use of a continuation method in which complex trajectory solutions are obtained by solving a sequence of progressively difficult optimization problems. High performance computing hardware is trending towards more parallel architectures as opposed to powerful single-core processors. Graphics Processing Units (GPU), which were originally developed for 3D graphics rendering have gained popularity in the past decade as high-performance, programmable parallel processors. The Compute Unified Device Architecture (CUDA) framework, a parallel computing architecture and programming model developed by NVIDIA, is one of the most widely used platforms in GPU computing. GPUs have been applied to a wide range of fields that require the solution of complex, computationally demanding problems. A GPU-accelerated indirect trajectory optimization methodology which uses the multiple shooting method and continuation is developed using the CUDA platform. The various algorithmic optimizations used to exploit the parallelism inherent in the indirect shooting method are described. The resulting rapid optimal control framework enables the construction of high quality optimal trajectories that satisfy problem-specific constraints and fully satisfy the necessary conditions of optimality. The benefits of the framework are highlighted by construction of maximum terminal velocity trajectories for a hypothetical long range weapon system. The techniques used to construct an initial guess from an analytic near-ballistic trajectory and the methods used to formulate the necessary conditions of optimality in a manner that is transparent to the designer are discussed. Various hypothetical mission scenarios that enforce different combinations of initial, terminal, interior point and path constraints demonstrate the rapid construction of complex trajectories without requiring any a-priori insight into the structure of the solutions. Trajectory problems of this kind were previously considered impractical to solve using indirect methods. The performance of the GPU-accelerated solver is found to be 2x--4x faster than MATLAB's bvp4c, even while running on GPU hardware that is five years behind the state-of-the-art.
Park, Daejin; Cho, Jeonghun
2014-01-01
A specially designed sensor processor used as a main processor in IoT (internet-of-thing) device for the rare-event sensing applications is proposed. The IoT device including the proposed sensor processor performs the event-driven sensor data processing based on an accuracy-energy configurable event-quantization in architectural level. The received sensor signal is converted into a sequence of atomic events, which is extracted by the signal-to-atomic-event generator (AEG). Using an event signal processing unit (EPU) as an accelerator, the extracted atomic events are analyzed to build the final event. Instead of the sampled raw data transmission via internet, the proposed method delays the communication with a host system until a semantic pattern of the signal is identified as a final event. The proposed processor is implemented on a single chip, which is tightly coupled in bus connection level with a microcontroller using a 0.18 μm CMOS embedded-flash process. For experimental results, we evaluated the proposed sensor processor by using an IR- (infrared radio-) based signal reflection and sensor signal acquisition system. We successfully demonstrated that the expected power consumption is in the range of 20% to 50% compared to the result of the basement in case of allowing 10% accuracy error.
NASA Astrophysics Data System (ADS)
Bradu, Adrian; Kapinchev, Konstantin; Barnes, Fred; Garway-Heath, David F.; Rajendram, Ranjan; Keane, Pearce; Podoleanu, Adrian G.
2015-03-01
Recently, we introduced a novel Optical Coherence Tomography (OCT) method, termed as Master Slave OCT (MS-OCT), specialized for delivering en-face images. This method uses principles of spectral domain interfereometry in two stages. MS-OCT operates like a time domain OCT, selecting only signals from a chosen depth only while scanning the laser beam across the eye. Time domain OCT allows real time production of an en-face image, although relatively slowly. As a major advance, the Master Slave method allows collection of signals from any number of depths, as required by the user. The tremendous advantage in terms of parallel provision of data from numerous depths could not be fully employed by using multi core processors only. The data processing required to generate images at multiple depths simultaneously is not achievable with commodity multicore processors only. We compare here the major improvement in processing and display, brought about by using graphic cards. We demonstrate images obtained with a swept source at 100 kHz (which determines an acquisition time [Ta] for a frame of 200×200 pixels2 of Ta =1.6 s). By the end of the acquired frame being scanned, using our computing capacity, 4 simultaneous en-face images could be created in T = 0.8 s. We demonstrate that by using graphic cards, 32 en-face images can be displayed in Td 0.3 s. Other faster swept source engines can be used with no difference in terms of Td. With 32 images (or more), volumes can be created for 3D display, using en-face images, as opposed to the current technology where volumes are created using cross section OCT images.
Madanecki, Piotr; Bałut, Magdalena; Buckley, Patrick G; Ochocka, J Renata; Bartoszewski, Rafał; Crossman, David K; Messiaen, Ludwine M; Piotrowski, Arkadiusz
2018-01-01
High-throughput technologies generate considerable amount of data which often requires bioinformatic expertise to analyze. Here we present High-Throughput Tabular Data Processor (HTDP), a platform independent Java program. HTDP works on any character-delimited column data (e.g. BED, GFF, GTF, PSL, WIG, VCF) from multiple text files and supports merging, filtering and converting of data that is produced in the course of high-throughput experiments. HTDP can also utilize itemized sets of conditions from external files for complex or repetitive filtering/merging tasks. The program is intended to aid global, real-time processing of large data sets using a graphical user interface (GUI). Therefore, no prior expertise in programming, regular expression, or command line usage is required of the user. Additionally, no a priori assumptions are imposed on the internal file composition. We demonstrate the flexibility and potential of HTDP in real-life research tasks including microarray and massively parallel sequencing, i.e. identification of disease predisposing variants in the next generation sequencing data as well as comprehensive concurrent analysis of microarray and sequencing results. We also show the utility of HTDP in technical tasks including data merge, reduction and filtering with external criteria files. HTDP was developed to address functionality that is missing or rudimentary in other GUI software for processing character-delimited column data from high-throughput technologies. Flexibility, in terms of input file handling, provides long term potential functionality in high-throughput analysis pipelines, as the program is not limited by the currently existing applications and data formats. HTDP is available as the Open Source software (https://github.com/pmadanecki/htdp).
Bałut, Magdalena; Buckley, Patrick G.; Ochocka, J. Renata; Bartoszewski, Rafał; Crossman, David K.; Messiaen, Ludwine M.; Piotrowski, Arkadiusz
2018-01-01
High-throughput technologies generate considerable amount of data which often requires bioinformatic expertise to analyze. Here we present High-Throughput Tabular Data Processor (HTDP), a platform independent Java program. HTDP works on any character-delimited column data (e.g. BED, GFF, GTF, PSL, WIG, VCF) from multiple text files and supports merging, filtering and converting of data that is produced in the course of high-throughput experiments. HTDP can also utilize itemized sets of conditions from external files for complex or repetitive filtering/merging tasks. The program is intended to aid global, real-time processing of large data sets using a graphical user interface (GUI). Therefore, no prior expertise in programming, regular expression, or command line usage is required of the user. Additionally, no a priori assumptions are imposed on the internal file composition. We demonstrate the flexibility and potential of HTDP in real-life research tasks including microarray and massively parallel sequencing, i.e. identification of disease predisposing variants in the next generation sequencing data as well as comprehensive concurrent analysis of microarray and sequencing results. We also show the utility of HTDP in technical tasks including data merge, reduction and filtering with external criteria files. HTDP was developed to address functionality that is missing or rudimentary in other GUI software for processing character-delimited column data from high-throughput technologies. Flexibility, in terms of input file handling, provides long term potential functionality in high-throughput analysis pipelines, as the program is not limited by the currently existing applications and data formats. HTDP is available as the Open Source software (https://github.com/pmadanecki/htdp). PMID:29432475
Graphic Arts: Book Two. Process Camera, Stripping, and Platemaking.
ERIC Educational Resources Information Center
Farajollahi, Karim; And Others
The second of a three-volume set of instructional materials for a course in graphic arts, this manual consists of 10 instructional units dealing with the process camera, stripping, and platemaking. Covered in the individual units are the process camera and darkroom photography, line photography, half-tone photography, other darkroom techniques,…
Graphic Arts: Book One. Orientation, Composition, and Paste-up.
ERIC Educational Resources Information Center
Farajollahi, Karim; And Others
The first of a three-volume set of instructional materials for a graphic arts course, this manual consists of 13 instructional units. Covered in the units are orientation (career overview, shop safety, shop organization, photo-offset theory, legal restrictions, and applying for a job); principles of copy planning (overview of copy planning and…
CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment
Manavski, Svetlin A; Valle, Giorgio
2008-01-01
Background Searching for similarities in protein and DNA databases has become a routine procedure in Molecular Biology. The Smith-Waterman algorithm has been available for more than 25 years. It is based on a dynamic programming approach that explores all the possible alignments between two sequences; as a result it returns the optimal local alignment. Unfortunately, the computational cost is very high, requiring a number of operations proportional to the product of the length of two sequences. Furthermore, the exponential growth of protein and DNA databases makes the Smith-Waterman algorithm unrealistic for searching similarities in large sets of sequences. For these reasons heuristic approaches such as those implemented in FASTA and BLAST tend to be preferred, allowing faster execution times at the cost of reduced sensitivity. The main motivation of our work is to exploit the huge computational power of commonly available graphic cards, to develop high performance solutions for sequence alignment. Results In this paper we present what we believe is the fastest solution of the exact Smith-Waterman algorithm running on commodity hardware. It is implemented in the recently released CUDA programming environment by NVidia. CUDA allows direct access to the hardware primitives of the last-generation Graphics Processing Units (GPU) G80. Speeds of more than 3.5 GCUPS (Giga Cell Updates Per Second) are achieved on a workstation running two GeForce 8800 GTX. Exhaustive tests have been done to compare our implementation to SSEARCH and BLAST, running on a 3 GHz Intel Pentium IV processor. Our solution was also compared to a recently published GPU implementation and to a Single Instruction Multiple Data (SIMD) solution. These tests show that our implementation performs from 2 to 30 times faster than any other previous attempt available on commodity hardware. Conclusions The results show that graphic cards are now sufficiently advanced to be used as efficient hardware accelerators for sequence alignment. Their performance is better than any alternative available on commodity hardware platforms. The solution presented in this paper allows large scale alignments to be performed at low cost, using the exact Smith-Waterman algorithm instead of the largely adopted heuristic approaches. PMID:18387198
A diesel fuel processor for fuel-cell-based auxiliary power unit applications
NASA Astrophysics Data System (ADS)
Samsun, Remzi Can; Krekel, Daniel; Pasel, Joachim; Prawitz, Matthias; Peters, Ralf; Stolten, Detlef
2017-07-01
Producing a hydrogen-rich gas from diesel fuel enables the efficient generation of electricity in a fuel-cell-based auxiliary power unit. In recent years, significant progress has been achieved in diesel reforming. One issue encountered is the stable operation of water-gas shift reactors with real reformates. A new fuel processor is developed using a commercial shift catalyst. The system is operated using optimized start-up and shut-down strategies. Experiments with diesel and kerosene fuels show slight performance drops in the shift reactor during continuous operation for 100 h. CO concentrations much lower than the target value are achieved during system operation in auxiliary power unit mode at partial loads of up to 60%. The regeneration leads to full recovery of the shift activity. Finally, a new operation strategy is developed whereby the gas hourly space velocity of the shift stages is re-designed. This strategy is validated using different diesel and kerosene fuels, showing a maximum CO concentration of 1.5% at the fuel processor outlet under extreme conditions, which can be tolerated by a high-temperature PEFC. The proposed operation strategy solves the issue of strong performance drop in the shift reactor and makes this technology available for reducing emissions in the transportation sector.
Acker, Jason P; Hansen, Adele L; Yi, Qi-Long; Sondi, Nayana; Cserti-Gazdewich, Christine; Pendergrast, Jacob; Hannach, Barbara
2016-01-01
After introduction of a closed-system cell processor, the effect of this product change on safety, efficacy, and utilization of washed red blood cells (RBCs) was assessed. This study was a pre-/postimplementation observational study. Efficacy data were collected from sequentially transfused washed RBCs received as prophylactic therapy by β-thalassemia patients during a 3-month period before and after implementation of the Haemonetics ACP 215 closed-system processor. Before implementation, an open system (TerumoBCT COBE 2991) was used to wash RBCs. The primary endpoint for efficacy was a change in hemoglobin (Hb) concentration corrected for the duration between transfusions. The primary endpoint for safety was the frequency of adverse transfusion reactions (ATRs) in all washed RBCs provided by Canadian Blood Services to the transfusion service for 12 months before and after implementation. Data were analyzed from more than 300 RBCs transfused to 31 recipients before implementation and 29 recipients after implementation. The number of units transfused per episode reduced significantly after implementation, from a mean of 3.5 units to a mean of 3.1 units (p < 0.005). The corrected change in Hb concentration was not significantly different before and after implementation. ATRs occurred in 0.15% of transfusions both before and after implementation. Safety and efficacy of washed RBCs were not affected after introduction of a closed-system cell processor. The ACP 215 allowed for an extended expiry time, improving inventory management and overall utilization of washed RBCs. Transfusion of fewer RBCs per episode reduced exposure of recipients to allogeneic blood products while maintaining efficacy. © 2015 AABB.
The development of a general purpose ARM-based processing unit for the ATLAS TileCal sROD
NASA Astrophysics Data System (ADS)
Cox, M. A.; Reed, R.; Mellado, B.
2015-01-01
After Phase-II upgrades in 2022, the data output from the LHC ATLAS Tile Calorimeter will increase significantly. ARM processors are common in mobile devices due to their low cost, low energy consumption and high performance. It is proposed that a cost-effective, high data throughput Processing Unit (PU) can be developed by using several consumer ARM processors in a cluster configuration to allow aggregated processing performance and data throughput while maintaining minimal software design difficulty for the end-user. This PU could be used for a variety of high-level functions on the high-throughput raw data such as spectral analysis and histograms to detect possible issues in the detector at a low level. High-throughput I/O interfaces are not typical in consumer ARM System on Chips but high data throughput capabilities are feasible via the novel use of PCI-Express as the I/O interface to the ARM processors. An overview of the PU is given and the results for performance and throughput testing of four different ARM Cortex System on Chips are presented.
Goldfinger, D; Medici, M A; Hsi, R; McPherson, J; Connelly, M
1983-01-01
Clinical studies have suggested that granulocyte transfusions may be of value in the treatment of septic neonatal patients who present with severe granulocytopenia. We have developed a protocol for the preparation of granulocyte concentrates from freshly collected units of whole blood, using an automated blood cell processor. The red cells were washed with saline. Then, the buffy coats were collected from the washed red cells and studied for their suitability as granulocyte concentrates for neonatal transfusion. The mean number of granulocytes per concentrate was 1.6 X 10(9) in a mean volume of 25 ml. Studies of granulocyte function, including viability, random mobility, chemotaxis, phagocytosis and nitro-blue tetrazolium reduction, demonstrated that the granulocytes were functionally unimpaired following preparation of the concentrates. These studies suggest that concentrates of functional granulocytes, suitable for transfusion to neonatal patients, can be prepared from fresh units of whole blood, using a cell processor. This procedure is more cost-effective than leukapheresis and allows for delivery of granulocytes for transfusion in a more timely fashion.
Processor Units Reduce Satellite Construction Costs
NASA Technical Reports Server (NTRS)
2014-01-01
As part of the effort to build the Fast Affordable Science and Technology Satellite (FASTSAT), Marshall Space Flight Center developed a low-cost telemetry unit which is used to facilitate communication between a satellite and its receiving station. Huntsville, Alabama-based Orbital Telemetry Inc. has licensed the NASA technology and is offering to install the cost-cutting units on commercial satellites.
Graphics Technology Study. Volume 1. State of Graphics Technology
1986-12-01
reaction of special heat sensitive paper when exposed to the heated elements of a thermal print head. Copy quality was poor due to characteristics...Vendors are now attempting to offer smaller units aimed at applications such as typography , graphic arts, CAD, and office automation. The key element in
A Conformance Test Suite for Arden Syntax Compilers and Interpreters.
Wolf, Klaus-Hendrik; Klimek, Mike
2016-01-01
The Arden Syntax for Medical Logic Modules is a standardized and well-established programming language to represent medical knowledge. To test the compliance level of existing compilers and interpreters no public test suite exists. This paper presents the research to transform the specification into a set of unit tests, represented in JUnit. It further reports on the utilization of the test suite testing four different Arden Syntax processors. The presented and compared results reveal the status conformance of the tested processors. How test driven development of Arden Syntax processors can help increasing the compliance with the standard is described with two examples. In the end some considerations how an open source test suite can improve the development and distribution of the Arden Syntax are presented.
2013-08-22
4 cores, where the code may simultaneously run on the multiple cores or the graphics processing unit (or GPU – to be more specific on an NVIDIA ...allowed to get accurate crack shapes. DISCLAIMER Reference herein to any specific commercial company , product, process, or service by trade name
GR712RC- Dual-Core Processor- Product Status
NASA Astrophysics Data System (ADS)
Sturesson, Fredrik; Habinc, Sandi; Gaisler, Jiri
2012-08-01
The GR712RC System-on-Chip (SoC) is a dual core LEON3FT system suitable for advanced high reliability space avionics. Fault tolerance features from Aeroflex Gaisler’s GRLIB IP library and an implementation using Ramon Chips RadSafe cell library enables superior radiation hardness.The GR712RC device has been designed to provide high processing power by including two LEON3FT 32- bit SPARC V8 processors, each with its own high- performance IEEE754 compliant floating-point-unit and SPARC reference memory management unit.This high processing power is combined with a large number of serial interfaces, ranging from high-speed links for data transfers to low-speed control buses for commanding and status acquisition.
A high-speed, large-capacity, 'jukebox' optical disk system
NASA Technical Reports Server (NTRS)
Ammon, G. J.; Calabria, J. A.; Thomas, D. T.
1985-01-01
Two optical disk 'jukebox' mass storage systems which provide access to any data in a store of 10 to the 13th bits (1250G bytes) within six seconds have been developed. The optical disk jukebox system is divided into two units, including a hardware/software controller and a disk drive. The controller provides flexibility and adaptability, through a ROM-based microcode-driven data processor and a ROM-based software-driven control processor. The cartridge storage module contains 125 optical disks housed in protective cartridges. Attention is given to a conceptual view of the disk drive unit, the NASA optical disk system, the NASA database management system configuration, the NASA optical disk system interface, and an open systems interconnect reference model.
FRAME (Force Review Automation Environment): MATLAB-based AFM data processor.
Partola, Kostyantyn R; Lykotrafitis, George
2016-05-03
Data processing of force-displacement curves generated by atomic force microscopes (AFMs) for elastic moduli and unbinding event measurements is very time consuming and susceptible to user error or bias. There is an evident need for consistent, dependable, and easy-to-use AFM data processing software. We have developed an open-source software application, the force review automation environment (or FRAME), that provides users with an intuitive graphical user interface, automating data processing, and tools for expediting manual processing. We did not observe a significant difference between manually processed and automatically processed results from the same data sets. Copyright © 2016 Elsevier Ltd. All rights reserved.
Experiences with hypercube operating system instrumentation
NASA Technical Reports Server (NTRS)
Reed, Daniel A.; Rudolph, David C.
1989-01-01
The difficulties in conceptualizing the interactions among a large number of processors make it difficult both to identify the sources of inefficiencies and to determine how a parallel program could be made more efficient. This paper describes an instrumentation system that can trace the execution of distributed memory parallel programs by recording the occurrence of parallel program events. The resulting event traces can be used to compile summary statistics that provide a global view of program performance. In addition, visualization tools permit the graphic display of event traces. Visual presentation of performance data is particularly useful, indeed, necessary for large-scale parallel computers; the enormous volume of performance data mandates visual display.
Navy Budget: Potential Reductions for Research, Development, Test, and Evaluation
1990-11-01
available for use in future Navy programs, including the MK-50 tor- pedo and Vertical Launch Antisubmarine Rocket. A total of $49.9 million of fiscal...346 Travel 03 07 + 04 Support 224 225 + 01 Total Requested $122.61 $122.61 -0- In addition, the Navy plans to acquire six Acoustic Video Processor...units at $2.4 million in fiscal year 1991. The Acoustic Video Processor pro- gram is experiencing development problems, and the full-scale develop- ment
The Effects of High-Altitude Electromagnetic Pulse (HEMP) on Telecommunications Assets
1988-06-01
common to a whole class of switches. 5ESS switch software controls the operating system, call processing, and system administration andgmaintenance...LEVEL (ky/rn)3 (a). Mean Fraction of Preset Calls Dropped Due to Induced Transients3 1.0 W -o35kVhM (36 EVENTS) 5-40 kV/M (13 EVENTS) IAUTOMATIC ...eel PERIPHRAL UNIT BUS,IMNA The entire 4ESS system is controlled by the 1A processor. The processor monitors and controls the operation of the
Replication of Space-Shuttle Computers in FPGAs and ASICs
NASA Technical Reports Server (NTRS)
Ferguson, Roscoe C.
2008-01-01
A document discusses the replication of the functionality of the onboard space-shuttle general-purpose computers (GPCs) in field-programmable gate arrays (FPGAs) and application-specific integrated circuits (ASICs). The purpose of the replication effort is to enable utilization of proven space-shuttle flight software and software-development facilities to the extent possible during development of software for flight computers for a new generation of launch vehicles derived from the space shuttles. The replication involves specifying the instruction set of the central processing unit and the input/output processor (IOP) of the space-shuttle GPC in a hardware description language (HDL). The HDL is synthesized to form a "core" processor in an FPGA or, less preferably, in an ASIC. The core processor can be used to create a flight-control card to be inserted into a new avionics computer. The IOP of the GPC as implemented in the core processor could be designed to support data-bus protocols other than that of a multiplexer interface adapter (MIA) used in the space shuttle. Hence, a computer containing the core processor could be tailored to communicate via the space-shuttle GPC bus and/or one or more other buses.
Development of compact fuel processor for 2 kW class residential PEMFCs
NASA Astrophysics Data System (ADS)
Seo, Yu Taek; Seo, Dong Joo; Jeong, Jin Hyeok; Yoon, Wang Lai
Korea Institute of Energy Research (KIER) has been developing a novel fuel processing system to provide hydrogen rich gas to residential polymer electrolyte membrane fuel cells (PEMFCs) cogeneration system. For the effective design of a compact hydrogen production system, the unit processes of steam reforming, high and low temperature water gas shift, steam generator and internal heat exchangers are thermally and physically integrated into a packaged hardware system. Several prototypes are under development and the prototype I fuel processor showed thermal efficiency of 73% as a HHV basis with methane conversion of 81%. Recently tested prototype II has been shown the improved performance of thermal efficiency of 76% with methane conversion of 83%. In both prototypes, two-stage PrOx reactors reduce CO concentration less than 10 ppm, which is the prerequisite CO limit condition of product gas for the PEMFCs stack. After confirming the initial performance of prototype I fuel processor, it is coupled with PEMFC single cell to test the durability and demonstrated that the fuel processor is operated for 3 days successfully without any failure of fuel cell voltage. Prototype II fuel processor also showed stable performance during the durability test.
Video image processor on the Spacelab 2 Solar Optical Universal Polarimeter /SL2 SOUP/
NASA Technical Reports Server (NTRS)
Lindgren, R. W.; Tarbell, T. D.
1981-01-01
The SOUP instrument is designed to obtain diffraction-limited digital images of the sun with high photometric accuracy. The Video Processor originated from the requirement to provide onboard real-time image processing, both to reduce the telemetry rate and to provide meaningful video displays of scientific data to the payload crew. This original concept has evolved into a versatile digital processing system with a multitude of other uses in the SOUP program. The central element in the Video Processor design is a 16-bit central processing unit based on 2900 family bipolar bit-slice devices. All arithmetic, logical and I/O operations are under control of microprograms, stored in programmable read-only memory and initiated by commands from the LSI-11. Several functions of the Video Processor are described, including interface to the High Rate Multiplexer downlink, cosmetic and scientific data processing, scan conversion for crew displays, focus and exposure testing, and use as ground support equipment.
Teaching Graphics in Technical Communication Classes.
ERIC Educational Resources Information Center
Spurgeon, Kristene C.
Perhaps because the United States is undergoing a video revolution, perhaps because of its increasing sales of goods to non-English speaking markets where graphics can help explain the products, perhaps because of the decreasing communication skills of the work force, graphic aids are becoming more and more widely used and more and more important.…
NASA Technical Reports Server (NTRS)
Bartram, Peter N.
1989-01-01
The current Life Sciences Laboratory Equipment (LSLE) microcomputer for life sciences experiment data acquisition is now obsolete. Among the weaknesses of the current microcomputer are small memory size, relatively slow analog data sampling rates, and the lack of a bulk data storage device. While life science investigators normally prefer data to be transmitted to Earth as it is taken, this is not always possible. No down-link exists for experiments performed in the Shuttle middeck region. One important aspect of a replacement microcomputer is provision for in-flight storage of experimental data. The Write Once, Read Many (WORM) optical disk was studied because of its high storage density, data integrity, and the availability of a space-qualified unit. In keeping with the goals for a replacement microcomputer based upon commercially available components and standard interfaces, the system studied includes a Small Computer System Interface (SCSI) for interfacing the WORM drive. The system itself is designed around the STD bus, using readily available boards. Configurations examined were: (1) master processor board and slave processor board with the SCSI interface; (2) master processor with SCSI interface; (3) master processor with SCSI and Direct Memory Access (DMA); (4) master processor controlling a separate STD bus SCSI board; and (5) master processor controlling a separate STD bus SCSI board with DMA.
2014-01-01
A specially designed sensor processor used as a main processor in IoT (internet-of-thing) device for the rare-event sensing applications is proposed. The IoT device including the proposed sensor processor performs the event-driven sensor data processing based on an accuracy-energy configurable event-quantization in architectural level. The received sensor signal is converted into a sequence of atomic events, which is extracted by the signal-to-atomic-event generator (AEG). Using an event signal processing unit (EPU) as an accelerator, the extracted atomic events are analyzed to build the final event. Instead of the sampled raw data transmission via internet, the proposed method delays the communication with a host system until a semantic pattern of the signal is identified as a final event. The proposed processor is implemented on a single chip, which is tightly coupled in bus connection level with a microcontroller using a 0.18 μm CMOS embedded-flash process. For experimental results, we evaluated the proposed sensor processor by using an IR- (infrared radio-) based signal reflection and sensor signal acquisition system. We successfully demonstrated that the expected power consumption is in the range of 20% to 50% compared to the result of the basement in case of allowing 10% accuracy error. PMID:25580458
Multi-Threaded Algorithms for GPGPU in the ATLAS High Level Trigger
NASA Astrophysics Data System (ADS)
Conde Muíño, P.; ATLAS Collaboration
2017-10-01
General purpose Graphics Processor Units (GPGPU) are being evaluated for possible future inclusion in an upgraded ATLAS High Level Trigger farm. We have developed a demonstrator including GPGPU implementations of Inner Detector and Muon tracking and Calorimeter clustering within the ATLAS software framework. ATLAS is a general purpose particle physics experiment located on the LHC collider at CERN. The ATLAS Trigger system consists of two levels, with Level-1 implemented in hardware and the High Level Trigger implemented in software running on a farm of commodity CPU. The High Level Trigger reduces the trigger rate from the 100 kHz Level-1 acceptance rate to 1.5 kHz for recording, requiring an average per-event processing time of ∼ 250 ms for this task. The selection in the high level trigger is based on reconstructing tracks in the Inner Detector and Muon Spectrometer and clusters of energy deposited in the Calorimeter. Performing this reconstruction within the available farm resources presents a significant challenge that will increase significantly with future LHC upgrades. During the LHC data taking period starting in 2021, luminosity will reach up to three times the original design value. Luminosity will increase further to 7.5 times the design value in 2026 following LHC and ATLAS upgrades. Corresponding improvements in the speed of the reconstruction code will be needed to provide the required trigger selection power within affordable computing resources. Key factors determining the potential benefit of including GPGPU as part of the HLT processor farm are: the relative speed of the CPU and GPGPU algorithm implementations; the relative execution times of the GPGPU algorithms and serial code remaining on the CPU; the number of GPGPU required, and the relative financial cost of the selected GPGPU. We give a brief overview of the algorithms implemented and present new measurements that compare the performance of various configurations exploiting GPGPU cards.
High-Performance High-Order Simulation of Wave and Plasma Phenomena
NASA Astrophysics Data System (ADS)
Klockner, Andreas
This thesis presents results aiming to enhance and broaden the applicability of the discontinuous Galerkin ("DG") method in a variety of ways. DG was chosen as a foundation for this work because it yields high-order finite element discretizations with very favorable numerical properties for the treatment of hyperbolic conservation laws. In a first part, I examine progress that can be made on implementation aspects of DG. In adapting the method to mass-market massively parallel computation hardware in the form of graphics processors ("GPUs"), I obtain an increase in computation performance per unit of cost by more than an order of magnitude over conventional processor architectures. Key to this advance is a recipe that adapts DG to a variety of hardware through automated self-tuning. I discuss new parallel programming tools supporting GPU run-time code generation which are instrumental in the DG self-tuning process and contribute to its reaching application floating point throughput greater than 200 GFlops/s on a single GPU and greater than 3 TFlops/s on a 16-GPU cluster in simulations of electromagnetics problems in three dimensions. I further briefly discuss the solver infrastructure that makes this possible. In the second part of the thesis, I introduce a number of new numerical methods whose motivation is partly rooted in the opportunity created by GPU-DG: First, I construct and examine a novel GPU-capable shock detector, which, when used to control an artificial viscosity, helps stabilize DG computations in gas dynamics and a number of other fields. Second, I describe my pursuit of a method that allows the simulation of rarefied plasmas using a DG discretization of the electromagnetic field. Finally, I introduce new explicit multi-rate time integrators for ordinary differential equations with multiple time scales, with a focus on applicability to DG discretizations of time-dependent problems.
Graphics processing unit-assisted lossless decompression
Loughry, Thomas A.
2016-04-12
Systems and methods for decompressing compressed data that has been compressed by way of a lossless compression algorithm are described herein. In a general embodiment, a graphics processing unit (GPU) is programmed to receive compressed data packets and decompress such packets in parallel. The compressed data packets are compressed representations of an image, and the lossless compression algorithm is a Rice compression algorithm.
Reactions to Graphic Health Warnings in the United States
ERIC Educational Resources Information Center
Nonnemaker, James M.; Choiniere, Conrad J.; Farrelly, Matthew C.; Kamyab, Kian; Davis, Kevin C.
2015-01-01
This study reports consumer reactions to the graphic health warnings selected by the Food and Drug Administration to be placed on cigarette packs in the United States. We recruited three sets of respondents for an experimental study from a national opt-in e-mail list sample: (i) current smokers aged 25 or older, (ii) young adult smokers aged 18-24…
First Year Engineering Graphics Curricula in Major Engineering Colleges.
ERIC Educational Resources Information Center
Meyers, Frederick D.
2000-01-01
Investigates the commonalities and differences of graphics programs among nine universities in the United States by analyzing the course structure and reviewing attendance and course syllabi. (Author/YDS)
GaAs Supercomputing: Architecture, Language, And Algorithms For Image Processing
NASA Astrophysics Data System (ADS)
Johl, John T.; Baker, Nick C.
1988-10-01
The application of high-speed GaAs processors in a parallel system matches the demanding computational requirements of image processing. The architecture of the McDonnell Douglas Astronautics Company (MDAC) vector processor is described along with the algorithms and language translator. Most image and signal processing algorithms can utilize parallel processing and show a significant performance improvement over sequential versions. The parallelization performed by this system is within each vector instruction. Since each vector has many elements, each requiring some computation, useful concurrent arithmetic operations can easily be performed. Balancing the memory bandwidth with the computation rate of the processors is an important design consideration for high efficiency and utilization. The architecture features a bus-based execution unit consisting of four to eight 32-bit GaAs RISC microprocessors running at a 200 MHz clock rate for a peak performance of 1.6 BOPS. The execution unit is connected to a vector memory with three buses capable of transferring two input words and one output word every 10 nsec. The address generators inside the vector memory perform different vector addressing modes and feed the data to the execution unit. The functions discussed in this paper include basic MATRIX OPERATIONS, 2-D SPATIAL CONVOLUTION, HISTOGRAM, and FFT. For each of these algorithms, assembly language programs were run on a behavioral model of the system to obtain performance figures.
Accelerating 3D Elastic Wave Equations on Knights Landing based Intel Xeon Phi processors
NASA Astrophysics Data System (ADS)
Sourouri, Mohammed; Birger Raknes, Espen
2017-04-01
In advanced imaging methods like reverse-time migration (RTM) and full waveform inversion (FWI) the elastic wave equation (EWE) is numerically solved many times to create the seismic image or the elastic parameter model update. Thus, it is essential to optimize the solution time for solving the EWE as this will have a major impact on the total computational cost in running RTM or FWI. From a computational point of view applications implementing EWEs are associated with two major challenges. The first challenge is the amount of memory-bound computations involved, while the second challenge is the execution of such computations over very large datasets. So far, multi-core processors have not been able to tackle these two challenges, which eventually led to the adoption of accelerators such as Graphics Processing Units (GPUs). Compared to conventional CPUs, GPUs are densely populated with many floating-point units and fast memory, a type of architecture that has proven to map well to many scientific computations. Despite its architectural advantages, full-scale adoption of accelerators has yet to materialize. First, accelerators require a significant programming effort imposed by programming models such as CUDA or OpenCL. Second, accelerators come with a limited amount of memory, which also require explicit data transfers between the CPU and the accelerator over the slow PCI bus. The second generation of the Xeon Phi processor based on the Knights Landing (KNL) architecture, promises the computational capabilities of an accelerator but require the same programming effort as traditional multi-core processors. The high computational performance is realized through many integrated cores (number of cores and tiles and memory varies with the model) organized in tiles that are connected via a 2D mesh based interconnect. In contrary to accelerators, KNL is a self-hosted system, meaning explicit data transfers over the PCI bus are no longer required. However, like most accelerators, KNL sports a memory subsystem consisting of low-level caches and 16GB of high-bandwidth MCDRAM memory. For capacity computing, up to 400GB of conventional DDR4 memory is provided. Such a strict hierarchical memory layout means that data locality is imperative if the true potential of this product is to be harnessed. In this work, we study a series of optimizations specifically targeting KNL for our EWE based application to reduce the time-to-solution time for the following 3D model sizes in grid points: 1283, 2563 and 5123. We compare the results with an optimized version for multi-core CPUs running on a dual-socket Xeon E5 2680v3 system using OpenMP. Our initial naive implementation on the KNL is roughly 20% faster than the multi-core version, but by using only one thread per core and careful memory placement using the memkind library, we could achieve higher speedups. Additionally, by using the MCDRAM as cache for problem sizes that are smaller than 16 GB further performance improvements were unlocked. Depending on the problem size, our overall results indicate that the KNL based system is approximately 2.2x faster than the 24-core Xeon E5 2680v3 system, with only modest changes to the code.
Performance analysis of parallel gravitational N-body codes on large GPU clusters
NASA Astrophysics Data System (ADS)
Huang, Si-Yi; Spurzem, Rainer; Berczik, Peter
2016-01-01
We compare the performance of two very different parallel gravitational N-body codes for astrophysical simulations on large Graphics Processing Unit (GPU) clusters, both of which are pioneers in their own fields as well as on certain mutual scales - NBODY6++ and Bonsai. We carry out benchmarks of the two codes by analyzing their performance, accuracy and efficiency through the modeling of structure decomposition and timing measurements. We find that both codes are heavily optimized to leverage the computational potential of GPUs as their performance has approached half of the maximum single precision performance of the underlying GPU cards. With such performance we predict that a speed-up of 200 - 300 can be achieved when up to 1k processors and GPUs are employed simultaneously. We discuss the quantitative information about comparisons of the two codes, finding that in the same cases Bonsai adopts larger time steps as well as larger relative energy errors than NBODY6++, typically ranging from 10 - 50 times larger, depending on the chosen parameters of the codes. Although the two codes are built for different astrophysical applications, in specified conditions they may overlap in performance at certain physical scales, thus allowing the user to choose either one by fine-tuning parameters accordingly.
High-Speed GPU-Based Fully Three-Dimensional Diffuse Optical Tomographic System
Saikia, Manob Jyoti; Kanhirodan, Rajan; Mohan Vasu, Ram
2014-01-01
We have developed a graphics processor unit (GPU-) based high-speed fully 3D system for diffuse optical tomography (DOT). The reduction in execution time of 3D DOT algorithm, a severely ill-posed problem, is made possible through the use of (1) an algorithmic improvement that uses Broyden approach for updating the Jacobian matrix and thereby updating the parameter matrix and (2) the multinode multithreaded GPU and CUDA (Compute Unified Device Architecture) software architecture. Two different GPU implementations of DOT programs are developed in this study: (1) conventional C language program augmented by GPU CUDA and CULA routines (C GPU), (2) MATLAB program supported by MATLAB parallel computing toolkit for GPU (MATLAB GPU). The computation time of the algorithm on host CPU and the GPU system is presented for C and Matlab implementations. The forward computation uses finite element method (FEM) and the problem domain is discretized into 14610, 30823, and 66514 tetrahedral elements. The reconstruction time, so achieved for one iteration of the DOT reconstruction for 14610 elements, is 0.52 seconds for a C based GPU program for 2-plane measurements. The corresponding MATLAB based GPU program took 0.86 seconds. The maximum number of reconstructed frames so achieved is 2 frames per second. PMID:24891848
High-Speed GPU-Based Fully Three-Dimensional Diffuse Optical Tomographic System.
Saikia, Manob Jyoti; Kanhirodan, Rajan; Mohan Vasu, Ram
2014-01-01
We have developed a graphics processor unit (GPU-) based high-speed fully 3D system for diffuse optical tomography (DOT). The reduction in execution time of 3D DOT algorithm, a severely ill-posed problem, is made possible through the use of (1) an algorithmic improvement that uses Broyden approach for updating the Jacobian matrix and thereby updating the parameter matrix and (2) the multinode multithreaded GPU and CUDA (Compute Unified Device Architecture) software architecture. Two different GPU implementations of DOT programs are developed in this study: (1) conventional C language program augmented by GPU CUDA and CULA routines (C GPU), (2) MATLAB program supported by MATLAB parallel computing toolkit for GPU (MATLAB GPU). The computation time of the algorithm on host CPU and the GPU system is presented for C and Matlab implementations. The forward computation uses finite element method (FEM) and the problem domain is discretized into 14610, 30823, and 66514 tetrahedral elements. The reconstruction time, so achieved for one iteration of the DOT reconstruction for 14610 elements, is 0.52 seconds for a C based GPU program for 2-plane measurements. The corresponding MATLAB based GPU program took 0.86 seconds. The maximum number of reconstructed frames so achieved is 2 frames per second.
NASA Astrophysics Data System (ADS)
Bailes, M.; Jameson, A.; Flynn, C.; Bateman, T.; Barr, E. D.; Bhandari, S.; Bunton, J. D.; Caleb, M.; Campbell-Wilson, D.; Farah, W.; Gaensler, B.; Green, A. J.; Hunstead, R. W.; Jankowski, F.; Keane, E. F.; Krishnan, V. Venkatraman; Murphy, Tara; O'Neill, M.; Osłowski, S.; Parthasarathy, A.; Ravi, V.; Rosado, P.; Temby, D.
2017-10-01
The Molonglo Observatory Synthesis Telescope (MOST) is an 18000 m2 radio telescope located 40 km from Canberra, Australia. Its operating band (820-851 MHz) is partly allocated to telecommunications, making radio astronomy challenging. We describe how the deployment of new digital receivers, Field Programmable Gate Array-based filterbanks, and server-class computers equipped with 43 Graphics Processing Units, has transformed the telescope into a versatile new instrument (UTMOST) for studying the radio sky on millisecond timescales. UTMOST has 10 times the bandwidth and double the field of view compared to the MOST, and voltage record and playback capability has facilitated rapid implementaton of many new observing modes, most of which operate commensally. UTMOST can simultaneously excise interference, make maps, coherently dedisperse pulsars, and perform real-time searches of coherent fan-beams for dispersed single pulses. UTMOST operates as a robotic facility, deciding how to efficiently target pulsars and how long to stay on source via real-time pulsar folding, while searching for single pulse events. Regular timing of over 300 pulsars has yielded seven pulsar glitches and three Fast Radio Bursts during commissioning. UTMOST demonstrates that if sufficient signal processing is applied to voltage streams, innovative science remains possible even in hostile radio frequency environments.
Shape reconstruction of irregular bodies with multiple complementary data sources
NASA Astrophysics Data System (ADS)
Kaasalainen, M.; Viikinkoski, M.
2012-07-01
We discuss inversion methods for shape reconstruction with complementary data sources. The current main sources are photometry, adaptive optics or other images, occultation timings, and interferometry, and the procedure can readily be extended to include range-Doppler radar and thermal infrared data as well. We introduce the octantoid, a generally applicable shape support that can be automatically used for surface types encountered in planetary research, including strongly nonconvex or non-starlike shapes. We present models of Kleopatra and Hermione from multimodal data as examples of this approach. An important concept in this approach is the optimal weighting of the various data modes. We define the maximum compatibility estimate, a multimodal generalization of the maximum likelihood estimate, for this purpose. We also present a specific version of the procedure for asteroid flyby missions, with which one can reconstruct the complete shape of the target by using the flyby-based map of a part of the surface together with other available data. Finally, we show that the relative volume error of a shape solution is usually approximately equal to the relative shape error rather than its multiple. Our algorithms are trivially parallelizable, so running the code on a CUDA-enabled graphics processing unit is some two orders of magnitude faster than the usual single-processor mode.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brown, W Michael; Kohlmeyer, Axel; Plimpton, Steven J
The use of accelerators such as graphics processing units (GPUs) has become popular in scientific computing applications due to their low cost, impressive floating-point capabilities, high memory bandwidth, and low electrical power requirements. Hybrid high-performance computers, machines with nodes containing more than one type of floating-point processor (e.g. CPU and GPU), are now becoming more prevalent due to these advantages. In this paper, we present a continuation of previous work implementing algorithms for using accelerators into the LAMMPS molecular dynamics software for distributed memory parallel hybrid machines. In our previous work, we focused on acceleration for short-range models with anmore » approach intended to harness the processing power of both the accelerator and (multi-core) CPUs. To augment the existing implementations, we present an efficient implementation of long-range electrostatic force calculation for molecular dynamics. Specifically, we present an implementation of the particle-particle particle-mesh method based on the work by Harvey and De Fabritiis. We present benchmark results on the Keeneland InfiniBand GPU cluster. We provide a performance comparison of the same kernels compiled with both CUDA and OpenCL. We discuss limitations to parallel efficiency and future directions for improving performance on hybrid or heterogeneous computers.« less
Fast parallel tandem mass spectral library searching using GPU hardware acceleration.
Baumgardner, Lydia Ashleigh; Shanmugam, Avinash Kumar; Lam, Henry; Eng, Jimmy K; Martin, Daniel B
2011-06-03
Mass spectrometry-based proteomics is a maturing discipline of biologic research that is experiencing substantial growth. Instrumentation has steadily improved over time with the advent of faster and more sensitive instruments collecting ever larger data files. Consequently, the computational process of matching a peptide fragmentation pattern to its sequence, traditionally accomplished by sequence database searching and more recently also by spectral library searching, has become a bottleneck in many mass spectrometry experiments. In both of these methods, the main rate-limiting step is the comparison of an acquired spectrum with all potential matches from a spectral library or sequence database. This is a highly parallelizable process because the core computational element can be represented as a simple but arithmetically intense multiplication of two vectors. In this paper, we present a proof of concept project taking advantage of the massively parallel computing available on graphics processing units (GPUs) to distribute and accelerate the process of spectral assignment using spectral library searching. This program, which we have named FastPaSS (for Fast Parallelized Spectral Searching), is implemented in CUDA (Compute Unified Device Architecture) from NVIDIA, which allows direct access to the processors in an NVIDIA GPU. Our efforts demonstrate the feasibility of GPU computing for spectral assignment, through implementation of the validated spectral searching algorithm SpectraST in the CUDA environment.
A Blended Approach to Reading and Writing Graphic Stories
ERIC Educational Resources Information Center
Brown, Sally
2013-01-01
This article documents the experiences of a diverse group of second grade students during a nine week unit of study focused on graphic stories. The project begins as the class is immersed in reading graphic stories designed for young readers. Images, written text, and dialog are utilized to scaffold reading comprehension and to practice fluency.…
ERIC Educational Resources Information Center
Masuchika, Glenn; Boldt, Gail
2010-01-01
American graphic novels are increasingly recognized as high-quality literature and an interesting genre for academic study. Graphic novels of Japan, called manga, have established a strong foothold in American culture. This preliminary survey of 44 United States university libraries demonstrates that Japanese manga in translation are consistently…
Space Tug Avionics Definition Study. Volume 5: Cost and Programmatics
NASA Technical Reports Server (NTRS)
1975-01-01
The baseline avionics system features a central digital computer that integrates the functions of all the space tug subsystems by means of a redundant digital data bus. The central computer consists of dual central processor units, dual input/output processors, and a fault tolerant memory, utilizing internal redundancy and error checking. Three electronically steerable phased arrays provide downlink transmission from any tug attitude directly to ground or via TDRS. Six laser gyros and six accelerometers in a dodecahedron configuration make up the inertial measurement unit. Both a scanning laser radar and a TV system, employing strobe lamps, are required as acquisition and docking sensors. Primary dc power at a nominal 28 volts is supplied from dual lightweight, thermally integrated fuel cells which operate from propellant grade reactants out of the main tanks.
A heterogeneous and parallel computing framework for high-resolution hydrodynamic simulations
NASA Astrophysics Data System (ADS)
Smith, Luke; Liang, Qiuhua
2015-04-01
Shock-capturing hydrodynamic models are now widely applied in the context of flood risk assessment and forecasting, accurately capturing the behaviour of surface water over ground and within rivers. Such models are generally explicit in their numerical basis, and can be computationally expensive; this has prohibited full use of high-resolution topographic data for complex urban environments, now easily obtainable through airborne altimetric surveys (LiDAR). As processor clock speed advances have stagnated in recent years, further computational performance gains are largely dependent on the use of parallel processing. Heterogeneous computing architectures (e.g. graphics processing units or compute accelerator cards) provide a cost-effective means of achieving high throughput in cases where the same calculation is performed with a large input dataset. In recent years this technique has been applied successfully for flood risk mapping, such as within the national surface water flood risk assessment for the United Kingdom. We present a flexible software framework for hydrodynamic simulations across multiple processors of different architectures, within multiple computer systems, enabled using OpenCL and Message Passing Interface (MPI) libraries. A finite-volume Godunov-type scheme is implemented using the HLLC approach to solving the Riemann problem, with optional extension to second-order accuracy in space and time using the MUSCL-Hancock approach. The framework is successfully applied on personal computers and a small cluster to provide considerable improvements in performance. The most significant performance gains were achieved across two servers, each containing four NVIDIA GPUs, with a mix of K20, M2075 and C2050 devices. Advantages are found with respect to decreased parametric sensitivity, and thus in reducing uncertainty, for a major fluvial flood within a large catchment during 2005 in Carlisle, England. Simulations for the three-day event could be performed on a 2m grid within a few hours. In the context of a rapid pluvial flood event in Newcastle upon Tyne during 2012, the technique allows simulation of inundation for a 31km2 of the city centre in less than an hour on a 2m grid; however, further grid refinement is required to fully capture important smaller flow pathways. Good agreement between the model and observed inundation is achieved for a variety of dam failure, slow fluvial inundation, rapid pluvial inundation, and defence breach scenarios in the UK.
Code of Federal Regulations, 2010 CFR
2010-07-01
...) Symbol means any design or graphic used by the United States Mint or the Treasury Department to represent themselves or their products. A design or graphic may include (1) A trademark, designation of origin, or mark...
Hyper-Spectral Synthesis of Active OB Stars Using GLaDoS
NASA Astrophysics Data System (ADS)
Hill, N. R.; Townsend, R. H. D.
2016-11-01
In recent years there has been considerable interest in using graphics processing units (GPUs) to perform scientific computations that have traditionally been handled by central processing units (CPUs). However, there is one area where the scientific potential of GPUs has been overlooked - computer graphics, the task they were originally designed for. Here we introduce GLaDoS, a hyper-spectral code which leverages the graphics capabilities of GPUs to synthesize spatially and spectrally resolved images of complex stellar systems. We demonstrate how GLaDoS can be applied to calculate observables for various classes of stars including systems with inhomogenous surface temperatures and contact binaries.
The analysis of optical-electro collimated light tube measurement system
NASA Astrophysics Data System (ADS)
Li, Zhenhui; Jiang, Tao; Cao, Guohua; Wang, Yanfei
2005-12-01
A new type of collimated light tube (CLT) is mentioned in this paper. The analysis and structure of CLT are described detail. The reticle and discrimination board are replaced by a optical-electro graphics generator, or DLP-Digital Light Processor. DLP gives all kinds of graphics controlled by computer, the lighting surface lies on the focus of the CLT. The rays of light pass through the CLT, and the tested products, the image of aim is received by variant focus objective CCD camera, the image can be processed by computer, then, some basic optical parameters will be obtained, such as optical aberration, image slope, etc. At the same time, motorized translation stage carry the DLP moving to simulate the limited distance. The grating ruler records the displacement of the DLP. The key technique is optical-electro auto-focus, the best imaging quality can be gotten by moving 6-D motorized positioning stage. Some principal questions can be solved in this device, for example, the aim generating, the structure of receiving system and optical matching.
Programmable personality interface for the dynamic infrared scene generator (IRSG2)
NASA Astrophysics Data System (ADS)
Buford, James A., Jr.; Mobley, Scott B.; Mayhall, Anthony J.; Braselton, William J.
1998-07-01
As scene generator platforms begin to rely specifically on commercial off-the-shelf (COTS) hardware and software components, the need for high speed programmable personality interfaces (PPIs) are required for interfacing to Infrared (IR) flight computer/processors and complex IR projectors in the hardware-in-the-loop (HWIL) simulation facilities. Recent technological advances and innovative applications of established technologies are beginning to allow development of cost effective PPIs to interface to COTS scene generators. At the U.S. Army Aviation and Missile Command (AMCOM) Missile Research, Development, and Engineering Center (MRDEC) researchers have developed such a PPI to reside between the AMCOM MRDEC IR Scene Generator (IRSG) and either a missile flight computer or the dynamic Laser Diode Array Projector (LDAP). AMCOM MRDEC has developed several PPIs for the first and second generation IRSGs (IRSG1 and IRSG2), which are based on Silicon Graphics Incorporated (SGI) Onyx and Onyx2 computers with Reality Engine 2 (RE2) and Infinite Reality (IR/IR2) graphics engines. This paper provides an overview of PPIs designed, integrated, tested, and verified at AMCOM MRDEC, specifically the IRSG2's PPI.
User's guide to the Reliability Estimation System Testbed (REST)
NASA Technical Reports Server (NTRS)
Nicol, David M.; Palumbo, Daniel L.; Rifkin, Adam
1992-01-01
The Reliability Estimation System Testbed is an X-window based reliability modeling tool that was created to explore the use of the Reliability Modeling Language (RML). RML was defined to support several reliability analysis techniques including modularization, graphical representation, Failure Mode Effects Simulation (FMES), and parallel processing. These techniques are most useful in modeling large systems. Using modularization, an analyst can create reliability models for individual system components. The modules can be tested separately and then combined to compute the total system reliability. Because a one-to-one relationship can be established between system components and the reliability modules, a graphical user interface may be used to describe the system model. RML was designed to permit message passing between modules. This feature enables reliability modeling based on a run time simulation of the system wide effects of a component's failure modes. The use of failure modes effects simulation enhances the analyst's ability to correctly express system behavior when using the modularization approach to reliability modeling. To alleviate the computation bottleneck often found in large reliability models, REST was designed to take advantage of parallel processing on hypercube processors.
76 FR 31575 - United States Standards for Grades of Frozen Onions
Federal Register 2010, 2011, 2012, 2013, 2014
2011-06-01
... processors of frozen onions in the United States. The petition provided information on style, sample size... change in the style designations for minced style, and a correction to the text. The members agreed with the proposed section concerning requirements for Styles, Type I, Whole onions and Type II, Pearl...
Utility of coupling nonlinear optimization methods with numerical modeling software
DOE Office of Scientific and Technical Information (OSTI.GOV)
Murphy, M.J.
1996-08-05
Results of using GLO (Global Local Optimizer), a general purpose nonlinear optimization software package for investigating multi-parameter problems in science and engineering is discussed. The package consists of the modular optimization control system (GLO), a graphical user interface (GLO-GUI), a pre-processor (GLO-PUT), a post-processor (GLO-GET), and nonlinear optimization software modules, GLOBAL & LOCAL. GLO is designed for controlling and easy coupling to any scientific software application. GLO runs the optimization module and scientific software application in an iterative loop. At each iteration, the optimization module defines new values for the set of parameters being optimized. GLO-PUT inserts the new parametermore » values into the input file of the scientific application. GLO runs the application with the new parameter values. GLO-GET determines the value of the objective function by extracting the results of the analysis and comparing to the desired result. GLO continues to run the scientific application over and over until it finds the ``best`` set of parameters by minimizing (or maximizing) the objective function. An example problem showing the optimization of material model is presented (Taylor cylinder impact test).« less
Element Load Data Processor (ELDAP) Users Manual
NASA Technical Reports Server (NTRS)
Ramsey, John K., Jr.; Ramsey, John K., Sr.
2015-01-01
Often, the shear and tensile forces and moments are extracted from finite element analyses to be used in off-line calculations for evaluating the integrity of structural connections involving bolts, rivets, and welds. Usually the maximum forces and moments are desired for use in the calculations. In situations where there are numerous structural connections of interest for numerous load cases, the effort in finding the true maximum force and/or moment combinations among all fasteners and welds and load cases becomes difficult. The Element Load Data Processor (ELDAP) software described herein makes this effort manageable. This software eliminates the possibility of overlooking the worst-case forces and moments that could result in erroneous positive margins of safety and/or selecting inconsistent combinations of forces and moments resulting in false negative margins of safety. In addition to forces and moments, any scalar quantity output in a PATRAN report file may be evaluated with this software. This software was originally written to fill an urgent need during the structural analysis of the Ares I-X Interstage segment. As such, this software was coded in a straightforward manner with no effort made to optimize or minimize code or to develop a graphical user interface.
DYCAST: A finite element program for the crash analysis of structures
NASA Technical Reports Server (NTRS)
Pifko, A. B.; Winter, R.; Ogilvie, P.
1987-01-01
DYCAST is a nonlinear structural dynamic finite element computer code developed for crash simulation. The element library contains stringers, beams, membrane skin triangles, plate bending triangles and spring elements. Changing stiffnesses in the structure are accounted for by plasticity and very large deflections. Material nonlinearities are accommodated by one of three options: elastic-perfectly plastic, elastic-linear hardening plastic, or elastic-nonlinear hardening plastic of the Ramberg-Osgood type. Geometric nonlinearities are handled in an updated Lagrangian formulation by reforming the structure into its deformed shape after small time increments while accumulating deformations, strains, and forces. The nonlinearities due to combined loadings are maintained, and stiffness variation due to structural failures are computed. Numerical time integrators available are fixed-step central difference, modified Adams, Newmark-beta, and Wilson-theta. The last three have a variable time step capability, which is controlled internally by a solution convergence error measure. Other features include: multiple time-load history tables to subject the structure to time dependent loading; gravity loading; initial pitch, roll, yaw, and translation of the structural model with respect to the global system; a bandwidth optimizer as a pre-processor; and deformed plots and graphics as post-processors.
NASA Astrophysics Data System (ADS)
Schaaf, Kjeld; Overeem, Ruud
2004-06-01
Moore’s law is best exploited by using consumer market hardware. In particular, the gaming industry pushes the limit of processor performance thus reducing the cost per raw flop even faster than Moore’s law predicts. Next to the cost benefits of Common-Of-The-Shelf (COTS) processing resources, there is a rapidly growing experience pool in cluster based processing. The typical Beowulf cluster of PC’s supercomputers are well known. Multiple examples exists of specialised cluster computers based on more advanced server nodes or even gaming stations. All these cluster machines build upon the same knowledge about cluster software management, scheduling, middleware libraries and mathematical libraries. In this study, we have integrated COTS processing resources and cluster nodes into a very high performance processing platform suitable for streaming data applications, in particular to implement a correlator. The required processing power for the correlator in modern radio telescopes is in the range of the larger supercomputers, which motivates the usage of supercomputer technology. Raw processing power is provided by graphical processors and is combined with an Infiniband host bus adapter with integrated data stream handling logic. With this processing platform a scalable correlator can be built with continuously growing processing power at consumer market prices.
Measurement of fault latency in a digital avionic mini processor, part 2
NASA Technical Reports Server (NTRS)
Mcgough, J.; Swern, F.
1983-01-01
The results of fault injection experiments utilizing a gate-level emulation of the central processor unit of the Bendix BDX-930 digital computer are described. Several earlier programs were reprogrammed, expanding the instruction set to capitalize on the full power of the BDX-930 computer. As a final demonstration of fault coverage an extensive, 3-axis, high performance flght control computation was added. The stages in the development of a CPU self-test program emphasizing the relationship between fault coverage, speed, and quantity of instructions were demonstrated.
NASA Technical Reports Server (NTRS)
Alkhatib, Hasan S.
1991-01-01
The hardware and the software architecture of the TurboLAN Intelligent Network Adapter Card (TINAC) are described. A high level as well as detailed treatment of the workings of various components of the TINAC are presented. The TINAC is divided into the following four major functional units: (1) the network access unit (NAU); (2) the buffer management unit; (3) the host interface unit; and (4) the node processor unit.
ERIC Educational Resources Information Center
Zebehazy, Kim T.; Wilton, Adam P.
2014-01-01
Introduction: This study investigated the perceptions and practices of teachers of students with visual impairments in Canada and the United States regarding graphics (both tactile and print) that are used by students with visual impairments. Questions focused on quality, importance, and instruction in the use of graphics. Methods: An electronic…
Fuel processors for fuel cell APU applications
NASA Astrophysics Data System (ADS)
Aicher, T.; Lenz, B.; Gschnell, F.; Groos, U.; Federici, F.; Caprile, L.; Parodi, L.
The conversion of liquid hydrocarbons to a hydrogen rich product gas is a central process step in fuel processors for auxiliary power units (APUs) for vehicles of all kinds. The selection of the reforming process depends on the fuel and the type of the fuel cell. For vehicle power trains, liquid hydrocarbons like gasoline, kerosene, and diesel are utilized and, therefore, they will also be the fuel for the respective APU systems. The fuel cells commonly envisioned for mobile APU applications are molten carbonate fuel cells (MCFC), solid oxide fuel cells (SOFC), and proton exchange membrane fuel cells (PEMFC). Since high-temperature fuel cells, e.g. MCFCs or SOFCs, can be supplied with a feed gas that contains carbon monoxide (CO) their fuel processor does not require reactors for CO reduction and removal. For PEMFCs on the other hand, CO concentrations in the feed gas must not exceed 50 ppm, better 20 ppm, which requires additional reactors downstream of the reforming reactor. This paper gives an overview of the current state of the fuel processor development for APU applications and APU system developments. Furthermore, it will present the latest developments at Fraunhofer ISE regarding fuel processors for high-temperature fuel cell APU systems on board of ships and aircrafts.
Dazert, Stefan; Thomas, Jan Peter; Büchner, Andreas; Müller, Joachim; Hempel, John Martin; Löwenheim, Hubert; Mlynski, Robert
2017-03-01
The RONDO is a single-unit cochlear implant audio processor, which omits the need for a behind-the-ear (BTE) audio processor. The primary aim was to compare speech perception results in quiet and in noise with the RONDO and the OPUS 2, a BTE audio processor. Secondary aims were to determine subjects' self-assessed levels of sound quality and gather subjective feedback on RONDO use. All speech perception tests were performed with the RONDO and the OPUS 2 behind-the-ear audio processor at 3 test intervals. Subjects were required to use the RONDO between test intervals. Subjects were tested at upgrade from the OPUS 2 to the RONDO and at 1 and 6 months after upgrade. Speech perception was determined using the Freiburg Monosyllables in quiet test and the Oldenburg Sentence Test (OLSA) in noise. Subjective perception was determined using the Hearing Implant Sound Quality Index (HISQUI 19 ), and a RONDO device-specific questionnaire. 50 subjects participated in the study. Neither speech perception scores nor self-perceived sound quality scores were significantly different at any interval between the RONDO and the OPUS 2. Subjects reported high levels of satisfaction with the RONDO. The RONDO provides comparable speech perception to the OPUS 2 while providing users with high levels of satisfaction and comfort without increasing health risk. The RONDO is a suitable and safe alternative to traditional BTE audio processors.
Direct match data flow machine apparatus and process for data driven computing
Davidson, G.S.; Grafe, V.G.
1997-08-12
A data flow computer and method of computing are disclosed which utilizes a data driven processor node architecture. The apparatus in a preferred embodiment includes a plurality of First-In-First-Out (FIFO) registers, a plurality of related data flow memories, and a processor. The processor makes the necessary calculations and includes a control unit to generate signals to enable the appropriate FIFO register receiving the result. In a particular embodiment, there are three FIFO registers per node: an input FIFO register to receive input information form an outside source and provide it to the data flow memories; an output FIFO register to provide output information from the processor to an outside recipient; and an internal FIFO register to provide information from the processor back to the data flow memories. The data flow memories are comprised of four commonly addressed memories. A parameter memory holds the A and B parameters used in the calculations; an opcode memory holds the instruction; a target memory holds the output address; and a tag memory contains status bits for each parameter. One status bit indicates whether the corresponding parameter is in the parameter memory and one status but to indicate whether the stored information in the corresponding data parameter is to be reused. The tag memory outputs a ``fire`` signal (signal R VALID) when all of the necessary information has been stored in the data flow memories, and thus when the instruction is ready to be fired to the processor. 11 figs.
Data flow machine for data driven computing
Davidson, G.S.; Grafe, V.G.
1988-07-22
A data flow computer and method of computing is disclosed which utilizes a data driven processor node architecture. The apparatus in a preferred embodiment includes a plurality of First-In-First-Out (FIFO) registers, a plurality of related data flow memories, and a processor. The processor makes the necessary calculations and includes a control unit to generate signals to enable the appropriate FIFO register receiving the result. In a particular embodiment, there are three FIFO registers per node: an input FIFO register to receive input information from an outside source and provide it to the data flow memories; an output FIFO register to provide output information from the processor to an outside recipient; and an internal FIFO register to provide information from the processor back to the data flow memories. The data flow memories are comprised of four commonly addressed memories. A parameter memory holds the A and B parameters used in the calculations; an opcode memory holds the instruction; a target memory holds the output address; and a tag memory contains status bits for each parameter. One status bit indicates whether the corresponding parameter is in the parameter memory and one status bit to indicate whether the stored information in the corresponding data parameter is to be reused. The tag memory outputs a ''fire'' signal (signal R VALID) when all of the necessary information has been stored in the data flow memories, and thus when the instruction is ready to be fired to the processor. 11 figs.
Data flow machine for data driven computing
Davidson, George S.; Grafe, Victor G.
1995-01-01
A data flow computer which of computing is disclosed which utilizes a data driven processor node architecture. The apparatus in a preferred embodiment includes a plurality of First-In-First-Out (FIFO) registers, a plurality of related data flow memories, and a processor. The processor makes the necessary calculations and includes a control unit to generate signals to enable the appropriate FIFO register receiving the result. In a particular embodiment, there are three FIFO registers per node: an input FIFO register to receive input information form an outside source and provide it to the data flow memories; an output FIFO register to provide output information from the processor to an outside recipient; and an internal FIFO register to provide information from the processor back to the data flow memories. The data flow memories are comprised of four commonly addressed memories. A parameter memory holds the A and B parameters used in the calculations; an opcode memory holds the instruction; a target memory holds the output address; and a tag memory contains status bits for each parameter. One status bit indicates whether the corresponding parameter is in the parameter memory and one status but to indicate whether the stored information in the corresponding data parameter is to be reused. The tag memory outputs a "fire" signal (signal R VALID) when all of the necessary information has been stored in the data flow memories, and thus when the instruction is ready to be fired to the processor.
Direct match data flow machine apparatus and process for data driven computing
Davidson, George S.; Grafe, Victor Gerald
1997-01-01
A data flow computer and method of computing is disclosed which utilizes a data driven processor node architecture. The apparatus in a preferred embodiment includes a plurality of First-In-First-Out (FIFO) registers, a plurality of related data flow memories, and a processor. The processor makes the necessary calculations and includes a control unit to generate signals to enable the appropriate FIFO register receiving the result. In a particular embodiment, there are three FIFO registers per node: an input FIFO register to receive input information form an outside source and provide it to the data flow memories; an output FIFO register to provide output information from the processor to an outside recipient; and an internal FIFO register to provide information from the processor back to the data flow memories. The data flow memories are comprised of four commonly addressed memories. A parameter memory holds the A and B parameters used in the calculations; an opcode memory holds the instruction; a target memory holds the output address; and a tag memory contains status bits for each parameter. One status bit indicates whether the corresponding parameter is in the parameter memory and one status but to indicate whether the stored information in the corresponding data parameter is to be reused. The tag memory outputs a "fire" signal (signal R VALID) when all of the necessary information has been stored in the data flow memories, and thus when the instruction is ready to be fired to the processor.
Direct match data flow memory for data driven computing
Davidson, George S.; Grafe, Victor Gerald
1997-01-01
A data flow computer and method of computing is disclosed which utilizes a data driven processor node architecture. The apparatus in a preferred embodiment includes a plurality of First-In-First-Out (FIFO) registers, a plurality of related data flow memories, and a processor. The processor makes the necessary calculations and includes a control unit to generate signals to enable the appropriate FIFO register receiving the result. In a particular embodiment, there are three FIFO registers per node: an input FIFO register to receive input information form an outside source and provide it to the data flow memories; an output FIFO register to provide output information from the processor to an outside recipient; and an internal FIFO register to provide information from the processor back to the data flow memories. The data flow memories are comprised of four commonly addressed memories. A parameter memory holds the A and B parameters used in the calculations; an opcode memory holds the instruction; a target memory holds the output address; and a tag memory contains status bits for each parameter. One status bit indicates whether the corresponding parameter is in the parameter memory and one status bit to indicate whether the stored information in the corresponding data parameter is to be reused. The tag memory outputs a "fire" signal (signal R VALID) when all of the necessary information has been stored in the data flow memories, and thus when the instruction is ready to be fired to the processor.
Direct match data flow memory for data driven computing
Davidson, G.S.; Grafe, V.G.
1997-10-07
A data flow computer and method of computing is disclosed which utilizes a data driven processor node architecture. The apparatus in a preferred embodiment includes a plurality of First-In-First-Out (FIFO) registers, a plurality of related data flow memories, and a processor. The processor makes the necessary calculations and includes a control unit to generate signals to enable the appropriate FIFO register receiving the result. In a particular embodiment, there are three FIFO registers per node: an input FIFO register to receive input information form an outside source and provide it to the data flow memories; an output FIFO register to provide output information from the processor to an outside recipient; and an internal FIFO register to provide information from the processor back to the data flow memories. The data flow memories are comprised of four commonly addressed memories. A parameter memory holds the A and B parameters used in the calculations; an opcode memory holds the instruction; a target memory holds the output address; and a tag memory contains status bits for each parameter. One status bit indicates whether the corresponding parameter is in the parameter memory and one status bit to indicate whether the stored information in the corresponding data parameter is to be reused. The tag memory outputs a ``fire`` signal (signal R VALID) when all of the necessary information has been stored in the data flow memories, and thus when the instruction is ready to be fired to the processor. 11 figs.
Wargaming and interactive color graphics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bly, S.; Buzzell, C.; Smith, G.
1980-08-04
JANUS is a two-sided interactive color graphic simulation in which human commanders can direct their forces, each trying to accomplish their mission. This competitive synthetic battlefield is used to explore the range of human ingenuity under conditions of incomplete information about enemy strength and deployment. Each player can react to new situations by planning new unit movements, using conventional and nuclear weapons, or modifying unit objectives. Conventional direct fire among tanks, infantry fighting vehicles, helicopters, and other units is automated subject to constraints of target acquisition, reload rate, range, suppression, etc. Artillery and missile indirect fire systems deliver conventional munitions,more » smoke, and nuclear weapons. Players use reconnaissance units, helicopters, or fixed wing aircraft to search for enemy unit locations. Counter-battery radars acquire enemy artillery. The JANUS simulation at LLL has demonstrated the value of the computer as a sophisticated blackboard. A small dedicated minicomputer is adequate for detailed calculations, and may be preferable to sharing a more powerful machine. Real-time color interactive graphics are essential to allow realistic command decision inputs. Competitive human-versus-human synthetic experiences are intense and well-remembered. 2 figures.« less
Kiefer, Gundolf; Lehmann, Helko; Weese, Jürgen
2006-04-01
Maximum intensity projections (MIPs) are an important visualization technique for angiographic data sets. Efficient data inspection requires frame rates of at least five frames per second at preserved image quality. Despite the advances in computer technology, this task remains a challenge. On the one hand, the sizes of computed tomography and magnetic resonance images are increasing rapidly. On the other hand, rendering algorithms do not automatically benefit from the advances in processor technology, especially for large data sets. This is due to the faster evolving processing power and the slower evolving memory access speed, which is bridged by hierarchical cache memory architectures. In this paper, we investigate memory access optimization methods and use them for generating MIPs on general-purpose central processing units (CPUs) and graphics processing units (GPUs), respectively. These methods can work on any level of the memory hierarchy, and we show that properly combined methods can optimize memory access on multiple levels of the hierarchy at the same time. We present performance measurements to compare different algorithm variants and illustrate the influence of the respective techniques. On current hardware, the efficient handling of the memory hierarchy for CPUs improves the rendering performance by a factor of 3 to 4. On GPUs, we observed that the effect is even larger, especially for large data sets. The methods can easily be adjusted to different hardware specifics, although their impact can vary considerably. They can also be used for other rendering techniques than MIPs, and their use for more general image processing task could be investigated in the future.
Job-mix modeling and system analysis of an aerospace multiprocessor.
NASA Technical Reports Server (NTRS)
Mallach, E. G.
1972-01-01
An aerospace guidance computer organization, consisting of multiple processors and memory units attached to a central time-multiplexed data bus, is described. A job mix for this type of computer is obtained by analysis of Apollo mission programs. Multiprocessor performance is then analyzed using: 1) queuing theory, under certain 'limiting case' assumptions; 2) Markov process methods; and 3) system simulation. Results of the analyses indicate: 1) Markov process analysis is a useful and efficient predictor of simulation results; 2) efficient job execution is not seriously impaired even when the system is so overloaded that new jobs are inordinately delayed in starting; 3) job scheduling is significant in determining system performance; and 4) a system having many slow processors may or may not perform better than a system of equal power having few fast processors, but will not perform significantly worse.
Parallel implementation of D-Phylo algorithm for maximum likelihood clusters.
Malik, Shamita; Sharma, Dolly; Khatri, Sunil Kumar
2017-03-01
This study explains a newly developed parallel algorithm for phylogenetic analysis of DNA sequences. The newly designed D-Phylo is a more advanced algorithm for phylogenetic analysis using maximum likelihood approach. The D-Phylo while misusing the seeking capacity of k -means keeps away from its real constraint of getting stuck at privately conserved motifs. The authors have tested the behaviour of D-Phylo on Amazon Linux Amazon Machine Image(Hardware Virtual Machine)i2.4xlarge, six central processing unit, 122 GiB memory, 8 × 800 Solid-state drive Elastic Block Store volume, high network performance up to 15 processors for several real-life datasets. Distributing the clusters evenly on all the processors provides us the capacity to accomplish a near direct speed if there should arise an occurrence of huge number of processors.
Performance Models for Split-execution Computing Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Humble, Travis S; McCaskey, Alex; Schrock, Jonathan
Split-execution computing leverages the capabilities of multiple computational models to solve problems, but splitting program execution across different computational models incurs costs associated with the translation between domains. We analyze the performance of a split-execution computing system developed from conventional and quantum processing units (QPUs) by using behavioral models that track resource usage. We focus on asymmetric processing models built using conventional CPUs and a family of special-purpose QPUs that employ quantum computing principles. Our performance models account for the translation of a classical optimization problem into the physical representation required by the quantum processor while also accounting for hardwaremore » limitations and conventional processor speed and memory. We conclude that the bottleneck in this split-execution computing system lies at the quantum-classical interface and that the primary time cost is independent of quantum processor behavior.« less
Recall Performance for Content-Addressable Memory Using Adiabatic Quantum Optimization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Imam, Neena; Humble, Travis S.; McCaskey, Alex
A content-addressable memory (CAM) stores key-value associations such that the key is recalled by providing its associated value. While CAM recall is traditionally performed using recurrent neural network models, we show how to solve this problem using adiabatic quantum optimization. Our approach maps the recurrent neural network to a commercially available quantum processing unit by taking advantage of the common underlying Ising spin model. We then assess the accuracy of the quantum processor to store key-value associations by quantifying recall performance against an ensemble of problem sets. We observe that different learning rules from the neural network community influence recallmore » accuracy but performance appears to be limited by potential noise in the processor. The strong connection established between quantum processors and neural network problems supports the growing intersection of these two ideas.« less