graphic processing unit: Topics by Science.gov

Sample records for graphic processing unit

Adaptive-optics optical coherence tomography processing using a graphics processing unit.

PubMed

Shafer, Brandon A; Kriske, Jeffery E; Kocaoglu, Omer P; Turner, Timothy L; Liu, Zhuolin; Lee, John Jaehwan; Miller, Donald T

2014-01-01

Graphics processing units are increasingly being used for scientific computing for their powerful parallel processing abilities, and moderate price compared to super computers and computing grids. In this paper we have used a general purpose graphics processing unit to process adaptive-optics optical coherence tomography (AOOCT) images in real time. Increasing the processing speed of AOOCT is an essential step in moving the super high resolution technology closer to clinical viability.
The Performance Improvement of the Lagrangian Particle Dispersion Model (LPDM) Using Graphics Processing Unit (GPU) Computing

DTIC Science & Technology

2017-08-01

access to the GPU for general purpose processing .5 CUDA is designed to work easily with multiple programming languages , including Fortran. CUDA is a...Using Graphics Processing Unit (GPU) Computing by Leelinda P Dawson Approved for public release; distribution unlimited...The Performance Improvement of the Lagrangian Particle Dispersion Model (LPDM) Using Graphics Processing Unit (GPU) Computing by Leelinda
Acceleration of integral imaging based incoherent Fourier hologram capture using graphic processing unit.

PubMed

Jeong, Kyeong-Min; Kim, Hee-Seung; Hong, Sung-In; Lee, Sung-Keun; Jo, Na-Young; Kim, Yong-Soo; Lim, Hong-Gi; Park, Jae-Hyeung

2012-10-08

Speed enhancement of integral imaging based incoherent Fourier hologram capture using a graphic processing unit is reported. Integral imaging based method enables exact hologram capture of real-existing three-dimensional objects under regular incoherent illumination. In our implementation, we apply parallel computation scheme using the graphic processing unit, accelerating the processing speed. Using enhanced speed of hologram capture, we also implement a pseudo real-time hologram capture and optical reconstruction system. The overall operation speed is measured to be 1 frame per second.
Grace: A cross-platform micromagnetic simulator on graphics processing units

NASA Astrophysics Data System (ADS)

Zhu, Ru

2015-12-01

A micromagnetic simulator running on graphics processing units (GPUs) is presented. Different from GPU implementations of other research groups which are predominantly running on NVidia's CUDA platform, this simulator is developed with C++ Accelerated Massive Parallelism (C++ AMP) and is hardware platform independent. It runs on GPUs from venders including NVidia, AMD and Intel, and achieves significant performance boost as compared to previous central processing unit (CPU) simulators, up to two orders of magnitude. The simulator paved the way for running large size micromagnetic simulations on both high-end workstations with dedicated graphics cards and low-end personal computers with integrated graphics cards, and is freely available to download.
Model for mapping settlements

DOEpatents

Vatsavai, Ranga Raju; Graesser, Jordan B.; Bhaduri, Budhendra L.

2016-07-05

A programmable media includes a graphical processing unit in communication with a memory element. The graphical processing unit is configured to detect one or more settlement regions from a high resolution remote sensed image based on the execution of programming code. The graphical processing unit identifies one or more settlements through the execution of the programming code that executes a multi-instance learning algorithm that models portions of the high resolution remote sensed image. The identification is based on spectral bands transmitted by a satellite and on selected designations of the image patches.
Target Information Processing: A Joint Decision and Estimation Approach

DTIC Science & Technology

2012-03-29

ground targets ( track - before - detect ) using computer cluster and graphics processing unit. Estimation and filtering theory is one of the most important...targets ( track - before - detect ) using computer cluster and graphics processing unit. Estimation and filtering theory is one of the most important
Software Graphics Processing Unit (sGPU) for Deep Space Applications

NASA Technical Reports Server (NTRS)

McCabe, Mary; Salazar, George; Steele, Glen

2015-01-01

A graphics processing capability will be required for deep space missions and must include a range of applications, from safety-critical vehicle health status to telemedicine for crew health. However, preliminary radiation testing of commercial graphics processing cards suggest they cannot operate in the deep space radiation environment. Investigation into an Software Graphics Processing Unit (sGPU)comprised of commercial-equivalent radiation hardened/tolerant single board computers, field programmable gate arrays, and safety-critical display software shows promising results. Preliminary performance of approximately 30 frames per second (FPS) has been achieved. Use of multi-core processors may provide a significant increase in performance.
Graphic Arts: Book Two. Process Camera, Stripping, and Platemaking.

ERIC Educational Resources Information Center

Farajollahi, Karim; And Others

The second of a three-volume set of instructional materials for a course in graphic arts, this manual consists of 10 instructional units dealing with the process camera, stripping, and platemaking. Covered in the individual units are the process camera and darkroom photography, line photography, half-tone photography, other darkroom techniques,…
A GPU-Based Wide-Band Radio Spectrometer

NASA Astrophysics Data System (ADS)

Chennamangalam, Jayanth; Scott, Simon; Jones, Glenn; Chen, Hong; Ford, John; Kepley, Amanda; Lorimer, D. R.; Nie, Jun; Prestage, Richard; Roshi, D. Anish; Wagner, Mark; Werthimer, Dan

2014-12-01

The graphics processing unit has become an integral part of astronomical instrumentation, enabling high-performance online data reduction and accelerated online signal processing. In this paper, we describe a wide-band reconfigurable spectrometer built using an off-the-shelf graphics processing unit card. This spectrometer, when configured as a polyphase filter bank, supports a dual-polarisation bandwidth of up to 1.1 GHz (or a single-polarisation bandwidth of up to 2.2 GHz) on the latest generation of graphics processing units. On the other hand, when configured as a direct fast Fourier transform, the spectrometer supports a dual-polarisation bandwidth of up to 1.4 GHz (or a single-polarisation bandwidth of up to 2.8 GHz).
Finite Element Optimization for Nondestructive Evaluation on a Graphics Processing Unit for Ground Vehicle Hull Inspection

DTIC Science & Technology

2013-08-22

4 cores, where the code may simultaneously run on the multiple cores or the graphics processing unit (or GPU – to be more specific on an NVIDIA ...allowed to get accurate crack shapes. DISCLAIMER Reference herein to any specific commercial company , product, process, or service by trade name
Performance Testing of GPU-Based Approximate Matching Algorithm on Network Traffic

DTIC Science & Technology

2015-03-01

Defense Department’s use. vi THIS PAGE INTENTIONALLY LEFT BLANK vii TABLE OF CONTENTS I. INTRODUCTION...22 D. GENERATING DIGESTS ............................................................................23 1. Reference...the-shelf GPU Graphical Processing Unit GPGPU General -Purpose Graphic Processing Unit HBSS Host-Based Security System HIPS Host Intrusion
Parallelized CCHE2D flow model with CUDA Fortran on Graphics Process Units

USDA-ARS?s Scientific Manuscript database

This paper presents the CCHE2D implicit flow model parallelized using CUDA Fortran programming technique on Graphics Processing Units (GPUs). A parallelized implicit Alternating Direction Implicit (ADI) solver using Parallel Cyclic Reduction (PCR) algorithm on GPU is developed and tested. This solve...
Graphic Arts: Book Three. The Press and Related Processes.

ERIC Educational Resources Information Center

Farajollahi, Karim; And Others

The third of a three-volume set of instructional materials for a graphic arts course, this manual consists of nine instructional units dealing with presses and related processes. Covered in the units are basic press fundamentals, offset press systems, offset press operating procedures, offset inks and dampening chemistry, preventive maintenance…
Weather information network including graphical display

NASA Technical Reports Server (NTRS)

Leger, Daniel R. (Inventor); Burdon, David (Inventor); Son, Robert S. (Inventor); Martin, Kevin D. (Inventor); Harrison, John (Inventor); Hughes, Keith R. (Inventor)

2006-01-01

An apparatus for providing weather information onboard an aircraft includes a processor unit and a graphical user interface. The processor unit processes weather information after it is received onboard the aircraft from a ground-based source, and the graphical user interface provides a graphical presentation of the weather information to a user onboard the aircraft. Preferably, the graphical user interface includes one or more user-selectable options for graphically displaying at least one of convection information, turbulence information, icing information, weather satellite information, SIGMET information, significant weather prognosis information, and winds aloft information.
Use of a graphics processing unit (GPU) to facilitate real-time 3D graphic presentation of the patient skin-dose distribution during fluoroscopic interventional procedures

PubMed Central

Rana, Vijay; Rudin, Stephen; Bednarek, Daniel R.

2012-01-01

We have developed a dose-tracking system (DTS) that calculates the radiation dose to the patient’s skin in real-time by acquiring exposure parameters and imaging-system-geometry from the digital bus on a Toshiba Infinix C-arm unit. The cumulative dose values are then displayed as a color map on an OpenGL-based 3D graphic of the patient for immediate feedback to the interventionalist. Determination of those elements on the surface of the patient 3D-graphic that intersect the beam and calculation of the dose for these elements in real time demands fast computation. Reducing the size of the elements results in more computation load on the computer processor and therefore a tradeoff occurs between the resolution of the patient graphic and the real-time performance of the DTS. The speed of the DTS for calculating dose to the skin is limited by the central processing unit (CPU) and can be improved by using the parallel processing power of a graphics processing unit (GPU). Here, we compare the performance speed of GPU-based DTS software to that of the current CPU-based software as a function of the resolution of the patient graphics. Results show a tremendous improvement in speed using the GPU. While an increase in the spatial resolution of the patient graphics resulted in slowing down the computational speed of the DTS on the CPU, the speed of the GPU-based DTS was hardly affected. This GPU-based DTS can be a powerful tool for providing accurate, real-time feedback about patient skin-dose to physicians while performing interventional procedures. PMID:24027616
Use of a graphics processing unit (GPU) to facilitate real-time 3D graphic presentation of the patient skin-dose distribution during fluoroscopic interventional procedures.

PubMed

Rana, Vijay; Rudin, Stephen; Bednarek, Daniel R

2012-02-23

We have developed a dose-tracking system (DTS) that calculates the radiation dose to the patient's skin in real-time by acquiring exposure parameters and imaging-system-geometry from the digital bus on a Toshiba Infinix C-arm unit. The cumulative dose values are then displayed as a color map on an OpenGL-based 3D graphic of the patient for immediate feedback to the interventionalist. Determination of those elements on the surface of the patient 3D-graphic that intersect the beam and calculation of the dose for these elements in real time demands fast computation. Reducing the size of the elements results in more computation load on the computer processor and therefore a tradeoff occurs between the resolution of the patient graphic and the real-time performance of the DTS. The speed of the DTS for calculating dose to the skin is limited by the central processing unit (CPU) and can be improved by using the parallel processing power of a graphics processing unit (GPU). Here, we compare the performance speed of GPU-based DTS software to that of the current CPU-based software as a function of the resolution of the patient graphics. Results show a tremendous improvement in speed using the GPU. While an increase in the spatial resolution of the patient graphics resulted in slowing down the computational speed of the DTS on the CPU, the speed of the GPU-based DTS was hardly affected. This GPU-based DTS can be a powerful tool for providing accurate, real-time feedback about patient skin-dose to physicians while performing interventional procedures.
Orthorectification by Using Gpgpu Method

NASA Astrophysics Data System (ADS)

Sahin, H.; Kulur, S.

2012-07-01

Thanks to the nature of the graphics processing, the newly released products offer highly parallel processing units with high-memory bandwidth and computational power of more than teraflops per second. The modern GPUs are not only powerful graphic engines but also they are high level parallel programmable processors with very fast computing capabilities and high-memory bandwidth speed compared to central processing units (CPU). Data-parallel computations can be shortly described as mapping data elements to parallel processing threads. The rapid development of GPUs programmability and capabilities attracted the attentions of researchers dealing with complex problems which need high level calculations. This interest has revealed the concepts of "General Purpose Computation on Graphics Processing Units (GPGPU)" and "stream processing". The graphic processors are powerful hardware which is really cheap and affordable. So the graphic processors became an alternative to computer processors. The graphic chips which were standard application hardware have been transformed into modern, powerful and programmable processors to meet the overall needs. Especially in recent years, the phenomenon of the usage of graphics processing units in general purpose computation has led the researchers and developers to this point. The biggest problem is that the graphics processing units use different programming models unlike current programming methods. Therefore, an efficient GPU programming requires re-coding of the current program algorithm by considering the limitations and the structure of the graphics hardware. Currently, multi-core processors can not be programmed by using traditional programming methods. Event procedure programming method can not be used for programming the multi-core processors. GPUs are especially effective in finding solution for repetition of the computing steps for many data elements when high accuracy is needed. Thus, it provides the computing process more quickly and accurately. Compared to the GPUs, CPUs which perform just one computing in a time according to the flow control are slower in performance. This structure can be evaluated for various applications of computer technology. In this study covers how general purpose parallel programming and computational power of the GPUs can be used in photogrammetric applications especially direct georeferencing. The direct georeferencing algorithm is coded by using GPGPU method and CUDA (Compute Unified Device Architecture) programming language. Results provided by this method were compared with the traditional CPU programming. In the other application the projective rectification is coded by using GPGPU method and CUDA programming language. Sample images of various sizes, as compared to the results of the program were evaluated. GPGPU method can be used especially in repetition of same computations on highly dense data, thus finding the solution quickly.
Graphics processing unit-assisted lossless decompression

DOEpatents

Loughry, Thomas A.

2016-04-12

Systems and methods for decompressing compressed data that has been compressed by way of a lossless compression algorithm are described herein. In a general embodiment, a graphics processing unit (GPU) is programmed to receive compressed data packets and decompress such packets in parallel. The compressed data packets are compressed representations of an image, and the lossless compression algorithm is a Rice compression algorithm.
Graphics Processing Unit Assisted Thermographic Compositing

NASA Technical Reports Server (NTRS)

Ragasa, Scott; Russell, Samuel S.

2012-01-01

Objective Develop a software application utilizing high performance computing techniques, including general purpose graphics processing units (GPGPUs), for the analysis and visualization of large thermographic data sets. Over the past several years, an increasing effort among scientists and engineers to utilize graphics processing units (GPUs) in a more general purpose fashion is allowing for previously unobtainable levels of computation by individual workstations. As data sets grow, the methods to work them grow at an equal, and often greater, pace. Certain common computations can take advantage of the massively parallel and optimized hardware constructs of the GPU which yield significant increases in performance. These common computations have high degrees of data parallelism, that is, they are the same computation applied to a large set of data where the result does not depend on other data elements. Image processing is one area were GPUs are being used to greatly increase the performance of certain analysis and visualization techniques.
Evaluating Mobile Graphics Processing Units (GPUs) for Real-Time Resource Constrained Applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Meredith, J; Conger, J; Liu, Y

2005-11-11

Modern graphics processing units (GPUs) can provide tremendous performance boosts for some applications beyond what a single CPU can accomplish, and their performance is growing at a rate faster than CPUs as well. Mobile GPUs available for laptops have the small form factor and low power requirements suitable for use in embedded processing. We evaluated several desktop and mobile GPUs and CPUs on traditional and non-traditional graphics tasks, as well as on the most time consuming pieces of a full hyperspectral imaging application. Accuracy remained high despite small differences in arithmetic operations like rounding. Performance improvements are summarized here relativemore » to a desktop Pentium 4 CPU.« less

Application of graphics processing units to search pipelines for gravitational waves from coalescing binaries of compact objects

NASA Astrophysics Data System (ADS)

Chung, Shin Kee; Wen, Linqing; Blair, David; Cannon, Kipp; Datta, Amitava

2010-07-01

We report a novel application of a graphics processing unit (GPU) for the purpose of accelerating the search pipelines for gravitational waves from coalescing binaries of compact objects. A speed-up of 16-fold in total has been achieved with an NVIDIA GeForce 8800 Ultra GPU card compared with one core of a 2.5 GHz Intel Q9300 central processing unit (CPU). We show that substantial improvements are possible and discuss the reduction in CPU count required for the detection of inspiral sources afforded by the use of GPUs.
Note: Quasi-real-time analysis of dynamic near field scattering data using a graphics processing unit

NASA Astrophysics Data System (ADS)

Cerchiari, G.; Croccolo, F.; Cardinaux, F.; Scheffold, F.

2012-10-01

We present an implementation of the analysis of dynamic near field scattering (NFS) data using a graphics processing unit. We introduce an optimized data management scheme thereby limiting the number of operations required. Overall, we reduce the processing time from hours to minutes, for typical experimental conditions. Previously the limiting step in such experiments, the processing time is now comparable to the data acquisition time. Our approach is applicable to various dynamic NFS methods, including shadowgraph, Schlieren and differential dynamic microscopy.
Accelerating Molecular Dynamic Simulation on Graphics Processing Units

PubMed Central

Friedrichs, Mark S.; Eastman, Peter; Vaidyanathan, Vishal; Houston, Mike; Legrand, Scott; Beberg, Adam L.; Ensign, Daniel L.; Bruns, Christopher M.; Pande, Vijay S.

2009-01-01

We describe a complete implementation of all-atom protein molecular dynamics running entirely on a graphics processing unit (GPU), including all standard force field terms, integration, constraints, and implicit solvent. We discuss the design of our algorithms and important optimizations needed to fully take advantage of a GPU. We evaluate its performance, and show that it can be more than 700 times faster than a conventional implementation running on a single CPU core. PMID:19191337
Hyper-Spectral Synthesis of Active OB Stars Using GLaDoS

NASA Astrophysics Data System (ADS)

Hill, N. R.; Townsend, R. H. D.

2016-11-01

In recent years there has been considerable interest in using graphics processing units (GPUs) to perform scientific computations that have traditionally been handled by central processing units (CPUs). However, there is one area where the scientific potential of GPUs has been overlooked - computer graphics, the task they were originally designed for. Here we introduce GLaDoS, a hyper-spectral code which leverages the graphics capabilities of GPUs to synthesize spatially and spectrally resolved images of complex stellar systems. We demonstrate how GLaDoS can be applied to calculate observables for various classes of stars including systems with inhomogenous surface temperatures and contact binaries.
Graphic Arts: Process Camera, Stripping, and Platemaking. Third Edition.

ERIC Educational Resources Information Center

Crummett, Dan

This document contains teacher and student materials for a course in graphic arts concentrating on camera work, stripping, and plate making in the printing process. Eight units of instruction cover the following topics: (1) the process camera and darkroom equipment; (2) line photography; (3) halftone photography; (4) other darkroom techniques; (5)…
Efficient particle-in-cell simulation of auroral plasma phenomena using a CUDA enabled graphics processing unit

NASA Astrophysics Data System (ADS)

Sewell, Stephen

This thesis introduces a software framework that effectively utilizes low-cost commercially available Graphic Processing Units (GPUs) to simulate complex scientific plasma phenomena that are modeled using the Particle-In-Cell (PIC) paradigm. The software framework that was developed conforms to the Compute Unified Device Architecture (CUDA), a standard for general purpose graphic processing that was introduced by NVIDIA Corporation. This framework has been verified for correctness and applied to advance the state of understanding of the electromagnetic aspects of the development of the Aurora Borealis and Aurora Australis. For each phase of the PIC methodology, this research has identified one or more methods to exploit the problem's natural parallelism and effectively map it for execution on the graphic processing unit and its host processor. The sources of overhead that can reduce the effectiveness of parallelization for each of these methods have also been identified. One of the novel aspects of this research was the utilization of particle sorting during the grid interpolation phase. The final representation resulted in simulations that executed about 38 times faster than simulations that were run on a single-core general-purpose processing system. The scalability of this framework to larger problem sizes and future generation systems has also been investigated.
Real-time radar signal processing using GPGPU (general-purpose graphic processing unit)

NASA Astrophysics Data System (ADS)

Kong, Fanxing; Zhang, Yan Rockee; Cai, Jingxiao; Palmer, Robert D.

2016-05-01

This study introduces a practical approach to develop real-time signal processing chain for general phased array radar on NVIDIA GPUs(Graphical Processing Units) using CUDA (Compute Unified Device Architecture) libraries such as cuBlas and cuFFT, which are adopted from open source libraries and optimized for the NVIDIA GPUs. The processed results are rigorously verified against those from the CPUs. Performance benchmarked in computation time with various input data cube sizes are compared across GPUs and CPUs. Through the analysis, it will be demonstrated that GPGPUs (General Purpose GPU) real-time processing of the array radar data is possible with relatively low-cost commercial GPUs.
Acceleration of GPU-based Krylov solvers via data transfer reduction

DOE PAGES

Anzt, Hartwig; Tomov, Stanimire; Luszczek, Piotr; ...

2015-04-08

Krylov subspace iterative solvers are often the method of choice when solving large sparse linear systems. At the same time, hardware accelerators such as graphics processing units continue to offer significant floating point performance gains for matrix and vector computations through easy-to-use libraries of computational kernels. However, as these libraries are usually composed of a well optimized but limited set of linear algebra operations, applications that use them often fail to reduce certain data communications, and hence fail to leverage the full potential of the accelerator. In this study, we target the acceleration of Krylov subspace iterative methods for graphicsmore » processing units, and in particular the Biconjugate Gradient Stabilized solver that significant improvement can be achieved by reformulating the method to reduce data-communications through application-specific kernels instead of using the generic BLAS kernels, e.g. as provided by NVIDIA’s cuBLAS library, and by designing a graphics processing unit specific sparse matrix-vector product kernel that is able to more efficiently use the graphics processing unit’s computing power. Furthermore, we derive a model estimating the performance improvement, and use experimental data to validate the expected runtime savings. Finally, considering that the derived implementation achieves significantly higher performance, we assert that similar optimizations addressing algorithm structure, as well as sparse matrix-vector, are crucial for the subsequent development of high-performance graphics processing units accelerated Krylov subspace iterative methods.« less
General purpose molecular dynamics simulations fully implemented on graphics processing units

NASA Astrophysics Data System (ADS)

Anderson, Joshua A.; Lorenz, Chris D.; Travesset, A.

2008-05-01

Graphics processing units (GPUs), originally developed for rendering real-time effects in computer games, now provide unprecedented computational power for scientific applications. In this paper, we develop a general purpose molecular dynamics code that runs entirely on a single GPU. It is shown that our GPU implementation provides a performance equivalent to that of fast 30 processor core distributed memory cluster. Our results show that GPUs already provide an inexpensive alternative to such clusters and discuss implications for the future.
Introduction of Parallel GPGPU Acceleration Algorithms for the Solution of Radiative Transfer

NASA Technical Reports Server (NTRS)

Godoy, William F.; Liu, Xu

2011-01-01

General-purpose computing on graphics processing units (GPGPU) is a recent technique that allows the parallel graphics processing unit (GPU) to accelerate calculations performed sequentially by the central processing unit (CPU). To introduce GPGPU to radiative transfer, the Gauss-Seidel solution of the well-known expressions for 1-D and 3-D homogeneous, isotropic media is selected as a test case. Different algorithms are introduced to balance memory and GPU-CPU communication, critical aspects of GPGPU. Results show that speed-ups of one to two orders of magnitude are obtained when compared to sequential solutions. The underlying value of GPGPU is its potential extension in radiative solvers (e.g., Monte Carlo, discrete ordinates) at a minimal learning curve.
Porting of the transfer-matrix method for multilayer thin-film computations on graphics processing units

NASA Astrophysics Data System (ADS)

Limmer, Steffen; Fey, Dietmar

2013-07-01

Thin-film computations are often a time-consuming task during optical design. An efficient way to accelerate these computations with the help of graphics processing units (GPUs) is described. It turned out that significant speed-ups can be achieved. We investigate the circumstances under which the best speed-up values can be expected. Therefore we compare different GPUs among themselves and with a modern CPU. Furthermore, the effect of thickness modulation on the speed-up and the runtime behavior depending on the input data is examined.
People detection method using graphics processing units for a mobile robot with an omnidirectional camera

NASA Astrophysics Data System (ADS)

Kang, Sungil; Roh, Annah; Nam, Bodam; Hong, Hyunki

2011-12-01

This paper presents a novel vision system for people detection using an omnidirectional camera mounted on a mobile robot. In order to determine regions of interest (ROI), we compute a dense optical flow map using graphics processing units, which enable us to examine compliance with the ego-motion of the robot in a dynamic environment. Shape-based classification algorithms are employed to sort ROIs into human beings and nonhumans. The experimental results show that the proposed system detects people more precisely than previous methods.
Graphics processing unit based computation for NDE applications

NASA Astrophysics Data System (ADS)

Nahas, C. A.; Rajagopal, Prabhu; Balasubramaniam, Krishnan; Krishnamurthy, C. V.

2012-05-01

Advances in parallel processing in recent years are helping to improve the cost of numerical simulation. Breakthroughs in Graphical Processing Unit (GPU) based computation now offer the prospect of further drastic improvements. The introduction of 'compute unified device architecture' (CUDA) by NVIDIA (the global technology company based in Santa Clara, California, USA) has made programming GPUs for general purpose computing accessible to the average programmer. Here we use CUDA to develop parallel finite difference schemes as applicable to two problems of interest to NDE community, namely heat diffusion and elastic wave propagation. The implementations are for two-dimensions. Performance improvement of the GPU implementation against serial CPU implementation is then discussed.
Graphic Communications--Commercial Photography. Ohio's Competency Analysis Profile.

ERIC Educational Resources Information Center

Ohio State Univ., Columbus. Vocational Instructional Materials Lab.

This Ohio Competency Analysis Profile (OCAP), derived from a modified Developing a Curriculum (DACUM) process, is a current comprehensive and verified employer competency program list for graphic communications--commercial photography. Each unit (with or without subunits) contains competencies and competency builders that identify the…
GPU-computing in econophysics and statistical physics

NASA Astrophysics Data System (ADS)

Preis, T.

2011-03-01

A recent trend in computer science and related fields is general purpose computing on graphics processing units (GPUs), which can yield impressive performance. With multiple cores connected by high memory bandwidth, today's GPUs offer resources for non-graphics parallel processing. This article provides a brief introduction into the field of GPU computing and includes examples. In particular computationally expensive analyses employed in financial market context are coded on a graphics card architecture which leads to a significant reduction of computing time. In order to demonstrate the wide range of possible applications, a standard model in statistical physics - the Ising model - is ported to a graphics card architecture as well, resulting in large speedup values.
Three-dimensional photoacoustic tomography based on graphics-processing-unit-accelerated finite element method.

PubMed

Peng, Kuan; He, Ling; Zhu, Ziqiang; Tang, Jingtian; Xiao, Jiaying

2013-12-01

Compared with commonly used analytical reconstruction methods, the frequency-domain finite element method (FEM) based approach has proven to be an accurate and flexible algorithm for photoacoustic tomography. However, the FEM-based algorithm is computationally demanding, especially for three-dimensional cases. To enhance the algorithm's efficiency, in this work a parallel computational strategy is implemented in the framework of the FEM-based reconstruction algorithm using a graphic-processing-unit parallel frame named the "compute unified device architecture." A series of simulation experiments is carried out to test the accuracy and accelerating effect of the improved method. The results obtained indicate that the parallel calculation does not change the accuracy of the reconstruction algorithm, while its computational cost is significantly reduced by a factor of 38.9 with a GTX 580 graphics card using the improved method.
Fast analytical scatter estimation using graphics processing units.

PubMed

Ingleby, Harry; Lippuner, Jonas; Rickey, Daniel W; Li, Yue; Elbakri, Idris

2015-01-01

To develop a fast patient-specific analytical estimator of first-order Compton and Rayleigh scatter in cone-beam computed tomography, implemented using graphics processing units. The authors developed an analytical estimator for first-order Compton and Rayleigh scatter in a cone-beam computed tomography geometry. The estimator was coded using NVIDIA's CUDA environment for execution on an NVIDIA graphics processing unit. Performance of the analytical estimator was validated by comparison with high-count Monte Carlo simulations for two different numerical phantoms. Monoenergetic analytical simulations were compared with monoenergetic and polyenergetic Monte Carlo simulations. Analytical and Monte Carlo scatter estimates were compared both qualitatively, from visual inspection of images and profiles, and quantitatively, using a scaled root-mean-square difference metric. Reconstruction of simulated cone-beam projection data of an anthropomorphic breast phantom illustrated the potential of this method as a component of a scatter correction algorithm. The monoenergetic analytical and Monte Carlo scatter estimates showed very good agreement. The monoenergetic analytical estimates showed good agreement for Compton single scatter and reasonable agreement for Rayleigh single scatter when compared with polyenergetic Monte Carlo estimates. For a voxelized phantom with dimensions 128 × 128 × 128 voxels and a detector with 256 × 256 pixels, the analytical estimator required 669 seconds for a single projection, using a single NVIDIA 9800 GX2 video card. Accounting for first order scatter in cone-beam image reconstruction improves the contrast to noise ratio of the reconstructed images. The analytical scatter estimator, implemented using graphics processing units, provides rapid and accurate estimates of single scatter and with further acceleration and a method to account for multiple scatter may be useful for practical scatter correction schemes.
General Purpose Graphics Processing Unit Based High-Rate Rice Decompression and Reed-Solomon Decoding

DOE Office of Scientific and Technical Information (OSTI.GOV)

Loughry, Thomas A.

As the volume of data acquired by space-based sensors increases, mission data compression/decompression and forward error correction code processing performance must likewise scale. This competency development effort was explored using the General Purpose Graphics Processing Unit (GPGPU) to accomplish high-rate Rice Decompression and high-rate Reed-Solomon (RS) decoding at the satellite mission ground station. Each algorithm was implemented and benchmarked on a single GPGPU. Distributed processing across one to four GPGPUs was also investigated. The results show that the GPGPU has considerable potential for performing satellite communication Data Signal Processing, with three times or better performance improvements and up to tenmore » times reduction in cost over custom hardware, at least in the case of Rice Decompression and Reed-Solomon Decoding.« less
Graphics processing unit accelerated intensity-based optical coherence tomography angiography using differential frames with real-time motion correction.

PubMed

Watanabe, Yuuki; Takahashi, Yuhei; Numazawa, Hiroshi

2014-02-01

We demonstrate intensity-based optical coherence tomography (OCT) angiography using the squared difference of two sequential frames with bulk-tissue-motion (BTM) correction. This motion correction was performed by minimization of the sum of the pixel values using axial- and lateral-pixel-shifted structural OCT images. We extract the BTM-corrected image from a total of 25 calculated OCT angiographic images. Image processing was accelerated by a graphics processing unit (GPU) with many stream processors to optimize the parallel processing procedure. The GPU processing rate was faster than that of a line scan camera (46.9 kHz). Our OCT system provides the means of displaying structural OCT images and BTM-corrected OCT angiographic images in real time.
Impact of memory bottleneck on the performance of graphics processing units

NASA Astrophysics Data System (ADS)

Son, Dong Oh; Choi, Hong Jun; Kim, Jong Myon; Kim, Cheol Hong

2015-12-01

Recent graphics processing units (GPUs) can process general-purpose applications as well as graphics applications with the help of various user-friendly application programming interfaces (APIs) supported by GPU vendors. Unfortunately, utilizing the hardware resource in the GPU efficiently is a challenging problem, since the GPU architecture is totally different to the traditional CPU architecture. To solve this problem, many studies have focused on the techniques for improving the system performance using GPUs. In this work, we analyze the GPU performance varying GPU parameters such as the number of cores and clock frequency. According to our simulations, the GPU performance can be improved by 125.8% and 16.2% on average as the number of cores and clock frequency increase, respectively. However, the performance is saturated when memory bottleneck problems incur due to huge data requests to the memory. The performance of GPUs can be improved as the memory bottleneck is reduced by changing GPU parameters dynamically.

Hyperspectral processing in graphical processing units

NASA Astrophysics Data System (ADS)

Winter, Michael E.; Winter, Edwin M.

2011-06-01

With the advent of the commercial 3D video card in the mid 1990s, we have seen an order of magnitude performance increase with each generation of new video cards. While these cards were designed primarily for visualization and video games, it became apparent after a short while that they could be used for scientific purposes. These Graphical Processing Units (GPUs) are rapidly being incorporated into data processing tasks usually reserved for general purpose computers. It has been found that many image processing problems scale well to modern GPU systems. We have implemented four popular hyperspectral processing algorithms (N-FINDR, linear unmixing, Principal Components, and the RX anomaly detection algorithm). These algorithms show an across the board speedup of at least a factor of 10, with some special cases showing extreme speedups of a hundred times or more.
DspaceOgre 3D Graphics Visualization Tool

NASA Technical Reports Server (NTRS)

Jain, Abhinandan; Myin, Steven; Pomerantz, Marc I.

2011-01-01

This general-purpose 3D graphics visualization C++ tool is designed for visualization of simulation and analysis data for articulated mechanisms. Examples of such systems are vehicles, robotic arms, biomechanics models, and biomolecular structures. DspaceOgre builds upon the open-source Ogre3D graphics visualization library. It provides additional classes to support the management of complex scenes involving multiple viewpoints and different scene groups, and can be used as a remote graphics server. This software provides improved support for adding programs at the graphics processing unit (GPU) level for improved performance. It also improves upon the messaging interface it exposes for use as a visualization server.
Porting a Hall MHD Code to a Graphic Processing Unit

NASA Technical Reports Server (NTRS)

Dorelli, John C.

2011-01-01

We present our experience porting a Hall MHD code to a Graphics Processing Unit (GPU). The code is a 2nd order accurate MUSCL-Hancock scheme which makes use of an HLL Riemann solver to compute numerical fluxes and second-order finite differences to compute the Hall contribution to the electric field. The divergence of the magnetic field is controlled with Dedner?s hyperbolic divergence cleaning method. Preliminary benchmark tests indicate a speedup (relative to a single Nehalem core) of 58x for a double precision calculation. We discuss scaling issues which arise when distributing work across multiple GPUs in a CPU-GPU cluster.
Graphics Processing Units for HEP trigger systems

NASA Astrophysics Data System (ADS)

Ammendola, R.; Bauce, M.; Biagioni, A.; Chiozzi, S.; Cotta Ramusino, A.; Fantechi, R.; Fiorini, M.; Giagu, S.; Gianoli, A.; Lamanna, G.; Lonardo, A.; Messina, A.; Neri, I.; Paolucci, P. S.; Piandani, R.; Pontisso, L.; Rescigno, M.; Simula, F.; Sozzi, M.; Vicini, P.

2016-07-01

General-purpose computing on GPUs (Graphics Processing Units) is emerging as a new paradigm in several fields of science, although so far applications have been tailored to the specific strengths of such devices as accelerator in offline computation. With the steady reduction of GPU latencies, and the increase in link and memory throughput, the use of such devices for real-time applications in high-energy physics data acquisition and trigger systems is becoming ripe. We will discuss the use of online parallel computing on GPU for synchronous low level trigger, focusing on CERN NA62 experiment trigger system. The use of GPU in higher level trigger system is also briefly considered.
Employing OpenCL to Accelerate Ab Initio Calculations on Graphics Processing Units.

PubMed

Kussmann, Jörg; Ochsenfeld, Christian

2017-06-13

We present an extension of our graphics processing units (GPU)-accelerated quantum chemistry package to employ OpenCL compute kernels, which can be executed on a wide range of computing devices like CPUs, Intel Xeon Phi, and AMD GPUs. Here, we focus on the use of AMD GPUs and discuss differences as compared to CUDA-based calculations on NVIDIA GPUs. First illustrative timings are presented for hybrid density functional theory calculations using serial as well as parallel compute environments. The results show that AMD GPUs are as fast or faster than comparable NVIDIA GPUs and provide a viable alternative for quantum chemical applications.
Exploiting graphics processing units for computational biology and bioinformatics.

PubMed

Payne, Joshua L; Sinnott-Armstrong, Nicholas A; Moore, Jason H

2010-09-01

Advances in the video gaming industry have led to the production of low-cost, high-performance graphics processing units (GPUs) that possess more memory bandwidth and computational capability than central processing units (CPUs), the standard workhorses of scientific computing. With the recent release of generalpurpose GPUs and NVIDIA's GPU programming language, CUDA, graphics engines are being adopted widely in scientific computing applications, particularly in the fields of computational biology and bioinformatics. The goal of this article is to concisely present an introduction to GPU hardware and programming, aimed at the computational biologist or bioinformaticist. To this end, we discuss the primary differences between GPU and CPU architecture, introduce the basics of the CUDA programming language, and discuss important CUDA programming practices, such as the proper use of coalesced reads, data types, and memory hierarchies. We highlight each of these topics in the context of computing the all-pairs distance between instances in a dataset, a common procedure in numerous disciplines of scientific computing. We conclude with a runtime analysis of the GPU and CPU implementations of the all-pairs distance calculation. We show our final GPU implementation to outperform the CPU implementation by a factor of 1700.
Potential Application of a Graphical Processing Unit to Parallel Computations in the NUBEAM Code

NASA Astrophysics Data System (ADS)

Payne, J.; McCune, D.; Prater, R.

2010-11-01

NUBEAM is a comprehensive computational Monte Carlo based model for neutral beam injection (NBI) in tokamaks. NUBEAM computes NBI-relevant profiles in tokamak plasmas by tracking the deposition and the slowing of fast ions. At the core of NUBEAM are vector calculations used to track fast ions. These calculations have recently been parallelized to run on MPI clusters. However, cost and interlink bandwidth limit the ability to fully parallelize NUBEAM on an MPI cluster. Recent implementation of double precision capabilities for Graphical Processing Units (GPUs) presents a cost effective and high performance alternative or complement to MPI computation. Commercially available graphics cards can achieve up to 672 GFLOPS double precision and can handle hundreds of thousands of threads. The ability to execute at least one thread per particle simultaneously could significantly reduce the execution time and the statistical noise of NUBEAM. Progress on implementation on a GPU will be presented.
Massively Parallel Signal Processing using the Graphics Processing Unit for Real-Time Brain-Computer Interface Feature Extraction.

PubMed

Wilson, J Adam; Williams, Justin C

2009-01-01

The clock speeds of modern computer processors have nearly plateaued in the past 5 years. Consequently, neural prosthetic systems that rely on processing large quantities of data in a short period of time face a bottleneck, in that it may not be possible to process all of the data recorded from an electrode array with high channel counts and bandwidth, such as electrocorticographic grids or other implantable systems. Therefore, in this study a method of using the processing capabilities of a graphics card [graphics processing unit (GPU)] was developed for real-time neural signal processing of a brain-computer interface (BCI). The NVIDIA CUDA system was used to offload processing to the GPU, which is capable of running many operations in parallel, potentially greatly increasing the speed of existing algorithms. The BCI system records many channels of data, which are processed and translated into a control signal, such as the movement of a computer cursor. This signal processing chain involves computing a matrix-matrix multiplication (i.e., a spatial filter), followed by calculating the power spectral density on every channel using an auto-regressive method, and finally classifying appropriate features for control. In this study, the first two computationally intensive steps were implemented on the GPU, and the speed was compared to both the current implementation and a central processing unit-based implementation that uses multi-threading. Significant performance gains were obtained with GPU processing: the current implementation processed 1000 channels of 250 ms in 933 ms, while the new GPU method took only 27 ms, an improvement of nearly 35 times.
Graphic Arts: Orientation, Composition, and Paste-Up. Third Edition.

ERIC Educational Resources Information Center

Crummett, Dan

This document contains teacher and student materials for a course in graphic arts. Ten units of instruction cover the following topics: (1) orientation; (2) shop safety; (3) shop organization; (4) printing processes; (5) paper; (6) typography; (7) typesetting; (8) design principles; (9) paste-up principles and procedures; and (10) proof procedures…
A real-time GNSS-R system based on software-defined radio and graphics processing units

NASA Astrophysics Data System (ADS)

Hobiger, Thomas; Amagai, Jun; Aida, Masanori; Narita, Hideki

2012-04-01

Reflected signals of the Global Navigation Satellite System (GNSS) from the sea or land surface can be utilized to deduce and monitor physical and geophysical parameters of the reflecting area. Unlike most other remote sensing techniques, GNSS-Reflectometry (GNSS-R) operates as a passive radar that takes advantage from the increasing number of navigation satellites that broadcast their L-band signals. Thereby, most of the GNSS-R receiver architectures are based on dedicated hardware solutions. Software-defined radio (SDR) technology has advanced in the recent years and enabled signal processing in real-time, which makes it an ideal candidate for the realization of a flexible GNSS-R system. Additionally, modern commodity graphic cards, which offer massive parallel computing performances, allow to handle the whole signal processing chain without interfering with the PC's CPU. Thus, this paper describes a GNSS-R system which has been developed on the principles of software-defined radio supported by General Purpose Graphics Processing Units (GPGPUs), and presents results from initial field tests which confirm the anticipated capability of the system.
Accelerating image recognition on mobile devices using GPGPU

NASA Astrophysics Data System (ADS)

Bordallo López, Miguel; Nykänen, Henri; Hannuksela, Jari; Silvén, Olli; Vehviläinen, Markku

2011-01-01

The future multi-modal user interfaces of battery-powered mobile devices are expected to require computationally costly image analysis techniques. The use of Graphic Processing Units for computing is very well suited for parallel processing and the addition of programmable stages and high precision arithmetic provide for opportunities to implement energy-efficient complete algorithms. At the moment the first mobile graphics accelerators with programmable pipelines are available, enabling the GPGPU implementation of several image processing algorithms. In this context, we consider a face tracking approach that uses efficient gray-scale invariant texture features and boosting. The solution is based on the Local Binary Pattern (LBP) features and makes use of the GPU on the pre-processing and feature extraction phase. We have implemented a series of image processing techniques in the shader language of OpenGL ES 2.0, compiled them for a mobile graphics processing unit and performed tests on a mobile application processor platform (OMAP3530). In our contribution, we describe the challenges of designing on a mobile platform, present the performance achieved and provide measurement results for the actual power consumption in comparison to using the CPU (ARM) on the same platform.
Efficiency of the energy transfer in the FMO complex using hierarchical equations on Graphics Processing Units

NASA Astrophysics Data System (ADS)

Kramer, Tobias; Kreisbeck, Christoph; Rodriguez, Mirta; Hein, Birgit

2011-03-01

We study the efficiency of the energy transfer in the Fenna-Matthews-Olson complex solving the non-Markovian hierarchical equations (HE) proposed by Ishizaki and Fleming in 2009, which include properly the reorganization process. We compare it to the Markovian approach and find that the Markovian dynamics overestimates the thermalization rate, yielding higher efficiencies than the HE. Using the high-performance of graphics processing units (GPU) we cover a large range of reorganization energies and temperatures and find that initial quantum beatings are important for the energy distribution, but of limited influence to the efficiency. Our efficient GPU implementation of the HE allows us to calculate nonlinear spectra of the FMO complex. References see www.quantumdynamics.de
Optimized Laplacian image sharpening algorithm based on graphic processing unit

NASA Astrophysics Data System (ADS)

Ma, Tinghuai; Li, Lu; Ji, Sai; Wang, Xin; Tian, Yuan; Al-Dhelaan, Abdullah; Al-Rodhaan, Mznah

2014-12-01

In classical Laplacian image sharpening, all pixels are processed one by one, which leads to large amount of computation. Traditional Laplacian sharpening processed on CPU is considerably time-consuming especially for those large pictures. In this paper, we propose a parallel implementation of Laplacian sharpening based on Compute Unified Device Architecture (CUDA), which is a computing platform of Graphic Processing Units (GPU), and analyze the impact of picture size on performance and the relationship between the processing time of between data transfer time and parallel computing time. Further, according to different features of different memory, an improved scheme of our method is developed, which exploits shared memory in GPU instead of global memory and further increases the efficiency. Experimental results prove that two novel algorithms outperform traditional consequentially method based on OpenCV in the aspect of computing speed.
Near-realtime simulations of biolelectric activity in small mammalian hearts using graphical processing units

PubMed Central

Vigmond, Edward J.; Boyle, Patrick M.; Leon, L. Joshua; Plank, Gernot

2014-01-01

Simulations of cardiac bioelectric phenomena remain a significant challenge despite continual advancements in computational machinery. Spanning large temporal and spatial ranges demands millions of nodes to accurately depict geometry, and a comparable number of timesteps to capture dynamics. This study explores a new hardware computing paradigm, the graphics processing unit (GPU), to accelerate cardiac models, and analyzes results in the context of simulating a small mammalian heart in real time. The ODEs associated with membrane ionic flow were computed on traditional CPU and compared to GPU performance, for one to four parallel processing units. The scalability of solving the PDE responsible for tissue coupling was examined on a cluster using up to 128 cores. Results indicate that the GPU implementation was between 9 and 17 times faster than the CPU implementation and scaled similarly. Solving the PDE was still 160 times slower than real time. PMID:19964295
Real-time digital holographic microscopy using the graphic processing unit.

PubMed

Shimobaba, Tomoyoshi; Sato, Yoshikuni; Miura, Junya; Takenouchi, Mai; Ito, Tomoyoshi

2008-08-04

Digital holographic microscopy (DHM) is a well-known powerful method allowing both the amplitude and phase of a specimen to be simultaneously observed. In order to obtain a reconstructed image from a hologram, numerous calculations for the Fresnel diffraction are required. The Fresnel diffraction can be accelerated by the FFT (Fast Fourier Transform) algorithm. However, real-time reconstruction from a hologram is difficult even if we use a recent central processing unit (CPU) to calculate the Fresnel diffraction by the FFT algorithm. In this paper, we describe a real-time DHM system using a graphic processing unit (GPU) with many stream processors, which allows use as a highly parallel processor. The computational speed of the Fresnel diffraction using the GPU is faster than that of recent CPUs. The real-time DHM system can obtain reconstructed images from holograms whose size is 512 x 512 grids in 24 frames per second.
GPU acceleration of Runge Kutta-Fehlberg and its comparison with Dormand-Prince method

NASA Astrophysics Data System (ADS)

Seen, Wo Mei; Gobithaasan, R. U.; Miura, Kenjiro T.

2014-07-01

There is a significant reduction of processing time and speedup of performance in computer graphics with the emergence of Graphic Processing Units (GPUs). GPUs have been developed to surpass Central Processing Unit (CPU) in terms of performance and processing speed. This evolution has opened up a new area in computing and researches where highly parallel GPU has been used for non-graphical algorithms. Physical or phenomenal simulations and modelling can be accelerated through General Purpose Graphic Processing Units (GPGPU) and Compute Unified Device Architecture (CUDA) implementations. These phenomena can be represented with mathematical models in the form of Ordinary Differential Equations (ODEs) which encompasses the gist of change rate between independent and dependent variables. ODEs are numerically integrated over time in order to simulate these behaviours. The classical Runge-Kutta (RK) scheme is the common method used to numerically solve ODEs. The Runge Kutta Fehlberg (RKF) scheme has been specially developed to provide an estimate of the principal local truncation error at each step, known as embedding estimate technique. This paper delves into the implementation of RKF scheme for GPU devices and compares its result with Dorman Prince method. A pseudo code is developed to show the implementation in detail. Hence, practitioners will be able to understand the data allocation in GPU, formation of RKF kernels and the flow of data to/from GPU-CPU upon RKF kernel evaluation. The pseudo code is then written in C Language and two ODE models are executed to show the achievable speedup as compared to CPU implementation. The accuracy and efficiency of the proposed implementation method is discussed in the final section of this paper.
Fast, multi-channel real-time processing of signals with microsecond latency using graphics processing units.

PubMed

Rath, N; Kato, S; Levesque, J P; Mauel, M E; Navratil, G A; Peng, Q

2014-04-01

Fast, digital signal processing (DSP) has many applications. Typical hardware options for performing DSP are field-programmable gate arrays (FPGAs), application-specific integrated DSP chips, or general purpose personal computer systems. This paper presents a novel DSP platform that has been developed for feedback control on the HBT-EP tokamak device. The system runs all signal processing exclusively on a Graphics Processing Unit (GPU) to achieve real-time performance with latencies below 8 μs. Signals are transferred into and out of the GPU using PCI Express peer-to-peer direct-memory-access transfers without involvement of the central processing unit or host memory. Tests were performed on the feedback control system of the HBT-EP tokamak using forty 16-bit floating point inputs and outputs each and a sampling rate of up to 250 kHz. Signals were digitized by a D-TACQ ACQ196 module, processing done on an NVIDIA GTX 580 GPU programmed in CUDA, and analog output was generated by D-TACQ AO32CPCI modules.
Apparatus and method for implementing power saving techniques when processing floating point values

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kim, Young Moon; Park, Sang Phill

An apparatus and method are described for reducing power when reading and writing graphics data. For example, one embodiment of an apparatus comprises: a graphics processor unit (GPU) to process graphics data including floating point data; a set of registers, at least one of the registers of the set partitioned to store the floating point data; and encode/decode logic to reduce a number of binary 1 values being read from the at least one register by causing a specified set of bit positions within the floating point data to be read out as 0s rather than 1s.
Installation to Production of a Large-Scale General Purpose Graphics Processing Unit (GPGPU) Cluster at the U.S. Army Research Laboratory: Thufir

DTIC Science & Technology

2014-09-01

semiempirical and ray-optical models. For example, the semiempirical COST-Walfisch- Ikegami model (3) estimates the received power predominantly on the...Books: Philadelphia, PA, 1965. 2. Rick, T .; Mathur, R. Fast Edge-Diffraction-Based Radio Wave Propagation Model for Graphics Hardware. Proceedings of
General purpose graphics-processing-unit implementation of cosmological domain wall network evolution.

PubMed

Correia, J R C C C; Martins, C J A P

2017-10-01

Topological defects unavoidably form at symmetry breaking phase transitions in the early universe. To probe the parameter space of theoretical models and set tighter experimental constraints (exploiting the recent advances in astrophysical observations), one requires more and more demanding simulations, and therefore more hardware resources and computation time. Improving the speed and efficiency of existing codes is essential. Here we present a general purpose graphics-processing-unit implementation of the canonical Press-Ryden-Spergel algorithm for the evolution of cosmological domain wall networks. This is ported to the Open Computing Language standard, and as a consequence significant speedups are achieved both in two-dimensional (2D) and 3D simulations.

Accelerated numerical processing of electronically recorded holograms with reduced speckle noise.

PubMed

Trujillo, Carlos; Garcia-Sucerquia, Jorge

2013-09-01

The numerical reconstruction of digitally recorded holograms suffers from speckle noise. An accelerated method that uses general-purpose computing in graphics processing units to reduce that noise is shown. The proposed methodology utilizes parallelized algorithms to record, reconstruct, and superimpose multiple uncorrelated holograms of a static scene. For the best tradeoff between reduction of the speckle noise and processing time, the method records, reconstructs, and superimposes six holograms of 1024 × 1024 pixels in 68 ms; for this case, the methodology reduces the speckle noise by 58% compared with that exhibited by a single hologram. The fully parallelized method running on a commodity graphics processing unit is one order of magnitude faster than the same technique implemented on a regular CPU using its multithreading capabilities. Experimental results are shown to validate the proposal.
Statistical iterative reconstruction for streak artefact reduction when using multidetector CT to image the dento-alveolar structures.

PubMed

Dong, J; Hayakawa, Y; Kober, C

2014-01-01

When metallic prosthetic appliances and dental fillings exist in the oral cavity, the appearance of metal-induced streak artefacts is not avoidable in CT images. The aim of this study was to develop a method for artefact reduction using the statistical reconstruction on multidetector row CT images. Adjacent CT images often depict similar anatomical structures. Therefore, reconstructed images with weak artefacts were attempted using projection data of an artefact-free image in a neighbouring thin slice. Images with moderate and strong artefacts were continuously processed in sequence by successive iterative restoration where the projection data was generated from the adjacent reconstructed slice. First, the basic maximum likelihood-expectation maximization algorithm was applied. Next, the ordered subset-expectation maximization algorithm was examined. Alternatively, a small region of interest setting was designated. Finally, the general purpose graphic processing unit machine was applied in both situations. The algorithms reduced the metal-induced streak artefacts on multidetector row CT images when the sequential processing method was applied. The ordered subset-expectation maximization and small region of interest reduced the processing duration without apparent detriments. A general-purpose graphic processing unit realized the high performance. A statistical reconstruction method was applied for the streak artefact reduction. The alternative algorithms applied were effective. Both software and hardware tools, such as ordered subset-expectation maximization, small region of interest and general-purpose graphic processing unit achieved fast artefact correction.
GPU Acceleration of DSP for Communication Receivers.

PubMed

Gunther, Jake; Gunther, Hyrum; Moon, Todd

2017-09-01

Graphics processing unit (GPU) implementations of signal processing algorithms can outperform CPU-based implementations. This paper describes the GPU implementation of several algorithms encountered in a wide range of high-data rate communication receivers including filters, multirate filters, numerically controlled oscillators, and multi-stage digital down converters. These structures are tested by processing the 20 MHz wide FM radio band (88-108 MHz). Two receiver structures are explored: a single channel receiver and a filter bank channelizer. Both run in real time on NVIDIA GeForce GTX 1080 graphics card.
General purpose graphic processing unit implementation of adaptive pulse compression algorithms

NASA Astrophysics Data System (ADS)

Cai, Jingxiao; Zhang, Yan

2017-07-01

This study introduces a practical approach to implement real-time signal processing algorithms for general surveillance radar based on NVIDIA graphical processing units (GPUs). The pulse compression algorithms are implemented using compute unified device architecture (CUDA) libraries such as CUDA basic linear algebra subroutines and CUDA fast Fourier transform library, which are adopted from open source libraries and optimized for the NVIDIA GPUs. For more advanced, adaptive processing algorithms such as adaptive pulse compression, customized kernel optimization is needed and investigated. A statistical optimization approach is developed for this purpose without needing much knowledge of the physical configurations of the kernels. It was found that the kernel optimization approach can significantly improve the performance. Benchmark performance is compared with the CPU performance in terms of processing accelerations. The proposed implementation framework can be used in various radar systems including ground-based phased array radar, airborne sense and avoid radar, and aerospace surveillance radar.
Parallel Computer System for 3D Visualization Stereo on GPU

NASA Astrophysics Data System (ADS)

Al-Oraiqat, Anas M.; Zori, Sergii A.

2018-03-01

This paper proposes the organization of a parallel computer system based on Graphic Processors Unit (GPU) for 3D stereo image synthesis. The development is based on the modified ray tracing method developed by the authors for fast search of tracing rays intersections with scene objects. The system allows significant increase in the productivity for the 3D stereo synthesis of photorealistic quality. The generalized procedure of 3D stereo image synthesis on the Graphics Processing Unit/Graphics Processing Clusters (GPU/GPC) is proposed. The efficiency of the proposed solutions by GPU implementation is compared with single-threaded and multithreaded implementations on the CPU. The achieved average acceleration in multi-thread implementation on the test GPU and CPU is about 7.5 and 1.6 times, respectively. Studying the influence of choosing the size and configuration of the computational Compute Unified Device Archi-tecture (CUDA) network on the computational speed shows the importance of their correct selection. The obtained experimental estimations can be significantly improved by new GPUs with a large number of processing cores and multiprocessors, as well as optimized configuration of the computing CUDA network.
GPU-powered Shotgun Stochastic Search for Dirichlet process mixtures of Gaussian Graphical Models

PubMed Central

Mukherjee, Chiranjit; Rodriguez, Abel

2016-01-01

Gaussian graphical models are popular for modeling high-dimensional multivariate data with sparse conditional dependencies. A mixture of Gaussian graphical models extends this model to the more realistic scenario where observations come from a heterogenous population composed of a small number of homogeneous sub-groups. In this paper we present a novel stochastic search algorithm for finding the posterior mode of high-dimensional Dirichlet process mixtures of decomposable Gaussian graphical models. Further, we investigate how to harness the massive thread-parallelization capabilities of graphical processing units to accelerate computation. The computational advantages of our algorithms are demonstrated with various simulated data examples in which we compare our stochastic search with a Markov chain Monte Carlo algorithm in moderate dimensional data examples. These experiments show that our stochastic search largely outperforms the Markov chain Monte Carlo algorithm in terms of computing-times and in terms of the quality of the posterior mode discovered. Finally, we analyze a gene expression dataset in which Markov chain Monte Carlo algorithms are too slow to be practically useful. PMID:28626348
GPU-powered Shotgun Stochastic Search for Dirichlet process mixtures of Gaussian Graphical Models.

PubMed

Mukherjee, Chiranjit; Rodriguez, Abel

2016-01-01

Gaussian graphical models are popular for modeling high-dimensional multivariate data with sparse conditional dependencies. A mixture of Gaussian graphical models extends this model to the more realistic scenario where observations come from a heterogenous population composed of a small number of homogeneous sub-groups. In this paper we present a novel stochastic search algorithm for finding the posterior mode of high-dimensional Dirichlet process mixtures of decomposable Gaussian graphical models. Further, we investigate how to harness the massive thread-parallelization capabilities of graphical processing units to accelerate computation. The computational advantages of our algorithms are demonstrated with various simulated data examples in which we compare our stochastic search with a Markov chain Monte Carlo algorithm in moderate dimensional data examples. These experiments show that our stochastic search largely outperforms the Markov chain Monte Carlo algorithm in terms of computing-times and in terms of the quality of the posterior mode discovered. Finally, we analyze a gene expression dataset in which Markov chain Monte Carlo algorithms are too slow to be practically useful.
Quantum optimal control with automatic differentiation using graphics processors

NASA Astrophysics Data System (ADS)

Leung, Nelson; Abdelhafez, Mohamed; Chakram, Srivatsan; Naik, Ravi; Groszkowski, Peter; Koch, Jens; Schuster, David

We implement quantum optimal control based on automatic differentiation and harness the acceleration afforded by graphics processing units (GPUs). Automatic differentiation allows us to specify advanced optimization criteria and incorporate them into the optimization process with ease. We will describe efficient techniques to optimally control weakly anharmonic systems that are commonly encountered in circuit QED, including coupled superconducting transmon qubits and multi-cavity circuit QED systems. These systems allow for a rich variety of control schemes that quantum optimal control is well suited to explore.
Line-by-line spectroscopic simulations on graphics processing units

NASA Astrophysics Data System (ADS)

Collange, Sylvain; Daumas, Marc; Defour, David

2008-01-01

We report here on software that performs line-by-line spectroscopic simulations on gases. Elaborate models (such as narrow band and correlated-K) are accurate and efficient for bands where various components are not simultaneously and significantly active. Line-by-line is probably the most accurate model in the infrared for blends of gases that contain high proportions of H 2O and CO 2 as this was the case for our prototype simulation. Our implementation on graphics processing units sustains a speedup close to 330 on computation-intensive tasks and 12 on memory intensive tasks compared to implementations on one core of high-end processors. This speedup is due to data parallelism, efficient memory access for specific patterns and some dedicated hardware operators only available in graphics processing units. It is obtained leaving most of processor resources available and it would scale linearly with the number of graphics processing units in parallel machines. Line-by-line simulation coupled with simulation of fluid dynamics was long believed to be economically intractable but our work shows that it could be done with some affordable additional resources compared to what is necessary to perform simulations on fluid dynamics alone. Program summaryProgram title: GPU4RE Catalogue identifier: ADZY_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADZY_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 62 776 No. of bytes in distributed program, including test data, etc.: 1 513 247 Distribution format: tar.gz Programming language: C++ Computer: x86 PC Operating system: Linux, Microsoft Windows. Compilation requires either gcc/g++ under Linux or Visual C++ 2003/2005 and Cygwin under Windows. It has been tested using gcc 4.1.2 under Ubuntu Linux 7.04 and using Visual C++ 2005 with Cygwin 1.5.24 under Windows XP. RAM: 1 gigabyte Classification: 21.2 External routines: OpenGL ( http://www.opengl.org) Nature of problem: Simulating radiative transfer on high-temperature high-pressure gases. Solution method: Line-by-line Monte-Carlo ray-tracing. Unusual features: Parallel computations are moved to the GPU. Additional comments: nVidia GeForce 7000 or ATI Radeon X1000 series graphics processing unit is required. Running time: A few minutes.
Image reproduction with interactive graphics

NASA Technical Reports Server (NTRS)

Buckner, J. D.; Council, H. W.; Edwards, T. R.

1974-01-01

Software application or development in optical image digital data processing requires a fast, good quality, yet inexpensive hard copy of processed images. To achieve this, a Cambo camera with an f 2.8/150-mm Xenotar lens in a Copal shutter having a Graflok back for 4 x 5 Polaroid type 57 pack-film has been interfaced to an existing Adage, AGT-30/Electro-Mechanical Research, EMR 6050 graphic computer system. Time-lapse photography in conjunction with a log to linear voltage transformation has resulted in an interactive system capable of producing a hard copy in 54 sec. The interactive aspect of the system lies in a Tektronix 4002 graphic computer terminal and its associated hard copy unit.
A Fast MHD Code for Gravitationally Stratified Media using Graphical Processing Units: SMAUG

NASA Astrophysics Data System (ADS)

Griffiths, M. K.; Fedun, V.; Erdélyi, R.

2015-03-01

Parallelization techniques have been exploited most successfully by the gaming/graphics industry with the adoption of graphical processing units (GPUs), possessing hundreds of processor cores. The opportunity has been recognized by the computational sciences and engineering communities, who have recently harnessed successfully the numerical performance of GPUs. For example, parallel magnetohydrodynamic (MHD) algorithms are important for numerical modelling of highly inhomogeneous solar, astrophysical and geophysical plasmas. Here, we describe the implementation of SMAUG, the Sheffield Magnetohydrodynamics Algorithm Using GPUs. SMAUG is a 1-3D MHD code capable of modelling magnetized and gravitationally stratified plasma. The objective of this paper is to present the numerical methods and techniques used for porting the code to this novel and highly parallel compute architecture. The methods employed are justified by the performance benchmarks and validation results demonstrating that the code successfully simulates the physics for a range of test scenarios including a full 3D realistic model of wave propagation in the solar atmosphere.
Exploiting current-generation graphics hardware for synthetic-scene generation

NASA Astrophysics Data System (ADS)

Tanner, Michael A.; Keen, Wayne A.

2010-04-01

Increasing seeker frame rate and pixel count, as well as the demand for higher levels of scene fidelity, have driven scene generation software for hardware-in-the-loop (HWIL) and software-in-the-loop (SWIL) testing to higher levels of parallelization. Because modern PC graphics cards provide multiple computational cores (240 shader cores for a current NVIDIA Corporation GeForce and Quadro cards), implementation of phenomenology codes on graphics processing units (GPUs) offers significant potential for simultaneous enhancement of simulation frame rate and fidelity. To take advantage of this potential requires algorithm implementation that is structured to minimize data transfers between the central processing unit (CPU) and the GPU. In this paper, preliminary methodologies developed at the Kinetic Hardware In-The-Loop Simulator (KHILS) will be presented. Included in this paper will be various language tradeoffs between conventional shader programming, Compute Unified Device Architecture (CUDA) and Open Computing Language (OpenCL), including performance trades and possible pathways for future tool development.
MGUPGMA: A Fast UPGMA Algorithm With Multiple Graphics Processing Units Using NCCL

PubMed Central

Hua, Guan-Jie; Hung, Che-Lun; Lin, Chun-Yuan; Wu, Fu-Che; Chan, Yu-Wei; Tang, Chuan Yi

2017-01-01

A phylogenetic tree is a visual diagram of the relationship between a set of biological species. The scientists usually use it to analyze many characteristics of the species. The distance-matrix methods, such as Unweighted Pair Group Method with Arithmetic Mean and Neighbor Joining, construct a phylogenetic tree by calculating pairwise genetic distances between taxa. These methods have the computational performance issue. Although several new methods with high-performance hardware and frameworks have been proposed, the issue still exists. In this work, a novel parallel Unweighted Pair Group Method with Arithmetic Mean approach on multiple Graphics Processing Units is proposed to construct a phylogenetic tree from extremely large set of sequences. The experimental results present that the proposed approach on a DGX-1 server with 8 NVIDIA P100 graphic cards achieves approximately 3-fold to 7-fold speedup over the implementation of Unweighted Pair Group Method with Arithmetic Mean on a modern CPU and a single GPU, respectively. PMID:29051701
MGUPGMA: A Fast UPGMA Algorithm With Multiple Graphics Processing Units Using NCCL.

PubMed

Hua, Guan-Jie; Hung, Che-Lun; Lin, Chun-Yuan; Wu, Fu-Che; Chan, Yu-Wei; Tang, Chuan Yi

2017-01-01

A phylogenetic tree is a visual diagram of the relationship between a set of biological species. The scientists usually use it to analyze many characteristics of the species. The distance-matrix methods, such as Unweighted Pair Group Method with Arithmetic Mean and Neighbor Joining, construct a phylogenetic tree by calculating pairwise genetic distances between taxa. These methods have the computational performance issue. Although several new methods with high-performance hardware and frameworks have been proposed, the issue still exists. In this work, a novel parallel Unweighted Pair Group Method with Arithmetic Mean approach on multiple Graphics Processing Units is proposed to construct a phylogenetic tree from extremely large set of sequences. The experimental results present that the proposed approach on a DGX-1 server with 8 NVIDIA P100 graphic cards achieves approximately 3-fold to 7-fold speedup over the implementation of Unweighted Pair Group Method with Arithmetic Mean on a modern CPU and a single GPU, respectively.
Gravitational tree-code on graphics processing units: implementation in CUDA

NASA Astrophysics Data System (ADS)

Gaburov, Evghenii; Bédorf, Jeroen; Portegies Zwart, Simon

2010-05-01

We present a new very fast tree-code which runs on massively parallel Graphical Processing Units (GPU) with NVIDIA CUDA architecture. The tree-construction and calculation of multipole moments is carried out on the host CPU, while the force calculation which consists of tree walks and evaluation of interaction list is carried out on the GPU. In this way we achieve a sustained performance of about 100GFLOP/s and data transfer rates of about 50GB/s. It takes about a second to compute forces on a million particles with an opening angle of θ ≈ 0.5. The code has a convenient user interface and is freely available for use. http://castle.strw.leidenuniv.nl/software/octgrav.html
Speedup for quantum optimal control from automatic differentiation based on graphics processing units

NASA Astrophysics Data System (ADS)

Leung, Nelson; Abdelhafez, Mohamed; Koch, Jens; Schuster, David

2017-04-01

We implement a quantum optimal control algorithm based on automatic differentiation and harness the acceleration afforded by graphics processing units (GPUs). Automatic differentiation allows us to specify advanced optimization criteria and incorporate them in the optimization process with ease. We show that the use of GPUs can speedup calculations by more than an order of magnitude. Our strategy facilitates efficient numerical simulations on affordable desktop computers and exploration of a host of optimization constraints and system parameters relevant to real-life experiments. We demonstrate optimization of quantum evolution based on fine-grained evaluation of performance at each intermediate time step, thus enabling more intricate control on the evolution path, suppression of departures from the truncated model subspace, as well as minimization of the physical time needed to perform high-fidelity state preparation and unitary gates.
Small Interactive Image Processing System (SMIPS) system description

NASA Technical Reports Server (NTRS)

Moik, J. G.

1973-01-01

The Small Interactive Image Processing System (SMIPS) operates under control of the IBM-OS/MVT operating system and uses an IBM-2250 model 1 display unit as interactive graphic device. The input language in the form of character strings or attentions from keys and light pen is interpreted and causes processing of built-in image processing functions as well as execution of a variable number of application programs kept on a private disk file. A description of design considerations is given and characteristics, structure and logic flow of SMIPS are summarized. Data management and graphic programming techniques used for the interactive manipulation and display of digital pictures are also discussed.
Inversion of Heavy Current Electroheat Problems on a Graphics Processing Unit (GPU)

DTIC Science & Technology

2013-11-10

Unit (GPU) Victor U. Karthik 1 , Sivamayam Sivasuthan 1, Arunasalam Rahunanthan 2 , Ravi S. Thyagarajan 3 and Paramsothy Jayakumar 3 , Lalita...Victor Karthik; Sivamayam Sivasuthan; Arunasalam Rahunanthan; Ravi Thyagarajan; Paramsothy Jayakumar 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK
Accelerating Malware Detection via a Graphics Processing Unit

DTIC Science & Technology

2010-09-01

Processing Unit . . . . . . . . . . . . . . . . . . 4 PE Portable Executable . . . . . . . . . . . . . . . . . . . . . 4 COFF Common Object File Format...operating systems for the future [Szo05]. The PE format is an updated version of the common object file format ( COFF ) [Mic06]. Microsoft released a new...NAs02]. These alerts can be costly in terms of time and resources for individuals and organizations to investigate each misidentified file [YWL07] [Vak10
Selecting a Benchmark Suite to Profile High-Performance Computing (HPC) Machines

DTIC Science & Technology

2014-11-01

architectures. Machines now contain central processing units (CPUs), graphics processing units (GPUs), and many integrated core ( MIC ) architecture all...evaluate the feasibility and applicability of a new architecture just released to the market . Researchers are often unsure how available resources will...architectures. Having a suite of programs running on different architectures, such as GPUs, MICs , and CPUs, adds complexity and technical challenges

A parallel approximate string matching under Levenshtein distance on graphics processing units using warp-shuffle operations

PubMed Central

Ho, ThienLuan; Oh, Seung-Rohk

2017-01-01

Approximate string matching with k-differences has a number of practical applications, ranging from pattern recognition to computational biology. This paper proposes an efficient memory-access algorithm for parallel approximate string matching with k-differences on Graphics Processing Units (GPUs). In the proposed algorithm, all threads in the same GPUs warp share data using warp-shuffle operation instead of accessing the shared memory. Moreover, we implement the proposed algorithm by exploiting the memory structure of GPUs to optimize its performance. Experiment results for real DNA packages revealed that the performance of the proposed algorithm and its implementation archived up to 122.64 and 1.53 times compared to that of sequential algorithm on CPU and previous parallel approximate string matching algorithm on GPUs, respectively. PMID:29016700
Graphics Processing Unit Acceleration of Gyrokinetic Turbulence Simulations

NASA Astrophysics Data System (ADS)

Hause, Benjamin; Parker, Scott

2012-10-01

We find a substantial increase in on-node performance using Graphics Processing Unit (GPU) acceleration in gyrokinetic delta-f particle-in-cell simulation. Optimization is performed on a two-dimensional slab gyrokinetic particle simulation using the Portland Group Fortran compiler with the GPU accelerator compiler directives. We have implemented the GPU acceleration on a Core I7 gaming PC with a NVIDIA GTX 580 GPU. We find comparable, or better, acceleration relative to the NERSC DIRAC cluster with the NVIDIA Tesla C2050 computing processor. The Tesla C 2050 is about 2.6 times more expensive than the GTX 580 gaming GPU. Optimization strategies and comparisons between DIRAC and the gaming PC will be presented. We will also discuss progress on optimizing the comprehensive three dimensional general geometry GEM code.
On the effective implementation of a boundary element code on graphics processing units unsing an out-of-core LU algorithm

DOE Office of Scientific and Technical Information (OSTI.GOV)

D'Azevedo, Ed F; Nintcheu Fata, Sylvain

2012-01-01

A collocation boundary element code for solving the three-dimensional Laplace equation, publicly available from \\url{http://www.intetec.org}, has been adapted to run on an Nvidia Tesla general purpose graphics processing unit (GPU). Global matrix assembly and LU factorization of the resulting dense matrix were performed on the GPU. Out-of-core techniques were used to solve problems larger than available GPU memory. The code achieved over eight times speedup in matrix assembly and about 56~Gflops/sec in the LU factorization using only 512~Mbytes of GPU memory. Details of the GPU implementation and comparisons with the standard sequential algorithm are included to illustrate the performance ofmore » the GPU code.« less
Mendel-GPU: haplotyping and genotype imputation on graphics processing units

PubMed Central

Chen, Gary K.; Wang, Kai; Stram, Alex H.; Sobel, Eric M.; Lange, Kenneth

2012-01-01

Motivation: In modern sequencing studies, one can improve the confidence of genotype calls by phasing haplotypes using information from an external reference panel of fully typed unrelated individuals. However, the computational demands are so high that they prohibit researchers with limited computational resources from haplotyping large-scale sequence data. Results: Our graphics processing unit based software delivers haplotyping and imputation accuracies comparable to competing programs at a fraction of the computational cost and peak memory demand. Availability: Mendel-GPU, our OpenCL software, runs on Linux platforms and is portable across AMD and nVidia GPUs. Users can download both code and documentation at http://code.google.com/p/mendel-gpu/. Contact: gary.k.chen@usc.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22954633
Graphics Processing Unit Assisted Thermographic Compositing

NASA Technical Reports Server (NTRS)

Ragasa, Scott; McDougal, Matthew; Russell, Sam

2013-01-01

Objective: To develop a software application utilizing general purpose graphics processing units (GPUs) for the analysis of large sets of thermographic data. Background: Over the past few years, an increasing effort among scientists and engineers to utilize the GPU in a more general purpose fashion is allowing for supercomputer level results at individual workstations. As data sets grow, the methods to work them grow at an equal, and often greater, pace. Certain common computations can take advantage of the massively parallel and optimized hardware constructs of the GPU to allow for throughput that was previously reserved for compute clusters. These common computations have high degrees of data parallelism, that is, they are the same computation applied to a large set of data where the result does not depend on other data elements. Signal (image) processing is one area were GPUs are being used to greatly increase the performance of certain algorithms and analysis techniques.
GPU MrBayes V3.1: MrBayes on Graphics Processing Units for Protein Sequence Data.

PubMed

Pang, Shuai; Stones, Rebecca J; Ren, Ming-Ming; Liu, Xiao-Guang; Wang, Gang; Xia, Hong-ju; Wu, Hao-Yang; Liu, Yang; Xie, Qiang

2015-09-01

We present a modified GPU (graphics processing unit) version of MrBayes, called ta(MC)(3) (GPU MrBayes V3.1), for Bayesian phylogenetic inference on protein data sets. Our main contributions are 1) utilizing 64-bit variables, thereby enabling ta(MC)(3) to process larger data sets than MrBayes; and 2) to use Kahan summation to improve accuracy, convergence rates, and consequently runtime. Versus the current fastest software, we achieve a speedup of up to around 2.5 (and up to around 90 vs. serial MrBayes), and more on multi-GPU hardware. GPU MrBayes V3.1 is available from http://sourceforge.net/projects/mrbayes-gpu/. © The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Real-time liquid-crystal atmosphere turbulence simulator with graphic processing unit.

PubMed

Hu, Lifa; Xuan, Li; Li, Dayu; Cao, Zhaoliang; Mu, Quanquan; Liu, Yonggang; Peng, Zenghui; Lu, Xinghai

2009-04-27

To generate time-evolving atmosphere turbulence in real time, a phase-generating method for our liquid-crystal (LC) atmosphere turbulence simulator (ATS) is derived based on the Fourier series (FS) method. A real matrix expression for generating turbulence phases is given and calculated with a graphic processing unit (GPU), the GeForce 8800 Ultra. A liquid crystal on silicon (LCOS) with 256x256 pixels is used as the turbulence simulator. The total time to generate a turbulence phase is about 7.8 ms for calculation and readout with the GPU. A parallel processing method of calculating and sending a picture to the LCOS is used to improve the simulating speed of our LC ATS. Therefore, the real-time turbulence phase-generation frequency of our LC ATS is up to 128 Hz. To our knowledge, it is the highest speed used to generate a turbulence phase in real time.
Failure detection in high-performance clusters and computers using chaotic map computations

DOEpatents

Rao, Nageswara S.

2015-09-01

A programmable media includes a processing unit capable of independent operation in a machine that is capable of executing 10.sup.18 floating point operations per second. The processing unit is in communication with a memory element and an interconnect that couples computing nodes. The programmable media includes a logical unit configured to execute arithmetic functions, comparative functions, and/or logical functions. The processing unit is configured to detect computing component failures, memory element failures and/or interconnect failures by executing programming threads that generate one or more chaotic map trajectories. The central processing unit or graphical processing unit is configured to detect a computing component failure, memory element failure and/or an interconnect failure through an automated comparison of signal trajectories generated by the chaotic maps.
Span graphics display utilities handbook, first edition

NASA Technical Reports Server (NTRS)

Gallagher, D. L.; Green, J. L.; Newman, R.

1985-01-01

The Space Physics Analysis Network (SPAN) is a computer network connecting scientific institutions throughout the United States. This network provides an avenue for timely, correlative research between investigators, in a multidisciplinary approach to space physics studies. An objective in the development of SPAN is to make available direct and simplified procedures that scientists can use, without specialized training, to exchange information over the network. Information exchanges include raw and processes data, analysis programs, correspondence, documents, and graphite images. This handbook details procedures that can be used to exchange graphic images over SPAN. The intent is to periodically update this handbook to reflect the constantly changing facilities available on SPAN. The utilities described within reflect an earnest attempt to provide useful descriptions of working utilities that can be used to transfer graphic images across the network. Whether graphic images are representative of satellite servations or theoretical modeling and whether graphics images are of device dependent or independent type, the SPAN graphics display utilities handbook will be the users guide to graphic image exchange.
Interactive graphical computer-aided design system

NASA Technical Reports Server (NTRS)

Edge, T. M.

1975-01-01

System is used for design, layout, and modification of large-scale-integrated (LSI) metal-oxide semiconductor (MOS) arrays. System is structured around small computer which provides real-time support for graphics storage display unit with keyboard, slave display unit, hard copy unit, and graphics tablet for designer/computer interface.
Creating Interactive Graphical Overlays in the Advanced Weather Interactive Processing System (AWIPS) Using Shapefiles and DGM Files

NASA Technical Reports Server (NTRS)

Barrett, Joe H., III; Lafosse, Richard; Hood, Doris; Hoeth, Brian

2007-01-01

Graphical overlays can be created in real-time in the Advanced Weather Interactive Processing System (AWIPS) using shapefiles or DARE Graphics Metafile (DGM) files. This presentation describes how to create graphical overlays on-the-fly for AWIPS, by using two examples of AWIPS applications that were created by the Applied Meteorology Unit (AMU). The first example is the Anvil Threat Corridor Forecast Tool, which produces a shapefile that depicts a graphical threat corridor of the forecast movement of thunderstorm anvil clouds, based on the observed or forecast upper-level winds. This tool is used by the Spaceflight Meteorology Group (SMG) and 45th Weather Squadron (45 WS) to analyze the threat of natural or space vehicle-triggered lightning over a location. The second example is a launch and landing trajectory tool that produces a DGM file that plots the ground track of space vehicles during launch or landing. The trajectory tool can be used by SMG and the 45 WS forecasters to analyze weather radar imagery along a launch or landing trajectory. Advantages of both file types will be listed.
A Real-Time High Performance Computation Architecture for Multiple Moving Target Tracking Based on Wide-Area Motion Imagery via Cloud and Graphic Processing Units

PubMed Central

Liu, Kui; Wei, Sixiao; Chen, Zhijiang; Jia, Bin; Chen, Genshe; Ling, Haibin; Sheaff, Carolyn; Blasch, Erik

2017-01-01

This paper presents the first attempt at combining Cloud with Graphic Processing Units (GPUs) in a complementary manner within the framework of a real-time high performance computation architecture for the application of detecting and tracking multiple moving targets based on Wide Area Motion Imagery (WAMI). More specifically, the GPU and Cloud Moving Target Tracking (GC-MTT) system applied a front-end web based server to perform the interaction with Hadoop and highly parallelized computation functions based on the Compute Unified Device Architecture (CUDA©). The introduced multiple moving target detection and tracking method can be extended to other applications such as pedestrian tracking, group tracking, and Patterns of Life (PoL) analysis. The cloud and GPUs based computing provides an efficient real-time target recognition and tracking approach as compared to methods when the work flow is applied using only central processing units (CPUs). The simultaneous tracking and recognition results demonstrate that a GC-MTT based approach provides drastically improved tracking with low frame rates over realistic conditions. PMID:28208684
A Real-Time High Performance Computation Architecture for Multiple Moving Target Tracking Based on Wide-Area Motion Imagery via Cloud and Graphic Processing Units.

PubMed

Liu, Kui; Wei, Sixiao; Chen, Zhijiang; Jia, Bin; Chen, Genshe; Ling, Haibin; Sheaff, Carolyn; Blasch, Erik

2017-02-12

This paper presents the first attempt at combining Cloud with Graphic Processing Units (GPUs) in a complementary manner within the framework of a real-time high performance computation architecture for the application of detecting and tracking multiple moving targets based on Wide Area Motion Imagery (WAMI). More specifically, the GPU and Cloud Moving Target Tracking (GC-MTT) system applied a front-end web based server to perform the interaction with Hadoop and highly parallelized computation functions based on the Compute Unified Device Architecture (CUDA©). The introduced multiple moving target detection and tracking method can be extended to other applications such as pedestrian tracking, group tracking, and Patterns of Life (PoL) analysis. The cloud and GPUs based computing provides an efficient real-time target recognition and tracking approach as compared to methods when the work flow is applied using only central processing units (CPUs). The simultaneous tracking and recognition results demonstrate that a GC-MTT based approach provides drastically improved tracking with low frame rates over realistic conditions.
Accelerating Monte Carlo simulations of photon transport in a voxelized geometry using a massively parallel graphics processing unit.

PubMed

Badal, Andreu; Badano, Aldo

2009-11-01

It is a known fact that Monte Carlo simulations of radiation transport are computationally intensive and may require long computing times. The authors introduce a new paradigm for the acceleration of Monte Carlo simulations: The use of a graphics processing unit (GPU) as the main computing device instead of a central processing unit (CPU). A GPU-based Monte Carlo code that simulates photon transport in a voxelized geometry with the accurate physics models from PENELOPE has been developed using the CUDATM programming model (NVIDIA Corporation, Santa Clara, CA). An outline of the new code and a sample x-ray imaging simulation with an anthropomorphic phantom are presented. A remarkable 27-fold speed up factor was obtained using a GPU compared to a single core CPU. The reported results show that GPUs are currently a good alternative to CPUs for the simulation of radiation transport. Since the performance of GPUs is currently increasing at a faster pace than that of CPUs, the advantages of GPU-based software are likely to be more pronounced in the future.
Graphics processing units in bioinformatics, computational biology and systems biology.

PubMed

Nobile, Marco S; Cazzaniga, Paolo; Tangherloni, Andrea; Besozzi, Daniela

2017-09-01

Several studies in Bioinformatics, Computational Biology and Systems Biology rely on the definition of physico-chemical or mathematical models of biological systems at different scales and levels of complexity, ranging from the interaction of atoms in single molecules up to genome-wide interaction networks. Traditional computational methods and software tools developed in these research fields share a common trait: they can be computationally demanding on Central Processing Units (CPUs), therefore limiting their applicability in many circumstances. To overcome this issue, general-purpose Graphics Processing Units (GPUs) are gaining an increasing attention by the scientific community, as they can considerably reduce the running time required by standard CPU-based software, and allow more intensive investigations of biological systems. In this review, we present a collection of GPU tools recently developed to perform computational analyses in life science disciplines, emphasizing the advantages and the drawbacks in the use of these parallel architectures. The complete list of GPU-powered tools here reviewed is available at http://bit.ly/gputools. © The Author 2016. Published by Oxford University Press.
Matrix decomposition graphics processing unit solver for Poisson image editing

NASA Astrophysics Data System (ADS)

Lei, Zhao; Wei, Li

2012-10-01

In recent years, gradient-domain methods have been widely discussed in the image processing field, including seamless cloning and image stitching. These algorithms are commonly carried out by solving a large sparse linear system: the Poisson equation. However, solving the Poisson equation is a computational and memory intensive task which makes it not suitable for real-time image editing. A new matrix decomposition graphics processing unit (GPU) solver (MDGS) is proposed to settle the problem. A matrix decomposition method is used to distribute the work among GPU threads, so that MDGS will take full advantage of the computing power of current GPUs. Additionally, MDGS is a hybrid solver (combines both the direct and iterative techniques) and has two-level architecture. These enable MDGS to generate identical solutions with those of the common Poisson methods and achieve high convergence rate in most cases. This approach is advantageous in terms of parallelizability, enabling real-time image processing, low memory-taken and extensive applications.
Mission: Define Computer Literacy. The Illinois-Wisconsin ISACS Computer Coordinators' Committee on Computer Literacy Report (May 1985).

ERIC Educational Resources Information Center

Computing Teacher, 1985

1985-01-01

Defines computer literacy and describes a computer literacy course which stresses ethics, hardware, and disk operating systems throughout. Core units on keyboarding, word processing, graphics, database management, problem solving, algorithmic thinking, and programing are outlined, together with additional units on spreadsheets, simulations,…
Medical and Graphic Arts Unit; a Guide to Organizing a Medical and Graphic Arts Unit in Health Science Educational Institutions. Monograph 2.

ERIC Educational Resources Information Center

Smith, Herbert R., Jr.

This monograph describes the role of medical and graphic arts units within the comprehensive communications departments of health science educational institutions. Historical trends and contemporary practices are described. Suggestions are made for: organizational structure; services and activities; staff requirements; budget; facility…
Real-time processing for full-range Fourier-domain optical-coherence tomography with zero-filling interpolation using multiple graphic processing units.

PubMed

Watanabe, Yuuki; Maeno, Seiya; Aoshima, Kenji; Hasegawa, Haruyuki; Koseki, Hitoshi

2010-09-01

The real-time display of full-range, 2048?axial pixelx1024?lateral pixel, Fourier-domain optical-coherence tomography (FD-OCT) images is demonstrated. The required speed was achieved by using dual graphic processing units (GPUs) with many stream processors to realize highly parallel processing. We used a zero-filling technique, including a forward Fourier transform, a zero padding to increase the axial data-array size to 8192, an inverse-Fourier transform back to the spectral domain, a linear interpolation from wavelength to wavenumber, a lateral Hilbert transform to obtain the complex spectrum, a Fourier transform to obtain the axial profiles, and a log scaling. The data-transfer time of the frame grabber was 15.73?ms, and the processing time, which includes the data transfer between the GPU memory and the host computer, was 14.75?ms, for a total time shorter than the 36.70?ms frame-interval time using a line-scan CCD camera operated at 27.9?kHz. That is, our OCT system achieved a processed-image display rate of 27.23 frames/s.
High-Speed Particle-in-Cell Simulation Parallelized with Graphic Processing Units for Low Temperature Plasmas for Material Processing

NASA Astrophysics Data System (ADS)

Hur, Min Young; Verboncoeur, John; Lee, Hae June

2014-10-01

Particle-in-cell (PIC) simulations have high fidelity in the plasma device requiring transient kinetic modeling compared with fluid simulations. It uses less approximation on the plasma kinetics but requires many particles and grids to observe the semantic results. It means that the simulation spends lots of simulation time in proportion to the number of particles. Therefore, PIC simulation needs high performance computing. In this research, a graphic processing unit (GPU) is adopted for high performance computing of PIC simulation for low temperature discharge plasmas. GPUs have many-core processors and high memory bandwidth compared with a central processing unit (CPU). NVIDIA GeForce GPUs were used for the test with hundreds of cores which show cost-effective performance. PIC code algorithm is divided into two modules which are a field solver and a particle mover. The particle mover module is divided into four routines which are named move, boundary, Monte Carlo collision (MCC), and deposit. Overall, the GPU code solves particle motions as well as electrostatic potential in two-dimensional geometry almost 30 times faster than a single CPU code. This work was supported by the Korea Institute of Science Technology Information.

Analytical gradients for tensor hyper-contracted MP2 and SOS-MP2 on graphical processing units

DOE PAGES

Song, Chenchen; Martinez, Todd J.

2017-08-29

Analytic energy gradients for tensor hyper-contraction (THC) are derived and implemented for second-order Møller-Plesset perturbation theory (MP2), with and without the scaled-opposite-spin (SOS)-MP2 approximation. By exploiting the THC factorization, the formal scaling of MP2 and SOS-MP2 gradient calculations with respect to system size is reduced to quartic and cubic, respectively. An efficient implementation has been developed that utilizes both graphics processing units and sparse tensor techniques exploiting spatial sparsity of the atomic orbitals. THC-MP2 has been applied to both geometry optimization and ab initio molecular dynamics (AIMD) simulations. Furthermore, the resulting energy conservation in micro-canonical AIMD demonstrates that the implementationmore » provides accurate nuclear gradients with respect to the THC-MP2 potential energy surfaces.« less
Fast Shepard interpolation on graphics processing units: potential energy surfaces and dynamics for H + CH4 → H2 + CH3.

PubMed

Welsch, Ralph; Manthe, Uwe

2013-04-28

A strategy for the fast evaluation of Shepard interpolated potential energy surfaces (PESs) utilizing graphics processing units (GPUs) is presented. Speed ups of several orders of magnitude are gained for the title reaction on the ZFWCZ PES [Y. Zhou, B. Fu, C. Wang, M. A. Collins, and D. H. Zhang, J. Chem. Phys. 134, 064323 (2011)]. Thermal rate constants are calculated employing the quantum transition state concept and the multi-layer multi-configurational time-dependent Hartree approach. Results for the ZFWCZ PES are compared to rate constants obtained for other ab initio PESs and problems are discussed. A revised PES is presented. Thermal rate constants obtained for the revised PES indicate that an accurate description of the anharmonicity around the transition state is crucial.
Analytical gradients for tensor hyper-contracted MP2 and SOS-MP2 on graphical processing units

NASA Astrophysics Data System (ADS)

Song, Chenchen; Martínez, Todd J.

2017-10-01

Analytic energy gradients for tensor hyper-contraction (THC) are derived and implemented for second-order Møller-Plesset perturbation theory (MP2), with and without the scaled-opposite-spin (SOS)-MP2 approximation. By exploiting the THC factorization, the formal scaling of MP2 and SOS-MP2 gradient calculations with respect to system size is reduced to quartic and cubic, respectively. An efficient implementation has been developed that utilizes both graphics processing units and sparse tensor techniques exploiting spatial sparsity of the atomic orbitals. THC-MP2 has been applied to both geometry optimization and ab initio molecular dynamics (AIMD) simulations. The resulting energy conservation in micro-canonical AIMD demonstrates that the implementation provides accurate nuclear gradients with respect to the THC-MP2 potential energy surfaces.
Analytical gradients for tensor hyper-contracted MP2 and SOS-MP2 on graphical processing units

DOE Office of Scientific and Technical Information (OSTI.GOV)

Song, Chenchen; Martinez, Todd J.

Analytic energy gradients for tensor hyper-contraction (THC) are derived and implemented for second-order Møller-Plesset perturbation theory (MP2), with and without the scaled-opposite-spin (SOS)-MP2 approximation. By exploiting the THC factorization, the formal scaling of MP2 and SOS-MP2 gradient calculations with respect to system size is reduced to quartic and cubic, respectively. An efficient implementation has been developed that utilizes both graphics processing units and sparse tensor techniques exploiting spatial sparsity of the atomic orbitals. THC-MP2 has been applied to both geometry optimization and ab initio molecular dynamics (AIMD) simulations. Furthermore, the resulting energy conservation in micro-canonical AIMD demonstrates that the implementationmore » provides accurate nuclear gradients with respect to the THC-MP2 potential energy surfaces.« less
Area-delay trade-offs of texture decompressors for a graphics processing unit

NASA Astrophysics Data System (ADS)

Novoa Súñer, Emilio; Ituero, Pablo; López-Vallejo, Marisa

2011-05-01

Graphics Processing Units have become a booster for the microelectronics industry. However, due to intellectual property issues, there is a serious lack of information on implementation details of the hardware architecture that is behind GPUs. For instance, the way texture is handled and decompressed in a GPU to reduce bandwidth usage has never been dealt with in depth from a hardware point of view. This work addresses a comparative study on the hardware implementation of different texture decompression algorithms for both conventional (PCs and video game consoles) and mobile platforms. Circuit synthesis is performed targeting both a reconfigurable hardware platform and a 90nm standard cell library. Area-delay trade-offs have been extensively analyzed, which allows us to compare the complexity of decompressors and thus determine suitability of algorithms for systems with limited hardware resources.
Real-Space Density Functional Theory on Graphical Processing Units: Computational Approach and Comparison to Gaussian Basis Set Methods.

PubMed

Andrade, Xavier; Aspuru-Guzik, Alán

2013-10-08

We discuss the application of graphical processing units (GPUs) to accelerate real-space density functional theory (DFT) calculations. To make our implementation efficient, we have developed a scheme to expose the data parallelism available in the DFT approach; this is applied to the different procedures required for a real-space DFT calculation. We present results for current-generation GPUs from AMD and Nvidia, which show that our scheme, implemented in the free code Octopus, can reach a sustained performance of up to 90 GFlops for a single GPU, representing a significant speed-up when compared to the CPU version of the code. Moreover, for some systems, our implementation can outperform a GPU Gaussian basis set code, showing that the real-space approach is a competitive alternative for DFT simulations on GPUs.
User Guide for HUFPrint, A Tabulation and Visualization Utility for the Hydrogeologic-Unit Flow (HUF) Package of MODFLOW

USGS Publications Warehouse

Banta, Edward R.; Provost, Alden M.

2008-01-01

This report documents HUFPrint, a computer program that extracts and displays information about model structure and hydraulic properties from the input data for a model built using the Hydrogeologic-Unit Flow (HUF) Package of the U.S. Geological Survey's MODFLOW program for modeling ground-water flow. HUFPrint reads the HUF Package and other MODFLOW input files, processes the data by hydrogeologic unit and by model layer, and generates text and graphics files useful for visualizing the data or for further processing. For hydrogeologic units, HUFPrint outputs such hydraulic properties as horizontal hydraulic conductivity along rows, horizontal hydraulic conductivity along columns, horizontal anisotropy, vertical hydraulic conductivity or anisotropy, specific storage, specific yield, and hydraulic-conductivity depth-dependence coefficient. For model layers, HUFPrint outputs such effective hydraulic properties as horizontal hydraulic conductivity along rows, horizontal hydraulic conductivity along columns, horizontal anisotropy, specific storage, primary direction of anisotropy, and vertical conductance. Text files tabulating hydraulic properties by hydrogeologic unit, by model layer, or in a specified vertical section may be generated. Graphics showing two-dimensional cross sections and one-dimensional vertical sections at specified locations also may be generated. HUFPrint reads input files designed for MODFLOW-2000 or MODFLOW-2005.
Visualizing Astronomical Data with Blender

NASA Astrophysics Data System (ADS)

Kent, Brian R.

2014-01-01

We present methods for using the 3D graphics program Blender in the visualization of astronomical data. The software's forte for animating 3D data lends itself well to use in astronomy. The Blender graphical user interface and Python scripting capabilities can be utilized in the generation of models for data cubes, catalogs, simulations, and surface maps. We review methods for data import, 2D and 3D voxel texture applications, animations, camera movement, and composite renders. Rendering times can be improved by using graphic processing units (GPUs). A number of examples are shown using the software features most applicable to various kinds of data paradigms in astronomy.
GPU-BSM: A GPU-Based Tool to Map Bisulfite-Treated Reads

PubMed Central

Manconi, Andrea; Orro, Alessandro; Manca, Emanuele; Armano, Giuliano; Milanesi, Luciano

2014-01-01

Cytosine DNA methylation is an epigenetic mark implicated in several biological processes. Bisulfite treatment of DNA is acknowledged as the gold standard technique to study methylation. This technique introduces changes in the genomic DNA by converting cytosines to uracils while 5-methylcytosines remain nonreactive. During PCR amplification 5-methylcytosines are amplified as cytosine, whereas uracils and thymines as thymine. To detect the methylation levels, reads treated with the bisulfite must be aligned against a reference genome. Mapping these reads to a reference genome represents a significant computational challenge mainly due to the increased search space and the loss of information introduced by the treatment. To deal with this computational challenge we devised GPU-BSM, a tool based on modern Graphics Processing Units. Graphics Processing Units are hardware accelerators that are increasingly being used successfully to accelerate general-purpose scientific applications. GPU-BSM is a tool able to map bisulfite-treated reads from whole genome bisulfite sequencing and reduced representation bisulfite sequencing, and to estimate methylation levels, with the goal of detecting methylation. Due to the massive parallelization obtained by exploiting graphics cards, GPU-BSM aligns bisulfite-treated reads faster than other cutting-edge solutions, while outperforming most of them in terms of unique mapped reads. PMID:24842718
Real-time blood flow visualization using the graphics processing unit

NASA Astrophysics Data System (ADS)

Yang, Owen; Cuccia, David; Choi, Bernard

2011-01-01

Laser speckle imaging (LSI) is a technique in which coherent light incident on a surface produces a reflected speckle pattern that is related to the underlying movement of optical scatterers, such as red blood cells, indicating blood flow. Image-processing algorithms can be applied to produce speckle flow index (SFI) maps of relative blood flow. We present a novel algorithm that employs the NVIDIA Compute Unified Device Architecture (CUDA) platform to perform laser speckle image processing on the graphics processing unit. Software written in C was integrated with CUDA and integrated into a LabVIEW Virtual Instrument (VI) that is interfaced with a monochrome CCD camera able to acquire high-resolution raw speckle images at nearly 10 fps. With the CUDA code integrated into the LabVIEW VI, the processing and display of SFI images were performed also at ~10 fps. We present three video examples depicting real-time flow imaging during a reactive hyperemia maneuver, with fluid flow through an in vitro phantom, and a demonstration of real-time LSI during laser surgery of a port wine stain birthmark.
Real-time blood flow visualization using the graphics processing unit

PubMed Central

Yang, Owen; Cuccia, David; Choi, Bernard

2011-01-01

Laser speckle imaging (LSI) is a technique in which coherent light incident on a surface produces a reflected speckle pattern that is related to the underlying movement of optical scatterers, such as red blood cells, indicating blood flow. Image-processing algorithms can be applied to produce speckle flow index (SFI) maps of relative blood flow. We present a novel algorithm that employs the NVIDIA Compute Unified Device Architecture (CUDA) platform to perform laser speckle image processing on the graphics processing unit. Software written in C was integrated with CUDA and integrated into a LabVIEW Virtual Instrument (VI) that is interfaced with a monochrome CCD camera able to acquire high-resolution raw speckle images at nearly 10 fps. With the CUDA code integrated into the LabVIEW VI, the processing and display of SFI images were performed also at ∼10 fps. We present three video examples depicting real-time flow imaging during a reactive hyperemia maneuver, with fluid flow through an in vitro phantom, and a demonstration of real-time LSI during laser surgery of a port wine stain birthmark. PMID:21280915
Real-time speckle variance swept-source optical coherence tomography using a graphics processing unit.

PubMed

Lee, Kenneth K C; Mariampillai, Adrian; Yu, Joe X Z; Cadotte, David W; Wilson, Brian C; Standish, Beau A; Yang, Victor X D

2012-07-01

Advances in swept source laser technology continues to increase the imaging speed of swept-source optical coherence tomography (SS-OCT) systems. These fast imaging speeds are ideal for microvascular detection schemes, such as speckle variance (SV), where interframe motion can cause severe imaging artifacts and loss of vascular contrast. However, full utilization of the laser scan speed has been hindered by the computationally intensive signal processing required by SS-OCT and SV calculations. Using a commercial graphics processing unit that has been optimized for parallel data processing, we report a complete high-speed SS-OCT platform capable of real-time data acquisition, processing, display, and saving at 108,000 lines per second. Subpixel image registration of structural images was performed in real-time prior to SV calculations in order to reduce decorrelation from stationary structures induced by the bulk tissue motion. The viability of the system was successfully demonstrated in a high bulk tissue motion scenario of human fingernail root imaging where SV images (512 × 512 pixels, n = 4) were displayed at 54 frames per second.
Evolution of Embedded Processing for Wide Area Surveillance

DTIC Science & Technology

2014-01-01

future vision . 15. SUBJECT TERMS Embedded processing; high performance computing; general-purpose graphical processing units (GPGPUs) 16. SECURITY...recon- naissance (ISR) mission capabilities. The capabilities these advancements are achieving include the ability to provide persistent all...fighters to support and positively affect their mission . Significant improvements in high-performance computing (HPC) technology make it possible to
Interactive boundary delineation of agricultural lands using graphics workstations

NASA Technical Reports Server (NTRS)

Cheng, Thomas D.; Angelici, Gary L.; Slye, Robert E.; Ma, Matt

1992-01-01

A review is presented of the computer-assisted stratification and sampling (CASS) system developed to delineate the boundaries of sample units for survey procedures. CASS stratifies the sampling units by land-cover and land-use type, employing image-processing software and hardware. This procedure generates coverage areas and the boundaries of stratified sampling units that are utilized for subsequent sampling procedures from which agricultural statistics are developed.
GPU applications for data processing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Vladymyrov, Mykhailo, E-mail: mykhailo.vladymyrov@cern.ch; Aleksandrov, Andrey; INFN sezione di Napoli, I-80125 Napoli

2015-12-31

Modern experiments that use nuclear photoemulsion imply fast and efficient data acquisition from the emulsion can be performed. The new approaches in developing scanning systems require real-time processing of large amount of data. Methods that use Graphical Processing Unit (GPU) computing power for emulsion data processing are presented here. It is shown how the GPU-accelerated emulsion processing helped us to rise the scanning speed by factor of nine.
Graphics Processing Unit Assisted Thermographic Compositing

NASA Technical Reports Server (NTRS)

Ragasa, Scott; McDougal, Matthew; Russell, Sam

2012-01-01

Objective: To develop a software application utilizing general purpose graphics processing units (GPUs) for the analysis of large sets of thermographic data. Background: Over the past few years, an increasing effort among scientists and engineers to utilize the GPU in a more general purpose fashion is allowing for supercomputer level results at individual workstations. As data sets grow, the methods to work them grow at an equal, and often great, pace. Certain common computations can take advantage of the massively parallel and optimized hardware constructs of the GPU to allow for throughput that was previously reserved for compute clusters. These common computations have high degrees of data parallelism, that is, they are the same computation applied to a large set of data where the result does not depend on other data elements. Signal (image) processing is one area were GPUs are being used to greatly increase the performance of certain algorithms and analysis techniques. Technical Methodology/Approach: Apply massively parallel algorithms and data structures to the specific analysis requirements presented when working with thermographic data sets.
Particle-in-cell simulations with charge-conserving current deposition on graphic processing units

NASA Astrophysics Data System (ADS)

Ren, Chuang; Kong, Xianglong; Huang, Michael; Decyk, Viktor; Mori, Warren

2011-10-01

Recently using CUDA, we have developed an electromagnetic Particle-in-Cell (PIC) code with charge-conserving current deposition for Nvidia graphic processing units (GPU's) (Kong et al., Journal of Computational Physics 230, 1676 (2011). On a Tesla M2050 (Fermi) card, the GPU PIC code can achieve a one-particle-step process time of 1.2 - 3.2 ns in 2D and 2.3 - 7.2 ns in 3D, depending on plasma temperatures. In this talk we will discuss novel algorithms for GPU-PIC including charge-conserving current deposition scheme with few branching and parallel particle sorting. These algorithms have made efficient use of the GPU shared memory. We will also discuss how to replace the computation kernels of existing parallel CPU codes while keeping their parallel structures. This work was supported by U.S. Department of Energy under Grant Nos. DE-FG02-06ER54879 and DE-FC02-04ER54789 and by NSF under Grant Nos. PHY-0903797 and CCF-0747324.
GPUmotif: An Ultra-Fast and Energy-Efficient Motif Analysis Program Using Graphics Processing Units

PubMed Central

Zandevakili, Pooya; Hu, Ming; Qin, Zhaohui

2012-01-01

Computational detection of TF binding patterns has become an indispensable tool in functional genomics research. With the rapid advance of new sequencing technologies, large amounts of protein-DNA interaction data have been produced. Analyzing this data can provide substantial insight into the mechanisms of transcriptional regulation. However, the massive amount of sequence data presents daunting challenges. In our previous work, we have developed a novel algorithm called Hybrid Motif Sampler (HMS) that enables more scalable and accurate motif analysis. Despite much improvement, HMS is still time-consuming due to the requirement to calculate matching probabilities position-by-position. Using the NVIDIA CUDA toolkit, we developed a graphics processing unit (GPU)-accelerated motif analysis program named GPUmotif. We proposed a “fragmentation" technique to hide data transfer time between memories. Performance comparison studies showed that commonly-used model-based motif scan and de novo motif finding procedures such as HMS can be dramatically accelerated when running GPUmotif on NVIDIA graphics cards. As a result, energy consumption can also be greatly reduced when running motif analysis using GPUmotif. The GPUmotif program is freely available at http://sourceforge.net/projects/gpumotif/ PMID:22662128
GPUmotif: an ultra-fast and energy-efficient motif analysis program using graphics processing units.

PubMed

Zandevakili, Pooya; Hu, Ming; Qin, Zhaohui

2012-01-01

Computational detection of TF binding patterns has become an indispensable tool in functional genomics research. With the rapid advance of new sequencing technologies, large amounts of protein-DNA interaction data have been produced. Analyzing this data can provide substantial insight into the mechanisms of transcriptional regulation. However, the massive amount of sequence data presents daunting challenges. In our previous work, we have developed a novel algorithm called Hybrid Motif Sampler (HMS) that enables more scalable and accurate motif analysis. Despite much improvement, HMS is still time-consuming due to the requirement to calculate matching probabilities position-by-position. Using the NVIDIA CUDA toolkit, we developed a graphics processing unit (GPU)-accelerated motif analysis program named GPUmotif. We proposed a "fragmentation" technique to hide data transfer time between memories. Performance comparison studies showed that commonly-used model-based motif scan and de novo motif finding procedures such as HMS can be dramatically accelerated when running GPUmotif on NVIDIA graphics cards. As a result, energy consumption can also be greatly reduced when running motif analysis using GPUmotif. The GPUmotif program is freely available at http://sourceforge.net/projects/gpumotif/
Computer simulations and real-time control of ELT AO systems using graphical processing units

NASA Astrophysics Data System (ADS)

Wang, Lianqi; Ellerbroek, Brent

2012-07-01

The adaptive optics (AO) simulations at the Thirty Meter Telescope (TMT) have been carried out using the efficient, C based multi-threaded adaptive optics simulator (MAOS, http://github.com/lianqiw/maos). By porting time-critical parts of MAOS to graphical processing units (GPU) using NVIDIA CUDA technology, we achieved a 10 fold speed up for each GTX 580 GPU used compared to a modern quad core CPU. Each time step of full scale end to end simulation for the TMT narrow field infrared AO system (NFIRAOS) takes only 0.11 second in a desktop with two GTX 580s. We also demonstrate that the TMT minimum variance reconstructor can be assembled in matrix vector multiply (MVM) format in 8 seconds with 8 GTX 580 GPUs, meeting the TMT requirement for updating the reconstructor. Analysis show that it is also possible to apply the MVM using 8 GTX 580s within the required latency.

Particle-In-Cell simulations of high pressure plasmas using graphics processing units

NASA Astrophysics Data System (ADS)

Gebhardt, Markus; Atteln, Frank; Brinkmann, Ralf Peter; Mussenbrock, Thomas; Mertmann, Philipp; Awakowicz, Peter

2009-10-01

Particle-In-Cell (PIC) simulations are widely used to understand the fundamental phenomena in low-temperature plasmas. Particularly plasmas at very low gas pressures are studied using PIC methods. The inherent drawback of these methods is that they are very time consuming -- certain stability conditions has to be satisfied. This holds even more for the PIC simulation of high pressure plasmas due to the very high collision rates. The simulations take up to very much time to run on standard computers and require the help of computer clusters or super computers. Recent advances in the field of graphics processing units (GPUs) provides every personal computer with a highly parallel multi processor architecture for very little money. This architecture is freely programmable and can be used to implement a wide class of problems. In this paper we present the concepts of a fully parallel PIC simulation of high pressure plasmas using the benefits of GPU programming.
Tempest: Accelerated MS/MS Database Search Software for Heterogeneous Computing Platforms.

PubMed

Adamo, Mark E; Gerber, Scott A

2016-09-07

MS/MS database search algorithms derive a set of candidate peptide sequences from in silico digest of a protein sequence database, and compute theoretical fragmentation patterns to match these candidates against observed MS/MS spectra. The original Tempest publication described these operations mapped to a CPU-GPU model, in which the CPU (central processing unit) generates peptide candidates that are asynchronously sent to a discrete GPU (graphics processing unit) to be scored against experimental spectra in parallel. The current version of Tempest expands this model, incorporating OpenCL to offer seamless parallelization across multicore CPUs, GPUs, integrated graphics chips, and general-purpose coprocessors. Three protocols describe how to configure and run a Tempest search, including discussion of how to leverage Tempest's unique feature set to produce optimal results. © 2016 by John Wiley & Sons, Inc. Copyright © 2016 John Wiley & Sons, Inc.
Nanoscale multireference quantum chemistry: full configuration interaction on graphical processing units.

PubMed

Fales, B Scott; Levine, Benjamin G

2015-10-13

Methods based on a full configuration interaction (FCI) expansion in an active space of orbitals are widely used for modeling chemical phenomena such as bond breaking, multiply excited states, and conical intersections in small-to-medium-sized molecules, but these phenomena occur in systems of all sizes. To scale such calculations up to the nanoscale, we have developed an implementation of FCI in which electron repulsion integral transformation and several of the more expensive steps in σ vector formation are performed on graphical processing unit (GPU) hardware. When applied to a 1.7 × 1.4 × 1.4 nm silicon nanoparticle (Si72H64) described with the polarized, all-electron 6-31G** basis set, our implementation can solve for the ground state of the 16-active-electron/16-active-orbital CASCI Hamiltonian (more than 100,000,000 configurations) in 39 min on a single NVidia K40 GPU.
Classification of hyperspectral imagery using MapReduce on a NVIDIA graphics processing unit (Conference Presentation)

NASA Astrophysics Data System (ADS)

Ramirez, Andres; Rahnemoonfar, Maryam

2017-04-01

A hyperspectral image provides multidimensional figure rich in data consisting of hundreds of spectral dimensions. Analyzing the spectral and spatial information of such image with linear and non-linear algorithms will result in high computational time. In order to overcome this problem, this research presents a system using a MapReduce-Graphics Processing Unit (GPU) model that can help analyzing a hyperspectral image through the usage of parallel hardware and a parallel programming model, which will be simpler to handle compared to other low-level parallel programming models. Additionally, Hadoop was used as an open-source version of the MapReduce parallel programming model. This research compared classification accuracy results and timing results between the Hadoop and GPU system and tested it against the following test cases: the CPU and GPU test case, a CPU test case and a test case where no dimensional reduction was applied.
Accelerating sino-atrium computer simulations with graphic processing units.

PubMed

Zhang, Hong; Xiao, Zheng; Lin, Shien-fong

2015-01-01

Sino-atrial node cells (SANCs) play a significant role in rhythmic firing. To investigate their role in arrhythmia and interactions with the atrium, computer simulations based on cellular dynamic mathematical models are generally used. However, the large-scale computation usually makes research difficult, given the limited computational power of Central Processing Units (CPUs). In this paper, an accelerating approach with Graphic Processing Units (GPUs) is proposed in a simulation consisting of the SAN tissue and the adjoining atrium. By using the operator splitting method, the computational task was made parallel. Three parallelization strategies were then put forward. The strategy with the shortest running time was further optimized by considering block size, data transfer and partition. The results showed that for a simulation with 500 SANCs and 30 atrial cells, the execution time taken by the non-optimized program decreased 62% with respect to a serial program running on CPU. The execution time decreased by 80% after the program was optimized. The larger the tissue was, the more significant the acceleration became. The results demonstrated the effectiveness of the proposed GPU-accelerating methods and their promising applications in more complicated biological simulations.
Accelerating Monte Carlo simulations of photon transport in a voxelized geometry using a massively parallel graphics processing unit

DOE Office of Scientific and Technical Information (OSTI.GOV)

Badal, Andreu; Badano, Aldo

Purpose: It is a known fact that Monte Carlo simulations of radiation transport are computationally intensive and may require long computing times. The authors introduce a new paradigm for the acceleration of Monte Carlo simulations: The use of a graphics processing unit (GPU) as the main computing device instead of a central processing unit (CPU). Methods: A GPU-based Monte Carlo code that simulates photon transport in a voxelized geometry with the accurate physics models from PENELOPE has been developed using the CUDA programming model (NVIDIA Corporation, Santa Clara, CA). Results: An outline of the new code and a sample x-raymore » imaging simulation with an anthropomorphic phantom are presented. A remarkable 27-fold speed up factor was obtained using a GPU compared to a single core CPU. Conclusions: The reported results show that GPUs are currently a good alternative to CPUs for the simulation of radiation transport. Since the performance of GPUs is currently increasing at a faster pace than that of CPUs, the advantages of GPU-based software are likely to be more pronounced in the future.« less
Magnetohydrodynamics with GAMER

NASA Astrophysics Data System (ADS)

Zhang, Ui-Han; Schive, Hsi-Yu; Chiueh, Tzihong

2018-06-01

GAMER, a parallel Graphic-processing-unit-accelerated Adaptive-MEsh-Refinement (AMR) hydrodynamic code, has been extended to support magnetohydrodynamics (MHD) with both the corner-transport-upwind and MUSCL-Hancock schemes and the constraint transport technique. The divergent preserving operator for AMR has been applied to reinforce the divergence-free constraint on the magnetic field. GAMER-MHD has fully exploited the concurrent executions between the graphic process unit (GPU) MHD solver and other central processing unit computation pertinent to AMR. We perform various standard tests to demonstrate that GAMER-MHD is both second-order accurate and robust, producing results as accurate as those given by high-resolution uniform-grid runs. We also explore a new 3D MHD test, where the magnetic field assumes the Arnold–Beltrami–Childress configuration, temporarily becomes turbulent with current sheets, and finally settles to a lowest-energy equilibrium state. This 3D problem is adopted for the performance test of GAMER-MHD. The single-GPU performance reaches 1.2 × 108 and 5.5 × 107 cell updates per second for the single- and double-precision calculations, respectively, on Tesla P100. We also demonstrate a parallel efficiency of ∼70% for both weak and strong scaling using 1024 XK nodes on the Blue Waters supercomputers.
Advanced mathematical on-line analysis in nuclear experiments. Usage of parallel computing CUDA routines in standard root analysis

NASA Astrophysics Data System (ADS)

Grzeszczuk, A.; Kowalski, S.

2015-04-01

Compute Unified Device Architecture (CUDA) is a parallel computing platform developed by Nvidia for increase speed of graphics by usage of parallel mode for processes calculation. The success of this solution has opened technology General-Purpose Graphic Processor Units (GPGPUs) for applications not coupled with graphics. The GPGPUs system can be applying as effective tool for reducing huge number of data for pulse shape analysis measures, by on-line recalculation or by very quick system of compression. The simplified structure of CUDA system and model of programming based on example Nvidia GForce GTX580 card are presented by our poster contribution in stand-alone version and as ROOT application.
Designing and Implementing an OVERFLOW Reader for ParaView and Comparing Performance Between Central Processing Units and Graphical Processing Units

NASA Technical Reports Server (NTRS)

Chawner, David M.; Gomez, Ray J.

2010-01-01

In the Applied Aerosciences and CFD branch at Johnson Space Center, computational simulations are run that face many challenges. Two of which are the ability to customize software for specialized needs and the need to run simulations as fast as possible. There are many different tools that are used for running these simulations and each one has its own pros and cons. Once these simulations are run, there needs to be software capable of visualizing the results in an appealing manner. Some of this software is called open source, meaning that anyone can edit the source code to make modifications and distribute it to all other users in a future release. This is very useful, especially in this branch where many different tools are being used. File readers can be written to load any file format into a program, to ease the bridging from one tool to another. Programming such a reader requires knowledge of the file format that is being read as well as the equations necessary to obtain the derived values after loading. When running these CFD simulations, extremely large files are being loaded and having values being calculated. These simulations usually take a few hours to complete, even on the fastest machines. Graphics processing units (GPUs) are usually used to load the graphics for computers; however, in recent years, GPUs are being used for more generic applications because of the speed of these processors. Applications run on GPUs have been known to run up to forty times faster than they would on normal central processing units (CPUs). If these CFD programs are extended to run on GPUs, the amount of time they would require to complete would be much less. This would allow more simulations to be run in the same amount of time and possibly perform more complex computations.
Performance and scalability of Fourier domain optical coherence tomography acceleration using graphics processing units.

PubMed

Li, Jian; Bloch, Pavel; Xu, Jing; Sarunic, Marinko V; Shannon, Lesley

2011-05-01

Fourier domain optical coherence tomography (FD-OCT) provides faster line rates, better resolution, and higher sensitivity for noninvasive, in vivo biomedical imaging compared to traditional time domain OCT (TD-OCT). However, because the signal processing for FD-OCT is computationally intensive, real-time FD-OCT applications demand powerful computing platforms to deliver acceptable performance. Graphics processing units (GPUs) have been used as coprocessors to accelerate FD-OCT by leveraging their relatively simple programming model to exploit thread-level parallelism. Unfortunately, GPUs do not "share" memory with their host processors, requiring additional data transfers between the GPU and CPU. In this paper, we implement a complete FD-OCT accelerator on a consumer grade GPU/CPU platform. Our data acquisition system uses spectrometer-based detection and a dual-arm interferometer topology with numerical dispersion compensation for retinal imaging. We demonstrate that the maximum line rate is dictated by the memory transfer time and not the processing time due to the GPU platform's memory model. Finally, we discuss how the performance trends of GPU-based accelerators compare to the expected future requirements of FD-OCT data rates.
Heterogeneity in homogeneous nucleation from billion-atom molecular dynamics simulation of solidification of pure metal.

PubMed

Shibuta, Yasushi; Sakane, Shinji; Miyoshi, Eisuke; Okita, Shin; Takaki, Tomohiro; Ohno, Munekazu

2017-04-05

Can completely homogeneous nucleation occur? Large scale molecular dynamics simulations performed on a graphics-processing-unit rich supercomputer can shed light on this long-standing issue. Here, a billion-atom molecular dynamics simulation of homogeneous nucleation from an undercooled iron melt reveals that some satellite-like small grains surrounding previously formed large grains exist in the middle of the nucleation process, which are not distributed uniformly. At the same time, grains with a twin boundary are formed by heterogeneous nucleation from the surface of the previously formed grains. The local heterogeneity in the distribution of grains is caused by the local accumulation of the icosahedral structure in the undercooled melt near the previously formed grains. This insight is mainly attributable to the multi-graphics processing unit parallel computation combined with the rapid progress in high-performance computational environments.Nucleation is a fundamental physical process, however it is a long-standing issue whether completely homogeneous nucleation can occur. Here the authors reveal, via a billion-atom molecular dynamics simulation, that local heterogeneity exists during homogeneous nucleation in an undercooled iron melt.
Faster, Better, Cheaper: A Decade of PC Progress.

ERIC Educational Resources Information Center

Crawford, Walt

1997-01-01

Reviews the development of personal computers and how computer components have changed in price and value. Highlights include disk drives; keyboards; displays; memory; color graphics; modems; CPU (central processing unit); storage; direct mail vendors; and future possibilities. (LRW)
A Large Scale, High Resolution Agent-Based Insurgency Model

DTIC Science & Technology

2013-09-30

CUDA) is NVIDIA Corporation’s software development model for General Purpose Programming on Graphics Processing Units (GPGPU) ( NVIDIA Corporation ...Conference. Argonne National Laboratory, Argonne, IL, October, 2005. NVIDIA Corporation . NVIDIA CUDA Programming Guide 2.0 [Online]. NVIDIA Corporation
Processing-in-Memory Enabled Graphics Processors for 3D Rendering

DOE Office of Scientific and Technical Information (OSTI.GOV)

Xie, Chenhao; Song, Shuaiwen; Wang, Jing

2017-02-06

The performance of 3D rendering of Graphics Processing Unit that convents 3D vector stream into 2D frame with 3D image effects significantly impact users’ gaming experience on modern computer systems. Due to the high texture throughput in 3D rendering, main memory bandwidth becomes a critical obstacle for improving the overall rendering performance. 3D stacked memory systems such as Hybrid Memory Cube (HMC) provide opportunities to significantly overcome the memory wall by directly connecting logic controllers to DRAM dies. Based on the observation that texel fetches significantly impact off-chip memory traffic, we propose two architectural designs to enable Processing-In-Memory based GPUmore » for efficient 3D rendering.« less
Framework Programmable Platform for the Advanced Software Development Workstation (FPP/ASDW). Demonstration framework document. Volume 2: Framework process description

NASA Technical Reports Server (NTRS)

Mayer, Richard J.; Blinn, Thomas M.; Dewitte, Paula S.; Crump, John W.; Ackley, Keith A.

1992-01-01

In the second volume of the Demonstration Framework Document, the graphical representation of the demonstration framework is given. This second document was created to facilitate the reading and comprehension of the demonstration framework. It is designed to be viewed in parallel with Section 4.2 of the first volume to help give a picture of the relationships between the UOB's (Unit of Behavior) of the model. The model is quite large and the design team felt that this form of presentation would make it easier for the reader to get a feel for the processes described in this document. The IDEF3 (Process Description Capture Method) diagrams of the processes of an Information System Development are presented. Volume 1 describes the processes and the agents involved with each process, while this volume graphically shows the precedence relationships among the processes.
Parallelized multi–graphics processing unit framework for high-speed Gabor-domain optical coherence microscopy

PubMed Central

Tankam, Patrice; Santhanam, Anand P.; Lee, Kye-Sung; Won, Jungeun; Canavesi, Cristina; Rolland, Jannick P.

2014-01-01

Abstract. Gabor-domain optical coherence microscopy (GD-OCM) is a volumetric high-resolution technique capable of acquiring three-dimensional (3-D) skin images with histological resolution. Real-time image processing is needed to enable GD-OCM imaging in a clinical setting. We present a parallelized and scalable multi-graphics processing unit (GPU) computing framework for real-time GD-OCM image processing. A parallelized control mechanism was developed to individually assign computation tasks to each of the GPUs. For each GPU, the optimal number of amplitude-scans (A-scans) to be processed in parallel was selected to maximize GPU memory usage and core throughput. We investigated five computing architectures for computational speed-up in processing 1000×1000 A-scans. The proposed parallelized multi-GPU computing framework enables processing at a computational speed faster than the GD-OCM image acquisition, thereby facilitating high-speed GD-OCM imaging in a clinical setting. Using two parallelized GPUs, the image processing of a 1×1×0.6 mm3 skin sample was performed in about 13 s, and the performance was benchmarked at 6.5 s with four GPUs. This work thus demonstrates that 3-D GD-OCM data may be displayed in real-time to the examiner using parallelized GPU processing. PMID:24695868
Parallelized multi-graphics processing unit framework for high-speed Gabor-domain optical coherence microscopy.

PubMed

Tankam, Patrice; Santhanam, Anand P; Lee, Kye-Sung; Won, Jungeun; Canavesi, Cristina; Rolland, Jannick P

2014-07-01

Gabor-domain optical coherence microscopy (GD-OCM) is a volumetric high-resolution technique capable of acquiring three-dimensional (3-D) skin images with histological resolution. Real-time image processing is needed to enable GD-OCM imaging in a clinical setting. We present a parallelized and scalable multi-graphics processing unit (GPU) computing framework for real-time GD-OCM image processing. A parallelized control mechanism was developed to individually assign computation tasks to each of the GPUs. For each GPU, the optimal number of amplitude-scans (A-scans) to be processed in parallel was selected to maximize GPU memory usage and core throughput. We investigated five computing architectures for computational speed-up in processing 1000×1000 A-scans. The proposed parallelized multi-GPU computing framework enables processing at a computational speed faster than the GD-OCM image acquisition, thereby facilitating high-speed GD-OCM imaging in a clinical setting. Using two parallelized GPUs, the image processing of a 1×1×0.6 mm3 skin sample was performed in about 13 s, and the performance was benchmarked at 6.5 s with four GPUs. This work thus demonstrates that 3-D GD-OCM data may be displayed in real-time to the examiner using parallelized GPU processing.
An Overview of U.S. Trends in Educational Software Design.

ERIC Educational Resources Information Center

Colvin, Linda B.

1989-01-01

Describes trends in educational software design in the United States for elementary and secondary education. Highlights include user-friendly software; learner control; interfacing the computer with other media, including television, telecommunications networks, and optical disk technology; microworlds; graphics; word processing; database…
Ultraviolet Communication for Medical Applications

DTIC Science & Technology

2015-06-01

In the previous Phase I effort, Directed Energy Inc.’s (DEI) parent company Imaging Systems Technology (IST) demonstrated feasibility of several key...accurately model high path loss. Custom photon scatter code was rewritten for parallel execution on a graphics processing unit (GPU). The NVidia CUDA
Real-time simulation of a spiking neural network model of the basal ganglia circuitry using general purpose computing on graphics processing units.

PubMed

Igarashi, Jun; Shouno, Osamu; Fukai, Tomoki; Tsujino, Hiroshi

2011-11-01

Real-time simulation of a biologically realistic spiking neural network is necessary for evaluation of its capacity to interact with real environments. However, the real-time simulation of such a neural network is difficult due to its high computational costs that arise from two factors: (1) vast network size and (2) the complicated dynamics of biologically realistic neurons. In order to address these problems, mainly the latter, we chose to use general purpose computing on graphics processing units (GPGPUs) for simulation of such a neural network, taking advantage of the powerful computational capability of a graphics processing unit (GPU). As a target for real-time simulation, we used a model of the basal ganglia that has been developed according to electrophysiological and anatomical knowledge. The model consists of heterogeneous populations of 370 spiking model neurons, including computationally heavy conductance-based models, connected by 11,002 synapses. Simulation of the model has not yet been performed in real-time using a general computing server. By parallelization of the model on the NVIDIA Geforce GTX 280 GPU in data-parallel and task-parallel fashion, faster-than-real-time simulation was robustly realized with only one-third of the GPU's total computational resources. Furthermore, we used the GPU's full computational resources to perform faster-than-real-time simulation of three instances of the basal ganglia model; these instances consisted of 1100 neurons and 33,006 synapses and were synchronized at each calculation step. Finally, we developed software for simultaneous visualization of faster-than-real-time simulation output. These results suggest the potential power of GPGPU techniques in real-time simulation of realistic neural networks. Copyright © 2011 Elsevier Ltd. All rights reserved.

Hybrid parallel computing architecture for multiview phase shifting

NASA Astrophysics Data System (ADS)

Zhong, Kai; Li, Zhongwei; Zhou, Xiaohui; Shi, Yusheng; Wang, Congjun

2014-11-01

The multiview phase-shifting method shows its powerful capability in achieving high resolution three-dimensional (3-D) shape measurement. Unfortunately, this ability results in very high computation costs and 3-D computations have to be processed offline. To realize real-time 3-D shape measurement, a hybrid parallel computing architecture is proposed for multiview phase shifting. In this architecture, the central processing unit can co-operate with the graphic processing unit (GPU) to achieve hybrid parallel computing. The high computation cost procedures, including lens distortion rectification, phase computation, correspondence, and 3-D reconstruction, are implemented in GPU, and a three-layer kernel function model is designed to simultaneously realize coarse-grained and fine-grained paralleling computing. Experimental results verify that the developed system can perform 50 fps (frame per second) real-time 3-D measurement with 260 K 3-D points per frame. A speedup of up to 180 times is obtained for the performance of the proposed technique using a NVIDIA GT560Ti graphics card rather than a sequential C in a 3.4 GHZ Inter Core i7 3770.
Practical Implementation of Prestack Kirchhoff Time Migration on a General Purpose Graphics Processing Unit

NASA Astrophysics Data System (ADS)

Liu, Guofeng; Li, Chun

2016-08-01

In this study, we present a practical implementation of prestack Kirchhoff time migration (PSTM) on a general purpose graphic processing unit. First, we consider the three main optimizations of the PSTM GPU code, i.e., designing a configuration based on a reasonable execution, using the texture memory for velocity interpolation, and the application of an intrinsic function in device code. This approach can achieve a speedup of nearly 45 times on a NVIDIA GTX 680 GPU compared with CPU code when a larger imaging space is used, where the PSTM output is a common reflection point that is gathered as I[ nx][ ny][ nh][ nt] in matrix format. However, this method requires more memory space so the limited imaging space cannot fully exploit the GPU sources. To overcome this problem, we designed a PSTM scheme with multi-GPUs for imaging different seismic data on different GPUs using an offset value. This process can achieve the peak speedup of GPU PSTM code and it greatly increases the efficiency of the calculations, but without changing the imaging result.
Accelerated Adaptive MGS Phase Retrieval

NASA Technical Reports Server (NTRS)

Lam, Raymond K.; Ohara, Catherine M.; Green, Joseph J.; Bikkannavar, Siddarayappa A.; Basinger, Scott A.; Redding, David C.; Shi, Fang

2011-01-01

The Modified Gerchberg-Saxton (MGS) algorithm is an image-based wavefront-sensing method that can turn any science instrument focal plane into a wavefront sensor. MGS characterizes optical systems by estimating the wavefront errors in the exit pupil using only intensity images of a star or other point source of light. This innovative implementation of MGS significantly accelerates the MGS phase retrieval algorithm by using stream-processing hardware on conventional graphics cards. Stream processing is a relatively new, yet powerful, paradigm to allow parallel processing of certain applications that apply single instructions to multiple data (SIMD). These stream processors are designed specifically to support large-scale parallel computing on a single graphics chip. Computationally intensive algorithms, such as the Fast Fourier Transform (FFT), are particularly well suited for this computing environment. This high-speed version of MGS exploits commercially available hardware to accomplish the same objective in a fraction of the original time. The exploit involves performing matrix calculations in nVidia graphic cards. The graphical processor unit (GPU) is hardware that is specialized for computationally intensive, highly parallel computation. From the software perspective, a parallel programming model is used, called CUDA, to transparently scale multicore parallelism in hardware. This technology gives computationally intensive applications access to the processing power of the nVidia GPUs through a C/C++ programming interface. The AAMGS (Accelerated Adaptive MGS) software takes advantage of these advanced technologies, to accelerate the optical phase error characterization. With a single PC that contains four nVidia GTX-280 graphic cards, the new implementation can process four images simultaneously to produce a JWST (James Webb Space Telescope) wavefront measurement 60 times faster than the previous code.
GPU Based Software Correlators - Perspectives for VLBI2010

NASA Technical Reports Server (NTRS)

Hobiger, Thomas; Kimura, Moritaka; Takefuji, Kazuhiro; Oyama, Tomoaki; Koyama, Yasuhiro; Kondo, Tetsuro; Gotoh, Tadahiro; Amagai, Jun

2010-01-01

Caused by historical separation and driven by the requirements of the PC gaming industry, Graphics Processing Units (GPUs) have evolved to massive parallel processing systems which entered the area of non-graphic related applications. Although a single processing core on the GPU is much slower and provides less functionality than its counterpart on the CPU, the huge number of these small processing entities outperforms the classical processors when the application can be parallelized. Thus, in recent years various radio astronomical projects have started to make use of this technology either to realize the correlator on this platform or to establish the post-processing pipeline with GPUs. Therefore, the feasibility of GPUs as a choice for a VLBI correlator is being investigated, including pros and cons of this technology. Additionally, a GPU based software correlator will be reviewed with respect to energy consumption/GFlop/sec and cost/GFlop/sec.
Graphics Processing Unit (GPU) implementation of image processing algorithms to improve system performance of the Control, Acquisition, Processing, and Image Display System (CAPIDS) of the Micro-Angiographic Fluoroscope (MAF).

PubMed

Vasan, S N Swetadri; Ionita, Ciprian N; Titus, A H; Cartwright, A N; Bednarek, D R; Rudin, S

2012-02-23

We present the image processing upgrades implemented on a Graphics Processing Unit (GPU) in the Control, Acquisition, Processing, and Image Display System (CAPIDS) for the custom Micro-Angiographic Fluoroscope (MAF) detector. Most of the image processing currently implemented in the CAPIDS system is pixel independent; that is, the operation on each pixel is the same and the operation on one does not depend upon the result from the operation on the other, allowing the entire image to be processed in parallel. GPU hardware was developed for this kind of massive parallel processing implementation. Thus for an algorithm which has a high amount of parallelism, a GPU implementation is much faster than a CPU implementation. The image processing algorithm upgrades implemented on the CAPIDS system include flat field correction, temporal filtering, image subtraction, roadmap mask generation and display window and leveling. A comparison between the previous and the upgraded version of CAPIDS has been presented, to demonstrate how the improvement is achieved. By performing the image processing on a GPU, significant improvements (with respect to timing or frame rate) have been achieved, including stable operation of the system at 30 fps during a fluoroscopy run, a DSA run, a roadmap procedure and automatic image windowing and leveling during each frame.
Real-time track-less Cherenkov ring fitting trigger system based on Graphics Processing Units

NASA Astrophysics Data System (ADS)

Ammendola, R.; Biagioni, A.; Chiozzi, S.; Cretaro, P.; Cotta Ramusino, A.; Di Lorenzo, S.; Fantechi, R.; Fiorini, M.; Frezza, O.; Gianoli, A.; Lamanna, G.; Lo Cicero, F.; Lonardo, A.; Martinelli, M.; Neri, I.; Paolucci, P. S.; Pastorelli, E.; Piandani, R.; Piccini, M.; Pontisso, L.; Rossetti, D.; Simula, F.; Sozzi, M.; Vicini, P.

2017-12-01

The parallel computing power of commercial Graphics Processing Units (GPUs) is exploited to perform real-time ring fitting at the lowest trigger level using information coming from the Ring Imaging Cherenkov (RICH) detector of the NA62 experiment at CERN. To this purpose, direct GPU communication with a custom FPGA-based board has been used to reduce the data transmission latency. The GPU-based trigger system is currently integrated in the experimental setup of the RICH detector of the NA62 experiment, in order to reconstruct ring-shaped hit patterns. The ring-fitting algorithm running on GPU is fed with raw RICH data only, with no information coming from other detectors, and is able to provide more complex trigger primitives with respect to the simple photodetector hit multiplicity, resulting in a higher selection efficiency. The performance of the system for multi-ring Cherenkov online reconstruction obtained during the NA62 physics run is presented.
Graphics-processing-unit-accelerated finite-difference time-domain simulation of the interaction between ultrashort laser pulses and metal nanoparticles

NASA Astrophysics Data System (ADS)

Nikolskiy, V. P.; Stegailov, V. V.

2018-01-01

Metal nanoparticles (NPs) serve as important tools for many modern technologies. However, the proper microscopic models of the interaction between ultrashort laser pulses and metal NPs are currently not very well developed in many cases. One part of the problem is the description of the warm dense matter that is formed in NPs after intense irradiation. Another part of the problem is the description of the electromagnetic waves around NPs. Description of wave propagation requires the solution of Maxwell’s equations and the finite-difference time-domain (FDTD) method is the classic approach for solving them. There are many commercial and free implementations of FDTD, including the open source software that supports graphics processing unit (GPU) acceleration. In this report we present the results on the FDTD calculations for different cases of the interaction between ultrashort laser pulses and metal nanoparticles. Following our previous results, we analyze the efficiency of the GPU acceleration of the FDTD algorithm.
High-Throughput Characterization of Porous Materials Using Graphics Processing Units

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kim, Jihan; Martin, Richard L.; Rübel, Oliver

We have developed a high-throughput graphics processing units (GPU) code that can characterize a large database of crystalline porous materials. In our algorithm, the GPU is utilized to accelerate energy grid calculations where the grid values represent interactions (i.e., Lennard-Jones + Coulomb potentials) between gas molecules (i.e., CHmore » $$_{4}$$ and CO$$_{2}$$) and material's framework atoms. Using a parallel flood fill CPU algorithm, inaccessible regions inside the framework structures are identified and blocked based on their energy profiles. Finally, we compute the Henry coefficients and heats of adsorption through statistical Widom insertion Monte Carlo moves in the domain restricted to the accessible space. The code offers significant speedup over a single core CPU code and allows us to characterize a set of porous materials at least an order of magnitude larger than ones considered in earlier studies. For structures selected from such a prescreening algorithm, full adsorption isotherms can be calculated by conducting multiple grand canonical Monte Carlo simulations concurrently within the GPU.« less
Communication: A reduced scaling J-engine based reformulation of SOS-MP2 using graphics processing units.

PubMed

Maurer, S A; Kussmann, J; Ochsenfeld, C

2014-08-07

We present a low-prefactor, cubically scaling scaled-opposite-spin second-order Møller-Plesset perturbation theory (SOS-MP2) method which is highly suitable for massively parallel architectures like graphics processing units (GPU). The scaling is reduced from O(N⁵) to O(N³) by a reformulation of the MP2-expression in the atomic orbital basis via Laplace transformation and the resolution-of-the-identity (RI) approximation of the integrals in combination with efficient sparse algebra for the 3-center integral transformation. In contrast to previous works that employ GPUs for post Hartree-Fock calculations, we do not simply employ GPU-based linear algebra libraries to accelerate the conventional algorithm. Instead, our reformulation allows to replace the rate-determining contraction step with a modified J-engine algorithm, that has been proven to be highly efficient on GPUs. Thus, our SOS-MP2 scheme enables us to treat large molecular systems in an accurate and efficient manner on a single GPU-server.
Real time emotion aware applications: a case study employing emotion evocative pictures and neuro-physiological sensing enhanced by Graphic Processor Units.

PubMed

Konstantinidis, Evdokimos I; Frantzidis, Christos A; Pappas, Costas; Bamidis, Panagiotis D

2012-07-01

In this paper the feasibility of adopting Graphic Processor Units towards real-time emotion aware computing is investigated for boosting the time consuming computations employed in such applications. The proposed methodology was employed in analysis of encephalographic and electrodermal data gathered when participants passively viewed emotional evocative stimuli. The GPU effectiveness when processing electroencephalographic and electrodermal recordings is demonstrated by comparing the execution time of chaos/complexity analysis through nonlinear dynamics (multi-channel correlation dimension/D2) and signal processing algorithms (computation of skin conductance level/SCL) into various popular programming environments. Apart from the beneficial role of parallel programming, the adoption of special design techniques regarding memory management may further enhance the time minimization which approximates a factor of 30 in comparison with ANSI C language (single-core sequential execution). Therefore, the use of GPU parallel capabilities offers a reliable and robust solution for real-time sensing the user's affective state. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
High performance hybrid functional Petri net simulations of biological pathway models on CUDA.

PubMed

Chalkidis, Georgios; Nagasaki, Masao; Miyano, Satoru

2011-01-01

Hybrid functional Petri nets are a wide-spread tool for representing and simulating biological models. Due to their potential of providing virtual drug testing environments, biological simulations have a growing impact on pharmaceutical research. Continuous research advancements in biology and medicine lead to exponentially increasing simulation times, thus raising the demand for performance accelerations by efficient and inexpensive parallel computation solutions. Recent developments in the field of general-purpose computation on graphics processing units (GPGPU) enabled the scientific community to port a variety of compute intensive algorithms onto the graphics processing unit (GPU). This work presents the first scheme for mapping biological hybrid functional Petri net models, which can handle both discrete and continuous entities, onto compute unified device architecture (CUDA) enabled GPUs. GPU accelerated simulations are observed to run up to 18 times faster than sequential implementations. Simulating the cell boundary formation by Delta-Notch signaling on a CUDA enabled GPU results in a speedup of approximately 7x for a model containing 1,600 cells.
Efficient Acceleration of the Pair-HMMs Forward Algorithm for GATK HaplotypeCaller on Graphics Processing Units.

PubMed

Ren, Shanshan; Bertels, Koen; Al-Ars, Zaid

2018-01-01

GATK HaplotypeCaller (HC) is a popular variant caller, which is widely used to identify variants in complex genomes. However, due to its high variants detection accuracy, it suffers from long execution time. In GATK HC, the pair-HMMs forward algorithm accounts for a large percentage of the total execution time. This article proposes to accelerate the pair-HMMs forward algorithm on graphics processing units (GPUs) to improve the performance of GATK HC. This article presents several GPU-based implementations of the pair-HMMs forward algorithm. It also analyzes the performance bottlenecks of the implementations on an NVIDIA Tesla K40 card with various data sets. Based on these results and the characteristics of GATK HC, we are able to identify the GPU-based implementations with the highest performance for the various analyzed data sets. Experimental results show that the GPU-based implementations of the pair-HMMs forward algorithm achieve a speedup of up to 5.47× over existing GPU-based implementations.
Multidimensional upwind hydrodynamics on unstructured meshes using graphics processing units - I. Two-dimensional uniform meshes

NASA Astrophysics Data System (ADS)

Paardekooper, S.-J.

2017-08-01

We present a new method for numerical hydrodynamics which uses a multidimensional generalization of the Roe solver and operates on an unstructured triangular mesh. The main advantage over traditional methods based on Riemann solvers, which commonly use one-dimensional flux estimates as building blocks for a multidimensional integration, is its inherently multidimensional nature, and as a consequence its ability to recognize multidimensional stationary states that are not hydrostatic. A second novelty is the focus on graphics processing units (GPUs). By tailoring the algorithms specifically to GPUs, we are able to get speedups of 100-250 compared to a desktop machine. We compare the multidimensional upwind scheme to a traditional, dimensionally split implementation of the Roe solver on several test problems, and we find that the new method significantly outperforms the Roe solver in almost all cases. This comes with increased computational costs per time-step, which makes the new method approximately a factor of 2 slower than a dimensionally split scheme acting on a structured grid.
More than Mere Motivation: Learning Specific Content through Multimodal Narratives

ERIC Educational Resources Information Center

Brugar, Kristy A.; Roberts, Kathryn L.; Jiménez, Laura M.; Meyer, Carla K.

2018-01-01

This study explores the possibilities for learning content that might accompany the use of an historically accurate graphic novel as part of a language arts instructional unit. During a 6-day unit, 16 sixth grade students engaged in graphic novels in ways that support comprehension, both in the context of a graphic novel text set and a specific…
Real-time colouring and filtering with graphics shaders

NASA Astrophysics Data System (ADS)

Vohl, D.; Fluke, C. J.; Barnes, D. G.; Hassan, A. H.

2017-11-01

Despite the popularity of the Graphics Processing Unit (GPU) for general purpose computing, one should not forget about the practicality of the GPU for fast scientific visualization. As astronomers have increasing access to three-dimensional (3D) data from instruments and facilities like integral field units and radio interferometers, visualization techniques such as volume rendering offer means to quickly explore spectral cubes as a whole. As most 3D visualization techniques have been developed in fields of research like medical imaging and fluid dynamics, many transfer functions are not optimal for astronomical data. We demonstrate how transfer functions and graphics shaders can be exploited to provide new astronomy-specific explorative colouring methods. We present 12 shaders, including four novel transfer functions specifically designed to produce intuitive and informative 3D visualizations of spectral cube data. We compare their utility to classic colour mapping. The remaining shaders highlight how common computation like filtering, smoothing and line ratio algorithms can be integrated as part of the graphics pipeline. We discuss how this can be achieved by utilizing the parallelism of modern GPUs along with a shading language, letting astronomers apply these new techniques at interactive frame rates. All shaders investigated in this work are included in the open source software shwirl (Vohl 2017).
Using Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

NASA Astrophysics Data System (ADS)

O'Connor, A. S.; Justice, B.; Harris, A. T.

2013-12-01

Graphics Processing Units (GPUs) are high-performance multiple-core processors capable of very high computational speeds and large data throughput. Modern GPUs are inexpensive and widely available commercially. These are general-purpose parallel processors with support for a variety of programming interfaces, including industry standard languages such as C. GPU implementations of algorithms that are well suited for parallel processing can often achieve speedups of several orders of magnitude over optimized CPU codes. Significant improvements in speeds for imagery orthorectification, atmospheric correction, target detection and image transformations like Independent Components Analsyis (ICA) have been achieved using GPU-based implementations. Additional optimizations, when factored in with GPU processing capabilities, can provide 50x - 100x reduction in the time required to process large imagery. Exelis Visual Information Solutions (VIS) has implemented a CUDA based GPU processing frame work for accelerating ENVI and IDL processes that can best take advantage of parallelization. Testing Exelis VIS has performed shows that orthorectification can take as long as two hours with a WorldView1 35,0000 x 35,000 pixel image. With GPU orthorecification, the same orthorectification process takes three minutes. By speeding up image processing, imagery can successfully be used by first responders, scientists making rapid discoveries with near real time data, and provides an operational component to data centers needing to quickly process and disseminate data.
A fast CT reconstruction scheme for a general multi-core PC.

PubMed

Zeng, Kai; Bai, Erwei; Wang, Ge

2007-01-01

Expensive computational cost is a severe limitation in CT reconstruction for clinical applications that need real-time feedback. A primary example is bolus-chasing computed tomography (CT) angiography (BCA) that we have been developing for the past several years. To accelerate the reconstruction process using the filtered backprojection (FBP) method, specialized hardware or graphics cards can be used. However, specialized hardware is expensive and not flexible. The graphics processing unit (GPU) in a current graphic card can only reconstruct images in a reduced precision and is not easy to program. In this paper, an acceleration scheme is proposed based on a multi-core PC. In the proposed scheme, several techniques are integrated, including utilization of geometric symmetry, optimization of data structures, single-instruction multiple-data (SIMD) processing, multithreaded computation, and an Intel C++ compilier. Our scheme maintains the original precision and involves no data exchange between the GPU and CPU. The merits of our scheme are demonstrated in numerical experiments against the traditional implementation. Our scheme achieves a speedup of about 40, which can be further improved by several folds using the latest quad-core processors.
A Fast CT Reconstruction Scheme for a General Multi-Core PC

PubMed Central

Zeng, Kai; Bai, Erwei; Wang, Ge

2007-01-01

Expensive computational cost is a severe limitation in CT reconstruction for clinical applications that need real-time feedback. A primary example is bolus-chasing computed tomography (CT) angiography (BCA) that we have been developing for the past several years. To accelerate the reconstruction process using the filtered backprojection (FBP) method, specialized hardware or graphics cards can be used. However, specialized hardware is expensive and not flexible. The graphics processing unit (GPU) in a current graphic card can only reconstruct images in a reduced precision and is not easy to program. In this paper, an acceleration scheme is proposed based on a multi-core PC. In the proposed scheme, several techniques are integrated, including utilization of geometric symmetry, optimization of data structures, single-instruction multiple-data (SIMD) processing, multithreaded computation, and an Intel C++ compilier. Our scheme maintains the original precision and involves no data exchange between the GPU and CPU. The merits of our scheme are demonstrated in numerical experiments against the traditional implementation. Our scheme achieves a speedup of about 40, which can be further improved by several folds using the latest quad-core processors. PMID:18256731
Development and Evaluation of Sterographic Display for Lung Cancer Screening

DTIC Science & Technology

2008-12-01

burden. Application of GPUs – With the evolution of commodity graphics processing units (GPUs) for accelerating games on personal computers, over the...units, which are designed for rendering computer games , are readily available and can be programmed to perform the kinds of real-time calculations...575-581, 1994. 12. Anderson CM, Saloner D, Tsuruda JS, Shapeero LG, Lee RE. "Artifacts in maximun-intensity-projection display of MR angiograms
Graphics Processor Units (GPUs)

NASA Technical Reports Server (NTRS)

Wyrwas, Edward J.

2017-01-01

This presentation will include information about Graphics Processor Units (GPUs) technology, NASA Electronic Parts and Packaging (NEPP) tasks, The test setup, test parameter considerations, lessons learned, collaborations, a roadmap, NEPP partners, results to date, and future plans.

BarraCUDA - a fast short read sequence aligner using graphics processing units

PubMed Central

2012-01-01

Background With the maturation of next-generation DNA sequencing (NGS) technologies, the throughput of DNA sequencing reads has soared to over 600 gigabases from a single instrument run. General purpose computing on graphics processing units (GPGPU), extracts the computing power from hundreds of parallel stream processors within graphics processing cores and provides a cost-effective and energy efficient alternative to traditional high-performance computing (HPC) clusters. In this article, we describe the implementation of BarraCUDA, a GPGPU sequence alignment software that is based on BWA, to accelerate the alignment of sequencing reads generated by these instruments to a reference DNA sequence. Findings Using the NVIDIA Compute Unified Device Architecture (CUDA) software development environment, we ported the most computational-intensive alignment component of BWA to GPU to take advantage of the massive parallelism. As a result, BarraCUDA offers a magnitude of performance boost in alignment throughput when compared to a CPU core while delivering the same level of alignment fidelity. The software is also capable of supporting multiple CUDA devices in parallel to further accelerate the alignment throughput. Conclusions BarraCUDA is designed to take advantage of the parallelism of GPU to accelerate the alignment of millions of sequencing reads generated by NGS instruments. By doing this, we could, at least in part streamline the current bioinformatics pipeline such that the wider scientific community could benefit from the sequencing technology. BarraCUDA is currently available from http://seqbarracuda.sf.net PMID:22244497
Geometric Detection Algorithms for Cavities on Protein Surfaces in Molecular Graphics: A Survey

PubMed Central

Simões, Tiago; Lopes, Daniel; Dias, Sérgio; Fernandes, Francisco; Pereira, João; Jorge, Joaquim; Bajaj, Chandrajit; Gomes, Abel

2017-01-01

Detecting and analyzing protein cavities provides significant information about active sites for biological processes (e.g., protein-protein or protein-ligand binding) in molecular graphics and modeling. Using the three-dimensional structure of a given protein (i.e., atom types and their locations in 3D) as retrieved from a PDB (Protein Data Bank) file, it is now computationally viable to determine a description of these cavities. Such cavities correspond to pockets, clefts, invaginations, voids, tunnels, channels, and grooves on the surface of a given protein. In this work, we survey the literature on protein cavity computation and classify algorithmic approaches into three categories: evolution-based, energy-based, and geometry-based. Our survey focuses on geometric algorithms, whose taxonomy is extended to include not only sphere-, grid-, and tessellation-based methods, but also surface-based, hybrid geometric, consensus, and time-varying methods. Finally, we detail those techniques that have been customized for GPU (Graphics Processing Unit) computing. PMID:29520122
Analysis of impact of general-purpose graphics processor units in supersonic flow modeling

NASA Astrophysics Data System (ADS)

Emelyanov, V. N.; Karpenko, A. G.; Kozelkov, A. S.; Teterina, I. V.; Volkov, K. N.; Yalozo, A. V.

2017-06-01

Computational methods are widely used in prediction of complex flowfields associated with off-normal situations in aerospace engineering. Modern graphics processing units (GPU) provide architectures and new programming models that enable to harness their large processing power and to design computational fluid dynamics (CFD) simulations at both high performance and low cost. Possibilities of the use of GPUs for the simulation of external and internal flows on unstructured meshes are discussed. The finite volume method is applied to solve three-dimensional unsteady compressible Euler and Navier-Stokes equations on unstructured meshes with high resolution numerical schemes. CUDA technology is used for programming implementation of parallel computational algorithms. Solutions of some benchmark test cases on GPUs are reported, and the results computed are compared with experimental and computational data. Approaches to optimization of the CFD code related to the use of different types of memory are considered. Speedup of solution on GPUs with respect to the solution on central processor unit (CPU) is compared. Performance measurements show that numerical schemes developed achieve 20-50 speedup on GPU hardware compared to CPU reference implementation. The results obtained provide promising perspective for designing a GPU-based software framework for applications in CFD.
Overview of implementation of DARPA GPU program in SAIC

NASA Astrophysics Data System (ADS)

Braunreiter, Dennis; Furtek, Jeremy; Chen, Hai-Wen; Healy, Dennis

2008-04-01

This paper reviews the implementation of DARPA MTO STAP-BOY program for both Phase I and II conducted at Science Applications International Corporation (SAIC). The STAP-BOY program conducts fast covariance factorization and tuning techniques for space-time adaptive process (STAP) Algorithm Implementation on Graphics Processor unit (GPU) Architectures for Embedded Systems. The first part of our presentation on the DARPA STAP-BOY program will focus on GPU implementation and algorithm innovations for a prototype radar STAP algorithm. The STAP algorithm will be implemented on the GPU, using stream programming (from companies such as PeakStream, ATI Technologies' CTM, and NVIDIA) and traditional graphics APIs. This algorithm will include fast range adaptive STAP weight updates and beamforming applications, each of which has been modified to exploit the parallel nature of graphics architectures.
High-performance dynamic quantum clustering on graphics processors

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wittek, Peter, E-mail: peterwittek@acm.org

2013-01-15

Clustering methods in machine learning may benefit from borrowing metaphors from physics. Dynamic quantum clustering associates a Gaussian wave packet with the multidimensional data points and regards them as eigenfunctions of the Schroedinger equation. The clustering structure emerges by letting the system evolve and the visual nature of the algorithm has been shown to be useful in a range of applications. Furthermore, the method only uses matrix operations, which readily lend themselves to parallelization. In this paper, we develop an implementation on graphics hardware and investigate how this approach can accelerate the computations. We achieve a speedup of up tomore » two magnitudes over a multicore CPU implementation, which proves that quantum-like methods and acceleration by graphics processing units have a great relevance to machine learning.« less
A space systems perspective of graphics simulation integration

NASA Technical Reports Server (NTRS)

Brown, R.; Gott, C.; Sabionski, G.; Bochsler, D.

1987-01-01

Creation of an interactive display environment can expose issues in system design and operation not apparent from nongraphics development approaches. Large amounts of information can be presented in a short period of time. Processes can be simulated and observed before committing resources. In addition, changes in the economics of computing have enabled broader graphics usage beyond traditional engineering and design into integrated telerobotics and Artificial Intelligence (AI) applications. The highly integrated nature of space operations often tend to rely upon visually intensive man-machine communication to ensure success. Graphics simulation activities at the Mission Planning and Analysis Division (MPAD) of NASA's Johnson Space Center are focusing on the evaluation of a wide variety of graphical analysis within the context of present and future space operations. Several telerobotics and AI applications studies utilizing graphical simulation are described. The presentation includes portions of videotape illustrating technology developments involving: (1) coordinated manned maneuvering unit and remote manipulator system operations, (2) a helmet mounted display system, and (3) an automated rendezous application utilizing expert system and voice input/output technology.
Advantages of GPU technology in DFT calculations of intercalated graphene

NASA Astrophysics Data System (ADS)

Pešić, J.; Gajić, R.

2014-09-01

Over the past few years, the expansion of general-purpose graphic-processing unit (GPGPU) technology has had a great impact on computational science. GPGPU is the utilization of a graphics-processing unit (GPU) to perform calculations in applications usually handled by the central processing unit (CPU). Use of GPGPUs as a way to increase computational power in the material sciences has significantly decreased computational costs in already highly demanding calculations. A level of the acceleration and parallelization depends on the problem itself. Some problems can benefit from GPU acceleration and parallelization, such as the finite-difference time-domain algorithm (FTDT) and density-functional theory (DFT), while others cannot take advantage of these modern technologies. A number of GPU-supported applications had emerged in the past several years (www.nvidia.com/object/gpu-applications.html). Quantum Espresso (QE) is reported as an integrated suite of open source computer codes for electronic-structure calculations and materials modeling at the nano-scale. It is based on DFT, the use of a plane-waves basis and a pseudopotential approach. Since the QE 5.0 version, it has been implemented as a plug-in component for standard QE packages that allows exploiting the capabilities of Nvidia GPU graphic cards (www.qe-forge.org/gf/proj). In this study, we have examined the impact of the usage of GPU acceleration and parallelization on the numerical performance of DFT calculations. Graphene has been attracting attention worldwide and has already shown some remarkable properties. We have studied an intercalated graphene, using the QE package PHonon, which employs GPU. The term ‘intercalation’ refers to a process whereby foreign adatoms are inserted onto a graphene lattice. In addition, by intercalating different atoms between graphene layers, it is possible to tune their physical properties. Our experiments have shown there are benefits from using GPUs, and we reached an acceleration of several times compared to standard CPU calculations.
[Automatic tracing of conversion scales from conventional units to the SI system of units].

PubMed

Besozzi, M; Bianchi, P; Agrifoglio, L

1988-01-01

American medical journals, as the Journal of the American Medical Association (JAMA), and the American Journal of Clinical Pathology (AJCP), the Journal of the American Society of Clinical Pathologists (ASCP), are shifting to selected SI (Système International d'Unités) units for reporting measurements. Further discussion by the AMA, the ASCP and other organizations is required before consensus in the US medical community can be reached as to the extent of and time frame for conversion to SI for reporting clinical laboratory measurements: however this decision will certainly greatly speed up the process of conversion in European countries too. Transition to SI units will require the use of different reference ranges, and there will be a potential for serious misinterpretation of laboratory data unless well-planned educational programs are instituted before the change. A simple program written in Microsoft Basic for automatically tracing on one's personal computer (PC) monitor a dual scale, in the conventional and in the SI system of units, is presented here. The program may be easily implemented and run on every PC operating under MS-DOS, equipped with a CGA or an AT&T6300 graphic card: through the operating system the scales may also be printed on a dot-matrix graphic printer. We believe that this, and other tools of this kind, will be useful in the thorough educational process of those reading the reports, and will be an important factor in the success of conversion to SI reporting.
Use of general purpose graphics processing units with MODFLOW

USGS Publications Warehouse

Hughes, Joseph D.; White, Jeremy T.

2013-01-01

To evaluate the use of general-purpose graphics processing units (GPGPUs) to improve the performance of MODFLOW, an unstructured preconditioned conjugate gradient (UPCG) solver has been developed. The UPCG solver uses a compressed sparse row storage scheme and includes Jacobi, zero fill-in incomplete, and modified-incomplete lower-upper (LU) factorization, and generalized least-squares polynomial preconditioners. The UPCG solver also includes options for sequential and parallel solution on the central processing unit (CPU) using OpenMP. For simulations utilizing the GPGPU, all basic linear algebra operations are performed on the GPGPU; memory copies between the central processing unit CPU and GPCPU occur prior to the first iteration of the UPCG solver and after satisfying head and flow criteria or exceeding a maximum number of iterations. The efficiency of the UPCG solver for GPGPU and CPU solutions is benchmarked using simulations of a synthetic, heterogeneous unconfined aquifer with tens of thousands to millions of active grid cells. Testing indicates GPGPU speedups on the order of 2 to 8, relative to the standard MODFLOW preconditioned conjugate gradient (PCG) solver, can be achieved when (1) memory copies between the CPU and GPGPU are optimized, (2) the percentage of time performing memory copies between the CPU and GPGPU is small relative to the calculation time, (3) high-performance GPGPU cards are utilized, and (4) CPU-GPGPU combinations are used to execute sequential operations that are difficult to parallelize. Furthermore, UPCG solver testing indicates GPGPU speedups exceed parallel CPU speedups achieved using OpenMP on multicore CPUs for preconditioners that can be easily parallelized.
Programming the Navier-Stokes computer: An abstract machine model and a visual editor

NASA Technical Reports Server (NTRS)

Middleton, David; Crockett, Tom; Tomboulian, Sherry

1988-01-01

The Navier-Stokes computer is a parallel computer designed to solve Computational Fluid Dynamics problems. Each processor contains several floating point units which can be configured under program control to implement a vector pipeline with several inputs and outputs. Since the development of an effective compiler for this computer appears to be very difficult, machine level programming seems necessary and support tools for this process have been studied. These support tools are organized into a graphical program editor. A programming process is described by which appropriate computations may be efficiently implemented on the Navier-Stokes computer. The graphical editor would support this programming process, verifying various programmer choices for correctness and deducing values such as pipeline delays and network configurations. Step by step details are provided and demonstrated with two example programs.
Real-time acquisition and display of flow contrast using speckle variance optical coherence tomography in a graphics processing unit.

PubMed

Xu, Jing; Wong, Kevin; Jian, Yifan; Sarunic, Marinko V

2014-02-01

In this report, we describe a graphics processing unit (GPU)-accelerated processing platform for real-time acquisition and display of flow contrast images with Fourier domain optical coherence tomography (FDOCT) in mouse and human eyes in vivo. Motion contrast from blood flow is processed using the speckle variance OCT (svOCT) technique, which relies on the acquisition of multiple B-scan frames at the same location and tracking the change of the speckle pattern. Real-time mouse and human retinal imaging using two different custom-built OCT systems with processing and display performed on GPU are presented with an in-depth analysis of performance metrics. The display output included structural OCT data, en face projections of the intensity data, and the svOCT en face projections of retinal microvasculature; these results compare projections with and without speckle variance in the different retinal layers to reveal significant contrast improvements. As a demonstration, videos of real-time svOCT for in vivo human and mouse retinal imaging are included in our results. The capability of performing real-time svOCT imaging of the retinal vasculature may be a useful tool in a clinical environment for monitoring disease-related pathological changes in the microcirculation such as diabetic retinopathy.
The PLATO IV Communications System.

ERIC Educational Resources Information Center

Sherwood, Bruce Arne; Stifle, Jack

The PLATO IV computer-based educational system contains its own communications hardware and software for operating plasma-panel graphics terminals. Key echoing is performed by the central processing unit: every key pressed at a terminal passes through the entire system before anything appears on the terminal's screen. Each terminal is guaranteed…
Fast direct fourier reconstruction of radial and PROPELLER MRI data using the chirp transform algorithm on graphics hardware.

PubMed

Feng, Yanqiu; Song, Yanli; Wang, Cong; Xin, Xuegang; Feng, Qianjin; Chen, Wufan

2013-10-01

To develop and test a new algorithm for fast direct Fourier transform (DrFT) reconstruction of MR data on non-Cartesian trajectories composed of lines with equally spaced points. The DrFT, which is normally used as a reference in evaluating the accuracy of other reconstruction methods, can reconstruct images directly from non-Cartesian MR data without interpolation. However, DrFT reconstruction involves substantially intensive computation, which makes the DrFT impractical for clinical routine applications. In this article, the Chirp transform algorithm was introduced to accelerate the DrFT reconstruction of radial and Periodically Rotated Overlapping ParallEL Lines with Enhanced Reconstruction (PROPELLER) MRI data located on the trajectories that are composed of lines with equally spaced points. The performance of the proposed Chirp transform algorithm-DrFT algorithm was evaluated by using simulation and in vivo MRI data. After implementing the algorithm on a graphics processing unit, the proposed Chirp transform algorithm-DrFT algorithm achieved an acceleration of approximately one order of magnitude, and the speed-up factor was further increased to approximately three orders of magnitude compared with the traditional single-thread DrFT reconstruction. Implementation the Chirp transform algorithm-DrFT algorithm on the graphics processing unit can efficiently calculate the DrFT reconstruction of the radial and PROPELLER MRI data. Copyright © 2012 Wiley Periodicals, Inc.
Smoldyn on graphics processing units: massively parallel Brownian dynamics simulations.

PubMed

Dematté, Lorenzo

2012-01-01

Space is a very important aspect in the simulation of biochemical systems; recently, the need for simulation algorithms able to cope with space is becoming more and more compelling. Complex and detailed models of biochemical systems need to deal with the movement of single molecules and particles, taking into consideration localized fluctuations, transportation phenomena, and diffusion. A common drawback of spatial models lies in their complexity: models can become very large, and their simulation could be time consuming, especially if we want to capture the systems behavior in a reliable way using stochastic methods in conjunction with a high spatial resolution. In order to deliver the promise done by systems biology to be able to understand a system as whole, we need to scale up the size of models we are able to simulate, moving from sequential to parallel simulation algorithms. In this paper, we analyze Smoldyn, a widely diffused algorithm for stochastic simulation of chemical reactions with spatial resolution and single molecule detail, and we propose an alternative, innovative implementation that exploits the parallelism of Graphics Processing Units (GPUs). The implementation executes the most computational demanding steps (computation of diffusion, unimolecular, and bimolecular reaction, as well as the most common cases of molecule-surface interaction) on the GPU, computing them in parallel on each molecule of the system. The implementation offers good speed-ups and real time, high quality graphics output
Distributed GPU Computing in GIScience

NASA Astrophysics Data System (ADS)

Jiang, Y.; Yang, C.; Huang, Q.; Li, J.; Sun, M.

2013-12-01

Geoscientists strived to discover potential principles and patterns hidden inside ever-growing Big Data for scientific discoveries. To better achieve this objective, more capable computing resources are required to process, analyze and visualize Big Data (Ferreira et al., 2003; Li et al., 2013). Current CPU-based computing techniques cannot promptly meet the computing challenges caused by increasing amount of datasets from different domains, such as social media, earth observation, environmental sensing (Li et al., 2013). Meanwhile CPU-based computing resources structured as cluster or supercomputer is costly. In the past several years with GPU-based technology matured in both the capability and performance, GPU-based computing has emerged as a new computing paradigm. Compare to traditional computing microprocessor, the modern GPU, as a compelling alternative microprocessor, has outstanding high parallel processing capability with cost-effectiveness and efficiency(Owens et al., 2008), although it is initially designed for graphical rendering in visualization pipe. This presentation reports a distributed GPU computing framework for integrating GPU-based computing within distributed environment. Within this framework, 1) for each single computer, computing resources of both GPU-based and CPU-based can be fully utilized to improve the performance of visualizing and processing Big Data; 2) within a network environment, a variety of computers can be used to build up a virtual super computer to support CPU-based and GPU-based computing in distributed computing environment; 3) GPUs, as a specific graphic targeted device, are used to greatly improve the rendering efficiency in distributed geo-visualization, especially for 3D/4D visualization. Key words: Geovisualization, GIScience, Spatiotemporal Studies Reference : 1. Ferreira de Oliveira, M. C., & Levkowitz, H. (2003). From visual data exploration to visual data mining: A survey. Visualization and Computer Graphics, IEEE Transactions on, 9(3), 378-394. 2. Li, J., Jiang, Y., Yang, C., Huang, Q., & Rice, M. (2013). Visualizing 3D/4D Environmental Data Using Many-core Graphics Processing Units (GPUs) and Multi-core Central Processing Units (CPUs). Computers & Geosciences, 59(9), 78-89. 3. Owens, J. D., Houston, M., Luebke, D., Green, S., Stone, J. E., & Phillips, J. C. (2008). GPU computing. Proceedings of the IEEE, 96(5), 879-899.
SIMD Optimization of Linear Expressions for Programmable Graphics Hardware

PubMed Central

Bajaj, Chandrajit; Ihm, Insung; Min, Jungki; Oh, Jinsang

2009-01-01

The increased programmability of graphics hardware allows efficient graphical processing unit (GPU) implementations of a wide range of general computations on commodity PCs. An important factor in such implementations is how to fully exploit the SIMD computing capacities offered by modern graphics processors. Linear expressions in the form of ȳ = Ax̄ + b̄, where A is a matrix, and x̄, ȳ and b̄ are vectors, constitute one of the most basic operations in many scientific computations. In this paper, we propose a SIMD code optimization technique that enables efficient shader codes to be generated for evaluating linear expressions. It is shown that performance can be improved considerably by efficiently packing arithmetic operations into four-wide SIMD instructions through reordering of the operations in linear expressions. We demonstrate that the presented technique can be used effectively for programming both vertex and pixel shaders for a variety of mathematical applications, including integrating differential equations and solving a sparse linear system of equations using iterative methods. PMID:19946569
Simulation in Metallurgical Processing: Recent Developments and Future Perspectives

NASA Astrophysics Data System (ADS)

Ludwig, Andreas; Wu, Menghuai; Kharicha, Abdellah

2016-08-01

This article briefly addresses the most important topics concerning numerical simulation of metallurgical processes, namely, multiphase issues (particle and bubble motion and flotation/sedimentation of equiaxed crystals during solidification), multiphysics issues (electromagnetic stirring, electro-slag remelting, Cu-electro-refining, fluid-structure interaction, and mushy zone deformation), process simulations on graphical processing units, integrated computational materials engineering, and automatic optimization via simulation. The present state-of-the-art as well as requirements for future developments are presented and briefly discussed.
Mapping the Information Trace in Local Field Potentials by a Computational Method of Two-Dimensional Time-Shifting Synchronization Likelihood Based on Graphic Processing Unit Acceleration.

PubMed

Zhao, Zi-Fang; Li, Xue-Zhu; Wan, You

2017-12-01

The local field potential (LFP) is a signal reflecting the electrical activity of neurons surrounding the electrode tip. Synchronization between LFP signals provides important details about how neural networks are organized. Synchronization between two distant brain regions is hard to detect using linear synchronization algorithms like correlation and coherence. Synchronization likelihood (SL) is a non-linear synchronization-detecting algorithm widely used in studies of neural signals from two distant brain areas. One drawback of non-linear algorithms is the heavy computational burden. In the present study, we proposed a graphic processing unit (GPU)-accelerated implementation of an SL algorithm with optional 2-dimensional time-shifting. We tested the algorithm with both artificial data and raw LFP data. The results showed that this method revealed detailed information from original data with the synchronization values of two temporal axes, delay time and onset time, and thus can be used to reconstruct the temporal structure of a neural network. Our results suggest that this GPU-accelerated method can be extended to other algorithms for processing time-series signals (like EEG and fMRI) using similar recording techniques.
GPU computing in medical physics: a review.

PubMed

Pratx, Guillem; Xing, Lei

2011-05-01

The graphics processing unit (GPU) has emerged as a competitive platform for computing massively parallel problems. Many computing applications in medical physics can be formulated as data-parallel tasks that exploit the capabilities of the GPU for reducing processing times. The authors review the basic principles of GPU computing as well as the main performance optimization techniques, and survey existing applications in three areas of medical physics, namely image reconstruction, dose calculation and treatment plan optimization, and image processing.
Real time 3D structural and Doppler OCT imaging on graphics processing units

NASA Astrophysics Data System (ADS)

Sylwestrzak, Marcin; Szlag, Daniel; Szkulmowski, Maciej; Gorczyńska, Iwona; Bukowska, Danuta; Wojtkowski, Maciej; Targowski, Piotr

2013-03-01

In this report the application of graphics processing unit (GPU) programming for real-time 3D Fourier domain Optical Coherence Tomography (FdOCT) imaging with implementation of Doppler algorithms for visualization of the flows in capillary vessels is presented. Generally, the time of the data processing of the FdOCT data on the main processor of the computer (CPU) constitute a main limitation for real-time imaging. Employing additional algorithms, such as Doppler OCT analysis, makes this processing even more time consuming. Lately developed GPUs, which offers a very high computational power, give a solution to this problem. Taking advantages of them for massively parallel data processing, allow for real-time imaging in FdOCT. The presented software for structural and Doppler OCT allow for the whole processing with visualization of 2D data consisting of 2000 A-scans generated from 2048 pixels spectra with frame rate about 120 fps. The 3D imaging in the same mode of the volume data build of 220 × 100 A-scans is performed at a rate of about 8 frames per second. In this paper a software architecture, organization of the threads and optimization applied is shown. For illustration the screen shots recorded during real time imaging of the phantom (homogeneous water solution of Intralipid in glass capillary) and the human eye in-vivo is presented.

High-throughput sequence alignment using Graphics Processing Units

PubMed Central

Schatz, Michael C; Trapnell, Cole; Delcher, Arthur L; Varshney, Amitabh

2007-01-01

Background The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. Results This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs) in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA) from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies. Conclusion MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU. PMID:18070356
Acceleration of Linear Finite-Difference Poisson-Boltzmann Methods on Graphics Processing Units.

PubMed

Qi, Ruxi; Botello-Smith, Wesley M; Luo, Ray

2017-07-11

Electrostatic interactions play crucial roles in biophysical processes such as protein folding and molecular recognition. Poisson-Boltzmann equation (PBE)-based models have emerged as widely used in modeling these important processes. Though great efforts have been put into developing efficient PBE numerical models, challenges still remain due to the high dimensionality of typical biomolecular systems. In this study, we implemented and analyzed commonly used linear PBE solvers for the ever-improving graphics processing units (GPU) for biomolecular simulations, including both standard and preconditioned conjugate gradient (CG) solvers with several alternative preconditioners. Our implementation utilizes the standard Nvidia CUDA libraries cuSPARSE, cuBLAS, and CUSP. Extensive tests show that good numerical accuracy can be achieved given that the single precision is often used for numerical applications on GPU platforms. The optimal GPU performance was observed with the Jacobi-preconditioned CG solver, with a significant speedup over standard CG solver on CPU in our diversified test cases. Our analysis further shows that different matrix storage formats also considerably affect the efficiency of different linear PBE solvers on GPU, with the diagonal format best suited for our standard finite-difference linear systems. Further efficiency may be possible with matrix-free operations and integrated grid stencil setup specifically tailored for the banded matrices in PBE-specific linear systems.
Helping Students Bridge Inferences in Science Texts Using Graphic Organizers

ERIC Educational Resources Information Center

Roman, Diego; Jones, Francesca; Basaraba, Deni; Hironaka, Stephanie

2016-01-01

The difficulties that students face when reading science texts go beyond understanding vocabulary and syntactic structures. Comprehension of science texts requires students to infer how these texts function as a unit to communicate scientific meaning. To help students in this process, science texts sometimes employ logical connectives (e.g.,…
Numerical Integration with Graphical Processing Unit for QKD Simulation

DTIC Science & Technology

2014-03-27

Windows system application programming interface (API) timer. The problem sizes studied produce speedups greater than 60x on the NVIDIA Tesla C2075...13 2.3.3 CUDA API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3.4 CUDA and NVIDIA GPU Hardware...Theoretical Floating-Point Operations per Second for Intel CPUs and NVIDIA GPUs [3
Utilizing Graphics Processing Units for Network Anomaly Detection

DTIC Science & Technology

2012-09-13

pages 1266–1271, 2003. [Nis12a] Ste↵en Nissen. Fann datatypes - activation function enum, 2012. http://leenissen.dk/fann/html/files/fann data-h.html# fann...activationfunc enum. Last accessed: 7 Aug 2012. [Nis12b] Ste↵en Nissen. Fann datatypes - train enum, 2012. http://leenissen.dk/fann/html/files/fann
Systems and methods for interactive virtual reality process control and simulation

DOEpatents

Daniel, Jr., William E.; Whitney, Michael A.

2001-01-01

A system for visualizing, controlling and managing information includes a data analysis unit for interpreting and classifying raw data using analytical techniques. A data flow coordination unit routes data from its source to other components within the system. A data preparation unit handles the graphical preparation of the data and a data rendering unit presents the data in a three-dimensional interactive environment where the user can observe, interact with, and interpret the data. A user can view the information on various levels, from a high overall process level view, to a view illustrating linkage between variables, to view the hard data itself, or to view results of an analysis of the data. The system allows a user to monitor a physical process in real-time and further allows the user to manage and control the information in a manner not previously possible.
permGPU: Using graphics processing units in RNA microarray association studies.

PubMed

Shterev, Ivo D; Jung, Sin-Ho; George, Stephen L; Owzar, Kouros

2010-06-16

Many analyses of microarray association studies involve permutation, bootstrap resampling and cross-validation, that are ideally formulated as embarrassingly parallel computing problems. Given that these analyses are computationally intensive, scalable approaches that can take advantage of multi-core processor systems need to be developed. We have developed a CUDA based implementation, permGPU, that employs graphics processing units in microarray association studies. We illustrate the performance and applicability of permGPU within the context of permutation resampling for a number of test statistics. An extensive simulation study demonstrates a dramatic increase in performance when using permGPU on an NVIDIA GTX 280 card compared to an optimized C/C++ solution running on a conventional Linux server. permGPU is available as an open-source stand-alone application and as an extension package for the R statistical environment. It provides a dramatic increase in performance for permutation resampling analysis in the context of microarray association studies. The current version offers six test statistics for carrying out permutation resampling analyses for binary, quantitative and censored time-to-event traits.
Communication: A reduced scaling J-engine based reformulation of SOS-MP2 using graphics processing units

DOE Office of Scientific and Technical Information (OSTI.GOV)

Maurer, S. A.; Kussmann, J.; Ochsenfeld, C., E-mail: Christian.Ochsenfeld@cup.uni-muenchen.de

2014-08-07

We present a low-prefactor, cubically scaling scaled-opposite-spin second-order Møller-Plesset perturbation theory (SOS-MP2) method which is highly suitable for massively parallel architectures like graphics processing units (GPU). The scaling is reduced from O(N{sup 5}) to O(N{sup 3}) by a reformulation of the MP2-expression in the atomic orbital basis via Laplace transformation and the resolution-of-the-identity (RI) approximation of the integrals in combination with efficient sparse algebra for the 3-center integral transformation. In contrast to previous works that employ GPUs for post Hartree-Fock calculations, we do not simply employ GPU-based linear algebra libraries to accelerate the conventional algorithm. Instead, our reformulation allows tomore » replace the rate-determining contraction step with a modified J-engine algorithm, that has been proven to be highly efficient on GPUs. Thus, our SOS-MP2 scheme enables us to treat large molecular systems in an accurate and efficient manner on a single GPU-server.« less
Accelerating 3D Hall MHD Magnetosphere Simulations with Graphics Processing Units

NASA Astrophysics Data System (ADS)

Bard, C.; Dorelli, J.

2017-12-01

The resolution required to simulate planetary magnetospheres with Hall magnetohydrodynamics result in program sizes approaching several hundred million grid cells. These would take years to run on a single computational core and require hundreds or thousands of computational cores to complete in a reasonable time. However, this requires access to the largest supercomputers. Graphics processing units (GPUs) provide a viable alternative: one GPU can do the work of roughly 100 cores, bringing Hall MHD simulations of Ganymede within reach of modest GPU clusters ( 8 GPUs). We report our progress in developing a GPU-accelerated, three-dimensional Hall magnetohydrodynamic code and present Hall MHD simulation results for both Ganymede (run on 8 GPUs) and Mercury (56 GPUs). We benchmark our Ganymede simulation with previous results for the Galileo G8 flyby, namely that adding the Hall term to ideal MHD simulations changes the global convection pattern within the magnetosphere. Additionally, we present new results for the G1 flyby as well as initial results from Hall MHD simulations of Mercury and compare them with the corresponding ideal MHD runs.
Real-time electroholography using a multiple-graphics processing unit cluster system with a single spatial light modulator and the InfiniBand network

NASA Astrophysics Data System (ADS)

Niwase, Hiroaki; Takada, Naoki; Araki, Hiromitsu; Maeda, Yuki; Fujiwara, Masato; Nakayama, Hirotaka; Kakue, Takashi; Shimobaba, Tomoyoshi; Ito, Tomoyoshi

2016-09-01

Parallel calculations of large-pixel-count computer-generated holograms (CGHs) are suitable for multiple-graphics processing unit (multi-GPU) cluster systems. However, it is not easy for a multi-GPU cluster system to accomplish fast CGH calculations when CGH transfers between PCs are required. In these cases, the CGH transfer between the PCs becomes a bottleneck. Usually, this problem occurs only in multi-GPU cluster systems with a single spatial light modulator. To overcome this problem, we propose a simple method using the InfiniBand network. The computational speed of the proposed method using 13 GPUs (NVIDIA GeForce GTX TITAN X) was more than 3000 times faster than that of a CPU (Intel Core i7 4770) when the number of three-dimensional (3-D) object points exceeded 20,480. In practice, we achieved ˜40 tera floating point operations per second (TFLOPS) when the number of 3-D object points exceeded 40,960. Our proposed method was able to reconstruct a real-time movie of a 3-D object comprising 95,949 points.
Three-directional motion-compensation mask-based novel look-up table on graphics processing units for video-rate generation of digital holographic videos of three-dimensional scenes.

PubMed

Kwon, Min-Woo; Kim, Seung-Cheol; Kim, Eun-Soo

2016-01-20

A three-directional motion-compensation mask-based novel look-up table method is proposed and implemented on graphics processing units (GPUs) for video-rate generation of digital holographic videos of three-dimensional (3D) scenes. Since the proposed method is designed to be well matched with the software and memory structures of GPUs, the number of compute-unified-device-architecture kernel function calls can be significantly reduced. This results in a great increase of the computational speed of the proposed method, allowing video-rate generation of the computer-generated hologram (CGH) patterns of 3D scenes. Experimental results reveal that the proposed method can generate 39.8 frames of Fresnel CGH patterns with 1920×1080 pixels per second for the test 3D video scenario with 12,088 object points on dual GPU boards of NVIDIA GTX TITANs, and they confirm the feasibility of the proposed method in the practical application fields of electroholographic 3D displays.
Acceleration of High Angular Momentum Electron Repulsion Integrals and Integral Derivatives on Graphics Processing Units.

PubMed

Miao, Yipu; Merz, Kenneth M

2015-04-14

We present an efficient implementation of ab initio self-consistent field (SCF) energy and gradient calculations that run on Compute Unified Device Architecture (CUDA) enabled graphical processing units (GPUs) using recurrence relations. We first discuss the machine-generated code that calculates the electron-repulsion integrals (ERIs) for different ERI types. Next we describe the porting of the SCF gradient calculation to GPUs, which results in an acceleration of the computation of the first-order derivative of the ERIs. However, only s, p, and d ERIs and s and p derivatives could be executed simultaneously on GPUs using the current version of CUDA and generation of NVidia GPUs using a previously described algorithm [Miao and Merz J. Chem. Theory Comput. 2013, 9, 965-976.]. Hence, we developed an algorithm to compute f type ERIs and d type ERI derivatives on GPUs. Our benchmarks shows the performance GPU enable ERI and ERI derivative computation yielded speedups of 10-18 times relative to traditional CPU execution. An accuracy analysis using double-precision calculations demonstrates that the overall accuracy is satisfactory for most applications.
Three-dimensional computer visualization of forensic pathology data.

PubMed

March, Jack; Schofield, Damian; Evison, Martin; Woodford, Noel

2004-03-01

Despite a decade of use in US courtrooms, it is only recently that forensic computer animations have become an increasingly important form of communication in legal spheres within the United Kingdom. Aims Research at the University of Nottingham has been influential in the critical investigation of forensic computer graphics reconstruction methodologies and techniques and in raising the profile of this novel form of data visualization within the United Kingdom. The case study presented demonstrates research undertaken by Aims Research and the Department of Forensic Pathology at the University of Sheffield, which aims to apply, evaluate, and develop novel 3-dimensional computer graphics (CG) visualization and virtual reality (VR) techniques in the presentation and investigation of forensic information concerning the human body. The inclusion of such visualizations within other CG or VR environments may ultimately provide the potential for alternative exploratory directions, processes, and results within forensic pathology investigations.
GPU Accelerated Ultrasonic Tomography Using Propagation and Back Propagation Method

DTIC Science & Technology

2015-09-28

the medical imaging field using GPUs has been done for many years. In [1], Copeland et al. used 2D images , obtained by X - ray projections, to...Index Terms— Medical Imaging , Ultrasonic Tomography, GPU, CUDA, Parallel Computing I. INTRODUCTION GRAPHIC Processing Units (GPUs) are computation... Imaging Algorithm The process of reconstructing images from ultrasonic infor- mation starts with the following acoustical wave equation: ∂2 ∂t2 u ( x
A survey of GPU-based medical image computing techniques

PubMed Central

Shi, Lin; Liu, Wen; Zhang, Heye; Xie, Yongming

2012-01-01

Medical imaging currently plays a crucial role throughout the entire clinical applications from medical scientific research to diagnostics and treatment planning. However, medical imaging procedures are often computationally demanding due to the large three-dimensional (3D) medical datasets to process in practical clinical applications. With the rapidly enhancing performances of graphics processors, improved programming support, and excellent price-to-performance ratio, the graphics processing unit (GPU) has emerged as a competitive parallel computing platform for computationally expensive and demanding tasks in a wide range of medical image applications. The major purpose of this survey is to provide a comprehensive reference source for the starters or researchers involved in GPU-based medical image processing. Within this survey, the continuous advancement of GPU computing is reviewed and the existing traditional applications in three areas of medical image processing, namely, segmentation, registration and visualization, are surveyed. The potential advantages and associated challenges of current GPU-based medical imaging are also discussed to inspire future applications in medicine. PMID:23256080
Compiling and editing agricultural strata boundaries with remotely sensed imagery and map attribute data using graphics workstations

NASA Technical Reports Server (NTRS)

Cheng, Thomas D.; Angelici, Gary L.; Slye, Robert E.; Ma, Matt

1991-01-01

The USDA presently uses labor-intensive photographic interpretation procedures to delineate large geographical areas into manageable size sampling units for the estimation of domestic crop and livestock production. Computer software to automate the boundary delineation procedure, called the computer-assisted stratification and sampling (CASS) system, was developed using a Hewlett Packard color-graphics workstation. The CASS procedures display Thematic Mapper (TM) satellite digital imagery on a graphics display workstation as the backdrop for the onscreen delineation of sampling units. USGS Digital Line Graph (DLG) data for roads and waterways are displayed over the TM imagery to aid in identifying potential sample unit boundaries. Initial analysis conducted with three Missouri counties indicated that CASS was six times faster than the manual techniques in delineating sampling units.
Scalable and massively parallel Monte Carlo photon transport simulations for heterogeneous computing platforms

NASA Astrophysics Data System (ADS)

Yu, Leiming; Nina-Paravecino, Fanny; Kaeli, David; Fang, Qianqian

2018-01-01

We present a highly scalable Monte Carlo (MC) three-dimensional photon transport simulation platform designed for heterogeneous computing systems. Through the development of a massively parallel MC algorithm using the Open Computing Language framework, this research extends our existing graphics processing unit (GPU)-accelerated MC technique to a highly scalable vendor-independent heterogeneous computing environment, achieving significantly improved performance and software portability. A number of parallel computing techniques are investigated to achieve portable performance over a wide range of computing hardware. Furthermore, multiple thread-level and device-level load-balancing strategies are developed to obtain efficient simulations using multiple central processing units and GPUs.
Micromagnetics on high-performance workstation and mobile computational platforms

NASA Astrophysics Data System (ADS)

Fu, S.; Chang, R.; Couture, S.; Menarini, M.; Escobar, M. A.; Kuteifan, M.; Lubarda, M.; Gabay, D.; Lomakin, V.

2015-05-01

The feasibility of using high-performance desktop and embedded mobile computational platforms is presented, including multi-core Intel central processing unit, Nvidia desktop graphics processing units, and Nvidia Jetson TK1 Platform. FastMag finite element method-based micromagnetic simulator is used as a testbed, showing high efficiency on all the platforms. Optimization aspects of improving the performance of the mobile systems are discussed. The high performance, low cost, low power consumption, and rapid performance increase of the embedded mobile systems make them a promising candidate for micromagnetic simulations. Such architectures can be used as standalone systems or can be built as low-power computing clusters.
Using VirtualGL/TurboVNC Software on the Peregrine System |

Science.gov Websites

High-Performance Computing | NREL VirtualGL/TurboVNC Software on the Peregrine System Using , allowing users to access and share large-memory visualization nodes with high-end graphics processing units may be better than just using X11 forwarding when connecting from a remote site with low bandwidth and
Solving Navier-Stokes' equation using Castillo-Grone's mimetic difference operators on GPUs

NASA Astrophysics Data System (ADS)

Abouali, Mohammad; Castillo, Jose

2012-11-01

This paper discusses the performance and the accuracy of Castillo-Grone's (CG) mimetic difference operator in solving the Navier-Stokes' equation in order to simulate oceanic and atmospheric flows. The implementation is further adapted to harness the power of the many computing cores available on the Graphics Processing Units (GPUs) and the speedup is discussed.

Multidisciplinary Simulation Acceleration using Multiple Shared-Memory Graphical Processing Units

NASA Astrophysics Data System (ADS)

Kemal, Jonathan Yashar

For purposes of optimizing and analyzing turbomachinery and other designs, the unsteady Favre-averaged flow-field differential equations for an ideal compressible gas can be solved in conjunction with the heat conduction equation. We solve all equations using the finite-volume multiple-grid numerical technique, with the dual time-step scheme used for unsteady simulations. Our numerical solver code targets CUDA-capable Graphical Processing Units (GPUs) produced by NVIDIA. Making use of MPI, our solver can run across networked compute notes, where each MPI process can use either a GPU or a Central Processing Unit (CPU) core for primary solver calculations. We use NVIDIA Tesla C2050/C2070 GPUs based on the Fermi architecture, and compare our resulting performance against Intel Zeon X5690 CPUs. Solver routines converted to CUDA typically run about 10 times faster on a GPU for sufficiently dense computational grids. We used a conjugate cylinder computational grid and ran a turbulent steady flow simulation using 4 increasingly dense computational grids. Our densest computational grid is divided into 13 blocks each containing 1033x1033 grid points, for a total of 13.87 million grid points or 1.07 million grid points per domain block. To obtain overall speedups, we compare the execution time of the solver's iteration loop, including all resource intensive GPU-related memory copies. Comparing the performance of 8 GPUs to that of 8 CPUs, we obtain an overall speedup of about 6.0 when using our densest computational grid. This amounts to an 8-GPU simulation running about 39.5 times faster than running than a single-CPU simulation.
Color graphics, interactive processing, and the supercomputer

NASA Technical Reports Server (NTRS)

Smith-Taylor, Rudeen

1987-01-01

The development of a common graphics environment for the NASA Langley Research Center user community and the integration of a supercomputer into this environment is examined. The initial computer hardware, the software graphics packages, and their configurations are described. The addition of improved computer graphics capability to the supercomputer, and the utilization of the graphic software and hardware are discussed. Consideration is given to the interactive processing system which supports the computer in an interactive debugging, processing, and graphics environment.
Graphic Arts: Book One. Orientation, Composition, and Paste-up.

ERIC Educational Resources Information Center

Farajollahi, Karim; And Others

The first of a three-volume set of instructional materials for a graphic arts course, this manual consists of 13 instructional units. Covered in the units are orientation (career overview, shop safety, shop organization, photo-offset theory, legal restrictions, and applying for a job); principles of copy planning (overview of copy planning and…
Forward and adjoint spectral-element simulations of seismic wave propagation using hardware accelerators

NASA Astrophysics Data System (ADS)

Peter, Daniel; Videau, Brice; Pouget, Kevin; Komatitsch, Dimitri

2015-04-01

Improving the resolution of tomographic images is crucial to answer important questions on the nature of Earth's subsurface structure and internal processes. Seismic tomography is the most prominent approach where seismic signals from ground-motion records are used to infer physical properties of internal structures such as compressional- and shear-wave speeds, anisotropy and attenuation. Recent advances in regional- and global-scale seismic inversions move towards full-waveform inversions which require accurate simulations of seismic wave propagation in complex 3D media, providing access to the full 3D seismic wavefields. However, these numerical simulations are computationally very expensive and need high-performance computing (HPC) facilities for further improving the current state of knowledge. During recent years, many-core architectures such as graphics processing units (GPUs) have been added to available large HPC systems. Such GPU-accelerated computing together with advances in multi-core central processing units (CPUs) can greatly accelerate scientific applications. There are mainly two possible choices of language support for GPU cards, the CUDA programming environment and OpenCL language standard. CUDA software development targets NVIDIA graphic cards while OpenCL was adopted mainly by AMD graphic cards. In order to employ such hardware accelerators for seismic wave propagation simulations, we incorporated a code generation tool BOAST into an existing spectral-element code package SPECFEM3D_GLOBE. This allows us to use meta-programming of computational kernels and generate optimized source code for both CUDA and OpenCL languages, running simulations on either CUDA or OpenCL hardware accelerators. We show here applications of forward and adjoint seismic wave propagation on CUDA/OpenCL GPUs, validating results and comparing performances for different simulations and hardware usages.
A New Parallel Approach for Accelerating the GPU-Based Execution of Edge Detection Algorithms

PubMed Central

Emrani, Zahra; Bateni, Soroosh; Rabbani, Hossein

2017-01-01

Real-time image processing is used in a wide variety of applications like those in medical care and industrial processes. This technique in medical care has the ability to display important patient information graphi graphically, which can supplement and help the treatment process. Medical decisions made based on real-time images are more accurate and reliable. According to the recent researches, graphic processing unit (GPU) programming is a useful method for improving the speed and quality of medical image processing and is one of the ways of real-time image processing. Edge detection is an early stage in most of the image processing methods for the extraction of features and object segments from a raw image. The Canny method, Sobel and Prewitt filters, and the Roberts’ Cross technique are some examples of edge detection algorithms that are widely used in image processing and machine vision. In this work, these algorithms are implemented using the Compute Unified Device Architecture (CUDA), Open Source Computer Vision (OpenCV), and Matrix Laboratory (MATLAB) platforms. An existing parallel method for Canny approach has been modified further to run in a fully parallel manner. This has been achieved by replacing the breadth- first search procedure with a parallel method. These algorithms have been compared by testing them on a database of optical coherence tomography images. The comparison of results shows that the proposed implementation of the Canny method on GPU using the CUDA platform improves the speed of execution by 2–100× compared to the central processing unit-based implementation using the OpenCV and MATLAB platforms. PMID:28487831
A New Parallel Approach for Accelerating the GPU-Based Execution of Edge Detection Algorithms.

PubMed

Emrani, Zahra; Bateni, Soroosh; Rabbani, Hossein

2017-01-01

Real-time image processing is used in a wide variety of applications like those in medical care and industrial processes. This technique in medical care has the ability to display important patient information graphi graphically, which can supplement and help the treatment process. Medical decisions made based on real-time images are more accurate and reliable. According to the recent researches, graphic processing unit (GPU) programming is a useful method for improving the speed and quality of medical image processing and is one of the ways of real-time image processing. Edge detection is an early stage in most of the image processing methods for the extraction of features and object segments from a raw image. The Canny method, Sobel and Prewitt filters, and the Roberts' Cross technique are some examples of edge detection algorithms that are widely used in image processing and machine vision. In this work, these algorithms are implemented using the Compute Unified Device Architecture (CUDA), Open Source Computer Vision (OpenCV), and Matrix Laboratory (MATLAB) platforms. An existing parallel method for Canny approach has been modified further to run in a fully parallel manner. This has been achieved by replacing the breadth- first search procedure with a parallel method. These algorithms have been compared by testing them on a database of optical coherence tomography images. The comparison of results shows that the proposed implementation of the Canny method on GPU using the CUDA platform improves the speed of execution by 2-100× compared to the central processing unit-based implementation using the OpenCV and MATLAB platforms.
OCTGRAV: Sparse Octree Gravitational N-body Code on Graphics Processing Units

NASA Astrophysics Data System (ADS)

Gaburov, Evghenii; Bédorf, Jeroen; Portegies Zwart, Simon

2010-10-01

Octgrav is a very fast tree-code which runs on massively parallel Graphical Processing Units (GPU) with NVIDIA CUDA architecture. The algorithms are based on parallel-scan and sort methods. The tree-construction and calculation of multipole moments is carried out on the host CPU, while the force calculation which consists of tree walks and evaluation of interaction list is carried out on the GPU. In this way, a sustained performance of about 100GFLOP/s and data transfer rates of about 50GB/s is achieved. It takes about a second to compute forces on a million particles with an opening angle of heta approx 0.5. To test the performance and feasibility, we implemented the algorithms in CUDA in the form of a gravitational tree-code which completely runs on the GPU. The tree construction and traverse algorithms are portable to many-core devices which have support for CUDA or OpenCL programming languages. The gravitational tree-code outperforms tuned CPU code during the tree-construction and shows a performance improvement of more than a factor 20 overall, resulting in a processing rate of more than 2.8 million particles per second. The code has a convenient user interface and is freely available for use.
GRay: A Massively Parallel GPU-based Code for Ray Tracing in Relativistic Spacetimes

NASA Astrophysics Data System (ADS)

Chan, Chi-kwan; Psaltis, Dimitrios; Özel, Feryal

2013-11-01

We introduce GRay, a massively parallel integrator designed to trace the trajectories of billions of photons in a curved spacetime. This graphics-processing-unit (GPU)-based integrator employs the stream processing paradigm, is implemented in CUDA C/C++, and runs on nVidia graphics cards. The peak performance of GRay using single-precision floating-point arithmetic on a single GPU exceeds 300 GFLOP (or 1 ns per photon per time step). For a realistic problem, where the peak performance cannot be reached, GRay is two orders of magnitude faster than existing central-processing-unit-based ray-tracing codes. This performance enhancement allows more effective searches of large parameter spaces when comparing theoretical predictions of images, spectra, and light curves from the vicinities of compact objects to observations. GRay can also perform on-the-fly ray tracing within general relativistic magnetohydrodynamic algorithms that simulate accretion flows around compact objects. Making use of this algorithm, we calculate the properties of the shadows of Kerr black holes and the photon rings that surround them. We also provide accurate fitting formulae of their dependencies on black hole spin and observer inclination, which can be used to interpret upcoming observations of the black holes at the center of the Milky Way, as well as M87, with the Event Horizon Telescope.
Spatial resolution recovery utilizing multi-ray tracing and graphic processing unit in PET image reconstruction.

PubMed

Liang, Yicheng; Peng, Hao

2015-02-07

Depth-of-interaction (DOI) poses a major challenge for a PET system to achieve uniform spatial resolution across the field-of-view, particularly for small animal and organ-dedicated PET systems. In this work, we implemented an analytical method to model system matrix for resolution recovery, which was then incorporated in PET image reconstruction on a graphical processing unit platform, due to its parallel processing capacity. The method utilizes the concepts of virtual DOI layers and multi-ray tracing to calculate the coincidence detection response function for a given line-of-response. The accuracy of the proposed method was validated for a small-bore PET insert to be used for simultaneous PET/MR breast imaging. In addition, the performance comparisons were studied among the following three cases: 1) no physical DOI and no resolution modeling; 2) two physical DOI layers and no resolution modeling; and 3) no physical DOI design but with a different number of virtual DOI layers. The image quality was quantitatively evaluated in terms of spatial resolution (full-width-half-maximum and position offset), contrast recovery coefficient and noise. The results indicate that the proposed method has the potential to be used as an alternative to other physical DOI designs and achieve comparable imaging performances, while reducing detector/system design cost and complexity.
Fast ray-tracing of human eye optics on Graphics Processing Units.

PubMed

Wei, Qi; Patkar, Saket; Pai, Dinesh K

2014-05-01

We present a new technique for simulating retinal image formation by tracing a large number of rays from objects in three dimensions as they pass through the optic apparatus of the eye to objects. Simulating human optics is useful for understanding basic questions of vision science and for studying vision defects and their corrections. Because of the complexity of computing such simulations accurately, most previous efforts used simplified analytical models of the normal eye. This makes them less effective in modeling vision disorders associated with abnormal shapes of the ocular structures which are hard to be precisely represented by analytical surfaces. We have developed a computer simulator that can simulate ocular structures of arbitrary shapes, for instance represented by polygon meshes. Topographic and geometric measurements of the cornea, lens, and retina from keratometer or medical imaging data can be integrated for individualized examination. We utilize parallel processing using modern Graphics Processing Units (GPUs) to efficiently compute retinal images by tracing millions of rays. A stable retinal image can be generated within minutes. We simulated depth-of-field, accommodation, chromatic aberrations, as well as astigmatism and correction. We also show application of the technique in patient specific vision correction by incorporating geometric models of the orbit reconstructed from clinical medical images. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Sauter, Nicholas K., E-mail: nksauter@lbl.gov; Hattne, Johan; Grosse-Kunstleve, Ralf W.

The Computational Crystallography Toolbox (cctbx) is a flexible software platform that has been used to develop high-throughput crystal-screening tools for both synchrotron sources and X-ray free-electron lasers. Plans for data-processing and visualization applications are discussed, and the benefits and limitations of using graphics-processing units are evaluated. Current pixel-array detectors produce diffraction images at extreme data rates (of up to 2 TB h{sup −1}) that make severe demands on computational resources. New multiprocessing frameworks are required to achieve rapid data analysis, as it is important to be able to inspect the data quickly in order to guide the experiment in realmore » time. By utilizing readily available web-serving tools that interact with the Python scripting language, it was possible to implement a high-throughput Bragg-spot analyzer (cctbx.spotfinder) that is presently in use at numerous synchrotron-radiation beamlines. Similarly, Python interoperability enabled the production of a new data-reduction package (cctbx.xfel) for serial femtosecond crystallography experiments at the Linac Coherent Light Source (LCLS). Future data-reduction efforts will need to focus on specialized problems such as the treatment of diffraction spots on interleaved lattices arising from multi-crystal specimens. In these challenging cases, accurate modeling of close-lying Bragg spots could benefit from the high-performance computing capabilities of graphics-processing units.« less
GPU acceleration for digitally reconstructed radiographs using bindless texture objects and CUDA/OpenGL interoperability.

PubMed

Abdellah, Marwan; Eldeib, Ayman; Owis, Mohamed I

2015-01-01

This paper features an advanced implementation of the X-ray rendering algorithm that harnesses the giant computing power of the current commodity graphics processors to accelerate the generation of high resolution digitally reconstructed radiographs (DRRs). The presented pipeline exploits the latest features of NVIDIA Graphics Processing Unit (GPU) architectures, mainly bindless texture objects and dynamic parallelism. The rendering throughput is substantially improved by exploiting the interoperability mechanisms between CUDA and OpenGL. The benchmarks of our optimized rendering pipeline reflect its capability of generating DRRs with resolutions of 2048(2) and 4096(2) at interactive and semi interactive frame-rates using an NVIDIA GeForce 970 GTX device.
Efficient Parallel Levenberg-Marquardt Model Fitting towards Real-Time Automated Parametric Imaging Microscopy

PubMed Central

Zhu, Xiang; Zhang, Dianwen

2013-01-01

We present a fast, accurate and robust parallel Levenberg-Marquardt minimization optimizer, GPU-LMFit, which is implemented on graphics processing unit for high performance scalable parallel model fitting processing. GPU-LMFit can provide a dramatic speed-up in massive model fitting analyses to enable real-time automated pixel-wise parametric imaging microscopy. We demonstrate the performance of GPU-LMFit for the applications in superresolution localization microscopy and fluorescence lifetime imaging microscopy. PMID:24130785
Graphics Technology Study. Volume 1. State of Graphics Technology

DTIC Science & Technology

1986-12-01

reaction of special heat sensitive paper when exposed to the heated elements of a thermal print head. Copy quality was poor due to characteristics...Vendors are now attempting to offer smaller units aimed at applications such as typography , graphic arts, CAD, and office automation. The key element in
A software package for interactive motor unit potential classification using fuzzy k-NN classifier.

PubMed

Rasheed, Sarbast; Stashuk, Daniel; Kamel, Mohamed

2008-01-01

We present an interactive software package for implementing the supervised classification task during electromyographic (EMG) signal decomposition process using a fuzzy k-NN classifier and utilizing the MATLAB high-level programming language and its interactive environment. The method employs an assertion-based classification that takes into account a combination of motor unit potential (MUP) shapes and two modes of use of motor unit firing pattern information: the passive and the active modes. The developed package consists of several graphical user interfaces used to detect individual MUP waveforms from a raw EMG signal, extract relevant features, and classify the MUPs into motor unit potential trains (MUPTs) using assertion-based classifiers.
Parallel approach in RDF query processing

NASA Astrophysics Data System (ADS)

Vajgl, Marek; Parenica, Jan

2017-07-01

Parallel approach is nowadays a very cheap solution to increase computational power due to possibility of usage of multithreaded computational units. This hardware became typical part of nowadays personal computers or notebooks and is widely spread. This contribution deals with experiments how evaluation of computational complex algorithm of the inference over RDF data can be parallelized over graphical cards to decrease computational time.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Bouaichaoui, Youcef; Berrahal, Abderezak; Halbaoui, Khaled

This paper describes the design of data acquisition system (DAQ) that is connected to a PC and development of a feedback control system that maintains the coolant temperature of the process at a desired set point using a digital controller system based on the graphical programming language. The paper will provide details about the data acquisition unit, shows the implementation of the controller, and present test results. (authors)
Lattice Boltzmann Methods for Fluid Structure Interaction

DTIC Science & Technology

2012-09-01

MONTEREY, CALIFORNIA DISSERTATION LATTICE BOLTZMANN METHODS FOR FLUID STRUCTURE INTERACTION by Stuart R. Blair September 2012 Dissertation Supervisor...200 words) The use of lattice Boltzmann methods (LBM) for fluid flow and its coupling with finite element method (FEM) structural models for fluid... structure interaction (FSI) is investigated. A body of high performance LBM software that exploits graphic processing unit (GPU) and multiprocessor
An Optimized Multicolor Point-Implicit Solver for Unstructured Grid Applications on Graphics Processing Units

NASA Technical Reports Server (NTRS)

Zubair, Mohammad; Nielsen, Eric; Luitjens, Justin; Hammond, Dana

2016-01-01

In the field of computational fluid dynamics, the Navier-Stokes equations are often solved using an unstructuredgrid approach to accommodate geometric complexity. Implicit solution methodologies for such spatial discretizations generally require frequent solution of large tightly-coupled systems of block-sparse linear equations. The multicolor point-implicit solver used in the current work typically requires a significant fraction of the overall application run time. In this work, an efficient implementation of the solver for graphics processing units is proposed. Several factors present unique challenges to achieving an efficient implementation in this environment. These include the variable amount of parallelism available in different kernel calls, indirect memory access patterns, low arithmetic intensity, and the requirement to support variable block sizes. In this work, the solver is reformulated to use standard sparse and dense Basic Linear Algebra Subprograms (BLAS) functions. However, numerical experiments show that the performance of the BLAS functions available in existing CUDA libraries is suboptimal for matrices representative of those encountered in actual simulations. Instead, optimized versions of these functions are developed. Depending on block size, the new implementations show performance gains of up to 7x over the existing CUDA library functions.
Accelerating the Gillespie Exact Stochastic Simulation Algorithm using hybrid parallel execution on graphics processing units.

PubMed

Komarov, Ivan; D'Souza, Roshan M

2012-01-01

The Gillespie Stochastic Simulation Algorithm (GSSA) and its variants are cornerstone techniques to simulate reaction kinetics in situations where the concentration of the reactant is too low to allow deterministic techniques such as differential equations. The inherent limitations of the GSSA include the time required for executing a single run and the need for multiple runs for parameter sweep exercises due to the stochastic nature of the simulation. Even very efficient variants of GSSA are prohibitively expensive to compute and perform parameter sweeps. Here we present a novel variant of the exact GSSA that is amenable to acceleration by using graphics processing units (GPUs). We parallelize the execution of a single realization across threads in a warp (fine-grained parallelism). A warp is a collection of threads that are executed synchronously on a single multi-processor. Warps executing in parallel on different multi-processors (coarse-grained parallelism) simultaneously generate multiple trajectories. Novel data-structures and algorithms reduce memory traffic, which is the bottleneck in computing the GSSA. Our benchmarks show an 8×-120× performance gain over various state-of-the-art serial algorithms when simulating different types of models.

Efficient molecular dynamics simulations with many-body potentials on graphics processing units

NASA Astrophysics Data System (ADS)

Fan, Zheyong; Chen, Wei; Vierimaa, Ville; Harju, Ari

2017-09-01

Graphics processing units have been extensively used to accelerate classical molecular dynamics simulations. However, there is much less progress on the acceleration of force evaluations for many-body potentials compared to pairwise ones. In the conventional force evaluation algorithm for many-body potentials, the force, virial stress, and heat current for a given atom are accumulated within different loops, which could result in write conflict between different threads in a CUDA kernel. In this work, we provide a new force evaluation algorithm, which is based on an explicit pairwise force expression for many-body potentials derived recently (Fan et al., 2015). In our algorithm, the force, virial stress, and heat current for a given atom can be accumulated within a single thread and is free of write conflicts. We discuss the formulations and algorithms and evaluate their performance. A new open-source code, GPUMD, is developed based on the proposed formulations. For the Tersoff many-body potential, the double precision performance of GPUMD using a Tesla K40 card is equivalent to that of the LAMMPS (Large-scale Atomic/Molecular Massively Parallel Simulator) molecular dynamics code running with about 100 CPU cores (Intel Xeon CPU X5670 @ 2.93 GHz).
Fast data preprocessing with Graphics Processing Units for inverse problem solving in light-scattering measurements

NASA Astrophysics Data System (ADS)

Derkachov, G.; Jakubczyk, T.; Jakubczyk, D.; Archer, J.; Woźniak, M.

2017-07-01

Utilising Compute Unified Device Architecture (CUDA) platform for Graphics Processing Units (GPUs) enables significant reduction of computation time at a moderate cost, by means of parallel computing. In the paper [Jakubczyk et al., Opto-Electron. Rev., 2016] we reported using GPU for Mie scattering inverse problem solving (up to 800-fold speed-up). Here we report the development of two subroutines utilising GPU at data preprocessing stages for the inversion procedure: (i) A subroutine, based on ray tracing, for finding spherical aberration correction function. (ii) A subroutine performing the conversion of an image to a 1D distribution of light intensity versus azimuth angle (i.e. scattering diagram), fed from a movie-reading CPU subroutine running in parallel. All subroutines are incorporated in PikeReader application, which we make available on GitHub repository. PikeReader returns a sequence of intensity distributions versus a common azimuth angle vector, corresponding to the recorded movie. We obtained an overall ∼ 400 -fold speed-up of calculations at data preprocessing stages using CUDA codes running on GPU in comparison to single thread MATLAB-only code running on CPU.
Space Object Collision Probability via Monte Carlo on the Graphics Processing Unit

NASA Astrophysics Data System (ADS)

Vittaldev, Vivek; Russell, Ryan P.

2017-09-01

Fast and accurate collision probability computations are essential for protecting space assets. Monte Carlo (MC) simulation is the most accurate but computationally intensive method. A Graphics Processing Unit (GPU) is used to parallelize the computation and reduce the overall runtime. Using MC techniques to compute the collision probability is common in literature as the benchmark. An optimized implementation on the GPU, however, is a challenging problem and is the main focus of the current work. The MC simulation takes samples from the uncertainty distributions of the Resident Space Objects (RSOs) at any time during a time window of interest and outputs the separations at closest approach. Therefore, any uncertainty propagation method may be used and the collision probability is automatically computed as a function of RSO collision radii. Integration using a fixed time step and a quartic interpolation after every Runge Kutta step ensures that no close approaches are missed. Two orders of magnitude speedups over a serial CPU implementation are shown, and speedups improve moderately with higher fidelity dynamics. The tool makes the MC approach tractable on a single workstation, and can be used as a final product, or for verifying surrogate and analytical collision probability methods.
GPU-based acceleration of computations in nonlinear finite element deformation analysis.

PubMed

Mafi, Ramin; Sirouspour, Shahin

2014-03-01

The physics of deformation for biological soft-tissue is best described by nonlinear continuum mechanics-based models, which then can be discretized by the FEM for a numerical solution. However, computational complexity of such models have limited their use in applications requiring real-time or fast response. In this work, we propose a graphic processing unit-based implementation of the FEM using implicit time integration for dynamic nonlinear deformation analysis. This is the most general formulation of the deformation analysis. It is valid for large deformations and strains and can account for material nonlinearities. The data-parallel nature and the intense arithmetic computations of nonlinear FEM equations make it particularly suitable for implementation on a parallel computing platform such as graphic processing unit. In this work, we present and compare two different designs based on the matrix-free and conventional preconditioned conjugate gradients algorithms for solving the FEM equations arising in deformation analysis. The speedup achieved with the proposed parallel implementations of the algorithms will be instrumental in the development of advanced surgical simulators and medical image registration methods involving soft-tissue deformation. Copyright © 2013 John Wiley & Sons, Ltd.
Accelerating Electrostatic Surface Potential Calculation with Multiscale Approximation on Graphics Processing Units

PubMed Central

Anandakrishnan, Ramu; Scogland, Tom R. W.; Fenley, Andrew T.; Gordon, John C.; Feng, Wu-chun; Onufriev, Alexey V.

2010-01-01

Tools that compute and visualize biomolecular electrostatic surface potential have been used extensively for studying biomolecular function. However, determining the surface potential for large biomolecules on a typical desktop computer can take days or longer using currently available tools and methods. Two commonly used techniques to speed up these types of electrostatic computations are approximations based on multi-scale coarse-graining and parallelization across multiple processors. This paper demonstrates that for the computation of electrostatic surface potential, these two techniques can be combined to deliver significantly greater speed-up than either one separately, something that is in general not always possible. Specifically, the electrostatic potential computation, using an analytical linearized Poisson Boltzmann (ALPB) method, is approximated using the hierarchical charge partitioning (HCP) multiscale method, and parallelized on an ATI Radeon 4870 graphical processing unit (GPU). The implementation delivers a combined 934-fold speed-up for a 476,040 atom viral capsid, compared to an equivalent non-parallel implementation on an Intel E6550 CPU without the approximation. This speed-up is significantly greater than the 42-fold speed-up for the HCP approximation alone or the 182-fold speed-up for the GPU alone. PMID:20452792
Molecular Monte Carlo Simulations Using Graphics Processing Units: To Waste Recycle or Not?

PubMed

Kim, Jihan; Rodgers, Jocelyn M; Athènes, Manuel; Smit, Berend

2011-10-11

In the waste recycling Monte Carlo (WRMC) algorithm, (1) multiple trial states may be simultaneously generated and utilized during Monte Carlo moves to improve the statistical accuracy of the simulations, suggesting that such an algorithm may be well posed for implementation in parallel on graphics processing units (GPUs). In this paper, we implement two waste recycling Monte Carlo algorithms in CUDA (Compute Unified Device Architecture) using uniformly distributed random trial states and trial states based on displacement random-walk steps, and we test the methods on a methane-zeolite MFI framework system to evaluate their utility. We discuss the specific implementation details of the waste recycling GPU algorithm and compare the methods to other parallel algorithms optimized for the framework system. We analyze the relationship between the statistical accuracy of our simulations and the CUDA block size to determine the efficient allocation of the GPU hardware resources. We make comparisons between the GPU and the serial CPU Monte Carlo implementations to assess speedup over conventional microprocessors. Finally, we apply our optimized GPU algorithms to the important problem of determining free energy landscapes, in this case for molecular motion through the zeolite LTA.
Graphics Processing Unit Acceleration of Gyrokinetic Turbulence Simulations

NASA Astrophysics Data System (ADS)

Hause, Benjamin; Parker, Scott; Chen, Yang

2013-10-01

We find a substantial increase in on-node performance using Graphics Processing Unit (GPU) acceleration in gyrokinetic delta-f particle-in-cell simulation. Optimization is performed on a two-dimensional slab gyrokinetic particle simulation using the Portland Group Fortran compiler with the OpenACC compiler directives and Fortran CUDA. Mixed implementation of both Open-ACC and CUDA is demonstrated. CUDA is required for optimizing the particle deposition algorithm. We have implemented the GPU acceleration on a third generation Core I7 gaming PC with two NVIDIA GTX 680 GPUs. We find comparable, or better, acceleration relative to the NERSC DIRAC cluster with the NVIDIA Tesla C2050 computing processor. The Tesla C 2050 is about 2.6 times more expensive than the GTX 580 gaming GPU. We also see enormous speedups (10 or more) on the Titan supercomputer at Oak Ridge with Kepler K20 GPUs. Results show speed-ups comparable or better than that of OpenMP models utilizing multiple cores. The use of hybrid OpenACC, CUDA Fortran, and MPI models across many nodes will also be discussed. Optimization strategies will be presented. We will discuss progress on optimizing the comprehensive three dimensional general geometry GEM code.
Monte Carlo-based fluorescence molecular tomography reconstruction method accelerated by a cluster of graphic processing units.

PubMed

Quan, Guotao; Gong, Hui; Deng, Yong; Fu, Jianwei; Luo, Qingming

2011-02-01

High-speed fluorescence molecular tomography (FMT) reconstruction for 3-D heterogeneous media is still one of the most challenging problems in diffusive optical fluorescence imaging. In this paper, we propose a fast FMT reconstruction method that is based on Monte Carlo (MC) simulation and accelerated by a cluster of graphics processing units (GPUs). Based on the Message Passing Interface standard, we modified the MC code for fast FMT reconstruction, and different Green's functions representing the flux distribution in media are calculated simultaneously by different GPUs in the cluster. A load-balancing method was also developed to increase the computational efficiency. By applying the Fréchet derivative, a Jacobian matrix is formed to reconstruct the distribution of the fluorochromes using the calculated Green's functions. Phantom experiments have shown that only 10 min are required to get reconstruction results with a cluster of 6 GPUs, rather than 6 h with a cluster of multiple dual opteron CPU nodes. Because of the advantages of high accuracy and suitability for 3-D heterogeneity media with refractive-index-unmatched boundaries from the MC simulation, the GPU cluster-accelerated method provides a reliable approach to high-speed reconstruction for FMT imaging.
Kinematic modelling of disc galaxies using graphics processing units

NASA Astrophysics Data System (ADS)

Bekiaris, G.; Glazebrook, K.; Fluke, C. J.; Abraham, R.

2016-01-01

With large-scale integral field spectroscopy (IFS) surveys of thousands of galaxies currently under-way or planned, the astronomical community is in need of methods, techniques and tools that will allow the analysis of huge amounts of data. We focus on the kinematic modelling of disc galaxies and investigate the potential use of massively parallel architectures, such as the graphics processing unit (GPU), as an accelerator for the computationally expensive model-fitting procedure. We review the algorithms involved in model-fitting and evaluate their suitability for GPU implementation. We employ different optimization techniques, including the Levenberg-Marquardt and nested sampling algorithms, but also a naive brute-force approach based on nested grids. We find that the GPU can accelerate the model-fitting procedure up to a factor of ˜100 when compared to a single-threaded CPU, and up to a factor of ˜10 when compared to a multithreaded dual CPU configuration. Our method's accuracy, precision and robustness are assessed by successfully recovering the kinematic properties of simulated data, and also by verifying the kinematic modelling results of galaxies from the GHASP and DYNAMO surveys as found in the literature. The resulting GBKFIT code is available for download from: http://supercomputing.swin.edu.au/gbkfit.
Multilevel Summation of Electrostatic Potentials Using Graphics Processing Units*

PubMed Central

Hardy, David J.; Stone, John E.; Schulten, Klaus

2009-01-01

Physical and engineering practicalities involved in microprocessor design have resulted in flat performance growth for traditional single-core microprocessors. The urgent need for continuing increases in the performance of scientific applications requires the use of many-core processors and accelerators such as graphics processing units (GPUs). This paper discusses GPU acceleration of the multilevel summation method for computing electrostatic potentials and forces for a system of charged atoms, which is a problem of paramount importance in biomolecular modeling applications. We present and test a new GPU algorithm for the long-range part of the potentials that computes a cutoff pair potential between lattice points, essentially convolving a fixed 3-D lattice of “weights” over all sub-cubes of a much larger lattice. The implementation exploits the different memory subsystems provided on the GPU to stream optimally sized data sets through the multiprocessors. We demonstrate for the full multilevel summation calculation speedups of up to 26 using a single GPU and 46 using multiple GPUs, enabling the computation of a high-resolution map of the electrostatic potential for a system of 1.5 million atoms in under 12 seconds. PMID:20161132
Parallel design of JPEG-LS encoder on graphics processing units

NASA Astrophysics Data System (ADS)

Duan, Hao; Fang, Yong; Huang, Bormin

2012-01-01

With recent technical advances in graphic processing units (GPUs), GPUs have outperformed CPUs in terms of compute capability and memory bandwidth. Many successful GPU applications to high performance computing have been reported. JPEG-LS is an ISO/IEC standard for lossless image compression which utilizes adaptive context modeling and run-length coding to improve compression ratio. However, adaptive context modeling causes data dependency among adjacent pixels and the run-length coding has to be performed in a sequential way. Hence, using JPEG-LS to compress large-volume hyperspectral image data is quite time-consuming. We implement an efficient parallel JPEG-LS encoder for lossless hyperspectral compression on a NVIDIA GPU using the computer unified device architecture (CUDA) programming technology. We use the block parallel strategy, as well as such CUDA techniques as coalesced global memory access, parallel prefix sum, and asynchronous data transfer. We also show the relation between GPU speedup and AVIRIS block size, as well as the relation between compression ratio and AVIRIS block size. When AVIRIS images are divided into blocks, each with 64×64 pixels, we gain the best GPU performance with 26.3x speedup over its original CPU code.
Execution of a parallel edge-based Navier-Stokes solver on commodity graphics processor units

NASA Astrophysics Data System (ADS)

Corral, Roque; Gisbert, Fernando; Pueblas, Jesus

2017-02-01

The implementation of an edge-based three-dimensional Reynolds Average Navier-Stokes solver for unstructured grids able to run on multiple graphics processing units (GPUs) is presented. Loops over edges, which are the most time-consuming part of the solver, have been written to exploit the massively parallel capabilities of GPUs. Non-blocking communications between parallel processes and between the GPU and the central processor unit (CPU) have been used to enhance code scalability. The code is written using a mixture of C++ and OpenCL, to allow the execution of the source code on GPUs. The Message Passage Interface (MPI) library is used to allow the parallel execution of the solver on multiple GPUs. A comparative study of the solver parallel performance is carried out using a cluster of CPUs and another of GPUs. It is shown that a single GPU is up to 64 times faster than a single CPU core. The parallel scalability of the solver is mainly degraded due to the loss of computing efficiency of the GPU when the size of the case decreases. However, for large enough grid sizes, the scalability is strongly improved. A cluster featuring commodity GPUs and a high bandwidth network is ten times less costly and consumes 33% less energy than a CPU-based cluster with an equivalent computational power.
Ice-sheet modelling accelerated by graphics cards

NASA Astrophysics Data System (ADS)

Brædstrup, Christian Fredborg; Damsgaard, Anders; Egholm, David Lundbek

2014-11-01

Studies of glaciers and ice sheets have increased the demand for high performance numerical ice flow models over the past decades. When exploring the highly non-linear dynamics of fast flowing glaciers and ice streams, or when coupling multiple flow processes for ice, water, and sediment, researchers are often forced to use super-computing clusters. As an alternative to conventional high-performance computing hardware, the Graphical Processing Unit (GPU) is capable of massively parallel computing while retaining a compact design and low cost. In this study, we present a strategy for accelerating a higher-order ice flow model using a GPU. By applying the newest GPU hardware, we achieve up to 180× speedup compared to a similar but serial CPU implementation. Our results suggest that GPU acceleration is a competitive option for ice-flow modelling when compared to CPU-optimised algorithms parallelised by the OpenMP or Message Passing Interface (MPI) protocols.
Rapid Parallel Calculation of shell Element Based On GPU

NASA Astrophysics Data System (ADS)

Wanga, Jian Hua; Lia, Guang Yao; Lib, Sheng; Li, Guang Yao

2010-06-01

Long computing time bottlenecked the application of finite element. In this paper, an effective method to speed up the FEM calculation by using the existing modern graphic processing unit and programmable colored rendering tool was put forward, which devised the representation of unit information in accordance with the features of GPU, converted all the unit calculation into film rendering process, solved the simulation work of all the unit calculation of the internal force, and overcame the shortcomings of lowly parallel level appeared ever before when it run in a single computer. Studies shown that this method could improve efficiency and shorten calculating hours greatly. The results of emulation calculation about the elasticity problem of large number cells in the sheet metal proved that using the GPU parallel simulation calculation was faster than using the CPU's. It is useful and efficient to solve the project problems in this way.
Development of High-speed Visualization System of Hypocenter Data Using CUDA-based GPU computing

NASA Astrophysics Data System (ADS)

Kumagai, T.; Okubo, K.; Uchida, N.; Matsuzawa, T.; Kawada, N.; Takeuchi, N.

2014-12-01

After the Great East Japan Earthquake on March 11, 2011, intelligent visualization of seismic information is becoming important to understand the earthquake phenomena. On the other hand, to date, the quantity of seismic data becomes enormous as a progress of high accuracy observation network; we need to treat many parameters (e.g., positional information, origin time, magnitude, etc.) to efficiently display the seismic information. Therefore, high-speed processing of data and image information is necessary to handle enormous amounts of seismic data. Recently, GPU (Graphic Processing Unit) is used as an acceleration tool for data processing and calculation in various study fields. This movement is called GPGPU (General Purpose computing on GPUs). In the last few years the performance of GPU keeps on improving rapidly. GPU computing gives us the high-performance computing environment at a lower cost than before. Moreover, use of GPU has an advantage of visualization of processed data, because GPU is originally architecture for graphics processing. In the GPU computing, the processed data is always stored in the video memory. Therefore, we can directly write drawing information to the VRAM on the video card by combining CUDA and the graphics API. In this study, we employ CUDA and OpenGL and/or DirectX to realize full-GPU implementation. This method makes it possible to write drawing information to the VRAM on the video card without PCIe bus data transfer: It enables the high-speed processing of seismic data. The present study examines the GPU computing-based high-speed visualization and the feasibility for high-speed visualization system of hypocenter data.
Optimizing ion channel models using a parallel genetic algorithm on graphical processors.

PubMed

Ben-Shalom, Roy; Aviv, Amit; Razon, Benjamin; Korngreen, Alon

2012-01-01

We have recently shown that we can semi-automatically constrain models of voltage-gated ion channels by combining a stochastic search algorithm with ionic currents measured using multiple voltage-clamp protocols. Although numerically successful, this approach is highly demanding computationally, with optimization on a high performance Linux cluster typically lasting several days. To solve this computational bottleneck we converted our optimization algorithm for work on a graphical processing unit (GPU) using NVIDIA's CUDA. Parallelizing the process on a Fermi graphic computing engine from NVIDIA increased the speed ∼180 times over an application running on an 80 node Linux cluster, considerably reducing simulation times. This application allows users to optimize models for ion channel kinetics on a single, inexpensive, desktop "super computer," greatly reducing the time and cost of building models relevant to neuronal physiology. We also demonstrate that the point of algorithm parallelization is crucial to its performance. We substantially reduced computing time by solving the ODEs (Ordinary Differential Equations) so as to massively reduce memory transfers to and from the GPU. This approach may be applied to speed up other data intensive applications requiring iterative solutions of ODEs. Copyright © 2012 Elsevier B.V. All rights reserved.
Progress in a novel architecture for high performance processing

NASA Astrophysics Data System (ADS)

Zhang, Zhiwei; Liu, Meng; Liu, Zijun; Du, Xueliang; Xie, Shaolin; Ma, Hong; Ding, Guangxin; Ren, Weili; Zhou, Fabiao; Sun, Wenqin; Wang, Huijuan; Wang, Donglin

2018-04-01

The high performance processing (HPP) is an innovative architecture which targets on high performance computing with excellent power efficiency and computing performance. It is suitable for data intensive applications like supercomputing, machine learning and wireless communication. An example chip with four application-specific integrated circuit (ASIC) cores which is the first generation of HPP cores has been taped out successfully under Taiwan Semiconductor Manufacturing Company (TSMC) 40 nm low power process. The innovative architecture shows great energy efficiency over the traditional central processing unit (CPU) and general-purpose computing on graphics processing units (GPGPU). Compared with MaPU, HPP has made great improvement in architecture. The chip with 32 HPP cores is being developed under TSMC 16 nm field effect transistor (FFC) technology process and is planed to use commercially. The peak performance of this chip can reach 4.3 teraFLOPS (TFLOPS) and its power efficiency is up to 89.5 gigaFLOPS per watt (GFLOPS/W).
The Research and Test of Fast Radio Burst Real-time Search Algorithm Based on GPU Acceleration

NASA Astrophysics Data System (ADS)

Wang, J.; Chen, M. Z.; Pei, X.; Wang, Z. Q.

2017-03-01

In order to satisfy the research needs of Nanshan 25 m radio telescope of Xinjiang Astronomical Observatory (XAO) and study the key technology of the planned QiTai radio Telescope (QTT), the receiver group of XAO studied the GPU (Graphics Processing Unit) based real-time FRB searching algorithm which developed from the original FRB searching algorithm based on CPU (Central Processing Unit), and built the FRB real-time searching system. The comparison of the GPU system and the CPU system shows that: on the basis of ensuring the accuracy of the search, the speed of the GPU accelerated algorithm is improved by 35-45 times compared with the CPU algorithm.
GPU Accelerated Prognostics

NASA Technical Reports Server (NTRS)

Gorospe, George E., Jr.; Daigle, Matthew J.; Sankararaman, Shankar; Kulkarni, Chetan S.; Ng, Eley

2017-01-01

Prognostic methods enable operators and maintainers to predict the future performance for critical systems. However, these methods can be computationally expensive and may need to be performed each time new information about the system becomes available. In light of these computational requirements, we have investigated the application of graphics processing units (GPUs) as a computational platform for real-time prognostics. Recent advances in GPU technology have reduced cost and increased the computational capability of these highly parallel processing units, making them more attractive for the deployment of prognostic software. We present a survey of model-based prognostic algorithms with considerations for leveraging the parallel architecture of the GPU and a case study of GPU-accelerated battery prognostics with computational performance results.
GENIE: a software package for gene-gene interaction analysis in genetic association studies using multiple GPU or CPU cores.

PubMed

Chikkagoudar, Satish; Wang, Kai; Li, Mingyao

2011-05-26

Gene-gene interaction in genetic association studies is computationally intensive when a large number of SNPs are involved. Most of the latest Central Processing Units (CPUs) have multiple cores, whereas Graphics Processing Units (GPUs) also have hundreds of cores and have been recently used to implement faster scientific software. However, currently there are no genetic analysis software packages that allow users to fully utilize the computing power of these multi-core devices for genetic interaction analysis for binary traits. Here we present a novel software package GENIE, which utilizes the power of multiple GPU or CPU processor cores to parallelize the interaction analysis. GENIE reads an entire genetic association study dataset into memory and partitions the dataset into fragments with non-overlapping sets of SNPs. For each fragment, GENIE analyzes: 1) the interaction of SNPs within it in parallel, and 2) the interaction between the SNPs of the current fragment and other fragments in parallel. We tested GENIE on a large-scale candidate gene study on high-density lipoprotein cholesterol. Using an NVIDIA Tesla C1060 graphics card, the GPU mode of GENIE achieves a speedup of 27 times over its single-core CPU mode run. GENIE is open-source, economical, user-friendly, and scalable. Since the computing power and memory capacity of graphics cards are increasing rapidly while their cost is going down, we anticipate that GENIE will achieve greater speedups with faster GPU cards. Documentation, source code, and precompiled binaries can be downloaded from http://www.cceb.upenn.edu/~mli/software/GENIE/.

GENIE: a software package for gene-gene interaction analysis in genetic association studies using multiple GPU or CPU cores

PubMed Central

2011-01-01

Background Gene-gene interaction in genetic association studies is computationally intensive when a large number of SNPs are involved. Most of the latest Central Processing Units (CPUs) have multiple cores, whereas Graphics Processing Units (GPUs) also have hundreds of cores and have been recently used to implement faster scientific software. However, currently there are no genetic analysis software packages that allow users to fully utilize the computing power of these multi-core devices for genetic interaction analysis for binary traits. Findings Here we present a novel software package GENIE, which utilizes the power of multiple GPU or CPU processor cores to parallelize the interaction analysis. GENIE reads an entire genetic association study dataset into memory and partitions the dataset into fragments with non-overlapping sets of SNPs. For each fragment, GENIE analyzes: 1) the interaction of SNPs within it in parallel, and 2) the interaction between the SNPs of the current fragment and other fragments in parallel. We tested GENIE on a large-scale candidate gene study on high-density lipoprotein cholesterol. Using an NVIDIA Tesla C1060 graphics card, the GPU mode of GENIE achieves a speedup of 27 times over its single-core CPU mode run. Conclusions GENIE is open-source, economical, user-friendly, and scalable. Since the computing power and memory capacity of graphics cards are increasing rapidly while their cost is going down, we anticipate that GENIE will achieve greater speedups with faster GPU cards. Documentation, source code, and precompiled binaries can be downloaded from http://www.cceb.upenn.edu/~mli/software/GENIE/. PMID:21615923
Apparatus and method for imaging metallic objects using an array of giant magnetoresistive sensors

DOEpatents

Chaiken, Alison

2000-01-01

A portable, low-power, metallic object detector and method for providing an image of a detected metallic object. In one embodiment, the present portable low-power metallic object detector an array of giant magnetoresistive (GMR) sensors. The array of GMR sensors is adapted for detecting the presence of and compiling image data of a metallic object. In the embodiment, the array of GMR sensors is arranged in a checkerboard configuration such that axes of sensitivity of alternate GMR sensors are orthogonally oriented. An electronics portion is coupled to the array of GMR sensors. The electronics portion is adapted to receive and process the image data of the metallic object compiled by the array of GMR sensors. The embodiment also includes a display unit which is coupled to the electronics portion. The display unit is adapted to display a graphical representation of the metallic object detected by the array of GMR sensors. In so doing, a graphical representation of the detected metallic object is provided.
Teaching Graphics in Technical Communication Classes.

ERIC Educational Resources Information Center

Spurgeon, Kristene C.

Perhaps because the United States is undergoing a video revolution, perhaps because of its increasing sales of goods to non-English speaking markets where graphics can help explain the products, perhaps because of the decreasing communication skills of the work force, graphic aids are becoming more and more widely used and more and more important.…
Reactions to Graphic Health Warnings in the United States

ERIC Educational Resources Information Center

Nonnemaker, James M.; Choiniere, Conrad J.; Farrelly, Matthew C.; Kamyab, Kian; Davis, Kevin C.

2015-01-01

This study reports consumer reactions to the graphic health warnings selected by the Food and Drug Administration to be placed on cigarette packs in the United States. We recruited three sets of respondents for an experimental study from a national opt-in e-mail list sample: (i) current smokers aged 25 or older, (ii) young adult smokers aged 18-24…
Digital image processing using parallel computing based on CUDA technology

NASA Astrophysics Data System (ADS)

Skirnevskiy, I. P.; Pustovit, A. V.; Abdrashitova, M. O.

2017-01-01

This article describes expediency of using a graphics processing unit (GPU) in big data processing in the context of digital images processing. It provides a short description of a parallel computing technology and its usage in different areas, definition of the image noise and a brief overview of some noise removal algorithms. It also describes some basic requirements that should be met by certain noise removal algorithm in the projection to computer tomography. It provides comparison of the performance with and without using GPU as well as with different percentage of using CPU and GPU.
A Theoretical Analysis of Learning with Graphics--Implications for Computer Graphics Design.

ERIC Educational Resources Information Center

ChanLin, Lih-Juan

This paper reviews the literature pertinent to learning with graphics. The dual coding theory provides explanation about how graphics are stored and precessed in semantic memory. The level of processing theory suggests how graphics can be employed in learning to encourage deeper processing. In addition to dual coding theory and level of processing…
First Year Engineering Graphics Curricula in Major Engineering Colleges.

ERIC Educational Resources Information Center

Meyers, Frederick D.

2000-01-01

Investigates the commonalities and differences of graphics programs among nine universities in the United States by analyzing the course structure and reviewing attendance and course syllabi. (Author/YDS)
Computational Particle Dynamic Simulations on Multicore Processors (CPDMu) Final Report Phase I

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schmalz, Mark S

2011-07-24

Statement of Problem - Department of Energy has many legacy codes for simulation of computational particle dynamics and computational fluid dynamics applications that are designed to run on sequential processors and are not easily parallelized. Emerging high-performance computing architectures employ massively parallel multicore architectures (e.g., graphics processing units) to increase throughput. Parallelization of legacy simulation codes is a high priority, to achieve compatibility, efficiency, accuracy, and extensibility. General Statement of Solution - A legacy simulation application designed for implementation on mainly-sequential processors has been represented as a graph G. Mathematical transformations, applied to G, produce a graph representation {und G}more » for a high-performance architecture. Key computational and data movement kernels of the application were analyzed/optimized for parallel execution using the mapping G {yields} {und G}, which can be performed semi-automatically. This approach is widely applicable to many types of high-performance computing systems, such as graphics processing units or clusters comprised of nodes that contain one or more such units. Phase I Accomplishments - Phase I research decomposed/profiled computational particle dynamics simulation code for rocket fuel combustion into low and high computational cost regions (respectively, mainly sequential and mainly parallel kernels), with analysis of space and time complexity. Using the research team's expertise in algorithm-to-architecture mappings, the high-cost kernels were transformed, parallelized, and implemented on Nvidia Fermi GPUs. Measured speedups (GPU with respect to single-core CPU) were approximately 20-32X for realistic model parameters, without final optimization. Error analysis showed no loss of computational accuracy. Commercial Applications and Other Benefits - The proposed research will constitute a breakthrough in solution of problems related to efficient parallel computation of particle and fluid dynamics simulations. These problems occur throughout DOE, military and commercial sectors: the potential payoff is high. We plan to license or sell the solution to contractors for military and domestic applications such as disaster simulation (aerodynamic and hydrodynamic), Government agencies (hydrological and environmental simulations), and medical applications (e.g., in tomographic image reconstruction). Keywords - High-performance Computing, Graphic Processing Unit, Fluid/Particle Simulation. Summary for Members of Congress - Department of Energy has many simulation codes that must compute faster, to be effective. The Phase I research parallelized particle/fluid simulations for rocket combustion, for high-performance computing systems.« less
Stochastic DT-MRI connectivity mapping on the GPU.

PubMed

McGraw, Tim; Nadar, Mariappan

2007-01-01

We present a method for stochastic fiber tract mapping from diffusion tensor MRI (DT-MRI) implemented on graphics hardware. From the simulated fibers we compute a connectivity map that gives an indication of the probability that two points in the dataset are connected by a neuronal fiber path. A Bayesian formulation of the fiber model is given and it is shown that the inversion method can be used to construct plausible connectivity. An implementation of this fiber model on the graphics processing unit (GPU) is presented. Since the fiber paths can be stochastically generated independently of one another, the algorithm is highly parallelizable. This allows us to exploit the data-parallel nature of the GPU fragment processors. We also present a framework for the connectivity computation on the GPU. Our implementation allows the user to interactively select regions of interest and observe the evolving connectivity results during computation. Results are presented from the stochastic generation of over 250,000 fiber steps per iteration at interactive frame rates on consumer-grade graphics hardware.
High speed finite element simulations on the graphics card

DOE Office of Scientific and Technical Information (OSTI.GOV)

Huthwaite, P.; Lowe, M. J. S.

A software package is developed to perform explicit time domain finite element simulations of ultrasonic propagation on the graphical processing unit, using Nvidia’s CUDA. Of critical importance for this problem is the arrangement of nodes in memory, allowing data to be loaded efficiently and minimising communication between the independently executed blocks of threads. The initial stage of memory arrangement is partitioning the mesh; both a well established ‘greedy’ partitioner and a new, more efficient ‘aligned’ partitioner are investigated. A method is then developed to efficiently arrange the memory within each partition. The technique is compared to a commercial CPU equivalent,more » demonstrating an overall speedup of at least 100 for a non-destructive testing weld model.« less
Software Accelerates Computing Time for Complex Math

NASA Technical Reports Server (NTRS)

2014-01-01

Ames Research Center awarded Newark, Delaware-based EM Photonics Inc. SBIR funding to utilize graphic processing unit (GPU) technology- traditionally used for computer video games-to develop high-computing software called CULA. The software gives users the ability to run complex algorithms on personal computers with greater speed. As a result of the NASA collaboration, the number of employees at the company has increased 10 percent.
Microlensing observations rapid search for exoplanets: MORSE code for GPUs

NASA Astrophysics Data System (ADS)

McDougall, Alistair; Albrow, Michael D.

2016-02-01

The rapid analysis of ongoing gravitational microlensing events has been integral to the successful detection and characterization of cool planets orbiting low-mass stars in the Galaxy. In this paper, we present an implementation of search and fit techniques on graphical processing unit (GPU) hardware. The method allows for the rapid identification of candidate planetary microlensing events and their subsequent follow-up for detailed characterization.
Recursive search method for the image elements of functionally defined surfaces

NASA Astrophysics Data System (ADS)

Vyatkin, S. I.

2017-05-01

This paper touches upon the synthesis of high-quality images in real time and the technique for specifying three-dimensional objects on the basis of perturbation functions. The recursive search method for the image elements of functionally defined objects with the use of graphics processing units is proposed. The advantages of such an approach over the frame-buffer visualization method are shown.
On the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods

PubMed Central

Lee, Anthony; Yau, Christopher; Giles, Michael B.; Doucet, Arnaud; Holmes, Christopher C.

2011-01-01

We present a case-study on the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. Graphics cards, containing multiple Graphics Processing Units (GPUs), are self-contained parallel computational devices that can be housed in conventional desktop and laptop computers and can be thought of as prototypes of the next generation of many-core processors. For certain classes of population-based Monte Carlo algorithms they offer massively parallel simulation, with the added advantage over conventional distributed multi-core processors that they are cheap, easily accessible, easy to maintain, easy to code, dedicated local devices with low power consumption. On a canonical set of stochastic simulation examples including population-based Markov chain Monte Carlo methods and Sequential Monte Carlo methods, we nd speedups from 35 to 500 fold over conventional single-threaded computer code. Our findings suggest that GPUs have the potential to facilitate the growth of statistical modelling into complex data rich domains through the availability of cheap and accessible many-core computation. We believe the speedup we observe should motivate wider use of parallelizable simulation methods and greater methodological attention to their design. PMID:22003276
Intellectual system of identification of Arabic graphics

NASA Astrophysics Data System (ADS)

Abdoullayeva, Gulchin G.; Aliyev, Telman A.; Gurbanova, Nazakat G.

2001-08-01

The studies made by using the domain of graphic images allowed creating facilities of the artificial intelligence for letters, letter combinations etc. for various graphics and prints. The work proposes a system of recognition and identification of symbols of the Arabic graphics, which has its own specificity as compared to Latin and Cyrillic ones. The starting stage of the recognition and the identification is coding with further entry of information into a computer. Here the problem of entry is one of the essentials. For entry of a large volume of information in the unit of time a scanner is usually employed. Along with the scanner the authors suggest their elaboration of technical facilities for effective input and coding of the information. For refinement of symbols not identified from the scanner mostly for a small bulk of information the developed coding devices are used directly in the process of writing. The functional design of the software is elaborated on the basis of the heuristic model of the creative activity of a researcher and experts in the description and estimation of states of the weakly formalizable systems on the strength of the methods of identification and of selection of geometric features.
Laser color recording unit

NASA Astrophysics Data System (ADS)

Jung, E.

1984-05-01

A color recording unit was designed for output and control of digitized picture data within computer controlled reproduction and picture processing systems. In order to get a color proof picture of high quality similar to a color print, together with reduced time and material consumption, a photographic color film material was exposed pixelwise by modulated laser beams of three wavelengths for red, green and blue light. Components of different manufacturers for lasers, acousto-optic modulators and polygon mirrors were tested, also different recording methods as (continuous tone mode or screened mode and with a drum or flatbed recording principle). Besides the application for the graphic arts - the proof recorder CPR 403 with continuous tone color recording with a drum scanner - such a color hardcopy peripheral unit with large picture formats and high resolution can be used in medicine, communication, and satellite picture processing.
A Blended Approach to Reading and Writing Graphic Stories

ERIC Educational Resources Information Center

Brown, Sally

2013-01-01

This article documents the experiences of a diverse group of second grade students during a nine week unit of study focused on graphic stories. The project begins as the class is immersed in reading graphic stories designed for young readers. Images, written text, and dialog are utilized to scaffold reading comprehension and to practice fluency.…
Japanese Manga in Translation and American Graphic Novels: A Preliminary Examination of the Collections in 44 Academic Libraries

ERIC Educational Resources Information Center

Masuchika, Glenn; Boldt, Gail

2010-01-01

American graphic novels are increasingly recognized as high-quality literature and an interesting genre for academic study. Graphic novels of Japan, called manga, have established a strong foothold in American culture. This preliminary survey of 44 United States university libraries demonstrates that Japanese manga in translation are consistently…
Parallel Algorithm for GPU Processing; for use in High Speed Machine Vision Sensing of Cotton Lint Trash.

PubMed

Pelletier, Mathew G

2008-02-08

One of the main hurdles standing in the way of optimal cleaning of cotton lint isthe lack of sensing systems that can react fast enough to provide the control system withreal-time information as to the level of trash contamination of the cotton lint. This researchexamines the use of programmable graphic processing units (GPU) as an alternative to thePC's traditional use of the central processing unit (CPU). The use of the GPU, as analternative computation platform, allowed for the machine vision system to gain asignificant improvement in processing time. By improving the processing time, thisresearch seeks to address the lack of availability of rapid trash sensing systems and thusalleviate a situation in which the current systems view the cotton lint either well before, orafter, the cotton is cleaned. This extended lag/lead time that is currently imposed on thecotton trash cleaning control systems, is what is responsible for system operators utilizing avery large dead-band safety buffer in order to ensure that the cotton lint is not undercleaned.Unfortunately, the utilization of a large dead-band buffer results in the majority ofthe cotton lint being over-cleaned which in turn causes lint fiber-damage as well assignificant losses of the valuable lint due to the excessive use of cleaning machinery. Thisresearch estimates that upwards of a 30% reduction in lint loss could be gained through theuse of a tightly coupled trash sensor to the cleaning machinery control systems. Thisresearch seeks to improve processing times through the development of a new algorithm forcotton trash sensing that allows for implementation on a highly parallel architecture.Additionally, by moving the new parallel algorithm onto an alternative computing platform,the graphic processing unit "GPU", for processing of the cotton trash images, a speed up ofover 6.5 times, over optimized code running on the PC's central processing unit "CPU", wasgained. The new parallel algorithm operating on the GPU was able to process a 1024x1024image in less than 17ms. At this improved speed, the image processing system's performance should now be sufficient to provide a system that would be capable of realtimefeed-back control that is in tight cooperation with the cleaning equipment.
Graphics Processing Unit (GPU) Acceleration of the Goddard Earth Observing System Atmospheric Model

NASA Technical Reports Server (NTRS)

Putnam, Williama

2011-01-01

The Goddard Earth Observing System 5 (GEOS-5) is the atmospheric model used by the Global Modeling and Assimilation Office (GMAO) for a variety of applications, from long-term climate prediction at relatively coarse resolution, to data assimilation and numerical weather prediction, to very high-resolution cloud-resolving simulations. GEOS-5 is being ported to a graphics processing unit (GPU) cluster at the NASA Center for Climate Simulation (NCCS). By utilizing GPU co-processor technology, we expect to increase the throughput of GEOS-5 by at least an order of magnitude, and accelerate the process of scientific exploration across all scales of global modeling, including: The large-scale, high-end application of non-hydrostatic, global, cloud-resolving modeling at 10- to I-kilometer (km) global resolutions Intermediate-resolution seasonal climate and weather prediction at 50- to 25-km on small clusters of GPUs Long-range, coarse-resolution climate modeling, enabled on a small box of GPUs for the individual researcher After being ported to the GPU cluster, the primary physics components and the dynamical core of GEOS-5 have demonstrated a potential speedup of 15-40 times over conventional processor cores. Performance improvements of this magnitude reduce the required scalability of 1-km, global, cloud-resolving models from an unfathomable 6 million cores to an attainable 200,000 GPU-enabled cores.

Accelerated rescaling of single Monte Carlo simulation runs with the Graphics Processing Unit (GPU).

PubMed

Yang, Owen; Choi, Bernard

2013-01-01

To interpret fiber-based and camera-based measurements of remitted light from biological tissues, researchers typically use analytical models, such as the diffusion approximation to light transport theory, or stochastic models, such as Monte Carlo modeling. To achieve rapid (ideally real-time) measurement of tissue optical properties, especially in clinical situations, there is a critical need to accelerate Monte Carlo simulation runs. In this manuscript, we report on our approach using the Graphics Processing Unit (GPU) to accelerate rescaling of single Monte Carlo runs to calculate rapidly diffuse reflectance values for different sets of tissue optical properties. We selected MATLAB to enable non-specialists in C and CUDA-based programming to use the generated open-source code. We developed a software package with four abstraction layers. To calculate a set of diffuse reflectance values from a simulated tissue with homogeneous optical properties, our rescaling GPU-based approach achieves a reduction in computation time of several orders of magnitude as compared to other GPU-based approaches. Specifically, our GPU-based approach generated a diffuse reflectance value in 0.08ms. The transfer time from CPU to GPU memory currently is a limiting factor with GPU-based calculations. However, for calculation of multiple diffuse reflectance values, our GPU-based approach still can lead to processing that is ~3400 times faster than other GPU-based approaches.
GRay: A MASSIVELY PARALLEL GPU-BASED CODE FOR RAY TRACING IN RELATIVISTIC SPACETIMES

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chan, Chi-kwan; Psaltis, Dimitrios; Özel, Feryal

We introduce GRay, a massively parallel integrator designed to trace the trajectories of billions of photons in a curved spacetime. This graphics-processing-unit (GPU)-based integrator employs the stream processing paradigm, is implemented in CUDA C/C++, and runs on nVidia graphics cards. The peak performance of GRay using single-precision floating-point arithmetic on a single GPU exceeds 300 GFLOP (or 1 ns per photon per time step). For a realistic problem, where the peak performance cannot be reached, GRay is two orders of magnitude faster than existing central-processing-unit-based ray-tracing codes. This performance enhancement allows more effective searches of large parameter spaces when comparingmore » theoretical predictions of images, spectra, and light curves from the vicinities of compact objects to observations. GRay can also perform on-the-fly ray tracing within general relativistic magnetohydrodynamic algorithms that simulate accretion flows around compact objects. Making use of this algorithm, we calculate the properties of the shadows of Kerr black holes and the photon rings that surround them. We also provide accurate fitting formulae of their dependencies on black hole spin and observer inclination, which can be used to interpret upcoming observations of the black holes at the center of the Milky Way, as well as M87, with the Event Horizon Telescope.« less
A REVIEW OF GRAPHICAL REPRESENTATIONS USED IN THE DIETARY GUIDELINES OF SELECTED COUNTRIES IN THE AMERICAS, EUROPE AND ASIA.

PubMed

Altamirano Martínez, Martha Betzaida; Cordero Muñoz, Aida Yanet; Macedo Ojeda, Gabriela; Márquez Sandoval, Yolanda Fabiola; Vizmanos, Barbara

2015-09-01

Food-Based Dietary Guidelines (FBDGs) are an initiative by the United Nations Food and Agriculture Organization (FAO) and the World Health Organization (WHO) designed to help countries establish their own nutrition education principles. Such principles should be expressed through clear and specific messages that provide guidance and promote good health among populations. Many of these guidelines contain graphical representations (GRs) as visual aids for dietary guidance. to analyze the characteristics of GRs used in various countries on four continents to identify international trends in these graphical messages and assess their usefulness as educational tools for their target populations. a review of GRs used in the FBDGs of countries in the Americas, Europe and Asia for which data were available in Spanish or English. the models most used are the food circle and pyramid. The GRs (n = 37) depict the following recommendations: food groups (37), physical activity (21), water intake (17), low salt intake (7), family meals (1) and relaxation (1). In addition, 10 quantitative recommendations were detected. The GRs of Greece and the United States do not show images of food. The aspects considered in the GRs vary by the regions, cultures and epidemiological characteristics of each country. A tendency to use the food circle and to include lifestyle recommendations in illustrations was observed in the United States, Spain and Mexico. Quantitative recommendations may help to clarify information provided during the educational process. Copyright AULA MEDICA EDICIONES 2014. Published by AULA MEDICA. All rights reserved.
31 CFR 92.12 - Definitions.

Code of Federal Regulations, 2010 CFR

2010-07-01

...) Symbol means any design or graphic used by the United States Mint or the Treasury Department to represent themselves or their products. A design or graphic may include (1) A trademark, designation of origin, or mark...
Quality, Importance, and Instruction: The Perspectives of Teachers of Students with Visual Impairments on Graphics Use by Students

ERIC Educational Resources Information Center

Zebehazy, Kim T.; Wilton, Adam P.

2014-01-01

Introduction: This study investigated the perceptions and practices of teachers of students with visual impairments in Canada and the United States regarding graphics (both tactile and print) that are used by students with visual impairments. Questions focused on quality, importance, and instruction in the use of graphics. Methods: An electronic…
Simulating spin models on GPU

NASA Astrophysics Data System (ADS)

Weigel, Martin

2011-09-01

Over the last couple of years it has been realized that the vast computational power of graphics processing units (GPUs) could be harvested for purposes other than the video game industry. This power, which at least nominally exceeds that of current CPUs by large factors, results from the relative simplicity of the GPU architectures as compared to CPUs, combined with a large number of parallel processing units on a single chip. To benefit from this setup for general computing purposes, the problems at hand need to be prepared in a way to profit from the inherent parallelism and hierarchical structure of memory accesses. In this contribution I discuss the performance potential for simulating spin models, such as the Ising model, on GPU as compared to conventional simulations on CPU.
GPU-accelerated computational tool for studying the effectiveness of asteroid disruption techniques

NASA Astrophysics Data System (ADS)

Zimmerman, Ben J.; Wie, Bong

2016-10-01

This paper presents the development of a new Graphics Processing Unit (GPU) accelerated computational tool for asteroid disruption techniques. Numerical simulations are completed using the high-order spectral difference (SD) method. Due to the compact nature of the SD method, it is well suited for implementation with the GPU architecture, hence solutions are generated at orders of magnitude faster than the Central Processing Unit (CPU) counterpart. A multiphase model integrated with the SD method is introduced, and several asteroid disruption simulations are conducted, including kinetic-energy impactors, multi-kinetic energy impactor systems, and nuclear options. Results illustrate the benefits of using multi-kinetic energy impactor systems when compared to a single impactor system. In addition, the effectiveness of nuclear options is observed.
Wargaming and interactive color graphics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bly, S.; Buzzell, C.; Smith, G.

1980-08-04

JANUS is a two-sided interactive color graphic simulation in which human commanders can direct their forces, each trying to accomplish their mission. This competitive synthetic battlefield is used to explore the range of human ingenuity under conditions of incomplete information about enemy strength and deployment. Each player can react to new situations by planning new unit movements, using conventional and nuclear weapons, or modifying unit objectives. Conventional direct fire among tanks, infantry fighting vehicles, helicopters, and other units is automated subject to constraints of target acquisition, reload rate, range, suppression, etc. Artillery and missile indirect fire systems deliver conventional munitions,more » smoke, and nuclear weapons. Players use reconnaissance units, helicopters, or fixed wing aircraft to search for enemy unit locations. Counter-battery radars acquire enemy artillery. The JANUS simulation at LLL has demonstrated the value of the computer as a sophisticated blackboard. A small dedicated minicomputer is adequate for detailed calculations, and may be preferable to sharing a more powerful machine. Real-time color interactive graphics are essential to allow realistic command decision inputs. Competitive human-versus-human synthetic experiences are intense and well-remembered. 2 figures.« less
NURE aerial gamma ray and magnetic detail survey of portions of northeast Washington. Final report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Not Available

1981-11-01

The Northeast Washington Survey was performed under the United States Department of Energy's National Uranium Resource Evaluation (NURE) Program, which is designed to provide radioelement distribution information to assist in assessing the uraniferous material potential of the United States. The radiometric and ancilliary data were digitally recorded and processed. The results are presented in the form of stacked profiles, contour maps, flight path maps, statistical tables and frequency distribution histograms. These graphical outputs are presented at a scale of 1:62,500 and are contained in the individual Volume 2 reports.
Graphics processing unit accelerated phase field dislocation dynamics: Application to bi-metallic interfaces

DOE PAGES

Eghtesad, Adnan; Germaschewski, Kai; Beyerlein, Irene J.; ...

2017-10-14

We present the first high-performance computing implementation of the meso-scale phase field dislocation dynamics (PFDD) model on a graphics processing unit (GPU)-based platform. The implementation takes advantage of the portable OpenACC standard directive pragmas along with Nvidia's compute unified device architecture (CUDA) fast Fourier transform (FFT) library called CUFFT to execute the FFT computations within the PFDD formulation on the same GPU platform. The overall implementation is termed ACCPFDD-CUFFT. The package is entirely performance portable due to the use of OPENACC-CUDA inter-operability, in which calls to CUDA functions are replaced with the OPENACC data regions for a host central processingmore » unit (CPU) and device (GPU). A comprehensive benchmark study has been conducted, which compares a number of FFT routines, the Numerical Recipes FFT (FOURN), Fastest Fourier Transform in the West (FFTW), and the CUFFT. The last one exploits the advantages of the GPU hardware for FFT calculations. The novel ACCPFDD-CUFFT implementation is verified using the analytical solutions for the stress field around an infinite edge dislocation and subsequently applied to simulate the interaction and motion of dislocations through a bi-phase copper-nickel (Cu–Ni) interface. It is demonstrated that the ACCPFDD-CUFFT implementation on a single TESLA K80 GPU offers a 27.6X speedup relative to the serial version and a 5X speedup relative to the 22-multicore Intel Xeon CPU E5-2699 v4 @ 2.20 GHz version of the code.« less
Graphics processing unit accelerated phase field dislocation dynamics: Application to bi-metallic interfaces

DOE Office of Scientific and Technical Information (OSTI.GOV)

Eghtesad, Adnan; Germaschewski, Kai; Beyerlein, Irene J.

We present the first high-performance computing implementation of the meso-scale phase field dislocation dynamics (PFDD) model on a graphics processing unit (GPU)-based platform. The implementation takes advantage of the portable OpenACC standard directive pragmas along with Nvidia's compute unified device architecture (CUDA) fast Fourier transform (FFT) library called CUFFT to execute the FFT computations within the PFDD formulation on the same GPU platform. The overall implementation is termed ACCPFDD-CUFFT. The package is entirely performance portable due to the use of OPENACC-CUDA inter-operability, in which calls to CUDA functions are replaced with the OPENACC data regions for a host central processingmore » unit (CPU) and device (GPU). A comprehensive benchmark study has been conducted, which compares a number of FFT routines, the Numerical Recipes FFT (FOURN), Fastest Fourier Transform in the West (FFTW), and the CUFFT. The last one exploits the advantages of the GPU hardware for FFT calculations. The novel ACCPFDD-CUFFT implementation is verified using the analytical solutions for the stress field around an infinite edge dislocation and subsequently applied to simulate the interaction and motion of dislocations through a bi-phase copper-nickel (Cu–Ni) interface. It is demonstrated that the ACCPFDD-CUFFT implementation on a single TESLA K80 GPU offers a 27.6X speedup relative to the serial version and a 5X speedup relative to the 22-multicore Intel Xeon CPU E5-2699 v4 @ 2.20 GHz version of the code.« less
Student Thinking Processes While Constructing Graphic Representations of Textbook Content: What Insights Do Think-Alouds Provide?

ERIC Educational Resources Information Center

Scott, D. Beth; Dreher, Mariam Jean

2016-01-01

This study examined the thinking processes students engage in while constructing graphic representations of textbook content. Twenty-eight students who either used graphic representations in a routine manner during social studies instruction or learned to construct graphic representations based on the rhetorical patterns used to organize textbook…
The effect of using graphic organizers in the teaching of standard biology

NASA Astrophysics Data System (ADS)

Pepper, Wade Louis, Jr.

This study was conducted to determine if the use of graphic organizers in the teaching of standard biology would increase student achievement, involvement and quality of activities. The subjects were 10th grade standard biology students in a large southern inner city high school. The study was conducted over a six-week period in an instructional setting using action research as the investigative format. After calculation of the homogeneity between classes, random selection was used to determine the graphic organizer class and the control class. The graphic organizer class was taught unit material through a variety of instructional methods along with the use of teacher generated graphic organizers. The control class was taught the same unit material using the same instructional methods, but without the use of graphic organizers. Data for the study were gathered from in-class written assignments, teacher-generated tests and text-generated tests, and rubric scores of an out-of-class written assignment and project. Also, data were gathered from student reactions, comments, observations and a teacher's research journal. Results were analyzed using descriptive statistics and qualitative interpretation. By comparing statistical results, it was determined that the use of graphic organizers did not make a statistically significant difference in the understanding of biological concepts and retention of factual information. Furthermore, the use of graphic organizers did not make a significant difference in motivating students to fulfill all class assignments with quality efforts and products. However, based upon student reactions and comments along with observations by the researcher, graphic organizers were viewed by the students as a favorable and helpful instructional tool. In lieu of statistical results, student gains from instructional activities using graphic organizers were positive and merit the continuation of their use as an instructional tool.
Real-time photoacoustic and ultrasound dual-modality imaging system facilitated with graphics processing unit and code parallel optimization.

PubMed

Yuan, Jie; Xu, Guan; Yu, Yao; Zhou, Yu; Carson, Paul L; Wang, Xueding; Liu, Xiaojun

2013-08-01

Photoacoustic tomography (PAT) offers structural and functional imaging of living biological tissue with highly sensitive optical absorption contrast and excellent spatial resolution comparable to medical ultrasound (US) imaging. We report the development of a fully integrated PAT and US dual-modality imaging system, which performs signal scanning, image reconstruction, and display for both photoacoustic (PA) and US imaging all in a truly real-time manner. The back-projection (BP) algorithm for PA image reconstruction is optimized to reduce the computational cost and facilitate parallel computation on a state of the art graphics processing unit (GPU) card. For the first time, PAT and US imaging of the same object can be conducted simultaneously and continuously, at a real-time frame rate, presently limited by the laser repetition rate of 10 Hz. Noninvasive PAT and US imaging of human peripheral joints in vivo were achieved, demonstrating the satisfactory image quality realized with this system. Another experiment, simultaneous PAT and US imaging of contrast agent flowing through an artificial vessel, was conducted to verify the performance of this system for imaging fast biological events. The GPU-based image reconstruction software code for this dual-modality system is open source and available for download from http://sourceforge.net/projects/patrealtime.
Accelerated Molecular Dynamics Simulations with the AMOEBA Polarizable Force Field on Graphics Processing Units

PubMed Central

2013-01-01

The accelerated molecular dynamics (aMD) method has recently been shown to enhance the sampling of biomolecules in molecular dynamics (MD) simulations, often by several orders of magnitude. Here, we describe an implementation of the aMD method for the OpenMM application layer that takes full advantage of graphics processing units (GPUs) computing. The aMD method is shown to work in combination with the AMOEBA polarizable force field (AMOEBA-aMD), allowing the simulation of long time-scale events with a polarizable force field. Benchmarks are provided to show that the AMOEBA-aMD method is efficiently implemented and produces accurate results in its standard parametrization. For the BPTI protein, we demonstrate that the protein structure described with AMOEBA remains stable even on the extended time scales accessed at high levels of accelerations. For the DNA repair metalloenzyme endonuclease IV, we show that the use of the AMOEBA force field is a significant improvement over fixed charged models for describing the enzyme active-site. The new AMOEBA-aMD method is publicly available (http://wiki.simtk.org/openmm/VirtualRepository) and promises to be interesting for studying complex systems that can benefit from both the use of a polarizable force field and enhanced sampling. PMID:24634618
A GPU-based incompressible Navier-Stokes solver on moving overset grids

NASA Astrophysics Data System (ADS)

Chandar, Dominic D. J.; Sitaraman, Jayanarayanan; Mavriplis, Dimitri J.

2013-07-01

In pursuit of obtaining high fidelity solutions to the fluid flow equations in a short span of time, graphics processing units (GPUs) which were originally intended for gaming applications are currently being used to accelerate computational fluid dynamics (CFD) codes. With a high peak throughput of about 1 TFLOPS on a PC, GPUs seem to be favourable for many high-resolution computations. One such computation that involves a lot of number crunching is computing time accurate flow solutions past moving bodies. The aim of the present paper is thus to discuss the development of a flow solver on unstructured and overset grids and its implementation on GPUs. In its present form, the flow solver solves the incompressible fluid flow equations on unstructured/hybrid/overset grids using a fully implicit projection method. The resulting discretised equations are solved using a matrix-free Krylov solver using several GPU kernels such as gradient, Laplacian and reduction. Some of the simple arithmetic vector calculations are implemented using the CU++: An Object Oriented Framework for Computational Fluid Dynamics Applications using Graphics Processing Units, Journal of Supercomputing, 2013, doi:10.1007/s11227-013-0985-9 approach where GPU kernels are automatically generated at compile time. Results are presented for two- and three-dimensional computations on static and moving grids.
Accelerating electrostatic surface potential calculation with multi-scale approximation on graphics processing units.

PubMed

Anandakrishnan, Ramu; Scogland, Tom R W; Fenley, Andrew T; Gordon, John C; Feng, Wu-chun; Onufriev, Alexey V

2010-06-01

Tools that compute and visualize biomolecular electrostatic surface potential have been used extensively for studying biomolecular function. However, determining the surface potential for large biomolecules on a typical desktop computer can take days or longer using currently available tools and methods. Two commonly used techniques to speed-up these types of electrostatic computations are approximations based on multi-scale coarse-graining and parallelization across multiple processors. This paper demonstrates that for the computation of electrostatic surface potential, these two techniques can be combined to deliver significantly greater speed-up than either one separately, something that is in general not always possible. Specifically, the electrostatic potential computation, using an analytical linearized Poisson-Boltzmann (ALPB) method, is approximated using the hierarchical charge partitioning (HCP) multi-scale method, and parallelized on an ATI Radeon 4870 graphical processing unit (GPU). The implementation delivers a combined 934-fold speed-up for a 476,040 atom viral capsid, compared to an equivalent non-parallel implementation on an Intel E6550 CPU without the approximation. This speed-up is significantly greater than the 42-fold speed-up for the HCP approximation alone or the 182-fold speed-up for the GPU alone. Copyright (c) 2010 Elsevier Inc. All rights reserved.
Tunable, mixed-resolution modeling using library-based Monte Carlo and graphics processing units

PubMed Central

Mamonov, Artem B.; Lettieri, Steven; Ding, Ying; Sarver, Jessica L.; Palli, Rohith; Cunningham, Timothy F.; Saxena, Sunil; Zuckerman, Daniel M.

2012-01-01

Building on our recently introduced library-based Monte Carlo (LBMC) approach, we describe a flexible protocol for mixed coarse-grained (CG)/all-atom (AA) simulation of proteins and ligands. In the present implementation of LBMC, protein side chain configurations are pre-calculated and stored in libraries, while bonded interactions along the backbone are treated explicitly. Because the AA side chain coordinates are maintained at minimal run-time cost, arbitrary sites and interaction terms can be turned on to create mixed-resolution models. For example, an AA region of interest such as a binding site can be coupled to a CG model for the rest of the protein. We have additionally developed a hybrid implementation of the generalized Born/surface area (GBSA) implicit solvent model suitable for mixed-resolution models, which in turn was ported to a graphics processing unit (GPU) for faster calculation. The new software was applied to study two systems: (i) the behavior of spin labels on the B1 domain of protein G (GB1) and (ii) docking of randomly initialized estradiol configurations to the ligand binding domain of the estrogen receptor (ERα). The performance of the GPU version of the code was also benchmarked in a number of additional systems. PMID:23162384
Acceleration of spiking neural network based pattern recognition on NVIDIA graphics processors.

PubMed

Han, Bing; Taha, Tarek M

2010-04-01

There is currently a strong push in the research community to develop biological scale implementations of neuron based vision models. Systems at this scale are computationally demanding and generally utilize more accurate neuron models, such as the Izhikevich and the Hodgkin-Huxley models, in favor of the more popular integrate and fire model. We examine the feasibility of using graphics processing units (GPUs) to accelerate a spiking neural network based character recognition network to enable such large scale systems. Two versions of the network utilizing the Izhikevich and Hodgkin-Huxley models are implemented. Three NVIDIA general-purpose (GP) GPU platforms are examined, including the GeForce 9800 GX2, the Tesla C1060, and the Tesla S1070. Our results show that the GPGPUs can provide significant speedup over conventional processors. In particular, the fastest GPGPU utilized, the Tesla S1070, provided a speedup of 5.6 and 84.4 over highly optimized implementations on the fastest central processing unit (CPU) tested, a quadcore 2.67 GHz Xeon processor, for the Izhikevich and the Hodgkin-Huxley models, respectively. The CPU implementation utilized all four cores and the vector data parallelism offered by the processor. The results indicate that GPUs are well suited for this application domain.
Accelerating large-scale protein structure alignments with graphics processing units

PubMed Central

2012-01-01

Background Large-scale protein structure alignment, an indispensable tool to structural bioinformatics, poses a tremendous challenge on computational resources. To ensure structure alignment accuracy and efficiency, efforts have been made to parallelize traditional alignment algorithms in grid environments. However, these solutions are costly and of limited accessibility. Others trade alignment quality for speedup by using high-level characteristics of structure fragments for structure comparisons. Findings We present ppsAlign, a parallel protein structure Alignment framework designed and optimized to exploit the parallelism of Graphics Processing Units (GPUs). As a general-purpose GPU platform, ppsAlign could take many concurrent methods, such as TM-align and Fr-TM-align, into the parallelized algorithm design. We evaluated ppsAlign on an NVIDIA Tesla C2050 GPU card, and compared it with existing software solutions running on an AMD dual-core CPU. We observed a 36-fold speedup over TM-align, a 65-fold speedup over Fr-TM-align, and a 40-fold speedup over MAMMOTH. Conclusions ppsAlign is a high-performance protein structure alignment tool designed to tackle the computational complexity issues from protein structural data. The solution presented in this paper allows large-scale structure comparisons to be performed using massive parallel computing power of GPU. PMID:22357132

Multi-Dimensional, Mesoscopic Monte Carlo Simulations of Inhomogeneous Reaction-Drift-Diffusion Systems on Graphics-Processing Units

PubMed Central

Vigelius, Matthias; Meyer, Bernd

2012-01-01

For many biological applications, a macroscopic (deterministic) treatment of reaction-drift-diffusion systems is insufficient. Instead, one has to properly handle the stochastic nature of the problem and generate true sample paths of the underlying probability distribution. Unfortunately, stochastic algorithms are computationally expensive and, in most cases, the large number of participating particles renders the relevant parameter regimes inaccessible. In an attempt to address this problem we present a genuine stochastic, multi-dimensional algorithm that solves the inhomogeneous, non-linear, drift-diffusion problem on a mesoscopic level. Our method improves on existing implementations in being multi-dimensional and handling inhomogeneous drift and diffusion. The algorithm is well suited for an implementation on data-parallel hardware architectures such as general-purpose graphics processing units (GPUs). We integrate the method into an operator-splitting approach that decouples chemical reactions from the spatial evolution. We demonstrate the validity and applicability of our algorithm with a comprehensive suite of standard test problems that also serve to quantify the numerical accuracy of the method. We provide a freely available, fully functional GPU implementation. Integration into Inchman, a user-friendly web service, that allows researchers to perform parallel simulations of reaction-drift-diffusion systems on GPU clusters is underway. PMID:22506001
Real-time image processing for non-contact monitoring of dynamic displacements using smartphone technologies

NASA Astrophysics Data System (ADS)

Min, Jae-Hong; Gelo, Nikolas J.; Jo, Hongki

2016-04-01

The newly developed smartphone application, named RINO, in this study allows measuring absolute dynamic displacements and processing them in real time using state-of-the-art smartphone technologies, such as high-performance graphics processing unit (GPU), in addition to already powerful CPU and memories, embedded high-speed/ resolution camera, and open-source computer vision libraries. A carefully designed color-patterned target and user-adjustable crop filter enable accurate and fast image processing, allowing up to 240fps for complete displacement calculation and real-time display. The performances of the developed smartphone application are experimentally validated, showing comparable accuracy with those of conventional laser displacement sensor.
Graphic Design in Libraries: A Conceptual Process

ERIC Educational Resources Information Center

Ruiz, Miguel

2014-01-01

Providing successful library services requires efficient and effective communication with users; therefore, it is important that content creators who develop visual materials understand key components of design and, specifically, develop a holistic graphic design process. Graphic design, as a form of visual communication, is the process of…
GPU real-time processing in NA62 trigger system

NASA Astrophysics Data System (ADS)

Ammendola, R.; Biagioni, A.; Chiozzi, S.; Cretaro, P.; Di Lorenzo, S.; Fantechi, R.; Fiorini, M.; Frezza, O.; Lamanna, G.; Lo Cicero, F.; Lonardo, A.; Martinelli, M.; Neri, I.; Paolucci, P. S.; Pastorelli, E.; Piandani, R.; Piccini, M.; Pontisso, L.; Rossetti, D.; Simula, F.; Sozzi, M.; Vicini, P.

2017-01-01

A commercial Graphics Processing Unit (GPU) is used to build a fast Level 0 (L0) trigger system tested parasitically with the TDAQ (Trigger and Data Acquisition systems) of the NA62 experiment at CERN. In particular, the parallel computing power of the GPU is exploited to perform real-time fitting in the Ring Imaging CHerenkov (RICH) detector. Direct GPU communication using a FPGA-based board has been used to reduce the data transmission latency. The performance of the system for multi-ring reconstrunction obtained during the NA62 physics run will be presented.
Optimization of Selected Remote Sensing Algorithms for Embedded NVIDIA Kepler GPU Architecture

NASA Technical Reports Server (NTRS)

Riha, Lubomir; Le Moigne, Jacqueline; El-Ghazawi, Tarek

2015-01-01

This paper evaluates the potential of embedded Graphic Processing Units in the Nvidias Tegra K1 for onboard processing. The performance is compared to a general purpose multi-core CPU and full fledge GPU accelerator. This study uses two algorithms: Wavelet Spectral Dimension Reduction of Hyperspectral Imagery and Automated Cloud-Cover Assessment (ACCA) Algorithm. Tegra K1 achieved 51 for ACCA algorithm and 20 for the dimension reduction algorithm, as compared to the performance of the high-end 8-core server Intel Xeon CPU with 13.5 times higher power consumption.
Effect of Message Format and Content on Attitude Accessibility Regarding Sexually Transmitted Infections.

PubMed

Jain, Parul; Hoffman, Eric; Beam, Michael; Xu, Shan Susan

2017-11-01

Sexually transmitted infections (STIs) are widespread in the United States among people ages 15-24 years and cost almost $16 billion yearly. It is therefore important to understand message design strategies that could help reduce these numbers. Guided by exemplification theory and the extended parallel process model (EPPM), this study examines the influence of message format and the presence versus absence of a graphic image on recipients' accessibility of STI attitudes regarding safe sex. Results of the experiment indicate a significant effect from testimonial messages on increased attitude accessibility regarding STIs compared to statistical messages. Results also indicate a conditional indirect effect of testimonial messages on STI attitude accessibility, though threat is greater when a graphic image is included. Implications and directions for future research are discussed.
Computing the Density Matrix in Electronic Structure Theory on Graphics Processing Units.

PubMed

Cawkwell, M J; Sanville, E J; Mniszewski, S M; Niklasson, Anders M N

2012-11-13

The self-consistent solution of a Schrödinger-like equation for the density matrix is a critical and computationally demanding step in quantum-based models of interatomic bonding. This step was tackled historically via the diagonalization of the Hamiltonian. We have investigated the performance and accuracy of the second-order spectral projection (SP2) algorithm for the computation of the density matrix via a recursive expansion of the Fermi operator in a series of generalized matrix-matrix multiplications. We demonstrate that owing to its simplicity, the SP2 algorithm [Niklasson, A. M. N. Phys. Rev. B2002, 66, 155115] is exceptionally well suited to implementation on graphics processing units (GPUs). The performance in double and single precision arithmetic of a hybrid GPU/central processing unit (CPU) and full GPU implementation of the SP2 algorithm exceed those of a CPU-only implementation of the SP2 algorithm and traditional matrix diagonalization when the dimensions of the matrices exceed about 2000 × 2000. Padding schemes for arrays allocated in the GPU memory that optimize the performance of the CUBLAS implementations of the level 3 BLAS DGEMM and SGEMM subroutines for generalized matrix-matrix multiplications are described in detail. The analysis of the relative performance of the hybrid CPU/GPU and full GPU implementations indicate that the transfer of arrays between the GPU and CPU constitutes only a small fraction of the total computation time. The errors measured in the self-consistent density matrices computed using the SP2 algorithm are generally smaller than those measured in matrices computed via diagonalization. Furthermore, the errors in the density matrices computed using the SP2 algorithm do not exhibit any dependence of system size, whereas the errors increase linearly with the number of orbitals when diagonalization is employed.
Study on efficiency of time computation in x-ray imaging simulation base on Monte Carlo algorithm using graphics processing unit

DOE Office of Scientific and Technical Information (OSTI.GOV)

Setiani, Tia Dwi, E-mail: tiadwisetiani@gmail.com; Suprijadi; Nuclear Physics and Biophysics Reaserch Division, Faculty of Mathematics and Natural Sciences, Institut Teknologi Bandung Jalan Ganesha 10 Bandung, 40132

Monte Carlo (MC) is one of the powerful techniques for simulation in x-ray imaging. MC method can simulate the radiation transport within matter with high accuracy and provides a natural way to simulate radiation transport in complex systems. One of the codes based on MC algorithm that are widely used for radiographic images simulation is MC-GPU, a codes developed by Andrea Basal. This study was aimed to investigate the time computation of x-ray imaging simulation in GPU (Graphics Processing Unit) compared to a standard CPU (Central Processing Unit). Furthermore, the effect of physical parameters to the quality of radiographic imagesmore » and the comparison of image quality resulted from simulation in the GPU and CPU are evaluated in this paper. The simulations were run in CPU which was simulated in serial condition, and in two GPU with 384 cores and 2304 cores. In simulation using GPU, each cores calculates one photon, so, a large number of photon were calculated simultaneously. Results show that the time simulations on GPU were significantly accelerated compared to CPU. The simulations on the 2304 core of GPU were performed about 64 -114 times faster than on CPU, while the simulation on the 384 core of GPU were performed about 20 – 31 times faster than in a single core of CPU. Another result shows that optimum quality of images from the simulation was gained at the history start from 10{sup 8} and the energy from 60 Kev to 90 Kev. Analyzed by statistical approach, the quality of GPU and CPU images are relatively the same.« less
Multiprocessor graphics computation and display using transputers

NASA Technical Reports Server (NTRS)

Ellis, Graham K.

1988-01-01

A package of two-dimensional graphics routines was developed to run on a transputer-based parallel processing system. These routines were designed to enable applications programmers to easily generate and display results from the transputer network in a graphic format. The graphics procedures were designed for the lowest possible network communication overhead for increased performance. The routines were designed for ease of use and to present an intuitive approach to generating graphics on the transputer parallel processing system.
Commercial Off-The-Shelf (COTS) Graphics Processing Board (GPB) Radiation Test Evaluation Report

NASA Technical Reports Server (NTRS)

Salazar, George A.; Steele, Glen F.

2013-01-01

Large round trip communications latency for deep space missions will require more onboard computational capabilities to enable the space vehicle to undertake many tasks that have traditionally been ground-based, mission control responsibilities. As a result, visual display graphics will be required to provide simpler vehicle situational awareness through graphical representations, as well as provide capabilities never before done in a space mission, such as augmented reality for in-flight maintenance or Telepresence activities. These capabilities will require graphics processors and associated support electronic components for high computational graphics processing. In an effort to understand the performance of commercial graphics card electronics operating in the expected radiation environment, a preliminary test was performed on five commercial offthe- shelf (COTS) graphics cards. This paper discusses the preliminary evaluation test results of five COTS graphics processing cards tested to the International Space Station (ISS) low earth orbit radiation environment. Three of the five graphics cards were tested to a total dose of 6000 rads (Si). The test articles, test configuration, preliminary results, and recommendations are discussed.
GPU accelerated Monte Carlo simulation of Brownian motors dynamics with CUDA

NASA Astrophysics Data System (ADS)

Spiechowicz, J.; Kostur, M.; Machura, L.

2015-06-01

This work presents an updated and extended guide on methods of a proper acceleration of the Monte Carlo integration of stochastic differential equations with the commonly available NVIDIA Graphics Processing Units using the CUDA programming environment. We outline the general aspects of the scientific computing on graphics cards and demonstrate them with two models of a well known phenomenon of the noise induced transport of Brownian motors in periodic structures. As a source of fluctuations in the considered systems we selected the three most commonly occurring noises: the Gaussian white noise, the white Poissonian noise and the dichotomous process also known as a random telegraph signal. The detailed discussion on various aspects of the applied numerical schemes is also presented. The measured speedup can be of the astonishing order of about 3000 when compared to a typical CPU. This number significantly expands the range of problems solvable by use of stochastic simulations, allowing even an interactive research in some cases.
AMICAL: An aid for architectural synthesis and exploration of control circuits

NASA Astrophysics Data System (ADS)

Park, Inhag

AMICAL is an architectural synthesis system for control flow dominated circuits. A behavioral finite state machine specification, where the scheduling and register allocation were performed, is presented. An abstract architecture specification that may feed existing silicon compilers acting at the logic and register transfer levels is described. AMICAL consists of five main functions allowing automatic, interactive and manual synthesis, as well as the combination of these methods. These functions are a synthesizer, a graphics editor, a verifier, an evaluator, and a documentor. Automatic synthesis is achieved by algorithms that allocate both functional units, stored in an expandable user defined library, and connections. AMICAL also allows the designer to interrupt the synthesis process at any stage and make interactive modifications via a specially designed graphics editor. The user's modifications are verified and evaluated to ensure that no design rules are broken and that any imposed constraints are still met. A documentor provides the designer with status and feedback reports from the synthesis process.
FPGA Implementation of the Coupled Filtering Method and the Affine Warping Method.

PubMed

Zhang, Chen; Liang, Tianzhu; Mok, Philip K T; Yu, Weichuan

2017-07-01

In ultrasound image analysis, the speckle tracking methods are widely applied to study the elasticity of body tissue. However, "feature-motion decorrelation" still remains as a challenge for the speckle tracking methods. Recently, a coupled filtering method and an affine warping method were proposed to accurately estimate strain values, when the tissue deformation is large. The major drawback of these methods is the high computational complexity. Even the graphics processing unit (GPU)-based program requires a long time to finish the analysis. In this paper, we propose field-programmable gate array (FPGA)-based implementations of both methods for further acceleration. The capability of FPGAs on handling different image processing components in these methods is discussed. A fast and memory-saving image warping approach is proposed. The algorithms are reformulated to build a highly efficient pipeline on FPGA. The final implementations on a Xilinx Virtex-7 FPGA are at least 13 times faster than the GPU implementation on the NVIDIA graphic card (GeForce GTX 580).
Development of a Low Cost Graphics Terminal.

ERIC Educational Resources Information Center

Lehr, Ted

1985-01-01

Describes modifications made to expand the capabilities of a display unit (Lear Siegler ADM-3A) to include medium resolution graphics. The modifying circuitry is detailed along with software subroutined written in Z-80 machine language for controlling the video display. (JN)
Accelerating epistasis analysis in human genetics with consumer graphics hardware.

PubMed

Sinnott-Armstrong, Nicholas A; Greene, Casey S; Cancare, Fabio; Moore, Jason H

2009-07-24

Human geneticists are now capable of measuring more than one million DNA sequence variations from across the human genome. The new challenge is to develop computationally feasible methods capable of analyzing these data for associations with common human disease, particularly in the context of epistasis. Epistasis describes the situation where multiple genes interact in a complex non-linear manner to determine an individual's disease risk and is thought to be ubiquitous for common diseases. Multifactor Dimensionality Reduction (MDR) is an algorithm capable of detecting epistasis. An exhaustive analysis with MDR is often computationally expensive, particularly for high order interactions. This challenge has previously been met with parallel computation and expensive hardware. The option we examine here exploits commodity hardware designed for computer graphics. In modern computers Graphics Processing Units (GPUs) have more memory bandwidth and computational capability than Central Processing Units (CPUs) and are well suited to this problem. Advances in the video game industry have led to an economy of scale creating a situation where these powerful components are readily available at very low cost. Here we implement and evaluate the performance of the MDR algorithm on GPUs. Of primary interest are the time required for an epistasis analysis and the price to performance ratio of available solutions. We found that using MDR on GPUs consistently increased performance per machine over both a feature rich Java software package and a C++ cluster implementation. The performance of a GPU workstation running a GPU implementation reduces computation time by a factor of 160 compared to an 8-core workstation running the Java implementation on CPUs. This GPU workstation performs similarly to 150 cores running an optimized C++ implementation on a Beowulf cluster. Furthermore this GPU system provides extremely cost effective performance while leaving the CPU available for other tasks. The GPU workstation containing three GPUs costs $2000 while obtaining similar performance on a Beowulf cluster requires 150 CPU cores which, including the added infrastructure and support cost of the cluster system, cost approximately $82,500. Graphics hardware based computing provides a cost effective means to perform genetic analysis of epistasis using MDR on large datasets without the infrastructure of a computing cluster.
A Graphics Processing Unit Accelerated Motion Correction Algorithm and Modular System for Real-time fMRI

PubMed Central

Scheinost, Dustin; Hampson, Michelle; Qiu, Maolin; Bhawnani, Jitendra; Constable, R. Todd; Papademetris, Xenophon

2013-01-01

Real-time functional magnetic resonance imaging (rt-fMRI) has recently gained interest as a possible means to facilitate the learning of certain behaviors. However, rt-fMRI is limited by processing speed and available software, and continued development is needed for rt-fMRI to progress further and become feasible for clinical use. In this work, we present an open-source rt-fMRI system for biofeedback powered by a novel Graphics Processing Unit (GPU) accelerated motion correction strategy as part of the BioImage Suite project (www.bioimagesuite.org). Our system contributes to the development of rt-fMRI by presenting a motion correction algorithm that provides an estimate of motion with essentially no processing delay as well as a modular rt-fMRI system design. Using empirical data from rt-fMRI scans, we assessed the quality of motion correction in this new system. The present algorithm performed comparably to standard (non real-time) offline methods and outperformed other real-time methods based on zero order interpolation of motion parameters. The modular approach to the rt-fMRI system allows the system to be flexible to the experiment and feedback design, a valuable feature for many applications. We illustrate the flexibility of the system by describing several of our ongoing studies. Our hope is that continuing development of open-source rt-fMRI algorithms and software will make this new technology more accessible and adaptable, and will thereby accelerate its application in the clinical and cognitive neurosciences. PMID:23319241
Shorebird Migration Patterns in Response to Climate Change: A Modeling Approach

NASA Technical Reports Server (NTRS)

Smith, James A.

2010-01-01

The availability of satellite remote sensing observations at multiple spatial and temporal scales, coupled with advances in climate modeling and information technologies offer new opportunities for the application of mechanistic models to predict how continental scale bird migration patterns may change in response to environmental change. In earlier studies, we explored the phenotypic plasticity of a migratory population of Pectoral sandpipers by simulating the movement patterns of an ensemble of 10,000 individual birds in response to changes in stopover locations as an indicator of the impacts of wetland loss and inter-annual variability on the fitness of migratory shorebirds. We used an individual based, biophysical migration model, driven by remotely sensed land surface data, climate data, and biological field data. Mean stop-over durations and stop-over frequency with latitude predicted from our model for nominal cases were consistent with results reported in the literature and available field data. In this study, we take advantage of new computing capabilities enabled by recent GP-GPU computing paradigms and commodity hardware (general purchase computing on graphics processing units). Several aspects of our individual based (agent modeling) approach lend themselves well to GP-GPU computing. We have been able to allocate compute-intensive tasks to the graphics processing units, and now simulate ensembles of 400,000 birds at varying spatial resolutions along the central North American flyway. We are incorporating additional, species specific, mechanistic processes to better reflect the processes underlying bird phenotypic plasticity responses to different climate change scenarios in the central U.S.
A graphics processing unit accelerated motion correction algorithm and modular system for real-time fMRI.

PubMed

Scheinost, Dustin; Hampson, Michelle; Qiu, Maolin; Bhawnani, Jitendra; Constable, R Todd; Papademetris, Xenophon

2013-07-01

Real-time functional magnetic resonance imaging (rt-fMRI) has recently gained interest as a possible means to facilitate the learning of certain behaviors. However, rt-fMRI is limited by processing speed and available software, and continued development is needed for rt-fMRI to progress further and become feasible for clinical use. In this work, we present an open-source rt-fMRI system for biofeedback powered by a novel Graphics Processing Unit (GPU) accelerated motion correction strategy as part of the BioImage Suite project ( www.bioimagesuite.org ). Our system contributes to the development of rt-fMRI by presenting a motion correction algorithm that provides an estimate of motion with essentially no processing delay as well as a modular rt-fMRI system design. Using empirical data from rt-fMRI scans, we assessed the quality of motion correction in this new system. The present algorithm performed comparably to standard (non real-time) offline methods and outperformed other real-time methods based on zero order interpolation of motion parameters. The modular approach to the rt-fMRI system allows the system to be flexible to the experiment and feedback design, a valuable feature for many applications. We illustrate the flexibility of the system by describing several of our ongoing studies. Our hope is that continuing development of open-source rt-fMRI algorithms and software will make this new technology more accessible and adaptable, and will thereby accelerate its application in the clinical and cognitive neurosciences.
Performance evaluation for volumetric segmentation of multiple sclerosis lesions using MATLAB and computing engine in the graphical processing unit (GPU)

NASA Astrophysics Data System (ADS)

Le, Anh H.; Park, Young W.; Ma, Kevin; Jacobs, Colin; Liu, Brent J.

2010-03-01

Multiple Sclerosis (MS) is a progressive neurological disease affecting myelin pathways in the brain. Multiple lesions in the white matter can cause paralysis and severe motor disabilities of the affected patient. To solve the issue of inconsistency and user-dependency in manual lesion measurement of MRI, we have proposed a 3-D automated lesion quantification algorithm to enable objective and efficient lesion volume tracking. The computer-aided detection (CAD) of MS, written in MATLAB, utilizes K-Nearest Neighbors (KNN) method to compute the probability of lesions on a per-voxel basis. Despite the highly optimized algorithm of imaging processing that is used in CAD development, MS CAD integration and evaluation in clinical workflow is technically challenging due to the requirement of high computation rates and memory bandwidth in the recursive nature of the algorithm. In this paper, we present the development and evaluation of using a computing engine in the graphical processing unit (GPU) with MATLAB for segmentation of MS lesions. The paper investigates the utilization of a high-end GPU for parallel computing of KNN in the MATLAB environment to improve algorithm performance. The integration is accomplished using NVIDIA's CUDA developmental toolkit for MATLAB. The results of this study will validate the practicality and effectiveness of the prototype MS CAD in a clinical setting. The GPU method may allow MS CAD to rapidly integrate in an electronic patient record or any disease-centric health care system.
Retinal angiography with real-time speckle variance optical coherence tomography.

PubMed

Xu, Jing; Han, Sherry; Balaratnasingam, Chandrakumar; Mammo, Zaid; Wong, Kevin S K; Lee, Sieun; Cua, Michelle; Young, Mei; Kirker, Andrew; Albiani, David; Forooghian, Farzin; Mackenzie, Paul; Merkur, Andrew; Yu, Dao-Yi; Sarunic, Marinko V

2015-10-01

This report describes a novel, non-invasive and label-free optical imaging technique, speckle variance optical coherence tomography (svOCT), for visualising blood flow within human retinal capillary networks. This imaging system uses a custom-built swept source OCT system operating at a line rate of 100 kHz. Real-time processing and visualisation is implemented on a consumer grade graphics processing unit. To investigate the quality of microvascular detail acquired with this device we compared images of human capillary networks acquired with svOCT and fluorescein angiography. We found that the density of capillary microvasculature acquired with this svOCT device was visibly greater than fluorescein angiography. We also found that this svOCT device had the capacity to generate en face images of distinct capillary networks that are morphologically comparable with previously published histological studies. Finally, we found that this svOCT device has the ability to non-invasively illustrate the common manifestations of diabetic retinopathy and retinal vascular occlusion. The results of this study suggest that graphics processing unit accelerated svOCT has the potential to non-invasively provide useful quantitative information about human retinal capillary networks. Therefore svOCT may have clinical and research applications for the management of retinal microvascular diseases, which are a major cause of visual morbidity worldwide. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

EPIBLASTER-fast exhaustive two-locus epistasis detection strategy using graphical processing units

PubMed Central

Kam-Thong, Tony; Czamara, Darina; Tsuda, Koji; Borgwardt, Karsten; Lewis, Cathryn M; Erhardt-Lehmann, Angelika; Hemmer, Bernhard; Rieckmann, Peter; Daake, Markus; Weber, Frank; Wolf, Christiane; Ziegler, Andreas; Pütz, Benno; Holsboer, Florian; Schölkopf, Bernhard; Müller-Myhsok, Bertram

2011-01-01

Detection of epistatic interaction between loci has been postulated to provide a more in-depth understanding of the complex biological and biochemical pathways underlying human diseases. Studying the interaction between two loci is the natural progression following traditional and well-established single locus analysis. However, the added costs and time duration required for the computation involved have thus far deterred researchers from pursuing a genome-wide analysis of epistasis. In this paper, we propose a method allowing such analysis to be conducted very rapidly. The method, dubbed EPIBLASTER, is applicable to case–control studies and consists of a two-step process in which the difference in Pearson's correlation coefficients is computed between controls and cases across all possible SNP pairs as an indication of significant interaction warranting further analysis. For the subset of interactions deemed potentially significant, a second-stage analysis is performed using the likelihood ratio test from the logistic regression to obtain the P-value for the estimated coefficients of the individual effects and the interaction term. The algorithm is implemented using the parallel computational capability of commercially available graphical processing units to greatly reduce the computation time involved. In the current setup and example data sets (211 cases, 222 controls, 299468 SNPs; and 601 cases, 825 controls, 291095 SNPs), this coefficient evaluation stage can be completed in roughly 1 day. Our method allows for exhaustive and rapid detection of significant SNP pair interactions without imposing significant marginal effects of the single loci involved in the pair. PMID:21150885
Evaluating virtual hosted desktops for graphics-intensive astronomy

NASA Astrophysics Data System (ADS)

Meade, B. F.; Fluke, C. J.

2018-04-01

Visualisation of data is critical to understanding astronomical phenomena. Today, many instruments produce datasets that are too big to be downloaded to a local computer, yet many of the visualisation tools used by astronomers are deployed only on desktop computers. Cloud computing is increasingly used to provide a computation and simulation platform in astronomy, but it also offers great potential as a visualisation platform. Virtual hosted desktops, with graphics processing unit (GPU) acceleration, allow interactive, graphics-intensive desktop applications to operate co-located with astronomy datasets stored in remote data centres. By combining benchmarking and user experience testing, with a cohort of 20 astronomers, we investigate the viability of replacing physical desktop computers with virtual hosted desktops. In our work, we compare two Apple MacBook computers (one old and one new, representing hardware and opposite ends of the useful lifetime) with two virtual hosted desktops: one commercial (Amazon Web Services) and one in a private research cloud (the Australian NeCTAR Research Cloud). For two-dimensional image-based tasks and graphics-intensive three-dimensional operations - typical of astronomy visualisation workflows - we found that benchmarks do not necessarily provide the best indication of performance. When compared to typical laptop computers, virtual hosted desktops can provide a better user experience, even with lower performing graphics cards. We also found that virtual hosted desktops are equally simple to use, provide greater flexibility in choice of configuration, and may actually be a more cost-effective option for typical usage profiles.
A Relational Reasoning Approach to Text-Graphic Processing

ERIC Educational Resources Information Center

Danielson, Robert W.; Sinatra, Gale M.

2017-01-01

We propose that research on text-graphic processing could be strengthened by the inclusion of relational reasoning perspectives. We briefly outline four aspects of relational reasoning: "analogies," "anomalies," "antinomies", and "antitheses". Next, we illustrate how text-graphic researchers have been…
Printing--Graphic Arts--Graphic Communications

ERIC Educational Resources Information Center

Hauenstein, A. Dean

1975-01-01

Recently, "graphic arts" has shifted from printing skills to a conceptual approach of production processes. "Graphic communications" must embrace the total system of communication through graphic media, to serve broad career education purposes; students taught concepts and principles can be flexible and adaptive. The author…
Case Study: Audio-Guided Learning, with Computer Graphics.

ERIC Educational Resources Information Center

Koumi, Jack; Daniels, Judith

1994-01-01

Describes teaching packages which involve the use of audiotape recordings with personal computers in Open University (United Kingdom) mathematics courses. Topics addressed include software development; computer graphics; pedagogic principles for distance education; feedback, including course evaluations and student surveys; and future plans.…
Measuring Cognitive Load in Test Items: Static Graphics versus Animated Graphics

ERIC Educational Resources Information Center

Dindar, M.; Kabakçi Yurdakul, I.; Inan Dönmez, F.

2015-01-01

The majority of multimedia learning studies focus on the use of graphics in learning process but very few of them examine the role of graphics in testing students' knowledge. This study investigates the use of static graphics versus animated graphics in a computer-based English achievement test from a cognitive load theory perspective. Three…
GPU COMPUTING FOR PARTICLE TRACKING

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nishimura, Hiroshi; Song, Kai; Muriki, Krishna

2011-03-25

This is a feasibility study of using a modern Graphics Processing Unit (GPU) to parallelize the accelerator particle tracking code. To demonstrate the massive parallelization features provided by GPU computing, a simplified TracyGPU program is developed for dynamic aperture calculation. Performances, issues, and challenges from introducing GPU are also discussed. General purpose Computation on Graphics Processing Units (GPGPU) bring massive parallel computing capabilities to numerical calculation. However, the unique architecture of GPU requires a comprehensive understanding of the hardware and programming model to be able to well optimize existing applications. In the field of accelerator physics, the dynamic aperture calculationmore » of a storage ring, which is often the most time consuming part of the accelerator modeling and simulation, can benefit from GPU due to its embarrassingly parallel feature, which fits well with the GPU programming model. In this paper, we use the Tesla C2050 GPU which consists of 14 multi-processois (MP) with 32 cores on each MP, therefore a total of 448 cores, to host thousands ot threads dynamically. Thread is a logical execution unit of the program on GPU. In the GPU programming model, threads are grouped into a collection of blocks Within each block, multiple threads share the same code, and up to 48 KB of shared memory. Multiple thread blocks form a grid, which is executed as a GPU kernel. A simplified code that is a subset of Tracy++ [2] is developed to demonstrate the possibility of using GPU to speed up the dynamic aperture calculation by having each thread track a particle.« less
The effect of a graphical interpretation of a statistic trend indicator (Trigg's Tracking Variable) on the detection of simulated changes.

PubMed

Kennedy, R R; Merry, A F

2011-09-01

Anaesthesia involves processing large amounts of information over time. One task of the anaesthetist is to detect substantive changes in physiological variables promptly and reliably. It has been previously demonstrated that a graphical trend display of historical data leads to more rapid detection of such changes. We examined the effect of a graphical indication of the magnitude of Trigg's Tracking Variable, a simple statistically based trend detection algorithm, on the accuracy and latency of the detection of changes in a micro-simulation. Ten anaesthetists each viewed 20 simulations with four variables displayed as the current value with a simple graphical trend display. Values for these variables were generated by a computer model, and updated every second; after a period of stability a change occurred to a new random value at least 10 units from baseline. In 50% of the simulations an indication of the rate of change was given by a five level graphical representation of the value of Trigg's Tracking Variable. Participants were asked to indicate when they thought a change was occurring. Changes were detected 10.9% faster with the trend indicator present (mean 13.1 [SD 3.1] cycles vs 14.6 [SD 3.4] cycles, 95% confidence interval 0.4 to 2.5 cycles, P = 0.013. There was no difference in accuracy of detection (median with trend detection 97% [interquartile range 95 to 100%], without trend detection 100% [98 to 100%]), P = 0.8. We conclude that simple statistical trend detection may speed detection of changes during routine anaesthesia, even when a graphical trend display is present.
Process and representation in graphical displays

NASA Technical Reports Server (NTRS)

Gillan, Douglas J.; Lewis, Robert; Rudisill, Marianne

1990-01-01

How people comprehend graphics is examined. Graphical comprehension involves the cognitive representation of information from a graphic display and the processing strategies that people apply to answer questions about graphics. Research on representation has examined both the features present in a graphic display and the cognitive representation of the graphic. The key features include the physical components of a graph, the relation between the figure and its axes, and the information in the graph. Tests of people's memory for graphs indicate that both the physical and informational aspect of a graph are important in the cognitive representation of a graph. However, the physical (or perceptual) features overshadow the information to a large degree. Processing strategies also involve a perception-information distinction. In order to answer simple questions (e.g., determining the value of a variable, comparing several variables, and determining the mean of a set of variables), people switch between two information processing strategies: (1) an arithmetic, look-up strategy in which they use a graph much like a table, looking up values and performing arithmetic calculations; and (2) a perceptual strategy in which they use the spatial characteristics of the graph to make comparisons and estimations. The user's choice of strategies depends on the task and the characteristics of the graph. A theory of graphic comprehension is presented.
An evaluation of flight path management automation in transport category aircraft

NASA Technical Reports Server (NTRS)

Chandra, D.; Bussolari, S. R.

1991-01-01

A desk-top simulation of a Boeing 757/767 Electronic Flight Instrumentation System (EFIS) and Control Display Unit (CDU) was used in an experiment to compare three modes of communication for the clearance amendment process: standard voice procedures, a textual delivery method, and a graphical delivery method. Eight qualified Boeing 757/767 pilots served as subjects. Each flew nine landing scenarios with three amendments given in each scenario. Both acceptable and unacceptable clearance amendments were presented in order to assess situational awareness. Times for comprehension and execution of the amendment were recorded along with workload ratings, responses to unacceptable amendments, and subjective impressions. The graphical mode was found to be superior in terms of the time measures and subjective ratings. No difference was found between the modes in the ability to detect unacceptable clearances.
Modeling cation/anion-water interactions in functional aluminosilicate structures.

PubMed

Richards, A J; Barnes, P; Collins, D R; Christodoulos, F; Clark, S M

1995-02-01

A need for the computer simulation of hydration/dehydration processes in functional aluminosilicate structures has been noted. Full and realistic simulations of these systems can be somewhat ambitious and require the aid of interactive computer graphics to identify key structural/chemical units, both in the devising of suitable water-ion simulation potentials and in the analysis of hydrogen-bonding schemes in the subsequent simulation studies. In this article, the former is demonstrated by the assembling of a range of essential water-ion potentials. These span the range of formal charges from +4e to -2e, and are evaluated in the context of three types of structure: a porous zeolite, calcium silicate cement, and layered clay. As an example of the latter, the computer graphics output from Monte Carlo computer simulation studies of hydration/dehydration in calcium-zeolite A is presented.
Computer generated hologram from point cloud using graphics processor.

PubMed

Chen, Rick H-Y; Wilkinson, Timothy D

2009-12-20

Computer generated holography is an extremely demanding and complex task when it comes to providing realistic reconstructions with full parallax, occlusion, and shadowing. We present an algorithm designed for data-parallel computing on modern graphics processing units to alleviate the computational burden. We apply Gaussian interpolation to create a continuous surface representation from discrete input object points. The algorithm maintains a potential occluder list for each individual hologram plane sample to keep the number of visibility tests to a minimum. We experimented with two approximations that simplify and accelerate occlusion computation. It is observed that letting several neighboring hologram plane samples share visibility information on object points leads to significantly faster computation without causing noticeable artifacts in the reconstructed images. Computing a reduced sample set via nonuniform sampling is also found to be an effective acceleration technique.
Holographic video at 40 frames per second for 4-million object points.

PubMed

Tsang, Peter; Cheung, W-K; Poon, T-C; Zhou, C

2011-08-01

We propose a fast method for generating digital Fresnel holograms based on an interpolated wavefront-recording plane (IWRP) approach. Our method can be divided into two stages. First, a small, virtual IWRP is derived in a computational-free manner. Second, the IWRP is expanded into a Fresnel hologram with a pair of fast Fourier transform processes, which are realized with the graphic processing unit (GPU). We demonstrate state-of-the-art experimental results, capable of generating a 2048 x 2048 Fresnel hologram of around 4 × 10(6) object points at a rate of over 40 frames per second.
Ultrafast electron diffraction pattern simulations using GPU technology. Applications to lattice vibrations.

PubMed

Eggeman, A S; London, A; Midgley, P A

2013-11-01

Graphical processing units (GPUs) offer a cost-effective and powerful means to enhance the processing power of computers. Here we show how GPUs can greatly increase the speed of electron diffraction pattern simulations by the implementation of a novel method to generate the phase grating used in multislice calculations. The increase in speed is especially apparent when using large supercell arrays and we illustrate the benefits of fast encoding the transmission function representing the atomic potentials through the simulation of thermal diffuse scattering in silicon brought about by specific vibrational modes. © 2013 Elsevier B.V. All rights reserved.
Development of a Chemically Reacting Flow Solver on the Graphic Processing Units

DTIC Science & Technology

2011-05-10

been implemented on the GPU by Schive et al. (2010). The outcome of their work is the GAMER code for astrophysical simulation. Thibault and...Euler equations at each cell. For simplification, consider the Euler equations in one dimension with no source terms; the discretized form of the...is known to be more diffusive than the other fluxes due to the large bound of the numerical signal velocities: b+, b-. 3.4 Time Marching Methods
Aspects of Industrial Water Treatment.

DTIC Science & Technology

1978-02-01

Biochemical oxygen demand, Nitrate (as I) Cyanide, total 5- d (DOD 5 ) Nitr$te (a N) Cyanide amenable to i Chealc€l oxygen deuma! (COD)) chlorination...for Industrial ProcessWaters * o s .o. . o , ,,. . . , s o , , , , 26 Is CON~TfNTS (Cont’dI TAMLE (Cont’ d ) 10 Limiting Concentration Ranges...the United States are graphically presented in Figures I and 2. D . Z C O’ M ITUMMS OJT WATM U 1. Untreated feedwater can cause numerous problems in
Multiscale Numerical Methods for Non-Equilibrium Plasma

DTIC Science & Technology

2015-08-01

current paper reports on the implementation of a numerical solver on the Graphic Processing Units (GPUs) to model reactive gas mixtures with detailed...Governing equations The flow ismodeled as amixture of gas specieswhile neglecting viscous effects. The chemical reactions taken place between the gas ...components are to be modeled in great detail. The set of the Euler equations for a reactive gas mixture can be written as: ∂Q ∂t + ∇ · F̄ = Ω̇ (1) where Q
Accelerating NBODY6 with graphics processing units

NASA Astrophysics Data System (ADS)

Nitadori, Keigo; Aarseth, Sverre J.

2012-07-01

We describe the use of graphics processing units (GPUs) for speeding up the code NBODY6 which is widely used for direct N-body simulations. Over the years, the N2 nature of the direct force calculation has proved a barrier for extending the particle number. Following an early introduction of force polynomials and individual time steps, the calculation cost was first reduced by the introduction of a neighbour scheme. After a decade of GRAPE computers which speeded up the force calculation further, we are now in the era of GPUs where relatively small hardware systems are highly cost effective. A significant gain in efficiency is achieved by employing the GPU to obtain the so-called regular force which typically involves some 99 per cent of the particles, while the remaining local forces are evaluated on the host. However, the latter operation is performed up to 20 times more frequently and may still account for a significant cost. This effort is reduced by parallel SSE/AVX procedures where each interaction term is calculated using mainly single precision. We also discuss further strategies connected with coordinate and velocity prediction required by the integration scheme. This leaves hard binaries and multiple close encounters which are treated by several regularization methods. The present NBODY6-GPU code is well balanced for simulations in the particle range 104-2 × 105 for a dual-GPU system attached to a standard PC.
Atomic orbital-based SOS-MP2 with tensor hypercontraction. II. Local tensor hypercontraction

NASA Astrophysics Data System (ADS)

Song, Chenchen; Martínez, Todd J.

2017-01-01

In the first paper of the series [Paper I, C. Song and T. J. Martinez, J. Chem. Phys. 144, 174111 (2016)], we showed how tensor-hypercontracted (THC) SOS-MP2 could be accelerated by exploiting sparsity in the atomic orbitals and using graphical processing units (GPUs). This reduced the formal scaling of the SOS-MP2 energy calculation to cubic with respect to system size. The computational bottleneck then becomes the THC metric matrix inversion, which scales cubically with a large prefactor. In this work, the local THC approximation is proposed to reduce the computational cost of inverting the THC metric matrix to linear scaling with respect to molecular size. By doing so, we have removed the primary bottleneck to THC-SOS-MP2 calculations on large molecules with O(1000) atoms. The errors introduced by the local THC approximation are less than 0.6 kcal/mol for molecules with up to 200 atoms and 3300 basis functions. Together with the graphical processing unit techniques and locality-exploiting approaches introduced in previous work, the scaled opposite spin MP2 (SOS-MP2) calculations exhibit O(N2.5) scaling in practice up to 10 000 basis functions. The new algorithms make it feasible to carry out SOS-MP2 calculations on small proteins like ubiquitin (1231 atoms/10 294 atomic basis functions) on a single node in less than a day.
Atomic orbital-based SOS-MP2 with tensor hypercontraction. II. Local tensor hypercontraction.

PubMed

Song, Chenchen; Martínez, Todd J

2017-01-21

In the first paper of the series [Paper I, C. Song and T. J. Martinez, J. Chem. Phys. 144, 174111 (2016)], we showed how tensor-hypercontracted (THC) SOS-MP2 could be accelerated by exploiting sparsity in the atomic orbitals and using graphical processing units (GPUs). This reduced the formal scaling of the SOS-MP2 energy calculation to cubic with respect to system size. The computational bottleneck then becomes the THC metric matrix inversion, which scales cubically with a large prefactor. In this work, the local THC approximation is proposed to reduce the computational cost of inverting the THC metric matrix to linear scaling with respect to molecular size. By doing so, we have removed the primary bottleneck to THC-SOS-MP2 calculations on large molecules with O(1000) atoms. The errors introduced by the local THC approximation are less than 0.6 kcal/mol for molecules with up to 200 atoms and 3300 basis functions. Together with the graphical processing unit techniques and locality-exploiting approaches introduced in previous work, the scaled opposite spin MP2 (SOS-MP2) calculations exhibit O(N 2.5 ) scaling in practice up to 10 000 basis functions. The new algorithms make it feasible to carry out SOS-MP2 calculations on small proteins like ubiquitin (1231 atoms/10 294 atomic basis functions) on a single node in less than a day.

Efficient gaussian density formulation of volume and surface areas of macromolecules on graphical processing units.

PubMed

Zhang, Baofeng; Kilburg, Denise; Eastman, Peter; Pande, Vijay S; Gallicchio, Emilio

2017-04-15

We present an algorithm to efficiently compute accurate volumes and surface areas of macromolecules on graphical processing unit (GPU) devices using an analytic model which represents atomic volumes by continuous Gaussian densities. The volume of the molecule is expressed by means of the inclusion-exclusion formula, which is based on the summation of overlap integrals among multiple atomic densities. The surface area of the molecule is obtained by differentiation of the molecular volume with respect to atomic radii. The many-body nature of the model makes a port to GPU devices challenging. To our knowledge, this is the first reported full implementation of this model on GPU hardware. To accomplish this, we have used recursive strategies to construct the tree of overlaps and to accumulate volumes and their gradients on the tree data structures so as to minimize memory contention. The algorithm is used in the formulation of a surface area-based non-polar implicit solvent model implemented as an open source plug-in (named GaussVol) for the popular OpenMM library for molecular mechanics modeling. GaussVol is 50 to 100 times faster than our best optimized implementation for the CPUs, achieving speeds in excess of 100 ns/day with 1 fs time-step for protein-sized systems on commodity GPUs. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Leveling the Playing Field: Graphical Aids on Mathematics Tests

ERIC Educational Resources Information Center

Jiménez, Albert M.; Nixon, Casey B.; Zepeda, Sally J.

2017-01-01

This research suggests that structural accommodation can be implemented during the construction phase of standardized mathematics examinations. Data from a racially diverse district in the United States are used to compare student performance on questions with and without graphical aids. Findings suggest that mathematics questions possessing…
Interactive brain shift compensation using GPU based programming

NASA Astrophysics Data System (ADS)

van der Steen, Sander; Noordmans, Herke Jan; Verdaasdonk, Rudolf

2009-02-01

Processing large images files or real-time video streams requires intense computational power. Driven by the gaming industry, the processing power of graphic process units (GPUs) has increased significantly. With the pixel shader model 4.0 the GPU can be used for image processing 10x faster than the CPU. Dedicated software was developed to deform 3D MR and CT image sets for real-time brain shift correction during navigated neurosurgery using landmarks or cortical surface traces defined by the navigation pointer. Feedback was given using orthogonal slices and an interactively raytraced 3D brain image. GPU based programming enables real-time processing of high definition image datasets and various applications can be developed in medicine, optics and image sciences.
Process and representation in graphical displays

NASA Technical Reports Server (NTRS)

Gillan, Douglas J.; Lewis, Robert; Rudisill, Marianne

1993-01-01

Our initial model of graphic comprehension has focused on statistical graphs. Like other models of human-computer interaction, models of graphical comprehension can be used by human-computer interface designers and developers to create interfaces that present information in an efficient and usable manner. Our investigation of graph comprehension addresses two primary questions: how do people represent the information contained in a data graph?; and how do they process information from the graph? The topics of focus for graphic representation concern the features into which people decompose a graph and the representations of the graph in memory. The issue of processing can be further analyzed as two questions: what overall processing strategies do people use?; and what are the specific processing skills required?
Integration of rocket turbine design and analysis through computer graphics

NASA Technical Reports Server (NTRS)

Hsu, Wayne; Boynton, Jim

1988-01-01

An interactive approach with engineering computer graphics is used to integrate the design and analysis processes of a rocket engine turbine into a progressive and iterative design procedure. The processes are interconnected through pre- and postprocessors. The graphics are used to generate the blade profiles, their stacking, finite element generation, and analysis presentation through color graphics. Steps of the design process discussed include pitch-line design, axisymmetric hub-to-tip meridional design, and quasi-three-dimensional analysis. The viscous two- and three-dimensional analysis codes are executed after acceptable designs are achieved and estimates of initial losses are confirmed.
Multiple Learning Strategies Project. Graphics. EMI.

ERIC Educational Resources Information Center

Steinberg, Alan; And Others

This instructional package, designed for educable mentally impaired students, focuses on the vocational area of graphics. Contained in this document are nine learning modules organized into a finishing and bindery unit. Maintenance of a Challenge power cutter, operation of a hand electric stapler, and packaging with kraft paper are examples of…
Development of digital reconstructed radiography software at new treatment facility for carbon-ion beam scanning of National Institute of Radiological Sciences.

PubMed

Mori, Shinichiro; Inaniwa, Taku; Kumagai, Motoki; Kuwae, Tsunekazu; Matsuzaki, Yuka; Furukawa, Takuji; Shirai, Toshiyuki; Noda, Koji

2012-06-01

To increase the accuracy of carbon ion beam scanning therapy, we have developed a graphical user interface-based digitally-reconstructed radiograph (DRR) software system for use in routine clinical practice at our center. The DRR software is used in particular scenarios in the new treatment facility to achieve the same level of geometrical accuracy at the treatment as at the imaging session. DRR calculation is implemented simply as the summation of CT image voxel values along the X-ray projection ray. Since we implemented graphics processing unit-based computation, the DRR images are calculated with a speed sufficient for the particular clinical practice requirements. Since high spatial resolution flat panel detector (FPD) images should be registered to the reference DRR images in patient setup process in any scenarios, the DRR images also needs higher spatial resolution close to that of FPD images. To overcome the limitation of the CT spatial resolution imposed by the CT voxel size, we applied image processing to improve the calculated DRR spatial resolution. The DRR software introduced here enabled patient positioning with sufficient accuracy for the implementation of carbon-ion beam scanning therapy at our center.
Nanoparticles in Constanta-North Wastewater Treatment Plant

NASA Astrophysics Data System (ADS)

Panaitescu, I. M.; Panaitescu, Fanel-Viorel L.; Panaitescu, Ileana-Irina F. V.

2015-02-01

In this paper we describe the route of the nanoparticles in the WWTP and demonstrate how to use the simulation flow sensitivity analysis within STOATTM program to evaluate the effect of variation of the constant, "k" in the equation v= kCh settling on fixed concentration of nanoparticles in sewage water from a primary tank of physical-biological stage. Wastewater treatment facilities are designed to remove conventional pollutants from sanitary waste. Major processes of treatment includes: a) physical treatment-remove suspended large solids by settling or sedimentation and eliminate floating greases; b) biological treatment-degradation or consumption of the dissolved organic matter using the means of cultivated in activated sludge or the trickling filters; c) chemical treatment-remove other matters by the means of chemical addition or destroying pathogenic organisms through disinfection; d) advanced treatment- removing specific constituents using processes such as activated carbon, membrane separation, or ion exchange. Particular treatment processes are: a) sedimentation; b) coagulation and flocculation; c) activated sludge; d) sand filters; e) membrane separation; f) disinfection. Methods are: 1) using the STOATTM program with input and output data for primary tank and parameters of wastewater. 2) generating a data file for influent using a sinusoidal model and we accepted defaults STOATTM data. 3) After this, getting spreadsheet data for various characteristics of wastewater for 48 hours:flow, temperature, pH, volatile fatty acids, soluble BOD, COD inert soluble particulate BOD, COD inert particles, volatile solids, volatile solids, ammonia, nitrate and soluble organic nitrogen. Findings and Results:1.Graphics after 48 hour;. 2.Graphics for parameters - flow,temperature, pH/units hours; 3.Graphics of nanoparticles; 4. Graphics of others volatile and non-volatile solids; 5. Timeseries data and summary statistics. Biodegradation of nanoparticles is the breakdown of organic molecules that may cause changes in the physical structure or the surface characteristic of the material.
Graphic Design Is Not a Medium.

ERIC Educational Resources Information Center

Gruber, John Edward, Jr.

2001-01-01

Discusses graphic design and reviews its development from analog processes to a digital tool with the use of computers. Topics include graphical user interfaces; the need for visual communication concepts; transmedia as opposed to repurposing; and graphic design instruction in higher education. (LRW)
Analysis of Interactive Graphics Display Equipment for an Automated Photo Interpretation System.

DTIC Science & Technology

1982-06-01

System provides the hardware and software for a range of graphics processor tasks. The IMAGE System employs the RSX- II M real - time operating . system in...One hard copy unit serves up to four work stations. The executive program of the IMAGE system is the DEC RSX- 11 M real - time operating system . In...picture controller. The PDP 11/34 executes programs concurrently under the RSX- I IM real - time operating system . Each graphics program consists of a
Sop-GPU: accelerating biomolecular simulations in the centisecond timescale using graphics processors.

PubMed

Zhmurov, A; Dima, R I; Kholodov, Y; Barsegov, V

2010-11-01

Theoretical exploration of fundamental biological processes involving the forced unraveling of multimeric proteins, the sliding motion in protein fibers and the mechanical deformation of biomolecular assemblies under physiological force loads is challenging even for distributed computing systems. Using a C(α)-based coarse-grained self organized polymer (SOP) model, we implemented the Langevin simulations of proteins on graphics processing units (SOP-GPU program). We assessed the computational performance of an end-to-end application of the program, where all the steps of the algorithm are running on a GPU, by profiling the simulation time and memory usage for a number of test systems. The ∼90-fold computational speedup on a GPU, compared with an optimized central processing unit program, enabled us to follow the dynamics in the centisecond timescale, and to obtain the force-extension profiles using experimental pulling speeds (v(f) = 1-10 μm/s) employed in atomic force microscopy and in optical tweezers-based dynamic force spectroscopy. We found that the mechanical molecular response critically depends on the conditions of force application and that the kinetics and pathways for unfolding change drastically even upon a modest 10-fold increase in v(f). This implies that, to resolve accurately the free energy landscape and to relate the results of single-molecule experiments in vitro and in silico, molecular simulations should be carried out under the experimentally relevant force loads. This can be accomplished in reasonable wall-clock time for biomolecules of size as large as 10(5) residues using the SOP-GPU package. © 2010 Wiley-Liss, Inc.
Fast generation of computer-generated hologram by graphics processing unit

NASA Astrophysics Data System (ADS)

Matsuda, Sho; Fujii, Tomohiko; Yamaguchi, Takeshi; Yoshikawa, Hiroshi

2009-02-01

A cylindrical hologram is well known to be viewable in 360 deg. This hologram depends high pixel resolution.Therefore, Computer-Generated Cylindrical Hologram (CGCH) requires huge calculation amount.In our previous research, we used look-up table method for fast calculation with Intel Pentium4 2.8 GHz.It took 480 hours to calculate high resolution CGCH (504,000 x 63,000 pixels and the average number of object points are 27,000).To improve quality of CGCH reconstructed image, fringe pattern requires higher spatial frequency and resolution.Therefore, to increase the calculation speed, we have to change the calculation method. In this paper, to reduce the calculation time of CGCH (912,000 x 108,000 pixels), we employ Graphics Processing Unit (GPU).It took 4,406 hours to calculate high resolution CGCH on Xeon 3.4 GHz.Since GPU has many streaming processors and a parallel processing structure, GPU works as the high performance parallel processor.In addition, GPU gives max performance to 2 dimensional data and streaming data.Recently, GPU can be utilized for the general purpose (GPGPU).For example, NVIDIA's GeForce7 series became a programmable processor with Cg programming language.Next GeForce8 series have CUDA as software development kit made by NVIDIA.Theoretically, calculation ability of GPU is announced as 500 GFLOPS. From the experimental result, we have achieved that 47 times faster calculation compared with our previous work which used CPU.Therefore, CGCH can be generated in 95 hours.So, total time is 110 hours to calculate and print the CGCH.
Graphics processing unit (GPU) real-time infrared scene generation

NASA Astrophysics Data System (ADS)

Christie, Chad L.; Gouthas, Efthimios (Themie); Williams, Owen M.

2007-04-01

VIRSuite, the GPU-based suite of software tools developed at DSTO for real-time infrared scene generation, is described. The tools include the painting of scene objects with radiometrically-associated colours, translucent object generation, polar plot validation and versatile scene generation. Special features include radiometric scaling within the GPU and the presence of zoom anti-aliasing at the core of VIRSuite. Extension of the zoom anti-aliasing construct to cover target embedding and the treatment of translucent objects is described.
Anvil Forecast Tool in the Advanced Weather Interactive Processing System, Phase II

NASA Technical Reports Server (NTRS)

Barrett, Joe H., III

2008-01-01

Meteorologists from the 45th Weather Squadron (45 WS) and Spaceflight Meteorology Group have identified anvil forecasting as one of their most challenging tasks when predicting the probability of violations of the Lightning Launch Commit Criteria and Space Light Rules. As a result, the Applied Meteorology Unit (AMU) created a graphical overlay tool for the Meteorological Interactive Data Display Systems (MIDDS) to indicate the threat of thunderstorm anvil clouds, using either observed or model forecast winds as input.
Computer applications in scientific balloon quality control

NASA Astrophysics Data System (ADS)

Seely, Loren G.; Smith, Michael S.

Seal defects and seal tensile strength are primary determinants of product quality in scientific balloon manufacturing; they therefore require a unit of quality measure. The availability of inexpensive and powerful data-processing tools can serve as the basis of a quality-trends-discerning analysis of products. The results of one such analysis are presently given in graphic form for use on the production floor. Software descriptions and their sample outputs are presented, together with a summary of overall and long-term effects of these methods on product quality.
Surface features of central North America: a synoptic view from computer graphics

USGS Publications Warehouse

Pike, R.J.

1991-01-01

A digital shaded-relief image of the 48 contiguous United States shows the details of large- and small-scale landforms, including several linear trends. The features faithfully reflect tectonism, continental glaciation, fluvial activity, volcanism, and other surface-shaping events and processes. The new map not only depicts topography accurately and in its true complexity, but does so in one synoptic view that provides a regional context for geologic analysis unobscured by clouds, culture, vegetation, or artistic constraints. -Author
Lattice QCD simulations using the OpenACC platform

NASA Astrophysics Data System (ADS)

Majumdar, Pushan

2016-10-01

In this article we will explore the OpenACC platform for programming Graphics Processing Units (GPUs). The OpenACC platform offers a directive based programming model for GPUs which avoids the detailed data flow control and memory management necessary in a CUDA programming environment. In the OpenACC model, programs can be written in high level languages with OpenMP like directives. We present some examples of QCD simulation codes using OpenACC and discuss their performance on the Fermi and Kepler GPUs.
Reprocessing of multi-channel seismic-reflection data collected in the Beaufort Sea

USGS Publications Warehouse

Agena, W.F.; Lee, Myung W.; Hart, P.E.

2000-01-01

Contained on this set of two CD-ROMs are stacked and migrated multi-channel seismic-reflection data for 65 lines recorded in the Beaufort Sea by the United States Geological Survey in 1977. All data were reprocessed by the USGS using updated processing methods resulting in improved interpretability. Each of the two CD-ROMs contains the following files: 1) 65 files containing the digital seismic data in standard, SEG-Y format; 2) 1 file containing navigation data for the 65 lines in standard SEG-P1 format; 3) an ASCII text file with cross-reference information for relating the sequential trace numbers on each line to cdp numbers and shotpoint numbers; 4) 2 small scale graphic images (stacked and migrated) of a segment of line 722 in Adobe Acrobat (R) PDF format; 5) a graphic image of the location map, generated from the navigation file; 6) PlotSeis, an MS-DOS Application that allows PC users to interactively view the SEG-Y files; 7) a PlotSeis documentation file; and 8) an explanation of the processing used to create the final seismic sections (this document).
Global magnetohydrodynamic simulations on multiple GPUs

NASA Astrophysics Data System (ADS)

Wong, Un-Hong; Wong, Hon-Cheng; Ma, Yonghui

2014-01-01

Global magnetohydrodynamic (MHD) models play the major role in investigating the solar wind-magnetosphere interaction. However, the huge computation requirement in global MHD simulations is also the main problem that needs to be solved. With the recent development of modern graphics processing units (GPUs) and the Compute Unified Device Architecture (CUDA), it is possible to perform global MHD simulations in a more efficient manner. In this paper, we present a global magnetohydrodynamic (MHD) simulator on multiple GPUs using CUDA 4.0 with GPUDirect 2.0. Our implementation is based on the modified leapfrog scheme, which is a combination of the leapfrog scheme and the two-step Lax-Wendroff scheme. GPUDirect 2.0 is used in our implementation to drive multiple GPUs. All data transferring and kernel processing are managed with CUDA 4.0 API instead of using MPI or OpenMP. Performance measurements are made on a multi-GPU system with eight NVIDIA Tesla M2050 (Fermi architecture) graphics cards. These measurements show that our multi-GPU implementation achieves a peak performance of 97.36 GFLOPS in double precision.
Multichannel Networked Phasemeter Readout and Analysis

NASA Technical Reports Server (NTRS)

Edmonds, Karina

2008-01-01

Netmeter software reads a data stream from up to 250 networked phasemeters, synchronizes the data, saves the reduced data to disk (after applying a low-pass filter), and provides a Web server interface for remote control. Unlike older phasemeter software that requires a special, real-time operating system, this program can run on any general-purpose computer. It needs about five percent of the CPU (central processing unit) to process 20 channels because it adds built-in data logging and network-based GUIs (graphical user interfaces) that are implemented in Scalable Vector Graphics (SVG). Netmeter runs on Linux and Windows. It displays the instantaneous displacements measured by several phasemeters at a user-selectable rate, up to 1 kHz. The program monitors the measure and reference channel frequencies. For ease of use, levels of status in Netmeter are color coded: green for normal operation, yellow for network errors, and red for optical misalignment problems. Netmeter includes user-selectable filters up to 4 k samples, and user-selectable averaging windows (after filtering). Before filtering, the program saves raw data to disk using a burst-write technique.

Accelerating atomistic calculations of quantum energy eigenstates on graphic cards

NASA Astrophysics Data System (ADS)

Rodrigues, Walter; Pecchia, A.; Lopez, M.; Auf der Maur, M.; Di Carlo, A.

2014-10-01

Electronic properties of nanoscale materials require the calculation of eigenvalues and eigenvectors of large matrices. This bottleneck can be overcome by parallel computing techniques or the introduction of faster algorithms. In this paper we report a custom implementation of the Lanczos algorithm with simple restart, optimized for graphical processing units (GPUs). The whole algorithm has been developed using CUDA and runs entirely on the GPU, with a specialized implementation that spares memory and reduces at most machine-to-device data transfers. Furthermore parallel distribution over several GPUs has been attained using the standard message passing interface (MPI). Benchmark calculations performed on a GaN/AlGaN wurtzite quantum dot with up to 600,000 atoms are presented. The empirical tight-binding (ETB) model with an sp3d5s∗+spin-orbit parametrization has been used to build the system Hamiltonian (H).
Advanced Architectures for Astrophysical Supercomputing

NASA Astrophysics Data System (ADS)

Barsdell, B. R.; Barnes, D. G.; Fluke, C. J.

2010-12-01

Astronomers have come to rely on the increasing performance of computers to reduce, analyze, simulate and visualize their data. In this environment, faster computation can mean more science outcomes or the opening up of new parameter spaces for investigation. If we are to avoid major issues when implementing codes on advanced architectures, it is important that we have a solid understanding of our algorithms. A recent addition to the high-performance computing scene that highlights this point is the graphics processing unit (GPU). The hardware originally designed for speeding-up graphics rendering in video games is now achieving speed-ups of O(100×) in general-purpose computation - performance that cannot be ignored. We are using a generalized approach, based on the analysis of astronomy algorithms, to identify the optimal problem-types and techniques for taking advantage of both current GPU hardware and future developments in computing architectures.
A Survey Of Techniques for Managing and Leveraging Caches in GPUs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mittal, Sparsh

2014-09-01

Initially introduced as special-purpose accelerators for graphics applications, graphics processing units (GPUs) have now emerged as general purpose computing platforms for a wide range of applications. To address the requirements of these applications, modern GPUs include sizable hardware-managed caches. However, several factors, such as unique architecture of GPU, rise of CPU–GPU heterogeneous computing, etc., demand effective management of caches to achieve high performance and energy efficiency. Recently, several techniques have been proposed for this purpose. In this paper, we survey several architectural and system-level techniques proposed for managing and leveraging GPU caches. We also discuss the importance and challenges ofmore » cache management in GPUs. The aim of this paper is to provide the readers insights into cache management techniques for GPUs and motivate them to propose even better techniques for leveraging the full potential of caches in the GPUs of tomorrow.« less
An Advanced NSSS Integrity Monitoring System for Shin-Kori Nuclear Units 3 and 4

NASA Astrophysics Data System (ADS)

Oh, Yang Gyun; Galin, Scott R.; Lee, Sang Jeong

2010-12-01

The advanced design features of NSSS (Nuclear Steam Supply System) Integrity Monitoring System for Shin-Kori Nuclear Units 3 and 4 are summarized herein. During the overall system design and detailed component design processes, many design improvements have been made for the system. The major design changes are: 1) the application of a common software platform for all subsystems, 2) the implementation of remote access, control and monitoring capabilities, and 3) the equipment redesign and rearrangement that has simplified the system architecture. Changes give an effect on cabinet size, number of cables, cyber-security, graphic user interfaces, and interfaces with other monitoring systems. The system installation and operation for Shin-Kori Nuclear Units 3 and 4 will be more convenient than those for previous Korean nuclear units in view of its remote control capability, automated test functions, improved user interface functions, and much less cabling.
The Effects of Interactive Graphics Analogies on Recall of Concepts in Science

DTIC Science & Technology

1976-08-01

processing , in the Craik and Lockhart sense, were induced by this postlesson condition. 3. The fact that students were able to deal with both...higher scores on a graphics posttest in Experiment III. These results suggest that both shallow and deep processing , in the Craik and Lockhart ...graphics posttest in Experiment III. These results suggest that both shallow and deep processing , in the Cralk and Lockhart sense, were induced by
Graphic Arts: Process Camera, Stripping, and Platemaking. Fourth Edition. Teacher Edition [and] Student Edition.

ERIC Educational Resources Information Center

Multistate Academic and Vocational Curriculum Consortium, Stillwater, OK.

This publication contains both a teacher edition and a student edition of materials for a course in graphic arts that covers the process camera, stripping, and platemaking. The course introduces basic concepts and skills necessary for entry-level employment in a graphic communication occupation. The contents of the materials are tied to measurable…
Teaching to the Type

ERIC Educational Resources Information Center

Tucker, Maggie

2008-01-01

Before she became an art teacher, the author relates that she worked at a graphic design agency and there she learned to fully appreciate typefaces and how they influence messages. In the years that she taught middle school art, the author has incorporated some basics of type design into her graphics unit, along with calligraphy, printmaking, and…
Graphic Communications--Preparatory Area. Book I--Typography and Modern Typesetting. Teacher's Manual.

ERIC Educational Resources Information Center

Hertz, Andrew

Intended for use with a companion student manual, this teacher's guide lists procedures and teaching tips for each unit of a secondary or postsecondary course of study in typography and modern typesetting. Course objectives are listed for developing student skills in the following preparatory functions of the graphic communications industry: copy…
The Graphic Arts and the New Copyright Act.

ERIC Educational Resources Information Center

Overbeck, Wayne

This paper briefly summarizes the Copyright Act recently passed by the United States Congress as it relates to graphic arts and points out that the law ignores the major problem facing that field: the lack of copyright protection for typography and typeface designs. It then explains the reasoning used for denying protection to typography and…
75 FR 61772 - Certain Coated Paper Suitable for High-Quality Print Graphics Using Sheet-Fed Presses From China...

Federal Register 2010, 2011, 2012, 2013, 2014

2010-10-06

... INTERNATIONAL TRADE COMMISSION [Investigation Nos. 701-TA-470-471 and 731-TA-1169-1170 (Final)] Certain Coated Paper Suitable for High-Quality Print Graphics Using Sheet-Fed Presses From China and Indonesia AGENCY: United States International Trade Commission. ACTION: Revised schedule for the subject...
75 FR 54650 - Certain Coated Paper Suitable for High-Quality Print Graphics Using Sheet-Fed Presses From China...

Federal Register 2010, 2011, 2012, 2013, 2014

2010-09-08

... INTERNATIONAL TRADE COMMISSION [Investigation Nos. 701-TA-470-471 and 731-TA-1169-1170 (Final)] Certain Coated Paper Suitable for High-Quality Print Graphics Using Sheet-Fed Presses From China and Indonesia AGENCY: United States International Trade Commission. ACTION: Revised schedule for the subject...
Textbook Graphics and Maps: Keys to Learning.

ERIC Educational Resources Information Center

Danzer, Gerald A.

1980-01-01

Explains how social studies pupils can use an awareness of textbook design to become better students. Suggestions include reproducing the collage on an American history textbook as a large poster for classroom use and directing students to design a graphic unit opener in the same style as the ones in their textbooks. (DB)
Numerical simulation of disperse particle flows on a graphics processing unit

NASA Astrophysics Data System (ADS)

Sierakowski, Adam J.

In both nature and technology, we commonly encounter solid particles being carried within fluid flows, from dust storms to sediment erosion and from food processing to energy generation. The motion of uncountably many particles in highly dynamic flow environments characterizes the tremendous complexity of such phenomena. While methods exist for the full-scale numerical simulation of such systems, current computational capabilities require the simplification of the numerical task with significant approximation using closure models widely recognized as insufficient. There is therefore a fundamental need for the investigation of the underlying physical processes governing these disperse particle flows. In the present work, we develop a new tool based on the Physalis method for the first-principles numerical simulation of thousands of particles (a small fraction of an entire disperse particle flow system) in order to assist in the search for new reduced-order closure models. We discuss numerous enhancements to the efficiency and stability of the Physalis method, which introduces the influence of spherical particles to a fixed-grid incompressible Navier-Stokes flow solver using a local analytic solution to the flow equations. Our first-principles investigation demands the modeling of unresolved length and time scales associated with particle collisions. We introduce a collision model alongside Physalis, incorporating lubrication effects and proposing a new nonlinearly damped Hertzian contact model. By reproducing experimental studies from the literature, we document extensive validation of the methods. We discuss the implementation of our methods for massively parallel computation using a graphics processing unit (GPU). We combine Eulerian grid-based algorithms with Lagrangian particle-based algorithms to achieve computational throughput up to 90 times faster than the legacy implementation of Physalis for a single central processing unit. By avoiding all data communication between the GPU and the host system during the simulation, we utilize with great efficacy the GPU hardware with which many high performance computing systems are currently equipped. We conclude by looking forward to the future of Physalis with multi-GPU parallelization in order to perform resolved disperse flow simulations of more than 100,000 particles and further advance the development of reduced-order closure models.
Accelerating Wright–Fisher Forward Simulations on the Graphics Processing Unit

PubMed Central

Lawrie, David S.

2017-01-01

Forward Wright–Fisher simulations are powerful in their ability to model complex demography and selection scenarios, but suffer from slow execution on the Central Processor Unit (CPU), thus limiting their usefulness. However, the single-locus Wright–Fisher forward algorithm is exceedingly parallelizable, with many steps that are so-called “embarrassingly parallel,” consisting of a vast number of individual computations that are all independent of each other and thus capable of being performed concurrently. The rise of modern Graphics Processing Units (GPUs) and programming languages designed to leverage the inherent parallel nature of these processors have allowed researchers to dramatically speed up many programs that have such high arithmetic intensity and intrinsic concurrency. The presented GPU Optimized Wright–Fisher simulation, or “GO Fish” for short, can be used to simulate arbitrary selection and demographic scenarios while running over 250-fold faster than its serial counterpart on the CPU. Even modest GPU hardware can achieve an impressive speedup of over two orders of magnitude. With simulations so accelerated, one can not only do quick parametric bootstrapping of previously estimated parameters, but also use simulated results to calculate the likelihoods and summary statistics of demographic and selection models against real polymorphism data, all without restricting the demographic and selection scenarios that can be modeled or requiring approximations to the single-locus forward algorithm for efficiency. Further, as many of the parallel programming techniques used in this simulation can be applied to other computationally intensive algorithms important in population genetics, GO Fish serves as an exciting template for future research into accelerating computation in evolution. GO Fish is part of the Parallel PopGen Package available at: http://dl42.github.io/ParallelPopGen/. PMID:28768689
HMI conventions for process control graphics.

PubMed

Pikaar, Ruud N

2012-01-01

Process operators supervise and control complex processes. To enable the operator to do an adequate job, instrumentation and process control engineers need to address several related topics, such as console design, information design, navigation, and alarm management. In process control upgrade projects, usually a 1:1 conversion of existing graphics is proposed. This paper suggests another approach, efficiently leading to a reduced number of new powerful process graphics, supported by a permanent process overview displays. In addition a road map for structuring content (process information) and conventions for the presentation of objects, symbols, and so on, has been developed. The impact of the human factors engineering approach on process control upgrade projects is illustrated by several cases.
Using parallel computing for the display and simulation of the space debris environment

NASA Astrophysics Data System (ADS)

Möckel, M.; Wiedemann, C.; Flegel, S.; Gelhaus, J.; Vörsmann, P.; Klinkrad, H.; Krag, H.

2011-07-01

Parallelism is becoming the leading paradigm in today's computer architectures. In order to take full advantage of this development, new algorithms have to be specifically designed for parallel execution while many old ones have to be upgraded accordingly. One field in which parallel computing has been firmly established for many years is computer graphics. Calculating and displaying three-dimensional computer generated imagery in real time requires complex numerical operations to be performed at high speed on a large number of objects. Since most of these objects can be processed independently, parallel computing is applicable in this field. Modern graphics processing units (GPUs) have become capable of performing millions of matrix and vector operations per second on multiple objects simultaneously. As a side project, a software tool is currently being developed at the Institute of Aerospace Systems that provides an animated, three-dimensional visualization of both actual and simulated space debris objects. Due to the nature of these objects it is possible to process them individually and independently from each other. Therefore, an analytical orbit propagation algorithm has been implemented to run on a GPU. By taking advantage of all its processing power a huge performance increase, compared to its CPU-based counterpart, could be achieved. For several years efforts have been made to harness this computing power for applications other than computer graphics. Software tools for the simulation of space debris are among those that could profit from embracing parallelism. With recently emerged software development tools such as OpenCL it is possible to transfer the new algorithms used in the visualization outside the field of computer graphics and implement them, for example, into the space debris simulation environment. This way they can make use of parallel hardware such as GPUs and Multi-Core-CPUs for faster computation. In this paper the visualization software will be introduced, including a comparison between the serial and the parallel method of orbit propagation. Ways of how to use the benefits of the latter method for space debris simulation will be discussed. An introduction to OpenCL will be given as well as an exemplary algorithm from the field of space debris simulation.
Using parallel computing for the display and simulation of the space debris environment

NASA Astrophysics Data System (ADS)

Moeckel, Marek; Wiedemann, Carsten; Flegel, Sven Kevin; Gelhaus, Johannes; Klinkrad, Heiner; Krag, Holger; Voersmann, Peter

Parallelism is becoming the leading paradigm in today's computer architectures. In order to take full advantage of this development, new algorithms have to be specifically designed for parallel execution while many old ones have to be upgraded accordingly. One field in which parallel computing has been firmly established for many years is computer graphics. Calculating and displaying three-dimensional computer generated imagery in real time requires complex numerical operations to be performed at high speed on a large number of objects. Since most of these objects can be processed independently, parallel computing is applicable in this field. Modern graphics processing units (GPUs) have become capable of performing millions of matrix and vector operations per second on multiple objects simultaneously. As a side project, a software tool is currently being developed at the Institute of Aerospace Systems that provides an animated, three-dimensional visualization of both actual and simulated space debris objects. Due to the nature of these objects it is possible to process them individually and independently from each other. Therefore, an analytical orbit propagation algorithm has been implemented to run on a GPU. By taking advantage of all its processing power a huge performance increase, compared to its CPU-based counterpart, could be achieved. For several years efforts have been made to harness this computing power for applications other than computer graphics. Software tools for the simulation of space debris are among those that could profit from embracing parallelism. With recently emerged software development tools such as OpenCL it is possible to transfer the new algorithms used in the visualization outside the field of computer graphics and implement them, for example, into the space debris simulation environment. This way they can make use of parallel hardware such as GPUs and Multi-Core-CPUs for faster computation. In this paper the visualization software will be introduced, including a comparison between the serial and the parallel method of orbit propagation. Ways of how to use the benefits of the latter method for space debris simulation will be discussed. An introduction of OpenCL will be given as well as an exemplary algorithm from the field of space debris simulation.
Revision of Primary Series Maps

USGS Publications Warehouse

,

2000-01-01

In 1992, the U.S. Geological Survey (USGS) completed a 50-year effort to provide primary series map coverage of the United States. Many of these maps now need to be updated to reflect the construction of new roads and highways and other changes that have taken place over time. The USGS has formulated a graphic revision plan to help keep the primary series maps current. Primary series maps include 1:20,000-scale quadrangles of Puerto Rico, 1:24,000- or 1:25,000-scale quadrangles of the conterminous United States, Hawaii, and U.S. Territories, and 1:63,360-scale quadrangles of Alaska. The revision of primary series maps from new collection sources is accomplished using a variety of processes. The raster revision process combines the scanned content of paper maps with raster updating technologies. The vector revision process involves the automated plotting of updated vector files. Traditional processes use analog stereoplotters and manual scribing instruments on specially coated map separates. The ability to select from or combine these processes increases the efficiency of the National Mapping Division map revision program.
Analysis and Implementation of Particle-to-Particle (P2P) Graphics Processor Unit (GPU) Kernel for Black-Box Adaptive Fast Multipole Method

DTIC Science & Technology

2015-06-01

5110P and 16 dx360M4 nodes each with one NVIDIA Kepler K20M/K40M GPU. Each node contained dual Intel Xeon E5-2670 (Sandy Bridge) central processing...kernel and as such does not employ multiple processors. This work makes use of a single processing core and a single NVIDIA Kepler K40 GK110...bandwidth (2 × 16 slot), 7.877 GFloat/s; Kepler K40 peak, 4,290 × 1 billion floating-point operations (GFLOPs), and 288 GB/s Kepler K40 memory
Write Is Right: Using Graphic Organizers to Improve Student Mathematical Problem Solving

ERIC Educational Resources Information Center

Zollman, Alan

2012-01-01

Teachers have used graphic organizers successfully in teaching the writing process. This paper describes graphic organizers and their potential mathematics benefits for both students and teachers, elucidates a specific graphic organizer adaptation for mathematical problem solving, and discusses results using the "four-corners-and-a-diamond"…

Building Regression Models: The Importance of Graphics.

ERIC Educational Resources Information Center

Dunn, Richard

1989-01-01

Points out reasons for using graphical methods to teach simple and multiple regression analysis. Argues that a graphically oriented approach has considerable pedagogic advantages in the exposition of simple and multiple regression. Shows that graphical methods may play a central role in the process of building regression models. (Author/LS)
Mathematical Creative Activity and the Graphic Calculator

ERIC Educational Resources Information Center

Duda, Janina

2011-01-01

Teaching mathematics using graphic calculators has been an issue of didactic discussions for years. Finding ways in which graphic calculators can enrich the development process of creative activity in mathematically gifted students between the ages of 16-17 is the focus of this article. Research was conducted using graphic calculators with…
Defining Identities through Multiliteracies: EL Teens Narrate Their Immigration Experiences as Graphic Stories

ERIC Educational Resources Information Center

Danzak, Robin L.

2011-01-01

Based on a framework of identity-as-narrative and multiliteracies, this article describes "Graphic Journeys," a multimedia literacy project in which English learners (ELs) in middle school created graphic stories that expressed their families' immigration experiences. The process involved reading graphic novels, journaling, interviewing, and…
Monte Carlo MP2 on Many Graphical Processing Units.

PubMed

Doran, Alexander E; Hirata, So

2016-10-11

In the Monte Carlo second-order many-body perturbation (MC-MP2) method, the long sum-of-product matrix expression of the MP2 energy, whose literal evaluation may be poorly scalable, is recast into a single high-dimensional integral of functions of electron pair coordinates, which is evaluated by the scalable method of Monte Carlo integration. The sampling efficiency is further accelerated by the redundant-walker algorithm, which allows a maximal reuse of electron pairs. Here, a multitude of graphical processing units (GPUs) offers a uniquely ideal platform to expose multilevel parallelism: fine-grain data-parallelism for the redundant-walker algorithm in which millions of threads compute and share orbital amplitudes on each GPU; coarse-grain instruction-parallelism for near-independent Monte Carlo integrations on many GPUs with few and infrequent interprocessor communications. While the efficiency boost by the redundant-walker algorithm on central processing units (CPUs) grows linearly with the number of electron pairs and tends to saturate when the latter exceeds the number of orbitals, on a GPU it grows quadratically before it increases linearly and then eventually saturates at a much larger number of pairs. This is because the orbital constructions are nearly perfectly parallelized on a GPU and thus completed in a near-constant time regardless of the number of pairs. In consequence, an MC-MP2/cc-pVDZ calculation of a benzene dimer is 2700 times faster on 256 GPUs (using 2048 electron pairs) than on two CPUs, each with 8 cores (which can use only up to 256 pairs effectively). We also numerically determine that the cost to achieve a given relative statistical uncertainty in an MC-MP2 energy increases as O(n 3 ) or better with system size n, which may be compared with the O(n 5 ) scaling of the conventional implementation of deterministic MP2. We thus establish the scalability of MC-MP2 with both system and computer sizes.
Full Stokes finite-element modeling of ice sheets using a graphics processing unit

NASA Astrophysics Data System (ADS)

Seddik, H.; Greve, R.

2016-12-01

Thermo-mechanical simulation of ice sheets is an important approach to understand and predict their evolution in a changing climate. For that purpose, higher order (e.g., ISSM, BISICLES) and full Stokes (e.g., Elmer/Ice, http://elmerice.elmerfem.org) models are increasingly used to more accurately model the flow of entire ice sheets. In parallel to this development, the rapidly improving performance and capabilities of Graphics Processing Units (GPUs) allows to efficiently offload more calculations of complex and computationally demanding problems on those devices. Thus, in order to continue the trend of using full Stokes models with greater resolutions, using GPUs should be considered for the implementation of ice sheet models. We developed the GPU-accelerated ice-sheet model Sainō. Sainō is an Elmer (http://www.csc.fi/english/pages/elmer) derivative implemented in Objective-C which solves the full Stokes equations with the finite element method. It uses the standard OpenCL language (http://www.khronos.org/opencl/) to offload the assembly of the finite element matrix on the GPU. A mesh-coloring scheme is used so that elements with the same color (non-sharing nodes) are assembled in parallel on the GPU without the need for synchronization primitives. The current implementation shows that, for the ISMIP-HOM experiment A, during the matrix assembly in double precision with 8000, 87,500 and 252,000 brick elements, Sainō is respectively 2x, 10x and 14x faster than Elmer/Ice (when both models are run on a single processing unit). In single precision, Sainō is even 3x, 20x and 25x faster than Elmer/Ice. A detailed description of the comparative results between Sainō and Elmer/Ice will be presented, and further perspectives in optimization and the limitations of the current implementation.
Robot graphic simulation testbed

NASA Technical Reports Server (NTRS)

Cook, George E.; Sztipanovits, Janos; Biegl, Csaba; Karsai, Gabor; Springfield, James F.

1991-01-01

The objective of this research was twofold. First, the basic capabilities of ROBOSIM (graphical simulation system) were improved and extended by taking advantage of advanced graphic workstation technology and artificial intelligence programming techniques. Second, the scope of the graphic simulation testbed was extended to include general problems of Space Station automation. Hardware support for 3-D graphics and high processing performance make high resolution solid modeling, collision detection, and simulation of structural dynamics computationally feasible. The Space Station is a complex system with many interacting subsystems. Design and testing of automation concepts demand modeling of the affected processes, their interactions, and that of the proposed control systems. The automation testbed was designed to facilitate studies in Space Station automation concepts.
Waterborne Commerce of the United States, calendar year 1998. Part 5 : national summaries

DOT National Transportation Integrated Search

2000-03-01

Waterborne Commerce of the United States, WCUS, Part 5 is one of a series of publications which provides statistics on the foreign and domestic waterborne commerce moved on the United States waters. WCUS, Part 5 provides extensive graphic and tabular...
POM.gpu-v1.0: a GPU-based Princeton Ocean Model

NASA Astrophysics Data System (ADS)

Xu, S.; Huang, X.; Oey, L.-Y.; Xu, F.; Fu, H.; Zhang, Y.; Yang, G.

2015-09-01

Graphics processing units (GPUs) are an attractive solution in many scientific applications due to their high performance. However, most existing GPU conversions of climate models use GPUs for only a few computationally intensive regions. In the present study, we redesign the mpiPOM (a parallel version of the Princeton Ocean Model) with GPUs. Specifically, we first convert the model from its original Fortran form to a new Compute Unified Device Architecture C (CUDA-C) code, then we optimize the code on each of the GPUs, the communications between the GPUs, and the I / O between the GPUs and the central processing units (CPUs). We show that the performance of the new model on a workstation containing four GPUs is comparable to that on a powerful cluster with 408 standard CPU cores, and it reduces the energy consumption by a factor of 6.8.
QMCPACK: an open source ab initio quantum Monte Carlo package for the electronic structure of atoms, molecules and solids

NASA Astrophysics Data System (ADS)

Kim, Jeongnim; Baczewski, Andrew D.; Beaudet, Todd D.; Benali, Anouar; Chandler Bennett, M.; Berrill, Mark A.; Blunt, Nick S.; Josué Landinez Borda, Edgar; Casula, Michele; Ceperley, David M.; Chiesa, Simone; Clark, Bryan K.; Clay, Raymond C., III; Delaney, Kris T.; Dewing, Mark; Esler, Kenneth P.; Hao, Hongxia; Heinonen, Olle; Kent, Paul R. C.; Krogel, Jaron T.; Kylänpää, Ilkka; Li, Ying Wai; Lopez, M. Graham; Luo, Ye; Malone, Fionn D.; Martin, Richard M.; Mathuriya, Amrita; McMinis, Jeremy; Melton, Cody A.; Mitas, Lubos; Morales, Miguel A.; Neuscamman, Eric; Parker, William D.; Pineda Flores, Sergio D.; Romero, Nichols A.; Rubenstein, Brenda M.; Shea, Jacqueline A. R.; Shin, Hyeondeok; Shulenburger, Luke; Tillack, Andreas F.; Townsend, Joshua P.; Tubman, Norm M.; Van Der Goetz, Brett; Vincent, Jordan E.; ChangMo Yang, D.; Yang, Yubo; Zhang, Shuai; Zhao, Luning

2018-05-01

QMCPACK is an open source quantum Monte Carlo package for ab initio electronic structure calculations. It supports calculations of metallic and insulating solids, molecules, atoms, and some model Hamiltonians. Implemented real space quantum Monte Carlo algorithms include variational, diffusion, and reptation Monte Carlo. QMCPACK uses Slater–Jastrow type trial wavefunctions in conjunction with a sophisticated optimizer capable of optimizing tens of thousands of parameters. The orbital space auxiliary-field quantum Monte Carlo method is also implemented, enabling cross validation between different highly accurate methods. The code is specifically optimized for calculations with large numbers of electrons on the latest high performance computing architectures, including multicore central processing unit and graphical processing unit systems. We detail the program’s capabilities, outline its structure, and give examples of its use in current research calculations. The package is available at http://qmcpack.org.
QMCPACK: an open source ab initio quantum Monte Carlo package for the electronic structure of atoms, molecules and solids.

PubMed

Kim, Jeongnim; Baczewski, Andrew T; Beaudet, Todd D; Benali, Anouar; Bennett, M Chandler; Berrill, Mark A; Blunt, Nick S; Borda, Edgar Josué Landinez; Casula, Michele; Ceperley, David M; Chiesa, Simone; Clark, Bryan K; Clay, Raymond C; Delaney, Kris T; Dewing, Mark; Esler, Kenneth P; Hao, Hongxia; Heinonen, Olle; Kent, Paul R C; Krogel, Jaron T; Kylänpää, Ilkka; Li, Ying Wai; Lopez, M Graham; Luo, Ye; Malone, Fionn D; Martin, Richard M; Mathuriya, Amrita; McMinis, Jeremy; Melton, Cody A; Mitas, Lubos; Morales, Miguel A; Neuscamman, Eric; Parker, William D; Pineda Flores, Sergio D; Romero, Nichols A; Rubenstein, Brenda M; Shea, Jacqueline A R; Shin, Hyeondeok; Shulenburger, Luke; Tillack, Andreas F; Townsend, Joshua P; Tubman, Norm M; Van Der Goetz, Brett; Vincent, Jordan E; Yang, D ChangMo; Yang, Yubo; Zhang, Shuai; Zhao, Luning

2018-05-16

QMCPACK is an open source quantum Monte Carlo package for ab initio electronic structure calculations. It supports calculations of metallic and insulating solids, molecules, atoms, and some model Hamiltonians. Implemented real space quantum Monte Carlo algorithms include variational, diffusion, and reptation Monte Carlo. QMCPACK uses Slater-Jastrow type trial wavefunctions in conjunction with a sophisticated optimizer capable of optimizing tens of thousands of parameters. The orbital space auxiliary-field quantum Monte Carlo method is also implemented, enabling cross validation between different highly accurate methods. The code is specifically optimized for calculations with large numbers of electrons on the latest high performance computing architectures, including multicore central processing unit and graphical processing unit systems. We detail the program's capabilities, outline its structure, and give examples of its use in current research calculations. The package is available at http://qmcpack.org.
Accelerating adaptive inverse distance weighting interpolation algorithm on a graphics processing unit

PubMed Central

Xu, Liangliang; Xu, Nengxiong

2017-01-01

This paper focuses on designing and implementing parallel adaptive inverse distance weighting (AIDW) interpolation algorithms by using the graphics processing unit (GPU). The AIDW is an improved version of the standard IDW, which can adaptively determine the power parameter according to the data points’ spatial distribution pattern and achieve more accurate predictions than those predicted by IDW. In this paper, we first present two versions of the GPU-accelerated AIDW, i.e. the naive version without profiting from the shared memory and the tiled version taking advantage of the shared memory. We also implement the naive version and the tiled version using two data layouts, structure of arrays and array of aligned structures, on both single and double precision. We then evaluate the performance of parallel AIDW by comparing it with its corresponding serial algorithm on three different machines equipped with the GPUs GT730M, M5000 and K40c. The experimental results indicate that: (i) there is no significant difference in the computational efficiency when different data layouts are employed; (ii) the tiled version is always slightly faster than the naive version; and (iii) on single precision the achieved speed-up can be up to 763 (on the GPU M5000), while on double precision the obtained highest speed-up is 197 (on the GPU K40c). To benefit the community, all source code and testing data related to the presented parallel AIDW algorithm are publicly available. PMID:28989754
Accelerating adaptive inverse distance weighting interpolation algorithm on a graphics processing unit.

PubMed

Mei, Gang; Xu, Liangliang; Xu, Nengxiong

2017-09-01

This paper focuses on designing and implementing parallel adaptive inverse distance weighting (AIDW) interpolation algorithms by using the graphics processing unit (GPU). The AIDW is an improved version of the standard IDW, which can adaptively determine the power parameter according to the data points' spatial distribution pattern and achieve more accurate predictions than those predicted by IDW. In this paper, we first present two versions of the GPU-accelerated AIDW, i.e. the naive version without profiting from the shared memory and the tiled version taking advantage of the shared memory. We also implement the naive version and the tiled version using two data layouts, structure of arrays and array of aligned structures, on both single and double precision. We then evaluate the performance of parallel AIDW by comparing it with its corresponding serial algorithm on three different machines equipped with the GPUs GT730M, M5000 and K40c. The experimental results indicate that: (i) there is no significant difference in the computational efficiency when different data layouts are employed; (ii) the tiled version is always slightly faster than the naive version; and (iii) on single precision the achieved speed-up can be up to 763 (on the GPU M5000), while on double precision the obtained highest speed-up is 197 (on the GPU K40c). To benefit the community, all source code and testing data related to the presented parallel AIDW algorithm are publicly available.
Accelerating cardiac bidomain simulations using graphics processing units.

PubMed

Neic, A; Liebmann, M; Hoetzl, E; Mitchell, L; Vigmond, E J; Haase, G; Plank, G

2012-08-01

Anatomically realistic and biophysically detailed multiscale computer models of the heart are playing an increasingly important role in advancing our understanding of integrated cardiac function in health and disease. Such detailed simulations, however, are computationally vastly demanding, which is a limiting factor for a wider adoption of in-silico modeling. While current trends in high-performance computing (HPC) hardware promise to alleviate this problem, exploiting the potential of such architectures remains challenging since strongly scalable algorithms are necessitated to reduce execution times. Alternatively, acceleration technologies such as graphics processing units (GPUs) are being considered. While the potential of GPUs has been demonstrated in various applications, benefits in the context of bidomain simulations where large sparse linear systems have to be solved in parallel with advanced numerical techniques are less clear. In this study, the feasibility of multi-GPU bidomain simulations is demonstrated by running strong scalability benchmarks using a state-of-the-art model of rabbit ventricles. The model is spatially discretized using the finite element methods (FEM) on fully unstructured grids. The GPU code is directly derived from a large pre-existing code, the Cardiac Arrhythmia Research Package (CARP), with very minor perturbation of the code base. Overall, bidomain simulations were sped up by a factor of 11.8 to 16.3 in benchmarks running on 6-20 GPUs compared to the same number of CPU cores. To match the fastest GPU simulation which engaged 20 GPUs, 476 CPU cores were required on a national supercomputing facility.
Accelerating Cardiac Bidomain Simulations Using Graphics Processing Units

PubMed Central

Neic, Aurel; Liebmann, Manfred; Hoetzl, Elena; Mitchell, Lawrence; Vigmond, Edward J.; Haase, Gundolf

2013-01-01

Anatomically realistic and biophysically detailed multiscale computer models of the heart are playing an increasingly important role in advancing our understanding of integrated cardiac function in health and disease. Such detailed simulations, however, are computationally vastly demanding, which is a limiting factor for a wider adoption of in-silico modeling. While current trends in high-performance computing (HPC) hardware promise to alleviate this problem, exploiting the potential of such architectures remains challenging since strongly scalable algorithms are necessitated to reduce execution times. Alternatively, acceleration technologies such as graphics processing units (GPUs) are being considered. While the potential of GPUs has been demonstrated in various applications, benefits in the context of bidomain simulations where large sparse linear systems have to be solved in parallel with advanced numerical techniques are less clear. In this study, the feasibility of multi-GPU bidomain simulations is demonstrated by running strong scalability benchmarks using a state-of-the-art model of rabbit ventricles. The model is spatially discretized using the finite element methods (FEM) on fully unstructured grids. The GPU code is directly derived from a large pre-existing code, the Cardiac Arrhythmia Research Package (CARP), with very minor perturbation of the code base. Overall, bidomain simulations were sped up by a factor of 11.8 to 16.3 in benchmarks running on 6–20 GPUs compared to the same number of CPU cores. To match the fastest GPU simulation which engaged 20GPUs, 476 CPU cores were required on a national supercomputing facility. PMID:22692867
Homology modeling, docking studies and molecular dynamic simulations using graphical processing unit architecture to probe the type-11 phosphodiesterase catalytic site: a computational approach for the rational design of selective inhibitors.

PubMed

Cichero, Elena; D'Ursi, Pasqualina; Moscatelli, Marco; Bruno, Olga; Orro, Alessandro; Rotolo, Chiara; Milanesi, Luciano; Fossa, Paola

2013-12-01

Phosphodiesterase 11 (PDE11) is the latest isoform of the PDEs family to be identified, acting on both cyclic adenosine monophosphate and cyclic guanosine monophosphate. The initial reports of PDE11 found evidence for PDE11 expression in skeletal muscle, prostate, testis, and salivary glands; however, the tissue distribution of PDE11 still remains a topic of active study and some controversy. Given the sequence similarity between PDE11 and PDE5, several PDE5 inhibitors have been shown to cross-react with PDE11. Accordingly, many non-selective inhibitors, such as IBMX, zaprinast, sildenafil, and dipyridamole, have been documented to inhibit PDE11. Only recently, a series of dihydrothieno[3,2-d]pyrimidin-4(3H)-one derivatives proved to be selective toward the PDE11 isoform. In the absence of experimental data about PDE11 X-ray structures, we found interesting to gain a better understanding of the enzyme-inhibitor interactions using in silico simulations. In this work, we describe a computational approach based on homology modeling, docking, and molecular dynamics simulation to derive a predictive 3D model of PDE11. Using a Graphical Processing Unit architecture, it is possible to perform long simulations, find stable interactions involved in the complex, and finally to suggest guideline for the identification and synthesis of potent and selective inhibitors. © 2013 John Wiley & Sons A/S.
On the calculation of dynamic and heat loads on a three-dimensional body in a hypersonic flow

NASA Astrophysics Data System (ADS)

Bocharov, A. N.; Bityurin, V. A.; Evstigneev, N. M.; Fortov, V. E.; Golovin, N. N.; Petrovskiy, V. P.; Ryabkov, O. I.; Teplyakov, I. O.; Shustov, A. A.; Solomonov, Yu S.

2018-01-01

We consider a three-dimensional body in a hypersonic flow at zero angle of attack. Our aim is to estimate heat and aerodynamic loads on specific body elements. We are considering a previously developed code to solve coupled heat- and mass-transfer problem. The change of the surface shape is taken into account by formation of the iterative process for the wall material ablation. The solution is conducted on the multi-graphics-processing-unit (multi-GPU) cluster. Five Mach number points are considered, namely for M = 20-28. For each point we estimate body shape after surface ablation, heat loads on the surface and aerodynamic loads on the whole body and its elements. The latter is done using Gauss-type quadrature on the surface of the body. The comparison of the results for different Mach numbers is performed. We also estimate the efficiency of the Navier-Stokes code on multi-GPU and central processing unit architecture for the coupled heat and mass transfer problem.
Parallel processor-based raster graphics system architecture

DOEpatents

Littlefield, Richard J.

1990-01-01

An apparatus for generating raster graphics images from the graphics command stream includes a plurality of graphics processors connected in parallel, each adapted to receive any part of the graphics command stream for processing the command stream part into pixel data. The apparatus also includes a frame buffer for mapping the pixel data to pixel locations and an interconnection network for interconnecting the graphics processors to the frame buffer. Through the interconnection network, each graphics processor may access any part of the frame buffer concurrently with another graphics processor accessing any other part of the frame buffer. The plurality of graphics processors can thereby transmit concurrently pixel data to pixel locations in the frame buffer.
Anvil Forecast Tool in the Advanced Weather Interactive Processing System

NASA Technical Reports Server (NTRS)

Barrett, Joe H., III; Hood, Doris

2009-01-01

Meteorologists from the 45th Weather Squadron (45 WS) and National Weather Service Spaceflight Meteorology Group (SMG) have identified anvil forecasting as one of their most challenging tasks when predicting the probability of violations of the Lightning Launch Commit Criteria and Space Shuttle Flight Rules. As a result, the Applied Meteorology Unit (AMU) was tasked to create a graphical overlay tool for the Meteorological Interactive Data Display System (MIDDS) that indicates the threat of thunderstorm anvil clouds, using either observed or model forecast winds as input. The tool creates a graphic depicting the potential location of thunderstorm anvils one, two, and three hours into the future. The locations are based on the average of the upper level observed or forecasted winds. The graphic includes 10 and 20 n mi standoff circles centered at the location of interest, as well as one-, two-, and three-hour arcs in the upwind direction. The arcs extend outward across a 30 sector width based on a previous AMU study that determined thunderstorm anvils move in a direction plus or minus 15 of the upper-level wind direction. The AMU was then tasked to transition the tool to the Advanced Weather Interactive Processing System (AWIPS). SMG later requested the tool be updated to provide more flexibility and quicker access to model data. This presentation describes the work performed by the AMU to transition the tool into AWIPS, as well as the subsequent improvements made to the tool.
A multi-level simulation platform of natural gas internal reforming solid oxide fuel cell-gas turbine hybrid generation system - Part II. Balancing units model library and system simulation

NASA Astrophysics Data System (ADS)

Bao, Cheng; Cai, Ningsheng; Croiset, Eric

2011-10-01

Following our integrated hierarchical modeling framework of natural gas internal reforming solid oxide fuel cell (IRSOFC), this paper firstly introduces the model libraries of main balancing units, including some state-of-the-art achievements and our specific work. Based on gPROMS programming code, flexible configuration and modular design are fully realized by specifying graphically all unit models in each level. Via comparison with the steady-state experimental data of Siemens-Westinghouse demonstration system, the in-house multi-level SOFC-gas turbine (GT) simulation platform is validated to be more accurate than the advanced power system analysis tool (APSAT). Moreover, some units of the demonstration system are designed reversely for analysis of a typically part-load transient process. The framework of distributed and dynamic modeling in most of units is significant for the development of control strategies in the future.
Using Graphic Organizers as a Tool for the Development of Scientific Language

ERIC Educational Resources Information Center

Mercuri, Sandra P.

2010-01-01

This observational study examines the effectiveness of graphic organizers two elementary teachers in California, United States use to teach the content and the academic language of science. The study was done during the 2006-2007 school year. The data was collected through field-notes and the audio recording of instructional activities, and they…

Creating Interactive Graphical Overlays in the Advanced Weather Interactive Processing System Using Shapefiles and DGM Files

NASA Technical Reports Server (NTRS)

Barrett, Joe H., III; Lafosse, Richard; Hood, Doris; Hoeth, Brian

2007-01-01

Graphical overlays can be created in real-time in the Advanced Weather Interactive Processing System (AWIPS) using shapefiles or Denver AWIPS Risk Reduction and Requirements Evaluation (DARE) Graphics Metafile (DGM) files. This presentation describes how to create graphical overlays on-the-fly for AWIPS, by using two examples of AWIPS applications that were created by the Applied Meteorology Unit (AMU) located at Cape Canaveral Air Force Station (CCAFS), Florida. The first example is the Anvil Threat Corridor Forecast Tool, which produces a shapefile that depicts a graphical threat corridor of the forecast movement of thunderstorm anvil clouds, based on the observed or forecast upper-level winds. This tool is used by the Spaceflight Meteorology Group (SMG) at Johnson Space Center, Texas and 45th Weather Squadron (45 WS) at CCAFS to analyze the threat of natural or space vehicle-triggered lightning over a location. The second example is a launch and landing trajectory tool that produces a DGM file that plots the ground track of space vehicles during launch or landing. The trajectory tool can be used by SMG and the 45 WS forecasters to analyze weather radar imagery along a launch or landing trajectory. The presentation will list the advantages and disadvantages of both file types for creating interactive graphical overlays in future AWIPS applications. Shapefiles are a popular format used extensively in Geographical Information Systems. They are usually used in AWIPS to depict static map backgrounds. A shapefile stores the geometry and attribute information of spatial features in a dataset (ESRI 1998). Shapefiles can contain point, line, and polygon features. Each shapefile contains a main file, index file, and a dBASE table. The main file contains a record for each spatial feature, which describes the feature with a list of its vertices. The index file contains the offset of each record from the beginning of the main file. The dBASE table contains records for each attribute. Attributes are commonly used to label spatial features. Shapefiles can be viewed, but not created in AWIPS. As a result, either third-party software can be installed on an AWIPS workstation, or new software must be written to create shapefiles in the correct format.
Support for fast comprehension of ICU data: visualization using metaphor graphics.

PubMed

Horn, W; Popow, C; Unterasinger, L

2001-01-01

The time-oriented analysis of electronic patient records on (neonatal) intensive care units is a tedious and time-consuming task. Graphic data visualization should make it easier for physicians to assess the overall situation of a patient and to recognize essential changes over time. Metaphor graphics are used to sketch the most relevant parameters for characterizing a patient's situation. By repetition of the graphic object in 24 frames the situation of the ICU patient is presented in one display, usually summarizing the last 24 h. VIE-VISU is a data visualization system which uses multiples to present the change in the patient's status over time in graphic form. Each multiple is a highly structured metaphor graphic object. Each object visualizes important ICU parameters from circulation, ventilation, and fluid balance. The design using multiples promotes a focus on stability and change. A stable patient is recognizable at first sight, continuous improvement or worsening condition are easy to analyze, drastic changes in the patient's situation get the viewers attention immediately.
Reactions to graphic health warnings in the United States.

PubMed

Nonnemaker, James M; Choiniere, Conrad J; Farrelly, Matthew C; Kamyab, Kian; Davis, Kevin C

2015-02-01

This study reports consumer reactions to the graphic health warnings selected by the Food and Drug Administration to be placed on cigarette packs in the United States. We recruited three sets of respondents for an experimental study from a national opt-in e-mail list sample: (i) current smokers aged 25 or older, (ii) young adult smokers aged 18-24 and (iii) youth aged 13-17 who are current smokers or who may be susceptible to initiation of smoking. Participants were randomly assigned to be exposed to a pack of cigarettes with one of nine graphic health warnings or with a text-only warning statement. All three age groups had overall strong negative emotional (ß = 4.7, P < 0.001 for adults; ß = 4.6, P < 0.001 for young adults and ß = 4.0, P < 0.001 for youth) and cognitive (ß = 2.4, P < 0.001 for adults; ß = 3.0, P < 0.001 for young adults and ß = 4.6, P < 0.001 for youth) reactions to the proposed labels. The strong negative emotional and cognitive reactions following a single exposure to the graphic health warnings suggest that, with repeated exposures over time, graphic health warnings may influence smokers' beliefs, intentions and behaviors. © The Author 2014. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Impact of Graphic and Text Warnings on Cigarette Packs: Findings from Four Countries over Five Years

PubMed Central

Borland, Ron; Wilson, Nick; Fong, Geoffrey T.; Hammond, David; Cummings, K. Michael; Yong, Hua-Hie; Hosking, Warwick; Hastings, Gerard; Thrasher, James; McNeill, Ann

2015-01-01

Objectives To examine the impact of health warnings on smokers by comparing the short-term impact of new graphic (2006) Australian warnings with: (i) earlier (2003) United Kingdom (UK) larger text-based warnings; (ii) and Canadian graphic warnings (late 2000); and secondarily, to extend our understanding of warning wear-out. Methods The International Tobacco Control Policy Evaluation Survey (ITC Project) follows prospective cohorts (with replenishment) of adult smokers annually (5 waves: 2002–2006), in Canada, United States, UK, and Australia (around 2000 per country per wave; total n=17,773). Measures were of pack warning salience (reading and noticing); cognitive responses (thoughts of harm and quitting); and two behavioural responses: forgoing cigarettes and avoiding the warnings. Results All four indicators of impact increased markedly among Australian smokers following the introduction of graphic warnings. Controlling for date of introduction, they stimulated more cognitive responses than the UK (text-only) changes, and were avoided more, did not significantly increase forgoing cigarettes, but were read and noticed less. The findings also extend previous work showing partial wear-out of both graphic and text-only warnings, but the Canadian warnings have more sustained effects than UK ones. Conclusions Australia’s new health warnings increased reactions that are prospectively predictive of cessation activity. Warning size increases warning effectiveness and graphic warnings may be superior to text-based warnings. While there is partial wear-out in the initial impact associated with all warnings, stronger warnings tend to sustain their effects for longer. These findings support arguments for governments to exceed minimum FCTC requirements on warnings. PMID:19561362
Engineering visualization utilizing advanced animation

NASA Technical Reports Server (NTRS)

Sabionski, Gunter R.; Robinson, Thomas L., Jr.

1989-01-01

Engineering visualization is the use of computer graphics to depict engineering analysis and simulation in visual form from project planning through documentation. Graphics displays let engineers see data represented dynamically which permits the quick evaluation of results. The current state of graphics hardware and software generally allows the creation of two types of 3D graphics. The use of animated video as an engineering visualization tool is presented. The engineering, animation, and videography aspects of animated video production are each discussed. Specific issues include the integration of staffing expertise, hardware, software, and the various production processes. A detailed explanation of the animation process reveals the capabilities of this unique engineering visualization method. Automation of animation and video production processes are covered and future directions are proposed.
SedCT: MATLAB™ tools for standardized and quantitative processing of sediment core computed tomography (CT) data collected using a medical CT scanner

NASA Astrophysics Data System (ADS)

Reilly, B. T.; Stoner, J. S.; Wiest, J.

2017-08-01

Computed tomography (CT) of sediment cores allows for high-resolution images, three-dimensional volumes, and down core profiles. These quantitative data are generated through the attenuation of X-rays, which are sensitive to sediment density and atomic number, and are stored in pixels as relative gray scale values or Hounsfield units (HU). We present a suite of MATLAB™ tools specifically designed for routine sediment core analysis as a means to standardize and better quantify the products of CT data collected on medical CT scanners. SedCT uses a graphical interface to process Digital Imaging and Communications in Medicine (DICOM) files, stitch overlapping scanned intervals, and create down core HU profiles in a manner robust to normal coring imperfections. Utilizing a random sampling technique, SedCT reduces data size and allows for quick processing on typical laptop computers. SedCTimage uses a graphical interface to create quality tiff files of CT slices that are scaled to a user-defined HU range, preserving the quantitative nature of CT images and easily allowing for comparison between sediment cores with different HU means and variance. These tools are presented along with examples from lacustrine and marine sediment cores to highlight the robustness and quantitative nature of this method.
Graphical Man/Machine Communications

DTIC Science & Technology

Progress is reported concerning the use of computer controlled graphical displays in the areas of radiaton diffusion and hydrodynamics, general...ventricular dynamics. Progress is continuing on the use of computer graphics in architecture. Some progress in halftone graphics is reported with no basic...developments presented. Colored halftone perspective pictures are being used to represent multivariable situations. Nonlinear waveform processing is
SU-E-T-423: Fast Photon Convolution Calculation with a 3D-Ideal Kernel On the GPU

DOE Office of Scientific and Technical Information (OSTI.GOV)

Moriya, S; Sato, M; Tachibana, H

Purpose: The calculation time is a trade-off for improving the accuracy of convolution dose calculation with fine calculation spacing of the KERMA kernel. We investigated to accelerate the convolution calculation using an ideal kernel on the Graphic Processing Units (GPU). Methods: The calculation was performed on the AMD graphics hardware of Dual FirePro D700 and our algorithm was implemented using the Aparapi that convert Java bytecode to OpenCL. The process of dose calculation was separated with the TERMA and KERMA steps. The dose deposited at the coordinate (x, y, z) was determined in the process. In the dose calculation runningmore » on the central processing unit (CPU) of Intel Xeon E5, the calculation loops were performed for all calculation points. On the GPU computation, all of the calculation processes for the points were sent to the GPU and the multi-thread computation was done. In this study, the dose calculation was performed in a water equivalent homogeneous phantom with 150{sup 3} voxels (2 mm calculation grid) and the calculation speed on the GPU to that on the CPU and the accuracy of PDD were compared. Results: The calculation time for the GPU and the CPU were 3.3 sec and 4.4 hour, respectively. The calculation speed for the GPU was 4800 times faster than that for the CPU. The PDD curve for the GPU was perfectly matched to that for the CPU. Conclusion: The convolution calculation with the ideal kernel on the GPU was clinically acceptable for time and may be more accurate in an inhomogeneous region. Intensity modulated arc therapy needs dose calculations for different gantry angles at many control points. Thus, it would be more practical that the kernel uses a coarse spacing technique if the calculation is faster while keeping the similar accuracy to a current treatment planning system.« less
An Electronic Pressure Profile Display system for aeronautic test facilities

NASA Technical Reports Server (NTRS)

Woike, Mark R.

1990-01-01

The NASA Lewis Research Center has installed an Electronic Pressure Profile Display system. This system provides for the real-time display of pressure readings on high resolution graphics monitors. The Electronic Pressure Profile Display system will replace manometer banks currently used in aeronautic test facilities. The Electronic Pressure Profile Display system consists of an industrial type Digital Pressure Transmitter (DPI) unit which interfaces with a host computer. The host computer collects the pressure data from the DPI unit, converts it into engineering units, and displays the readings on a high resolution graphics monitor in bar graph format. Software was developed to accomplish the above tasks and also draw facility diagrams as background information on the displays. Data transfer between host computer and DPT unit is done with serial communications. Up to 64 channels are displayed with one second update time. This paper describes the system configuration, its features, and its advantages over existing systems.
An electronic pressure profile display system for aeronautic test facilities

NASA Technical Reports Server (NTRS)

Woike, Mark R.

1990-01-01

The NASA Lewis Research Center has installed an Electronic Pressure Profile Display system. This system provides for the real-time display of pressure readings on high resolution graphics monitors. The Electronic Pressure Profile Display system will replace manometer banks currently used in aeronautic test facilities. The Electronic Pressure Profile Display system consists of an industrial type Digital Pressure Transmitter (DPT) unit which interfaces with a host computer. The host computer collects the pressure data from the DPT unit, converts it into engineering units, and displays the readings on a high resolution graphics monitor in bar graph format. Software was developed to accomplish the above tasks and also draw facility diagrams as background information on the displays. Data transfer between host computer and DPT unit is done with serial communications. Up to 64 channels are displayed with one second update time. This paper describes the system configuration, its features, and its advantages over existing systems.
Update On Ink Jet Proofing

NASA Astrophysics Data System (ADS)

Santos, Richard P.

1989-04-01

IRIS Graphics, Inc., is a new start-up company chartered to develop, manufacture, and market direct digital filmless color imaging systems. IRIS is pleased to have been the recipient of the Graphic Art Technologies Foundation INTERTEC '87 Award for innovative excellence. IRIS is extremely proud to have been given this honor. IRIS was incorporated in April 1984 and received its initial funding of approximately 1 million by September 1984. The first 2044 Beta unit was installed in August 1985, and the first 2044 sales were made in December 1985 to R. R. Donnelley, the largest printer in the United States, and to G. S. Litho, the largest U.S. color separation house. In May 1986, IRIS received an additional 3 million in its second round of financing. A smaller version of the 2044, the 2024 was introduced at Lasers In Graphics in September 1986. IRIS achieved additional financing in July 1987 and completed the introduction of the new breakthrough Series 3000 again at Lasers In Graphics in September 1987 in Orlando, Florida. IRIS occupies 20,000 square feet at its new location in Bedford, Massachusetts, which located off of Route 128 in the high technology area near Boston.
Benchmarking GPU and CPU codes for Heisenberg spin glass over-relaxation

NASA Astrophysics Data System (ADS)

Bernaschi, M.; Parisi, G.; Parisi, L.

2011-06-01

We present a set of possible implementations for Graphics Processing Units (GPU) of the Over-relaxation technique applied to the 3D Heisenberg spin glass model. The results show that a carefully tuned code can achieve more than 100 GFlops/s of sustained performance and update a single spin in about 0.6 nanoseconds. A multi-hit technique that exploits the GPU shared memory further reduces this time. Such results are compared with those obtained by means of a highly-tuned vector-parallel code on latest generation multi-core CPUs.
Nuflood, Version 1.x

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tasseff, Byron

2016-07-29

NUFLOOD Version 1.x is a surface-water hydrodynamic package designed for the simulation of overland flow of fluids. It consists of various routines to address a wide range of applications (e.g., rainfall-runoff, tsunami, storm surge) and real time, interactive visualization tools. NUFLOOD has been designed for general-purpose computers and workstations containing multi-core processors and/or graphics processing units. The software is easy to use and extensible, constructed in mind for instructors, students, and practicing engineers. NUFLOOD is intended to assist the water resource community in planning against water-related natural disasters.
Practical system for generating digital mixed reality video holograms.

PubMed

Song, Joongseok; Kim, Changseob; Park, Hanhoon; Park, Jong-Il

2016-07-10

We propose a practical system that can effectively mix the depth data of real and virtual objects by using a Z buffer and can quickly generate digital mixed reality video holograms by using multiple graphic processing units (GPUs). In an experiment, we verify that real objects and virtual objects can be merged naturally in free viewing angles, and the occlusion problem is well handled. Furthermore, we demonstrate that the proposed system can generate mixed reality video holograms at 7.6 frames per second. Finally, the system performance is objectively verified by users' subjective evaluations.
Remedial Action Assessment System: A computer-based methodology for conducting feasibility studies

DOE Office of Scientific and Technical Information (OSTI.GOV)

White, M.K.; Buelt, J.L.; Stottlemyre, J.A.

1991-02-01

Because of the complexity and number of potential waste sites facing the US Department of Energy (DOE) for potential cleanup, DOE is supporting the development of a computer-based methodology to streamline the remedial investigation/feasibility study process. The Remedial Action Assessment System (RAAS), can be used for screening, linking, and evaluating established technology processes in support of conducting feasibility studies. It is also intended to do the same in support of corrective measures studies. The user interface employs menus, windows, help features, and graphical information while RAAS is in operation. Object-oriented programming is used to link unit processes into sets ofmore » compatible processes that form appropriate remedial alternatives. Once the remedial alternatives are formed, the RAAS methodology can evaluate them in terms of effectiveness, implementability, and cost. RAAS will access a user-selected risk assessment code to determine the reduction of risk after remedial action by each recommended alternative. The methodology will also help determine the implementability of the remedial alternatives at a site and access cost estimating tools to provide estimates of capital, operating, and maintenance costs. This paper presents the characteristics of two RAAS prototypes currently being developed. These include the RAAS Technology Information System, which accesses graphical, tabular and textual information about technologies, and the main RAAS methodology, which screens, links, and evaluates remedial technologies. 4 refs., 3 figs., 1 tab.« less
A new space-time information expression and analysis approach based on 3S technology: a case study of China's coastland

NASA Astrophysics Data System (ADS)

Cao, Bao; Luo, Hong; Gao, Zhenji

2009-10-01

Space-time Information Expression and Analysis (SIEA) uses vivid graphic images of thinking to deal with information units according to series distribution rules with a variety of arranging, which combined with the use of information technology, powerful data-processing capabilities to carry out analysis and integration of information units. In this paper, a new SIEA approach was proposed and its model was constructed. And basic units, methodologies and steps of SIEA were discussed. Taking China's coastland as an example, the new SIEA approach were applied for the parameters of air humidity, rainfall and surface temperature from the year 1981 to 2000. The case study shows that the parameters change within month alternation, but little change within year alternation. From the view of spatial distribution, it was significantly different for the parameters in north and south of China's coastland. The new SIEA approach proposed in this paper not only has the intuitive, image characteristics, but also can solved the problem that it is difficult to express the biophysical parameters of space-time distribution using traditional charts and tables. It can reveal the complexity of the phenomenon behind the movement of things and laws of nature. And it can quantitatively analyze the phenomenon and nature law of the parameters, which inherited the advantages of graphics of traditional ways of thinking. SIEA provides a new space-time analysis and expression approach, using comprehensive 3S technologies, for the research of Earth System Science.
Improved preconditioned conjugate gradient algorithm and application in 3D inversion of gravity-gradiometry data

NASA Astrophysics Data System (ADS)

Wang, Tai-Han; Huang, Da-Nian; Ma, Guo-Qing; Meng, Zhao-Hai; Li, Ye

2017-06-01

With the continuous development of full tensor gradiometer (FTG) measurement techniques, three-dimensional (3D) inversion of FTG data is becoming increasingly used in oil and gas exploration. In the fast processing and interpretation of large-scale high-precision data, the use of the graphics processing unit process unit (GPU) and preconditioning methods are very important in the data inversion. In this paper, an improved preconditioned conjugate gradient algorithm is proposed by combining the symmetric successive over-relaxation (SSOR) technique and the incomplete Choleksy decomposition conjugate gradient algorithm (ICCG). Since preparing the preconditioner requires extra time, a parallel implement based on GPU is proposed. The improved method is then applied in the inversion of noisecontaminated synthetic data to prove its adaptability in the inversion of 3D FTG data. Results show that the parallel SSOR-ICCG algorithm based on NVIDIA Tesla C2050 GPU achieves a speedup of approximately 25 times that of a serial program using a 2.0 GHz Central Processing Unit (CPU). Real airborne gravity-gradiometry data from Vinton salt dome (southwest Louisiana, USA) are also considered. Good results are obtained, which verifies the efficiency and feasibility of the proposed parallel method in fast inversion of 3D FTG data.
A system for the delivery of programmable, adaptive stimulation intensity envelopes for drop foot correction applications.

PubMed

Breen, P P; O'Keeffe, D T; Conway, R; Lyons, G M

2006-03-01

We describe the design of an intelligent drop foot stimulator unit for use in conjunction with a commercial neuromuscular electrical nerve stimulation (NMES) unit, the NT2000. The developed micro-controller unit interfaces to a personal computer (PC) and a graphical user interface (GUI) allows the clinician to graphically specify the shape of the stimulation intensity envelope required for a subject undergoing drop foot correction. The developed unit is based on the ADuC812S micro-controller evaluation board from Analog Devices and uses two force sensitive resistor (FSR) based foot-switches to control application of stimulus. The unit has the ability to display to the clinician how the stimulus intensity envelope is being delivered during walking using a data capture capability. The developed system has a built-in algorithm to dynamically adjust the delivery of stimulus to reflect changes both within the gait cycle and from cycle to cycle. Thus, adaptive control of stimulus intensity is achieved.
Low-cost compact ECG with graphic LCD and phonocardiogram system design.

PubMed

Kara, Sadik; Kemaloğlu, Semra; Kirbaş, Samil

2006-06-01

Till today, many different ECG devices are made in developing countries. In this study, low cost, small size, portable LCD screen ECG device, and phonocardiograph were designed. With designed system, heart sounds that take synchronously with ECG signal are heard as sensitive. Improved system consist three units; Unit 1, ECG circuit, filter and amplifier structure. Unit 2, heart sound acquisition circuit. Unit 3, microcontroller, graphic LCD and ECG signal sending unit to computer. Our system can be used easily in different departments of the hospital, health institution and clinics, village clinic and also in houses because of its small size structure and other benefits. In this way, it is possible that to see ECG signal and hear heart sounds as synchronously and sensitively. In conclusion, heart sounds are heard on the part of both doctor and patient because sounds are given to environment with a tiny speaker. Thus, the patient knows and hears heart sounds him/herself and is acquainted by doctor about healthy condition.
Spins Dynamics in a Dissipative Environment: Hierarchal Equations of Motion Approach Using a Graphics Processing Unit (GPU).

PubMed

Tsuchimoto, Masashi; Tanimura, Yoshitaka

2015-08-11

A system with many energy states coupled to a harmonic oscillator bath is considered. To study quantum non-Markovian system-bath dynamics numerically rigorously and nonperturbatively, we developed a computer code for the reduced hierarchy equations of motion (HEOM) for a graphics processor unit (GPU) that can treat the system as large as 4096 energy states. The code employs a Padé spectrum decomposition (PSD) for a construction of HEOM and the exponential integrators. Dynamics of a quantum spin glass system are studied by calculating the free induction decay signal for the cases of 3 × 2 to 3 × 4 triangular lattices with antiferromagnetic interactions. We found that spins relax faster at lower temperature due to transitions through a quantum coherent state, as represented by the off-diagonal elements of the reduced density matrix, while it has been known that the spins relax slower due to suppression of thermal activation in a classical case. The decay of the spins are qualitatively similar regardless of the lattice sizes. The pathway of spin relaxation is analyzed under a sudden temperature drop condition. The Compute Unified Device Architecture (CUDA) based source code used in the present calculations is provided as Supporting Information .

[Influence of the recording interval and a graphic organizer on the writing process/product and on other psychological variables].

PubMed

García Sánchez, Jesús N; Rodríguez Pérez, Celestino

2007-05-01

An experimental study of the influence of the recording interval and a graphic organizer on the processes of writing composition and on the final product is presented. We studied 326 participants, age 10 to 16 years old, by means of a nested design. Two groups were compared: one group was aided in the writing process with a graphic organizer and the other was not. Each group was subdivided into two further groups: one with a mean recording interval of 45 seconds and the other with approximately 90 seconds recording interval in a writing log. The results showed that the group aided by a graphic organizer obtained better results both in processes and writing product, and that the groups assessed with an average interval of 45 seconds obtained worse results. Implications for educational practice are discussed, and limitations and future perspectives are commented on.
Chromium: A Stress-Processing Framework for Interactive Rendering on Clusters

DOE Office of Scientific and Technical Information (OSTI.GOV)

Humphreys, G,; Houston, M.; Ng, Y.-R.

2002-01-11

We describe Chromium, a system for manipulating streams of graphics API commands on clusters of workstations. Chromium's stream filters can be arranged to create sort-first and sort-last parallel graphics architectures that, in many cases, support the same applications while using only commodity graphics accelerators. In addition, these stream filters can be extended programmatically, allowing the user to customize the stream transformations performed by nodes in a cluster. Because our stream processing mechanism is completely general, any cluster-parallel rendering algorithm can be either implemented on top of or embedded in Chromium. In this paper, we give examples of real-world applications thatmore » use Chromium to achieve good scalability on clusters of workstations, and describe other potential uses of this stream processing technology. By completely abstracting the underlying graphics architecture, network topology, and API command processing semantics, we allow a variety of applications to run in different environments.« less
Task-Analytic Design of Graphic Presentations

DTIC Science & Technology

1990-05-18

important premise of Larkin and Simon’s work is that, when comparing alternative presentations, it is fruitful to characterize graphic-based problem solving...using the same information-processing models used to help understand problem solving using other representations [Newell and Simon, 19721...luring execution of graphic presentation- 4 based problem -solving procedures. Chapter 2 reviews other work related to the problem of designing graphic
Interactive computer graphics system for structural sizing and analysis of aircraft structures

NASA Technical Reports Server (NTRS)

Bendavid, D.; Pipano, A.; Raibstein, A.; Somekh, E.

1975-01-01

A computerized system for preliminary sizing and analysis of aircraft wing and fuselage structures was described. The system is based upon repeated application of analytical program modules, which are interactively interfaced and sequence-controlled during the iterative design process with the aid of design-oriented graphics software modules. The entire process is initiated and controlled via low-cost interactive graphics terminals driven by a remote computer in a time-sharing mode.
Graphics processing unit (GPU)-based computation of heat conduction in thermally anisotropic solids

NASA Astrophysics Data System (ADS)

Nahas, C. A.; Balasubramaniam, Krishnan; Rajagopal, Prabhu

2013-01-01

Numerical modeling of anisotropic media is a computationally intensive task since it brings additional complexity to the field problem in such a way that the physical properties are different in different directions. Largely used in the aerospace industry because of their lightweight nature, composite materials are a very good example of thermally anisotropic media. With advancements in video gaming technology, parallel processors are much cheaper today and accessibility to higher-end graphical processing devices has increased dramatically over the past couple of years. Since these massively parallel GPUs are very good in handling floating point arithmetic, they provide a new platform for engineers and scientists to accelerate their numerical models using commodity hardware. In this paper we implement a parallel finite difference model of thermal diffusion through anisotropic media using the NVIDIA CUDA (Compute Unified device Architecture). We use the NVIDIA GeForce GTX 560 Ti as our primary computing device which consists of 384 CUDA cores clocked at 1645 MHz with a standard desktop pc as the host platform. We compare the results from standard CPU implementation for its accuracy and speed and draw implications for simulation using the GPU paradigm.
Interactive, graphical processing unitbased evaluation of evacuation scenarios at the state scale

DOE Office of Scientific and Technical Information (OSTI.GOV)

Perumalla, Kalyan S; Aaby, Brandon G; Yoginath, Srikanth B

2011-01-01

In large-scale scenarios, transportation modeling and simulation is severely constrained by simulation time. For example, few real- time simulators scale to evacuation traffic scenarios at the level of an entire state, such as Louisiana (approximately 1 million links) or Florida (2.5 million links). New simulation approaches are needed to overcome severe computational demands of conventional (microscopic or mesoscopic) modeling techniques. Here, a new modeling and execution methodology is explored that holds the potential to provide a tradeoff among the level of behavioral detail, the scale of transportation network, and real-time execution capabilities. A novel, field-based modeling technique and its implementationmore » on graphical processing units are presented. Although additional research with input from domain experts is needed for refining and validating the models, the techniques reported here afford interactive experience at very large scales of multi-million road segments. Illustrative experiments on a few state-scale net- works are described based on an implementation of this approach in a software system called GARFIELD. Current modeling cap- abilities and implementation limitations are described, along with possible use cases and future research.« less
A binary-decision-diagram-based two-bit arithmetic logic unit on a GaAs-based regular nanowire network with hexagonal topology.

PubMed

Zhao, Hong-Quan; Kasai, Seiya; Shiratori, Yuta; Hashizume, Tamotsu

2009-06-17

A two-bit arithmetic logic unit (ALU) was successfully fabricated on a GaAs-based regular nanowire network with hexagonal topology. This fundamental building block of central processing units can be implemented on a regular nanowire network structure with simple circuit architecture based on graphical representation of logic functions using a binary decision diagram and topology control of the graph. The four-instruction ALU was designed by integrating subgraphs representing each instruction, and the circuitry was implemented by transferring the logical graph structure to a GaAs-based nanowire network formed by electron beam lithography and wet chemical etching. A path switching function was implemented in nodes by Schottky wrap gate control of nanowires. The fabricated circuit integrating 32 node devices exhibits the correct output waveforms at room temperature allowing for threshold voltage variation.
Large Terrain Continuous Level of Detail 3D Visualization Tool

NASA Technical Reports Server (NTRS)

Myint, Steven; Jain, Abhinandan

2012-01-01

This software solved the problem of displaying terrains that are usually too large to be displayed on standard workstations in real time. The software can visualize terrain data sets composed of billions of vertices, and can display these data sets at greater than 30 frames per second. The Large Terrain Continuous Level of Detail 3D Visualization Tool allows large terrains, which can be composed of billions of vertices, to be visualized in real time. It utilizes a continuous level of detail technique called clipmapping to support this. It offloads much of the work involved in breaking up the terrain into levels of details onto the GPU (graphics processing unit) for faster processing.
Blind detection of giant pulses: GPU implementation

NASA Astrophysics Data System (ADS)

Ait-Allal, Dalal; Weber, Rodolphe; Dumez-Viou, Cédric; Cognard, Ismael; Theureau, Gilles

2012-01-01

Radio astronomical pulsar observations require specific instrumentation and dedicated signal processing to cope with the dispersion caused by the interstellar medium. Moreover, the quality of observations can be limited by radio frequency interference (RFI) generated by Telecommunications activity. This article presents the innovative pulsar instrumentation based on graphical processing units (GPU) which has been designed at the Nançay Radio Astronomical Observatory. In addition, for giant pulsar search, we propose a new approach which combines a hardware-efficient search method and some RFI mitigation capabilities. Although this approach is less sensitive than the classical approach, its advantage is that no a priori information on the pulsar parameters is required. The validation of a GPU implementation is under way.
GPU-accelerated computation of electron transfer.

PubMed

Höfinger, Siegfried; Acocella, Angela; Pop, Sergiu C; Narumi, Tetsu; Yasuoka, Kenji; Beu, Titus; Zerbetto, Francesco

2012-11-05

Electron transfer is a fundamental process that can be studied with the help of computer simulation. The underlying quantum mechanical description renders the problem a computationally intensive application. In this study, we probe the graphics processing unit (GPU) for suitability to this type of problem. Time-critical components are identified via profiling of an existing implementation and several different variants are tested involving the GPU at increasing levels of abstraction. A publicly available library supporting basic linear algebra operations on the GPU turns out to accelerate the computation approximately 50-fold with minor dependence on actual problem size. The performance gain does not compromise numerical accuracy and is of significant value for practical purposes. Copyright © 2012 Wiley Periodicals, Inc.
Distributed shared memory for roaming large volumes.

PubMed

Castanié, Laurent; Mion, Christophe; Cavin, Xavier; Lévy, Bruno

2006-01-01

We present a cluster-based volume rendering system for roaming very large volumes. This system allows to move a gigabyte-sized probe inside a total volume of several tens or hundreds of gigabytes in real-time. While the size of the probe is limited by the total amount of texture memory on the cluster, the size of the total data set has no theoretical limit. The cluster is used as a distributed graphics processing unit that both aggregates graphics power and graphics memory. A hardware-accelerated volume renderer runs in parallel on the cluster nodes and the final image compositing is implemented using a pipelined sort-last rendering algorithm. Meanwhile, volume bricking and volume paging allow efficient data caching. On each rendering node, a distributed hierarchical cache system implements a global software-based distributed shared memory on the cluster. In case of a cache miss, this system first checks page residency on the other cluster nodes instead of directly accessing local disks. Using two Gigabit Ethernet network interfaces per node, we accelerate data fetching by a factor of 4 compared to directly accessing local disks. The system also implements asynchronous disk access and texture loading, which makes it possible to overlap data loading, volume slicing and rendering for optimal volume roaming.
Low-SWaP coincidence processing for Geiger-mode LIDAR video

NASA Astrophysics Data System (ADS)

Schultz, Steven E.; Cervino, Noel P.; Kurtz, Zachary D.; Brown, Myron Z.

2015-05-01

Photon-counting Geiger-mode lidar detector arrays provide a promising approach for producing three-dimensional (3D) video at full motion video (FMV) data rates, resolution, and image size from long ranges. However, coincidence processing required to filter raw photon counts is computationally expensive, generally requiring significant size, weight, and power (SWaP) and also time. In this paper, we describe a laboratory test-bed developed to assess the feasibility of low-SWaP, real-time processing for 3D FMV based on Geiger-mode lidar. First, we examine a design based on field programmable gate arrays (FPGA) and demonstrate proof-of-concept results. Then we examine a design based on a first-of-its-kind embedded graphical processing unit (GPU) and compare performance with the FPGA. Results indicate feasibility of real-time Geiger-mode lidar processing for 3D FMV and also suggest utility for real-time onboard processing for mapping lidar systems.
Image Understanding and Intelligent Parallel Systems

DTIC Science & Technology

1991-05-09

a common user interface for the interactive , graphical manipulation of those histories, and...Circuits and Systems, August 1987. Yap, S.-K. and M.L. Scott, "PenGuin: A language for reactive graphical user interface programming," to appear, INTERACT 󈨞, Cambridge, United Kingdom, 1990. 74 ...of up to a factor of 100 over single-workstation implementations. User interfaces to large multiprocessor computers are a difficult issue addressed
CrossTalk. The Journal of Defense Software Engineering. Volume 25, Number 3

DTIC Science & Technology

2012-06-01

OMG) standard Business Process Modeling and Nota- tion ( BPMN ) [6] graphical notation. I will address each of these: identify and document steps...to a value stream map using BPMN and textual process narratives. The resulting process narratives or process metadata includes key information...objectives. Once the processes are identified we can graphically document them capturing the process using BPMN (see Figure 1). The BPMN models
Accelerating Smith-Waterman Alignment for Protein Database Search Using Frequency Distance Filtration Scheme Based on CPU-GPU Collaborative System.

PubMed

Liu, Yu; Hong, Yang; Lin, Chun-Yuan; Hung, Che-Lun

2015-01-01

The Smith-Waterman (SW) algorithm has been widely utilized for searching biological sequence databases in bioinformatics. Recently, several works have adopted the graphic card with Graphic Processing Units (GPUs) and their associated CUDA model to enhance the performance of SW computations. However, these works mainly focused on the protein database search by using the intertask parallelization technique, and only using the GPU capability to do the SW computations one by one. Hence, in this paper, we will propose an efficient SW alignment method, called CUDA-SWfr, for the protein database search by using the intratask parallelization technique based on a CPU-GPU collaborative system. Before doing the SW computations on GPU, a procedure is applied on CPU by using the frequency distance filtration scheme (FDFS) to eliminate the unnecessary alignments. The experimental results indicate that CUDA-SWfr runs 9.6 times and 96 times faster than the CPU-based SW method without and with FDFS, respectively.
Payload specialist station study: Volume 2, part 3: Program analysis and planning for phase C/D

NASA Technical Reports Server (NTRS)

1976-01-01

The controls and displays (C&D) required at the Orbiter aft-flight deck (AFD) and the core C&D required at the Payload Specialist Station (PSS) are identified in this document. The AFD C&D Concept consists of a multifunction display system (MFDS) and elements of multiuse mission support equipment (MMSE). The MFDS consists of two CRTs, a display electronics unit (DEU), and a keyboard. The MMSE consists of a manual pointing controller (MPC), five digit numeric displays, 10 character alphanumeric legends, event timers, analog meters, rotary and toggle switches. The MMSE may be hardwired to the experiment, or interface with a data bus at the PSS for signal processing. The MFDS has video capability, with alphanumeric and graphic overlay features, on one CRT and alphanumeric and graphic (tricolor) capability on a second CRT. The DEU will have the capability to communicate, via redundant data buses, with both the spacelab experiment and subsystem computers.
Production Level CFD Code Acceleration for Hybrid Many-Core Architectures

NASA Technical Reports Server (NTRS)

Duffy, Austen C.; Hammond, Dana P.; Nielsen, Eric J.

2012-01-01

In this work, a novel graphics processing unit (GPU) distributed sharing model for hybrid many-core architectures is introduced and employed in the acceleration of a production-level computational fluid dynamics (CFD) code. The latest generation graphics hardware allows multiple processor cores to simultaneously share a single GPU through concurrent kernel execution. This feature has allowed the NASA FUN3D code to be accelerated in parallel with up to four processor cores sharing a single GPU. For codes to scale and fully use resources on these and the next generation machines, codes will need to employ some type of GPU sharing model, as presented in this work. Findings include the effects of GPU sharing on overall performance. A discussion of the inherent challenges that parallel unstructured CFD codes face in accelerator-based computing environments is included, with considerations for future generation architectures. This work was completed by the author in August 2010, and reflects the analysis and results of the time.
Flexible, fast and accurate sequence alignment profiling on GPGPU with PaSWAS.

PubMed

Warris, Sven; Yalcin, Feyruz; Jackson, Katherine J L; Nap, Jan Peter

2015-01-01

To obtain large-scale sequence alignments in a fast and flexible way is an important step in the analyses of next generation sequencing data. Applications based on the Smith-Waterman (SW) algorithm are often either not fast enough, limited to dedicated tasks or not sufficiently accurate due to statistical issues. Current SW implementations that run on graphics hardware do not report the alignment details necessary for further analysis. With the Parallel SW Alignment Software (PaSWAS) it is possible (a) to have easy access to the computational power of NVIDIA-based general purpose graphics processing units (GPGPUs) to perform high-speed sequence alignments, and (b) retrieve relevant information such as score, number of gaps and mismatches. The software reports multiple hits per alignment. The added value of the new SW implementation is demonstrated with two test cases: (1) tag recovery in next generation sequence data and (2) isotype assignment within an immunoglobulin 454 sequence data set. Both cases show the usability and versatility of the new parallel Smith-Waterman implementation.
Exploring Communication Technology.

ERIC Educational Resources Information Center

Wood, Jimmie; Kellum, Mary

These teacher's materials for a seven-unit course were developed to help students develop technological literacy, career exploration, and problem-solving skills relative to the communication industries. The seven units are on an introduction to communication, verbal communication, design and sketching, drafting, graphic reproduction, photography,…
Fast analysis of molecular dynamics trajectories with graphics processing units-Radial distribution function histogramming

DOE Office of Scientific and Technical Information (OSTI.GOV)

Levine, Benjamin G., E-mail: ben.levine@temple.ed; Stone, John E., E-mail: johns@ks.uiuc.ed; Kohlmeyer, Axel, E-mail: akohlmey@temple.ed

2011-05-01

The calculation of radial distribution functions (RDFs) from molecular dynamics trajectory data is a common and computationally expensive analysis task. The rate limiting step in the calculation of the RDF is building a histogram of the distance between atom pairs in each trajectory frame. Here we present an implementation of this histogramming scheme for multiple graphics processing units (GPUs). The algorithm features a tiling scheme to maximize the reuse of data at the fastest levels of the GPU's memory hierarchy and dynamic load balancing to allow high performance on heterogeneous configurations of GPUs. Several versions of the RDF algorithm aremore » presented, utilizing the specific hardware features found on different generations of GPUs. We take advantage of larger shared memory and atomic memory operations available on state-of-the-art GPUs to accelerate the code significantly. The use of atomic memory operations allows the fast, limited-capacity on-chip memory to be used much more efficiently, resulting in a fivefold increase in performance compared to the version of the algorithm without atomic operations. The ultimate version of the algorithm running in parallel on four NVIDIA GeForce GTX 480 (Fermi) GPUs was found to be 92 times faster than a multithreaded implementation running on an Intel Xeon 5550 CPU. On this multi-GPU hardware, the RDF between two selections of 1,000,000 atoms each can be calculated in 26.9 s per frame. The multi-GPU RDF algorithms described here are implemented in VMD, a widely used and freely available software package for molecular dynamics visualization and analysis.« less

Mechanical properties of bovine cortical bone based on the automated ball indentation technique and graphics processing method.

PubMed

Zhang, Airong; Zhang, Song; Bian, Cuirong

2018-02-01

Cortical bone provides the main form of support in humans and other vertebrates against various forces. Thus, capturing its mechanical properties is important. In this study, the mechanical properties of cortical bone were investigated by using automated ball indentation and graphics processing at both the macroscopic and microstructural levels under dry conditions. First, all polished samples were photographed under a metallographic microscope, and the area ratio of the circumferential lamellae and osteons was calculated through the graphics processing method. Second, fully-computer-controlled automated ball indentation (ABI) tests were performed to explore the micro-mechanical properties of the cortical bone at room temperature and a constant indenter speed. The indentation defects were examined with a scanning electron microscope. Finally, the macroscopic mechanical properties of the cortical bone were estimated with the graphics processing method and mixture rule. Combining ABI and graphics processing proved to be an effective tool to obtaining the mechanical properties of the cortical bone, and the indenter size had a significant effect on the measurement. The methods presented in this paper provide an innovative approach to acquiring the macroscopic mechanical properties of cortical bone in a nondestructive manner. Copyright © 2017 Elsevier Ltd. All rights reserved.
Portable x-ray fluorescence spectrometer for environmental monitoring of inorganic pollutants

NASA Technical Reports Server (NTRS)

Clark, III, Benton C. (Inventor); Thornton, Michael G. (Inventor)

1991-01-01

A portable x-ray fluorescence spectrometer has a portable sensor unit containing a battery, a high voltage power supply, an x-ray tube which produces a beam x-ray radiation directed toward a target sample, and a detector for fluorescent x-rays produced by the sample. If a silicon-lithium detector is used, the sensor unit also contains either a thermoelectric or thermochemical cooler, or a small dewar flask containing liquid nitrogen to cool the detector. A pulse height analyzer (PHA) generates a spectrum of data for each sample consisting of the number of fluorescent x-rays detected as a function of their energy level. The PHA can also store spectrum data for a number of samples in the field. A processing unit can be attached to the pulse height analyzer to upload and analyze the stored spectrum data for each sample. The processing unit provides a graphic display of the spectrum data for each sample, and provides qualitative and/or quantitative analysis of the elemental composition of the sample by comparing the peaks in the sample spectrum against known x-ray energies for various chemical elements. An optional filtration enclosure can be used to filter particles from a sample suspension, either in the form of a natural suspension or a chemically created precipitate. The sensor unit is then temporarily attached to the filtration unit to analyze the particles collected by the filter medium.
Pre and post processing using the IBM 3277 display station graphics attachment (RPQ7H0284)

NASA Technical Reports Server (NTRS)

Burroughs, S. H.; Lawlor, M. B.; Miller, I. M.

1978-01-01

A graphical interactive procedure operating under TSO and utilizing two CRT display terminals is shown to be an effective means of accomplishing mesh generation, establishing boundary conditions, and reviewing graphic output for finite element analysis activity.
Getting Graphic at the School Library.

ERIC Educational Resources Information Center

Kan, Kat

2003-01-01

Provides information for school libraries interested in acquiring graphic novels. Discusses theft prevention; processing and cataloging; maintaining the collection; what to choose, with two Web sites for more information on graphic novels for libraries; collection development decisions; and Japanese comics called Manga. Includes an annotated list…
A graphically oriented specification language for automatic code generation. GRASP/Ada: A Graphical Representation of Algorithms, Structure, and Processes for Ada, phase 1

NASA Technical Reports Server (NTRS)

Cross, James H., II; Morrison, Kelly I.; May, Charles H., Jr.; Waddel, Kathryn C.

1989-01-01

The first phase of a three-phase effort to develop a new graphically oriented specification language which will facilitate the reverse engineering of Ada source code into graphical representations (GRs) as well as the automatic generation of Ada source code is described. A simplified view of the three phases of Graphical Representations for Algorithms, Structure, and Processes for Ada (GRASP/Ada) with respect to three basic classes of GRs is presented. Phase 1 concentrated on the derivation of an algorithmic diagram, the control structure diagram (CSD) (CRO88a) from Ada source code or Ada PDL. Phase 2 includes the generation of architectural and system level diagrams such as structure charts and data flow diagrams and should result in a requirements specification for a graphically oriented language able to support automatic code generation. Phase 3 will concentrate on the development of a prototype to demonstrate the feasibility of this new specification language.
Real-time autocorrelator for fluorescence correlation spectroscopy based on graphical-processor-unit architecture: method, implementation, and comparative studies

NASA Astrophysics Data System (ADS)

Laracuente, Nicholas; Grossman, Carl

2013-03-01

We developed an algorithm and software to calculate autocorrelation functions from real-time photon-counting data using the fast, parallel capabilities of graphical processor units (GPUs). Recent developments in hardware and software have allowed for general purpose computing with inexpensive GPU hardware. These devices are more suited for emulating hardware autocorrelators than traditional CPU-based software applications by emphasizing parallel throughput over sequential speed. Incoming data are binned in a standard multi-tau scheme with configurable points-per-bin size and are mapped into a GPU memory pattern to reduce time-expensive memory access. Applications include dynamic light scattering (DLS) and fluorescence correlation spectroscopy (FCS) experiments. We ran the software on a 64-core graphics pci card in a 3.2 GHz Intel i5 CPU based computer running Linux. FCS measurements were made on Alexa-546 and Texas Red dyes in a standard buffer (PBS). Software correlations were compared to hardware correlator measurements on the same signals. Supported by HHMI and Swarthmore College
Graphics Processing Unit-Accelerated Nonrigid Registration of MR Images to CT Images During CT-Guided Percutaneous Liver Tumor Ablations.

PubMed

Tokuda, Junichi; Plishker, William; Torabi, Meysam; Olubiyi, Olutayo I; Zaki, George; Tatli, Servet; Silverman, Stuart G; Shekher, Raj; Hata, Nobuhiko

2015-06-01

Accuracy and speed are essential for the intraprocedural nonrigid magnetic resonance (MR) to computed tomography (CT) image registration in the assessment of tumor margins during CT-guided liver tumor ablations. Although both accuracy and speed can be improved by limiting the registration to a region of interest (ROI), manual contouring of the ROI prolongs the registration process substantially. To achieve accurate and fast registration without the use of an ROI, we combined a nonrigid registration technique on the basis of volume subdivision with hardware acceleration using a graphics processing unit (GPU). We compared the registration accuracy and processing time of GPU-accelerated volume subdivision-based nonrigid registration technique to the conventional nonrigid B-spline registration technique. Fourteen image data sets of preprocedural MR and intraprocedural CT images for percutaneous CT-guided liver tumor ablations were obtained. Each set of images was registered using the GPU-accelerated volume subdivision technique and the B-spline technique. Manual contouring of ROI was used only for the B-spline technique. Registration accuracies (Dice similarity coefficient [DSC] and 95% Hausdorff distance [HD]) and total processing time including contouring of ROIs and computation were compared using a paired Student t test. Accuracies of the GPU-accelerated registrations and B-spline registrations, respectively, were 88.3 ± 3.7% versus 89.3 ± 4.9% (P = .41) for DSC and 13.1 ± 5.2 versus 11.4 ± 6.3 mm (P = .15) for HD. Total processing time of the GPU-accelerated registration and B-spline registration techniques was 88 ± 14 versus 557 ± 116 seconds (P < .000000002), respectively; there was no significant difference in computation time despite the difference in the complexity of the algorithms (P = .71). The GPU-accelerated volume subdivision technique was as accurate as the B-spline technique and required significantly less processing time. The GPU-accelerated volume subdivision technique may enable the implementation of nonrigid registration into routine clinical practice. Copyright © 2015 AUR. Published by Elsevier Inc. All rights reserved.
Process Simulation of Gas Metal Arc Welding Software

DOE Office of Scientific and Technical Information (OSTI.GOV)

Murray, Paul E.

2005-09-06

ARCWELDER is a Windows-based application that simulates gas metal arc welding (GMAW) of steel and aluminum. The software simulates the welding process in an accurate and efficient manner, provides menu items for process parameter selection, and includes a graphical user interface with the option to animate the process. The user enters the base and electrode material, open circuit voltage, wire diameter, wire feed speed, welding speed, and standoff distance. The program computes the size and shape of a square-groove or V-groove weld in the flat position. The program also computes the current, arc voltage, arc length, electrode extension, transfer ofmore » droplets, heat input, filler metal deposition, base metal dilution, and centerline cooling rate, in English or SI units. The simulation may be used to select welding parameters that lead to desired operation conditions.« less
Guide to making time-lapse graphics using the facilities of the National Magnetic Fusion Energy Computing Center

DOE Office of Scientific and Technical Information (OSTI.GOV)

Munro, J.K. Jr.

1980-05-01

The advent of large, fast computers has opened the way to modeling more complex physical processes and to handling very large quantities of experimental data. The amount of information that can be processed in a short period of time is so great that use of graphical displays assumes greater importance as a means of displaying this information. Information from dynamical processes can be displayed conveniently by use of animated graphics. This guide presents the basic techniques for generating black and white animated graphics, with consideration of aesthetic, mechanical, and computational problems. The guide is intended for use by someone whomore » wants to make movies on the National Magnetic Fusion Energy Computing Center (NMFECC) CDC-7600. Problems encountered by a geographically remote user are given particular attention. Detailed information is given that will allow a remote user to do some file checking and diagnosis before giving graphics files to the system for processing into film in order to spot problems without having to wait for film to be delivered. Source listings of some useful software are given in appendices along with descriptions of how to use it. 3 figures, 5 tables.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

Gallarno, George; Rogers, James H; Maxwell, Don E

The high computational capability of graphics processing units (GPUs) is enabling and driving the scientific discovery process at large-scale. The world s second fastest supercomputer for open science, Titan, has more than 18,000 GPUs that computational scientists use to perform scientific simu- lations and data analysis. Understanding of GPU reliability characteristics, however, is still in its nascent stage since GPUs have only recently been deployed at large-scale. This paper presents a detailed study of GPU errors and their impact on system operations and applications, describing experiences with the 18,688 GPUs on the Titan supercom- puter as well as lessons learnedmore » in the process of efficient operation of GPUs at scale. These experiences are helpful to HPC sites which already have large-scale GPU clusters or plan to deploy GPUs in the future.« less
Simultaneous reconstruction of multiple depth images without off-focus points in integral imaging using a graphics processing unit.

PubMed

Yi, Faliu; Lee, Jieun; Moon, Inkyu

2014-05-01

The reconstruction of multiple depth images with a ray back-propagation algorithm in three-dimensional (3D) computational integral imaging is computationally burdensome. Further, a reconstructed depth image consists of a focus and an off-focus area. Focus areas are 3D points on the surface of an object that are located at the reconstructed depth, while off-focus areas include 3D points in free-space that do not belong to any object surface in 3D space. Generally, without being removed, the presence of an off-focus area would adversely affect the high-level analysis of a 3D object, including its classification, recognition, and tracking. Here, we use a graphics processing unit (GPU) that supports parallel processing with multiple processors to simultaneously reconstruct multiple depth images using a lookup table containing the shifted values along the x and y directions for each elemental image in a given depth range. Moreover, each 3D point on a depth image can be measured by analyzing its statistical variance with its corresponding samples, which are captured by the two-dimensional (2D) elemental images. These statistical variances can be used to classify depth image pixels as either focus or off-focus points. At this stage, the measurement of focus and off-focus points in multiple depth images is also implemented in parallel on a GPU. Our proposed method is conducted based on the assumption that there is no occlusion of the 3D object during the capture stage of the integral imaging process. Experimental results have demonstrated that this method is capable of removing off-focus points in the reconstructed depth image. The results also showed that using a GPU to remove the off-focus points could greatly improve the overall computational speed compared with using a CPU.
A new tool for supervised classification of satellite images available on web servers: Google Maps as a case study

NASA Astrophysics Data System (ADS)

García-Flores, Agustín.; Paz-Gallardo, Abel; Plaza, Antonio; Li, Jun

2016-10-01

This paper describes a new web platform dedicated to the classification of satellite images called Hypergim. The current implementation of this platform enables users to perform classification of satellite images from any part of the world thanks to the worldwide maps provided by Google Maps. To perform this classification, Hypergim uses unsupervised algorithms like Isodata and K-means. Here, we present an extension of the original platform in which we adapt Hypergim in order to use supervised algorithms to improve the classification results. This involves a significant modification of the user interface, providing the user with a way to obtain samples of classes present in the images to use in the training phase of the classification process. Another main goal of this development is to improve the runtime of the image classification process. To achieve this goal, we use a parallel implementation of the Random Forest classification algorithm. This implementation is a modification of the well-known CURFIL software package. The use of this type of algorithms to perform image classification is widespread today thanks to its precision and ease of training. The actual implementation of Random Forest was developed using CUDA platform, which enables us to exploit the potential of several models of NVIDIA graphics processing units using them to execute general purpose computing tasks as image classification algorithms. As well as CUDA, we use other parallel libraries as Intel Boost, taking advantage of the multithreading capabilities of modern CPUs. To ensure the best possible results, the platform is deployed in a cluster of commodity graphics processing units (GPUs), so that multiple users can use the tool in a concurrent way. The experimental results indicate that this new algorithm widely outperform the previous unsupervised algorithms implemented in Hypergim, both in runtime as well as precision of the actual classification of the images.
Graphic Warning Labels Elicit Affective and Thoughtful Responses from Smokers: Results of a Randomized Clinical Trial.

PubMed

Evans, Abigail T; Peters, Ellen; Strasser, Andrew A; Emery, Lydia F; Sheerin, Kaitlin M; Romer, Daniel

2015-01-01

Observational research suggests that placing graphic images on cigarette warning labels can reduce smoking rates, but field studies lack experimental control. Our primary objective was to determine the psychological processes set in motion by naturalistic exposure to graphic vs. text-only warnings in a randomized clinical trial involving exposure to modified cigarette packs over a 4-week period. Theories of graphic-warning impact were tested by examining affect toward smoking, credibility of warning information, risk perceptions, quit intentions, warning label memory, and smoking risk knowledge. Adults who smoked between 5 and 40 cigarettes daily (N = 293; mean age = 33.7), did not have a contra-indicated medical condition, and did not intend to quit were recruited from Philadelphia, PA and Columbus, OH. Smokers were randomly assigned to receive their own brand of cigarettes for four weeks in one of three warning conditions: text only, graphic images plus text, or graphic images with elaborated text. Data from 244 participants who completed the trial were analyzed in structural-equation models. The presence of graphic images (compared to text-only) caused more negative affect toward smoking, a process that indirectly influenced risk perceptions and quit intentions (e.g., image->negative affect->risk perception->quit intention). Negative affect from graphic images also enhanced warning credibility including through increased scrutiny of the warnings, a process that also indirectly affected risk perceptions and quit intentions (e.g., image->negative affect->risk scrutiny->warning credibility->risk perception->quit intention). Unexpectedly, elaborated text reduced warning credibility. Finally, graphic warnings increased warning-information recall and indirectly increased smoking-risk knowledge at the end of the trial and one month later. In the first naturalistic clinical trial conducted, graphic warning labels are more effective than text-only warnings in encouraging smokers to consider quitting and in educating them about smoking's risks. Negative affective reactions to smoking, thinking about risks, and perceptions of credibility are mediators of their impact. Clinicaltrials.gov NCT01782053.
Vital Signs for Instructional Design

ERIC Educational Resources Information Center

Ley, Kathryn; Gannon-Cook, Ruth

2014-01-01

The purpose of this study was to investigate the relationship between a collaborative design process for selecting instructional graphics and online learner perceptions of graphic appropriateness. At the end of their online graduate course, 9 students ranked how appropriately each of 25 graphics represented 1 of 8 human performance technology…
Information Sharing for Medical Triage Tasking During Mass Casualty/Humanitarian Operations

DTIC Science & Technology

2009-12-01

military patrol units or surreptitious " cloak and dagger " fact gathering missions to gain photographic/video graphic data for dissemination to the...fractured command and control organization and retarded deployment of resources. Tragedies such as Hurricane Katrina in 2005, the September 11 attacks of...with PKI certificates and HMAC protection from replay attacks and UDP flooding [17]. 3. Triage Graphical User Interface (GUI) Currently the GUI for
Massively parallel algorithm and implementation of RI-MP2 energy calculation for peta-scale many-core supercomputers.

PubMed

Katouda, Michio; Naruse, Akira; Hirano, Yukihiko; Nakajima, Takahito

2016-11-15

A new parallel algorithm and its implementation for the RI-MP2 energy calculation utilizing peta-flop-class many-core supercomputers are presented. Some improvements from the previous algorithm (J. Chem. Theory Comput. 2013, 9, 5373) have been performed: (1) a dual-level hierarchical parallelization scheme that enables the use of more than 10,000 Message Passing Interface (MPI) processes and (2) a new data communication scheme that reduces network communication overhead. A multi-node and multi-GPU implementation of the present algorithm is presented for calculations on a central processing unit (CPU)/graphics processing unit (GPU) hybrid supercomputer. Benchmark results of the new algorithm and its implementation using the K computer (CPU clustering system) and TSUBAME 2.5 (CPU/GPU hybrid system) demonstrate high efficiency. The peak performance of 3.1 PFLOPS is attained using 80,199 nodes of the K computer. The peak performance of the multi-node and multi-GPU implementation is 514 TFLOPS using 1349 nodes and 4047 GPUs of TSUBAME 2.5. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
A Hybrid CPU/GPU Pattern-Matching Algorithm for Deep Packet Inspection

PubMed Central

Chen, Yaw-Chung

2015-01-01

The large quantities of data now being transferred via high-speed networks have made deep packet inspection indispensable for security purposes. Scalable and low-cost signature-based network intrusion detection systems have been developed for deep packet inspection for various software platforms. Traditional approaches that only involve central processing units (CPUs) are now considered inadequate in terms of inspection speed. Graphic processing units (GPUs) have superior parallel processing power, but transmission bottlenecks can reduce optimal GPU efficiency. In this paper we describe our proposal for a hybrid CPU/GPU pattern-matching algorithm (HPMA) that divides and distributes the packet-inspecting workload between a CPU and GPU. All packets are initially inspected by the CPU and filtered using a simple pre-filtering algorithm, and packets that might contain malicious content are sent to the GPU for further inspection. Test results indicate that in terms of random payload traffic, the matching speed of our proposed algorithm was 3.4 times and 2.7 times faster than those of the AC-CPU and AC-GPU algorithms, respectively. Further, HPMA achieved higher energy efficiency than the other tested algorithms. PMID:26437335
A Hybrid CPU/GPU Pattern-Matching Algorithm for Deep Packet Inspection.

PubMed

Lee, Chun-Liang; Lin, Yi-Shan; Chen, Yaw-Chung

2015-01-01

The large quantities of data now being transferred via high-speed networks have made deep packet inspection indispensable for security purposes. Scalable and low-cost signature-based network intrusion detection systems have been developed for deep packet inspection for various software platforms. Traditional approaches that only involve central processing units (CPUs) are now considered inadequate in terms of inspection speed. Graphic processing units (GPUs) have superior parallel processing power, but transmission bottlenecks can reduce optimal GPU efficiency. In this paper we describe our proposal for a hybrid CPU/GPU pattern-matching algorithm (HPMA) that divides and distributes the packet-inspecting workload between a CPU and GPU. All packets are initially inspected by the CPU and filtered using a simple pre-filtering algorithm, and packets that might contain malicious content are sent to the GPU for further inspection. Test results indicate that in terms of random payload traffic, the matching speed of our proposed algorithm was 3.4 times and 2.7 times faster than those of the AC-CPU and AC-GPU algorithms, respectively. Further, HPMA achieved higher energy efficiency than the other tested algorithms.
Graphical Representation of Parallel Algorithmic Processes

DTIC Science & Technology

1990-12-01

interface with the AAARF main process . The source code for the AAARF class-common library is in the common subdi- rectory and consists of the following files... for public release; distribution unlimited AFIT/GCE/ENG/90D-07 Graphical Representation of Parallel Algorithmic Processes THESIS Presented to the...goal of this study is to develop an algorithm animation facility for parallel processes executing on different architectures, from multiprocessor
On the Road to Graphicacy: The Learning of Graphical Representation Systems

ERIC Educational Resources Information Center

Postigo, Yolanda; Pozo, Juan Ignacio

2004-01-01

This article examines the learning of different types of graphic information by subjects with different levels of education and knowledge of the content represented. Three levels of graphic information learning were distinguished (explicit, implicit, and conceptual information processing) and two experiments were conducted, looking at graph and…

Conceptual Learning with Multiple Graphical Representations: Intelligent Tutoring Systems Support for Sense-Making and Fluency-Building Processes

ERIC Educational Resources Information Center

Rau, Martina A.

2013-01-01

Most learning environments in the STEM disciplines use multiple graphical representations along with textual descriptions and symbolic representations. Multiple graphical representations are powerful learning tools because they can emphasize complementary aspects of complex learning contents. However, to benefit from multiple graphical…
Stable stress‐drop measurements and their variability: Implications for ground‐motion prediction

USGS Publications Warehouse

Hanks, Thomas C.; Baltay, Annemarie S.; Beroza, Gregory C.

2013-01-01

We estimate the arms‐stress drop, Graphic, (Hanks, 1979) using acceleration time records of 59 earthquakes from two earthquake sequences in eastern Honshu, Japan. These acceleration‐based static stress drops compare well to stress drops calculated for the same events by Baltay et al. (2011) using an empirical Green’s function (eGf) approach. This agreement supports the assumption that earthquake acceleration time histories in the bandwidth between the corner frequency and a maximum observed frequency can be considered white, Gaussian, noise. Although the Graphic is computationally simpler than the eGf‐based Graphic‐stress drop, and is used as the “stress parameter” to describe the earthquake source in ground‐motion prediction equations, we find that it only compares well to the Graphic at source‐station distances of ∼20 km or less because there is no consideration of whole‐path anelastic attenuation or scattering. In these circumstances, the correlation between the Graphic and Graphic is strong. Events with high and low stress drops obtained through the eGf method have similarly high and low Graphic. We find that the inter‐event standard deviation of stress drop, for the population of earthquakes considered, is similar for both methods, 0.40 for the Graphic method and 0.42 for the Graphic, in log10 units, provided we apply the ∼20 km distance restriction to Graphic. This indicates that the observed variability is inherent to the source, rather than attributable to uncertainties in stress‐drop estimates
The value of animations in biology teaching: a study of long-term memory retention.

PubMed

O'Day, Danton H

2007-01-01

Previous work has established that a narrated animation is more effective at communicating a complex biological process (signal transduction) than the equivalent graphic with figure legend. To my knowledge, no study has been done in any subject area on the effectiveness of animations versus graphics in the long-term retention of information, a primary and critical issue in studies of teaching and learning. In this study, involving 393 student responses, three different animations and two graphics-one with and one lacking a legend-were used to determine the long-term retention of information. The results show that students retain more information 21 d after viewing an animation without narration compared with an equivalent graphic whether or not that graphic had a legend. Students' comments provide additional insight into the value of animations in the pedagogical process, and suggestions for future work are proposed.
Symplectic multi-particle tracking on GPUs

NASA Astrophysics Data System (ADS)

Liu, Zhicong; Qiang, Ji

2018-05-01

A symplectic multi-particle tracking model is implemented on the Graphic Processing Units (GPUs) using the Compute Unified Device Architecture (CUDA) language. The symplectic tracking model can preserve phase space structure and reduce non-physical effects in long term simulation, which is important for beam property evaluation in particle accelerators. Though this model is computationally expensive, it is very suitable for parallelization and can be accelerated significantly by using GPUs. In this paper, we optimized the implementation of the symplectic tracking model on both single GPU and multiple GPUs. Using a single GPU processor, the code achieves a factor of 2-10 speedup for a range of problem sizes compared with the time on a single state-of-the-art Central Processing Unit (CPU) node with similar power consumption and semiconductor technology. It also shows good scalability on a multi-GPU cluster at Oak Ridge Leadership Computing Facility. In an application to beam dynamics simulation, the GPU implementation helps save more than a factor of two total computing time in comparison to the CPU implementation.
GPU-based streaming architectures for fast cone-beam CT image reconstruction and demons deformable registration.

PubMed

Sharp, G C; Kandasamy, N; Singh, H; Folkert, M

2007-10-07

This paper shows how to significantly accelerate cone-beam CT reconstruction and 3D deformable image registration using the stream-processing model. We describe data-parallel designs for the Feldkamp, Davis and Kress (FDK) reconstruction algorithm, and the demons deformable registration algorithm, suitable for use on a commodity graphics processing unit. The streaming versions of these algorithms are implemented using the Brook programming environment and executed on an NVidia 8800 GPU. Performance results using CT data of a preserved swine lung indicate that the GPU-based implementations of the FDK and demons algorithms achieve a substantial speedup--up to 80 times for FDK and 70 times for demons when compared to an optimized reference implementation on a 2.8 GHz Intel processor. In addition, the accuracy of the GPU-based implementations was found to be excellent. Compared with CPU-based implementations, the RMS differences were less than 0.1 Hounsfield unit for reconstruction and less than 0.1 mm for deformable registration.
Acceleration of the matrix multiplication of Radiance three phase daylighting simulations with parallel computing on heterogeneous hardware of personal computer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zuo, Wangda; McNeil, Andrew; Wetter, Michael

2013-05-23

Building designers are increasingly relying on complex fenestration systems to reduce energy consumed for lighting and HVAC in low energy buildings. Radiance, a lighting simulation program, has been used to conduct daylighting simulations for complex fenestration systems. Depending on the configurations, the simulation can take hours or even days using a personal computer. This paper describes how to accelerate the matrix multiplication portion of a Radiance three-phase daylight simulation by conducting parallel computing on heterogeneous hardware of a personal computer. The algorithm was optimized and the computational part was implemented in parallel using OpenCL. The speed of new approach wasmore » evaluated using various daylighting simulation cases on a multicore central processing unit and a graphics processing unit. Based on the measurements and analysis of the time usage for the Radiance daylighting simulation, further speedups can be achieved by using fast I/O devices and storing the data in a binary format.« less
QMCPACK : an open source ab initio quantum Monte Carlo package for the electronic structure of atoms, molecules and solids

DOE PAGES

Kim, Jeongnim; Baczewski, Andrew T.; Beaudet, Todd D.; ...

2018-04-19

QMCPACK is an open source quantum Monte Carlo package for ab-initio electronic structure calculations. It supports calculations of metallic and insulating solids, molecules, atoms, and some model Hamiltonians. Implemented real space quantum Monte Carlo algorithms include variational, diffusion, and reptation Monte Carlo. QMCPACK uses Slater-Jastrow type trial wave functions in conjunction with a sophisticated optimizer capable of optimizing tens of thousands of parameters. The orbital space auxiliary field quantum Monte Carlo method is also implemented, enabling cross validation between different highly accurate methods. The code is specifically optimized for calculations with large numbers of electrons on the latest high performancemore » computing architectures, including multicore central processing unit (CPU) and graphical processing unit (GPU) systems. We detail the program’s capabilities, outline its structure, and give examples of its use in current research calculations. The package is available at http://www.qmcpack.org.« less
GPU accelerated manifold correction method for spinning compact binaries

NASA Astrophysics Data System (ADS)

Ran, Chong-xi; Liu, Song; Zhong, Shuang-ying

2018-04-01

The graphics processing unit (GPU) acceleration of the manifold correction algorithm based on the compute unified device architecture (CUDA) technology is designed to simulate the dynamic evolution of the Post-Newtonian (PN) Hamiltonian formulation of spinning compact binaries. The feasibility and the efficiency of parallel computation on GPU have been confirmed by various numerical experiments. The numerical comparisons show that the accuracy on GPU execution of manifold corrections method has a good agreement with the execution of codes on merely central processing unit (CPU-based) method. The acceleration ability when the codes are implemented on GPU can increase enormously through the use of shared memory and register optimization techniques without additional hardware costs, implying that the speedup is nearly 13 times as compared with the codes executed on CPU for phase space scan (including 314 × 314 orbits). In addition, GPU-accelerated manifold correction method is used to numerically study how dynamics are affected by the spin-induced quadrupole-monopole interaction for black hole binary system.
Quantum supercharger library: hyper-parallelism of the Hartree-Fock method.

PubMed

Fernandes, Kyle D; Renison, C Alicia; Naidoo, Kevin J

2015-07-05

We present here a set of algorithms that completely rewrites the Hartree-Fock (HF) computations common to many legacy electronic structure packages (such as GAMESS-US, GAMESS-UK, and NWChem) into a massively parallel compute scheme that takes advantage of hardware accelerators such as Graphical Processing Units (GPUs). The HF compute algorithm is core to a library of routines that we name the Quantum Supercharger Library (QSL). We briefly evaluate the QSL's performance and report that it accelerates a HF 6-31G Self-Consistent Field (SCF) computation by up to 20 times for medium sized molecules (such as a buckyball) when compared with mature Central Processing Unit algorithms available in the legacy codes in regular use by researchers. It achieves this acceleration by massive parallelization of the one- and two-electron integrals and optimization of the SCF and Direct Inversion in the Iterative Subspace routines through the use of GPU linear algebra libraries. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.
QMCPACK : an open source ab initio quantum Monte Carlo package for the electronic structure of atoms, molecules and solids

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kim, Jeongnim; Baczewski, Andrew T.; Beaudet, Todd D.

QMCPACK is an open source quantum Monte Carlo package for ab-initio electronic structure calculations. It supports calculations of metallic and insulating solids, molecules, atoms, and some model Hamiltonians. Implemented real space quantum Monte Carlo algorithms include variational, diffusion, and reptation Monte Carlo. QMCPACK uses Slater-Jastrow type trial wave functions in conjunction with a sophisticated optimizer capable of optimizing tens of thousands of parameters. The orbital space auxiliary field quantum Monte Carlo method is also implemented, enabling cross validation between different highly accurate methods. The code is specifically optimized for calculations with large numbers of electrons on the latest high performancemore » computing architectures, including multicore central processing unit (CPU) and graphical processing unit (GPU) systems. We detail the program’s capabilities, outline its structure, and give examples of its use in current research calculations. The package is available at http://www.qmcpack.org.« less
Computer-composite mapping for geologists

USGS Publications Warehouse

van Driel, J.N.

1980-01-01

A computer program for overlaying maps has been tested and evaluated as a means for producing geologic derivative maps. Four maps of the Sugar House Quadrangle, Utah, were combined, using the Multi-Scale Data Analysis and Mapping Program, in a single composite map that shows the relative stability of the land surface during earthquakes. Computer-composite mapping can provide geologists with a powerful analytical tool and a flexible graphic display technique. Digitized map units can be shown singly, grouped with different units from the same map, or combined with units from other source maps to produce composite maps. The mapping program permits the user to assign various values to the map units and to specify symbology for the final map. Because of its flexible storage, easy manipulation, and capabilities of graphic output, the composite-mapping technique can readily be applied to mapping projects in sedimentary and crystalline terranes, as well as to maps showing mineral resource potential. ?? 1980 Springer-Verlag New York Inc.
Surface and deep structures in graphics comprehension.

PubMed

Schnotz, Wolfgang; Baadte, Christiane

2015-05-01

Comprehension of graphics can be considered as a process of schema-mediated structure mapping from external graphics on internal mental models. Two experiments were conducted to test the hypothesis that graphics possess a perceptible surface structure as well as a semantic deep structure both of which affect mental model construction. The same content was presented to different groups of learners by graphics from different perspectives with different surface structures but the same deep structure. Deep structures were complementary: major features of the learning content in one experiment became minor features in the other experiment, and vice versa. Text was held constant. Participants were asked to read, understand, and memorize the learning material. Furthermore, they were either instructed to process the material from the perspective supported by the graphic or from an alternative perspective, or they received no further instruction. After learning, they were asked to recall the learning content from different perspectives by completing graphs of different formats as accurately as possible. Learners' recall was more accurate if the format of recall was the same as the learning format which indicates surface structure influences. However, participants also showed more accurate recall when they remembered the content from a perspective emphasizing the deep structure, regardless of the graphics format presented before. This included better recall of what they had not seen than of what they really had seen before. That is, deep structure effects overrode surface effects. Depending on context conditions, stimulation of additional cognitive processing by instruction had partially positive and partially negative effects.
A State Articulated Instructional Objectives Guide for Occupational Education Programs. State Pilot Model for Drafting (Graphic Communications). Part I--Basic. Part II--Specialty Programs. Section A (Mechanical Drafting and Design). Section B (Architectural Drafting and Design).

ERIC Educational Resources Information Center

North Carolina State Dept. of Community Colleges, Raleigh.

A two-part articulation instructional objective guide for drafting (graphic communications) is provided. Part I contains summary information on seven blocks (courses) of instruction. They are as follow: introduction; basic technical drafting; problem solving in graphics; reproduction processes; freehand drawing and sketching; graphics composition;…
SU-E-J-91: FFT Based Medical Image Registration Using a Graphics Processing Unit (GPU).

PubMed

Luce, J; Hoggarth, M; Lin, J; Block, A; Roeske, J

2012-06-01

To evaluate the efficiency gains obtained from using a Graphics Processing Unit (GPU) to perform a Fourier Transform (FT) based image registration. Fourier-based image registration involves obtaining the FT of the component images, and analyzing them in Fourier space to determine the translations and rotations of one image set relative to another. An important property of FT registration is that by enlarging the images (adding additional pixels), one can obtain translations and rotations with sub-pixel resolution. The expense, however, is an increased computational time. GPUs may decrease the computational time associated with FT image registration by taking advantage of their parallel architecture to perform matrix computations much more efficiently than a Central Processor Unit (CPU). In order to evaluate the computational gains produced by a GPU, images with known translational shifts were utilized. A program was written in the Interactive Data Language (IDL; Exelis, Boulder, CO) to performCPU-based calculations. Subsequently, the program was modified using GPU bindings (Tech-X, Boulder, CO) to perform GPU-based computation on the same system. Multiple image sizes were used, ranging from 256×256 to 2304×2304. The time required to complete the full algorithm by the CPU and GPU were benchmarked and the speed increase was defined as the ratio of the CPU-to-GPU computational time. The ratio of the CPU-to- GPU time was greater than 1.0 for all images, which indicates the GPU is performing the algorithm faster than the CPU. The smallest improvement, a 1.21 ratio, was found with the smallest image size of 256×256, and the largest speedup, a 4.25 ratio, was observed with the largest image size of 2304×2304. GPU programming resulted in a significant decrease in computational time associated with a FT image registration algorithm. The inclusion of the GPU may provide near real-time, sub-pixel registration capability. © 2012 American Association of Physicists in Medicine.
Adaptation of a Control Center Development Environment for Industrial Process Control

NASA Technical Reports Server (NTRS)

Killough, Ronnie L.; Malik, James M.

1994-01-01

In the control center, raw telemetry data is received for storage, display, and analysis. This raw data must be combined and manipulated in various ways by mathematical computations to facilitate analysis, provide diversified fault detection mechanisms, and enhance display readability. A development tool called the Graphical Computation Builder (GCB) has been implemented which provides flight controllers with the capability to implement computations for use in the control center. The GCB provides a language that contains both general programming constructs and language elements specifically tailored for the control center environment. The GCB concept allows staff who are not skilled in computer programming to author and maintain computer programs. The GCB user is isolated from the details of external subsystem interfaces and has access to high-level functions such as matrix operators, trigonometric functions, and unit conversion macros. The GCB provides a high level of feedback during computation development that improves upon the often cryptic errors produced by computer language compilers. An equivalent need can be identified in the industrial data acquisition and process control domain: that of an integrated graphical development tool tailored to the application to hide the operating system, computer language, and data acquisition interface details. The GCB features a modular design which makes it suitable for technology transfer without significant rework. Control center-specific language elements can be replaced by elements specific to industrial process control.
[Soft- and hardware support for the setup for computer tracking of radiation teletherapy].

PubMed

Tarutin, I G; Piliavets, V I; Strakh, A G; Minenko, V F; Golubovskiĭ, A I

1983-06-01

A hard and soft ware computer assisted complex has been worked out for gamma-beam therapy. The complex included all radiotherapeutic units, including a Siemens program controlled betatron with an energy of 42 MEV computer ES-1022, a Medigraf system of the processing of graphic information, a Mars-256 system for control over the homogeneity of distribution of dose rate on the field of irradiation and a package of mathematical programs to select a plan of irradiation of various tumor sites. The prospects of the utilization of such complexes in the dosimetric support of radiation therapy are discussed.
Towards an interactive electromechanical model of the heart

PubMed Central

Talbot, Hugo; Marchesseau, Stéphanie; Duriez, Christian; Sermesant, Maxime; Cotin, Stéphane; Delingette, Hervé

2013-01-01

In this work, we develop an interactive framework for rehearsal of and training in cardiac catheter ablation, and for planning cardiac resynchronization therapy. To this end, an interactive and real-time electrophysiology model of the heart is developed to fit patient-specific data. The proposed interactive framework relies on two main contributions. First, an efficient implementation of cardiac electrophysiology is proposed, using the latest graphics processing unit computing techniques. Second, a mechanical simulation is then coupled to the electrophysiological signals to produce realistic motion of the heart. We demonstrate that pathological mechanical and electrophysiological behaviour can be simulated. PMID:24427533
Experiences in autotuning matrix multiplication for energy minimization on GPUs

DOE PAGES

Anzt, Hartwig; Haugen, Blake; Kurzak, Jakub; ...

2015-05-20

In this study, we report extensive results and analysis of autotuning the computationally intensive graphics processing units kernel for dense matrix–matrix multiplication in double precision. In contrast to traditional autotuning and/or optimization for runtime performance only, we also take the energy efficiency into account. For kernels achieving equal performance, we show significant differences in their energy balance. We also identify the memory throughput as the most influential metric that trades off performance and energy efficiency. Finally, as a result, the performance optimal case ends up not being the most efficient kernel in overall resource use.
GPU-based optimal control for RWM feedback in tokamaks

DOE PAGES

Clement, Mitchell; Hanson, Jeremy; Bialek, Jim; ...

2017-08-23

The design and implementation of a Graphics Processing Unit (GPU) based Resistive Wall Mode (RWM) controller to perform feedback control on the RWM using Linear Quadratic Gaussian (LQG) control is reported herein. Also, the control algorithm is based on a simplified DIII-D VALEN model. By using NVIDIA’s GPUDirect RDMA framework, the digitizer and output module are able to write and read directly to and from GPU memory, eliminating memory transfers between host and GPU. In conclusion, the system and algorithm was able to reduce plasma response excited by externally applied fields by 32% during development experiments.
GAPD: a GPU-accelerated atom-based polychromatic diffraction simulation code.

PubMed

E, J C; Wang, L; Chen, S; Zhang, Y Y; Luo, S N

2018-03-01

GAPD, a graphics-processing-unit (GPU)-accelerated atom-based polychromatic diffraction simulation code for direct, kinematics-based, simulations of X-ray/electron diffraction of large-scale atomic systems with mono-/polychromatic beams and arbitrary plane detector geometries, is presented. This code implements GPU parallel computation via both real- and reciprocal-space decompositions. With GAPD, direct simulations are performed of the reciprocal lattice node of ultralarge systems (∼5 billion atoms) and diffraction patterns of single-crystal and polycrystalline configurations with mono- and polychromatic X-ray beams (including synchrotron undulator sources), and validation, benchmark and application cases are presented.

GPU-based optimal control for RWM feedback in tokamaks

DOE Office of Scientific and Technical Information (OSTI.GOV)

Clement, Mitchell; Hanson, Jeremy; Bialek, Jim

The design and implementation of a Graphics Processing Unit (GPU) based Resistive Wall Mode (RWM) controller to perform feedback control on the RWM using Linear Quadratic Gaussian (LQG) control is reported herein. Also, the control algorithm is based on a simplified DIII-D VALEN model. By using NVIDIA’s GPUDirect RDMA framework, the digitizer and output module are able to write and read directly to and from GPU memory, eliminating memory transfers between host and GPU. In conclusion, the system and algorithm was able to reduce plasma response excited by externally applied fields by 32% during development experiments.
Improving Quantum Gate Simulation using a GPU

NASA Astrophysics Data System (ADS)

Gutierrez, Eladio; Romero, Sergio; Trenas, Maria A.; Zapata, Emilio L.

2008-11-01

Due to the increasing computing power of the graphics processing units (GPU), they are becoming more and more popular when solving general purpose algorithms. As the simulation of quantum computers results on a problem with exponential complexity, it is advisable to perform a parallel computation, such as the one provided by the SIMD multiprocessors present in recent GPUs. In this paper, we focus on an important quantum algorithm, the quantum Fourier transform (QTF), in order to evaluate different parallelization strategies on a novel GPU architecture. Our implementation makes use of the new CUDA software/hardware architecture developed recently by NVIDIA.
Scaling of data communications for an advanced supercomputer network

NASA Technical Reports Server (NTRS)

Levin, E.; Eaton, C. K.; Young, Bruce

1986-01-01

The goal of NASA's Numerical Aerodynamic Simulation (NAS) Program is to provide a powerful computational environment for advanced research and development in aeronautics and related disciplines. The present NAS system consists of a Cray 2 supercomputer connected by a data network to a large mass storage system, to sophisticated local graphics workstations and by remote communication to researchers throughout the United States. The program plan is to continue acquiring the most powerful supercomputers as they become available. The implications of a projected 20-fold increase in processing power on the data communications requirements are described.
Processing Information in Graphical Form.

ERIC Educational Resources Information Center

Curcio, Frances R.; Smith-Burke, M. Trika

The purpose of this exploratory, descriptive study was to examine how children process different tasks of comprehension presented in graphical form. During the Spring 1981, 8 fourth graders and 9 seventh graders were interviewed. The children were presented with graphs accompanied by six questions reflecting three levels of comprehension:…
Concept Learning through Image Processing.

ERIC Educational Resources Information Center

Cifuentes, Lauren; Yi-Chuan, Jane Hsieh

This study explored computer-based image processing as a study strategy for middle school students' science concept learning. Specifically, the research examined the effects of computer graphics generation on science concept learning and the impact of using computer graphics to show interrelationships among concepts during study time. The 87…
Injection-depth-locking axial motion guided handheld micro-injector using CP-SSOCT.

PubMed

Cheon, Gyeong Woo; Huang, Yong; Kwag, Hye Rin; Kim, Ki-Young; Taylor, Russell H; Gehlbach, Peter L; Kang, Jin U

2014-01-01

This paper presents a handheld micro-injector system using common-path swept source optical coherence tomography (CP-SSOCT) as a distal sensor with highly accurate injection-depth-locking. To achieve real-time, highly precise, and intuitive freehand control, the system used graphics processing unit (GPU) to process the oversampled OCT signal with high throughput and a smart customized motion monitoring control algorithm. A performance evaluation was conducted with 60-insertions and fluorescein dye injection tests to show how accurately the system can guide the needle and lock to the target depth. The evaluation tests show our system can guide the injection needle into the desired depth with 4.12 um average deviation error while injecting 50 nl of fluorescein dye.
Climate Prediction Center - Monitoring & Data Index

Science.gov Websites

Data North American Monsoon Experiment United States Climate Data & Graphics ENSO Impacts on the United States Previous ENSO Events El NiÃ±o Impacts on United States Climate El NiÃ±o Impacts State by State La NiÃ±a Impacts by Region El NiÃ±o's Influence on United States Precipitation Amounts El NiÃ±o
Implementation of GPU accelerated SPECT reconstruction with Monte Carlo-based scatter correction.

PubMed

Bexelius, Tobias; Sohlberg, Antti

2018-06-01

Statistical SPECT reconstruction can be very time-consuming especially when compensations for collimator and detector response, attenuation, and scatter are included in the reconstruction. This work proposes an accelerated SPECT reconstruction algorithm based on graphics processing unit (GPU) processing. Ordered subset expectation maximization (OSEM) algorithm with CT-based attenuation modelling, depth-dependent Gaussian convolution-based collimator-detector response modelling, and Monte Carlo-based scatter compensation was implemented using OpenCL. The OpenCL implementation was compared against the existing multi-threaded OSEM implementation running on a central processing unit (CPU) in terms of scatter-to-primary ratios, standardized uptake values (SUVs), and processing speed using mathematical phantoms and clinical multi-bed bone SPECT/CT studies. The difference in scatter-to-primary ratios, visual appearance, and SUVs between GPU and CPU implementations was minor. On the other hand, at its best, the GPU implementation was noticed to be 24 times faster than the multi-threaded CPU version on a normal 128 × 128 matrix size 3 bed bone SPECT/CT data set when compensations for collimator and detector response, attenuation, and scatter were included. GPU SPECT reconstructions show great promise as an every day clinical reconstruction tool.
Industrial Technology Modernization Program. Project 32. Factory Vision. Phase 2

DTIC Science & Technology

1988-04-01

instructions for the PWA’s, generating the numerical control (NC) program instructions for factory assembly equipment, controlling the process... generating the numerical control (NC) program instructions for factory assembly equipment, controlling the production process instructions and NC... Assembly Operations the "Create Production Process Program" will automatically generate a sequence of graphics pages (in paper mode), or graphics screens
Use of graphics in the design office at the Military Aircraft Division of the British Aircraft Corporation

NASA Technical Reports Server (NTRS)

Coles, W. A.

1975-01-01

The CAD/CAM interactive computer graphics system was described; uses to which it has been put were shown, and current developments of the system were outlined. The system supports batch, time sharing, and fully interactive graphic processing. Engineers using the system may switch between these methods of data processing and problem solving to make the best use of the available resources. It is concluded that the introduction of on-line computing in the form of teletypes, storage tubes, and fully interactive graphics has resulted in large increases in productivity and reduced timescales in the geometric computing, numerical lofting and part programming areas, together with a greater utilization of the system in the technical departments.
Distributed computation of graphics primitives on a transputer network

NASA Technical Reports Server (NTRS)

Ellis, Graham K.

1988-01-01

A method is developed for distributing the computation of graphics primitives on a parallel processing network. Off-the-shelf transputer boards are used to perform the graphics transformations and scan-conversion tasks that would normally be assigned to a single transputer based display processor. Each node in the network performs a single graphics primitive computation. Frequently requested tasks can be duplicated on several nodes. The results indicate that the current distribution of commands on the graphics network shows a performance degradation when compared to the graphics display board alone. A change to more computation per node for every communication (perform more complex tasks on each node) may cause the desired increase in throughput.
A Strategy for Making Content Reading Successful: Grades 4-6.

ERIC Educational Resources Information Center

Alvermann, Donna E.; Boothby, Paula R.

A graphic organizer is a tree diagram that consists of vocabulary related to one particular concept. A modified version of a graphic organizer contains empty slots that represent missing information and actively involves students during the reading process as opposed to before or after. This modified graphic organizer can provide both the…
Role of Graphics Tools in the Learning Design Process

ERIC Educational Resources Information Center

Laisney, Patrice; Brandt-Pomares, Pascale

2015-01-01

This paper discusses the design activities of students in secondary school in France. Graphics tools are now part of the capacity of design professionals. It is therefore apt to reflect on their integration into the technological education. Has the use of intermediate graphical tools changed students' performance, and if so in what direction, in…
Graphic Arts: Process Camera, Stripping, and Platemaking. Teacher Guide.

ERIC Educational Resources Information Center

Feasley, Sue C., Ed.

This curriculum guide is the second in a three-volume series of instructional materials for competency-based graphic arts instruction. Each publication is designed to include the technical content and tasks necessary for a student to be employed in an entry-level graphic arts occupation. Introductory materials include an instructional/task…
Critique and Process: Signature Pedagogies in the Graphic Design Classroom

ERIC Educational Resources Information Center

Motley, Phillip

2017-01-01

Like many disciplines in design and the visual fine arts, critique is a signature pedagogy in the graphic design classroom. It serves as both a formative and summative assessment while also giving students the opportunity to practice the habits of graphic design. Critiques help students become keen observers of relevant disciplinary criteria;…
A DDC Bibliography on Optical or Graphic Information Processing (Information Sciences Series). Volume I.

ERIC Educational Resources Information Center

Defense Documentation Center, Alexandria, VA.

This unclassified-unlimited bibliography contains 183 references, with abstracts, dealing specifically with optical or graphic information processing. Citations are grouped under three headings: display devices and theory, character recognition, and pattern recognition. Within each group, they are arranged in accession number (AD-number) sequence.…
An Interactive Graphics Program for Investigating Digital Signal Processing.

ERIC Educational Resources Information Center

Miller, Billy K.; And Others

1983-01-01

Describes development of an interactive computer graphics program for use in teaching digital signal processing. The program allows students to interactively configure digital systems on a monitor display and observe their system's performance by means of digital plots on the system's outputs. A sample program run is included. (JN)
The Use of Computer Graphics in the Design Process.

ERIC Educational Resources Information Center

Palazzi, Maria

This master's thesis examines applications of computer technology to the field of industrial design and ways in which technology can transform the traditional process. Following a statement of the problem, the history and applications of the fields of computer graphics and industrial design are reviewed. The traditional industrial design process…
A Prototype Lisp-Based Soft Real-Time Object-Oriented Graphical User Interface for Control System Development

NASA Technical Reports Server (NTRS)

Litt, Jonathan; Wong, Edmond; Simon, Donald L.

1994-01-01

A prototype Lisp-based soft real-time object-oriented Graphical User Interface for control system development is presented. The Graphical User Interface executes alongside a test system in laboratory conditions to permit observation of the closed loop operation through animation, graphics, and text. Since it must perform interactive graphics while updating the screen in real time, techniques are discussed which allow quick, efficient data processing and animation. Examples from an implementation are included to demonstrate some typical functionalities which allow the user to follow the control system's operation.
A graphical language for reliability model generation

NASA Technical Reports Server (NTRS)

Howell, Sandra V.; Bavuso, Salvatore J.; Haley, Pamela J.

1990-01-01

A graphical interface capability of the hybrid automated reliability predictor (HARP) is described. The graphics-oriented (GO) module provides the user with a graphical language for modeling system failure modes through the selection of various fault tree gates, including sequence dependency gates, or by a Markov chain. With this graphical input language, a fault tree becomes a convenient notation for describing a system. In accounting for any sequence dependencies, HARP converts the fault-tree notation to a complex stochastic process that is reduced to a Markov chain which it can then solve for system reliability. The graphics capability is available for use on an IBM-compatible PC, a Sun, and a VAX workstation. The GO module is written in the C programming language and uses the Graphical Kernel System (GKS) standard for graphics implementation. The PC, VAX, and Sun versions of the HARP GO module are currently in beta-testing.

GRASP/Ada: Graphical Representations of Algorithms, Structures, and Processes for Ada. The development of a program analysis environment for Ada: Reverse engineering tools for Ada, task 2, phase 3

NASA Technical Reports Server (NTRS)

Cross, James H., II

1991-01-01

The main objective is the investigation, formulation, and generation of graphical representations of algorithms, structures, and processes for Ada (GRASP/Ada). The presented task, in which various graphical representations that can be extracted or generated from source code are described and categorized, is focused on reverse engineering. The following subject areas are covered: the system model; control structure diagram generator; object oriented design diagram generator; user interface; and the GRASP library.
Graphical Language for Data Processing

NASA Technical Reports Server (NTRS)

Alphonso, Keith

2011-01-01

A graphical language for processing data allows processing elements to be connected with virtual wires that represent data flows between processing modules. The processing of complex data, such as lidar data, requires many different algorithms to be applied. The purpose of this innovation is to automate the processing of complex data, such as LIDAR, without the need for complex scripting and programming languages. The system consists of a set of user-interface components that allow the user to drag and drop various algorithmic and processing components onto a process graph. By working graphically, the user can completely visualize the process flow and create complex diagrams. This innovation supports the nesting of graphs, such that a graph can be included in another graph as a single step for processing. In addition to the user interface components, the system includes a set of .NET classes that represent the graph internally. These classes provide the internal system representation of the graphical user interface. The system includes a graph execution component that reads the internal representation of the graph (as described above) and executes that graph. The execution of the graph follows the interpreted model of execution in that each node is traversed and executed from the original internal representation. In addition, there are components that allow external code elements, such as algorithms, to be easily integrated into the system, thus making the system infinitely expandable.
Applying a visual language for image processing as a graphical teaching tool in medical imaging

NASA Astrophysics Data System (ADS)

Birchman, James J.; Tanimoto, Steven L.; Rowberg, Alan H.; Choi, Hyung-Sik; Kim, Yongmin

1992-05-01

Typical user interaction in image processing is with command line entries, pull-down menus, or text menu selections from a list, and as such is not generally graphical in nature. Although applying these interactive methods to construct more sophisticated algorithms from a series of simple image processing steps may be clear to engineers and programmers, it may not be clear to clinicians. A solution to this problem is to implement a visual programming language using visual representations to express image processing algorithms. Visual representations promote a more natural and rapid understanding of image processing algorithms by providing more visual insight into what the algorithms do than the interactive methods mentioned above can provide. Individuals accustomed to dealing with images will be more likely to understand an algorithm that is represented visually. This is especially true of referring physicians, such as surgeons in an intensive care unit. With the increasing acceptance of picture archiving and communications system (PACS) workstations and the trend toward increasing clinical use of image processing, referring physicians will need to learn more sophisticated concepts than simply image access and display. If the procedures that they perform commonly, such as window width and window level adjustment and image enhancement using unsharp masking, are depicted visually in an interactive environment, it will be easier for them to learn and apply these concepts. The software described in this paper is a visual programming language for imaging processing which has been implemented on the NeXT computer using NeXTstep user interface development tools and other tools in an object-oriented environment. The concept is based upon the description of a visual language titled `Visualization of Vision Algorithms' (VIVA). Iconic representations of simple image processing steps are placed into a workbench screen and connected together into a dataflow path by the user. As the user creates and edits a dataflow path, more complex algorithms can be built on the screen. Once the algorithm is built, it can be executed, its results can be reviewed, and operator parameters can be interactively adjusted until an optimized output is produced. The optimized algorithm can then be saved and added to the system as a new operator. This system has been evaluated as a graphical teaching tool for window width and window level adjustment, image enhancement using unsharp masking, and other techniques.
Medical image processing on the GPU - past, present and future.

PubMed

Eklund, Anders; Dufort, Paul; Forsberg, Daniel; LaConte, Stephen M

2013-12-01

Graphics processing units (GPUs) are used today in a wide range of applications, mainly because they can dramatically accelerate parallel computing, are affordable and energy efficient. In the field of medical imaging, GPUs are in some cases crucial for enabling practical use of computationally demanding algorithms. This review presents the past and present work on GPU accelerated medical image processing, and is meant to serve as an overview and introduction to existing GPU implementations. The review covers GPU acceleration of basic image processing operations (filtering, interpolation, histogram estimation and distance transforms), the most commonly used algorithms in medical imaging (image registration, image segmentation and image denoising) and algorithms that are specific to individual modalities (CT, PET, SPECT, MRI, fMRI, DTI, ultrasound, optical imaging and microscopy). The review ends by highlighting some future possibilities and challenges. Copyright © 2013 Elsevier B.V. All rights reserved.
Fast optically sectioned fluorescence HiLo endomicroscopy.

PubMed

Ford, Tim N; Lim, Daryl; Mertz, Jerome

2012-02-01

We describe a nonscanning, fiber bundle endomicroscope that performs optically sectioned fluorescence imaging with fast frame rates and real-time processing. Our sectioning technique is based on HiLo imaging, wherein two widefield images are acquired under uniform and structured illumination and numerically processed to reject out-of-focus background. This work is an improvement upon an earlier demonstration of widefield optical sectioning through a flexible fiber bundle. The improved device features lateral and axial resolutions of 2.6 and 17 μm, respectively, a net frame rate of 9.5 Hz obtained by real-time image processing with a graphics processing unit (GPU) and significantly reduced motion artifacts obtained by the use of a double-shutter camera. We demonstrate the performance of our system with optically sectioned images and videos of a fluorescently labeled chorioallantoic membrane (CAM) in the developing G. gallus embryo. HiLo endomicroscopy is a candidate technique for low-cost, high-speed clinical optical biopsies.
Fast optically sectioned fluorescence HiLo endomicroscopy

NASA Astrophysics Data System (ADS)

Ford, Tim N.; Lim, Daryl; Mertz, Jerome

2012-02-01

We describe a nonscanning, fiber bundle endomicroscope that performs optically sectioned fluorescence imaging with fast frame rates and real-time processing. Our sectioning technique is based on HiLo imaging, wherein two widefield images are acquired under uniform and structured illumination and numerically processed to reject out-of-focus background. This work is an improvement upon an earlier demonstration of widefield optical sectioning through a flexible fiber bundle. The improved device features lateral and axial resolutions of 2.6 and 17 μm, respectively, a net frame rate of 9.5 Hz obtained by real-time image processing with a graphics processing unit (GPU) and significantly reduced motion artifacts obtained by the use of a double-shutter camera. We demonstrate the performance of our system with optically sectioned images and videos of a fluorescently labeled chorioallantoic membrane (CAM) in the developing G. gallus embryo. HiLo endomicroscopy is a candidate technique for low-cost, high-speed clinical optical biopsies.
Accelerating Biomedical Signal Processing Using GPU: A Case Study of Snore Sound Feature Extraction.

PubMed

Guo, Jian; Qian, Kun; Zhang, Gongxuan; Xu, Huijie; Schuller, Björn

2017-12-01

The advent of 'Big Data' and 'Deep Learning' offers both, a great challenge and a huge opportunity for personalised health-care. In machine learning-based biomedical data analysis, feature extraction is a key step for 'feeding' the subsequent classifiers. With increasing numbers of biomedical data, extracting features from these 'big' data is an intensive and time-consuming task. In this case study, we employ a Graphics Processing Unit (GPU) via Python to extract features from a large corpus of snore sound data. Those features can subsequently be imported into many well-known deep learning training frameworks without any format processing. The snore sound data were collected from several hospitals (20 subjects, with 770-990 MB per subject - in total 17.20 GB). Experimental results show that our GPU-based processing significantly speeds up the feature extraction phase, by up to seven times, as compared to the previous CPU system.
Molecular dynamics simulations investigating consecutive nucleation, solidification and grain growth in a twelve-million-atom Fe-system

NASA Astrophysics Data System (ADS)

Okita, Shin; Verestek, Wolfgang; Sakane, Shinji; Takaki, Tomohiro; Ohno, Munekazu; Shibuta, Yasushi

2017-09-01

Continuous processes of homogeneous nucleation, solidification and grain growth are spontaneously achieved from an undercooled iron melt without any phenomenological parameter in the molecular dynamics (MD) simulation with 12 million atoms. The nucleation rate at the critical temperature is directly estimated from the atomistic configuration by cluster analysis to be of the order of 1034 m-3 s-1. Moreover, time evolution of grain size distribution during grain growth is obtained by the combination of Voronoi and cluster analyses. The grain growth exponent is estimated to be around 0.3 from the geometric average of the grain size distribution. Comprehensive understanding of kinetic properties during continuous processes is achieved in the large-scale MD simulation by utilizing the high parallel efficiency of a graphics processing unit (GPU), which is shedding light on the fundamental aspects of production processes of materials from the atomistic viewpoint.
Identifying opportune landing sites in degraded visual environments with terrain and cultural databases

NASA Astrophysics Data System (ADS)

Moody, Marc; Fisher, Robert; Little, J. Kristin

2014-06-01

Boeing has developed a degraded visual environment navigational aid that is flying on the Boeing AH-6 light attack helicopter. The navigational aid is a two dimensional software digital map underlay generated by the Boeing™ Geospatial Embedded Mapping Software (GEMS) and fully integrated with the operational flight program. The page format on the aircraft's multi function displays (MFD) is termed the Approach page. The existing work utilizes Digital Terrain Elevation Data (DTED) and OpenGL ES 2.0 graphics capabilities to compute the pertinent graphics underlay entirely on the graphics processor unit (GPU) within the AH-6 mission computer. The next release will incorporate cultural databases containing Digital Vertical Obstructions (DVO) to warn the crew of towers, buildings, and power lines when choosing an opportune landing site. Future IRAD will include Light Detection and Ranging (LIDAR) point cloud generating sensors to provide 2D and 3D synthetic vision on the final approach to the landing zone. Collision detection with respect to terrain, cultural, and point cloud datasets may be used to further augment the crew warning system. The techniques for creating the digital map underlay leverage the GPU almost entirely, making this solution viable on most embedded mission computing systems with an OpenGL ES 2.0 capable GPU. This paper focuses on the AH-6 crew interface process for determining a landing zone and flying the aircraft to it.
Fast 2D flood modelling using GPU technology - recent applications and new developments

NASA Astrophysics Data System (ADS)

Crossley, Amanda; Lamb, Rob; Waller, Simon; Dunning, Paul

2010-05-01

In recent years there has been considerable interest amongst scientists and engineers in exploiting the potential of commodity graphics hardware for desktop parallel computing. The Graphics Processing Units (GPUs) that are used in PC graphics cards have now evolved into powerful parallel co-processors that can be used to accelerate the numerical codes used for floodplain inundation modelling. We report in this paper on experience over the past two years in developing and applying two dimensional (2D) flood inundation models using GPUs to achieve significant practical performance benefits. Starting with a solution scheme for the 2D diffusion wave approximation to the 2D Shallow Water Equations (SWEs), we have demonstrated the capability to reduce model run times in ‘real-world' applications using GPU hardware and programming techniques. We then present results from a GPU-based 2D finite volume SWE solver. A series of numerical test cases demonstrate that the model produces outputs that are accurate and consistent with reference results published elsewhere. In comparisons conducted for a real world test case, the GPU-based SWE model was over 100 times faster than the CPU version. We conclude with some discussion of practical experience in using the GPU technology for flood mapping applications, and for research projects investigating use of Monte Carlo simulation methods for the analysis of uncertainty in 2D flood modelling.
Downsizer - A Graphical User Interface-Based Application for Browsing, Acquiring, and Formatting Time-Series Data for Hydrologic Modeling

USGS Publications Warehouse

Ward-Garrison, Christian; Markstrom, Steven L.; Hay, Lauren E.

2009-01-01

The U.S. Geological Survey Downsizer is a computer application that selects, downloads, verifies, and formats station-based time-series data for environmental-resource models, particularly the Precipitation-Runoff Modeling System. Downsizer implements the client-server software architecture. The client presents a map-based, graphical user interface that is intuitive to modelers; the server provides streamflow and climate time-series data from over 40,000 measurement stations across the United States. This report is the Downsizer user's manual and provides (1) an overview of the software design, (2) installation instructions, (3) a description of the graphical user interface, (4) a description of selected output files, and (5) troubleshooting information.
Towards Autonomous Modular UAV Missions: The Detection, Geo-Location and Landing Paradigm

PubMed Central

Kyristsis, Sarantis; Antonopoulos, Angelos; Chanialakis, Theofilos; Stefanakis, Emmanouel; Linardos, Christos; Tripolitsiotis, Achilles; Partsinevelos, Panagiotis

2016-01-01

Nowadays, various unmanned aerial vehicle (UAV) applications become increasingly demanding since they require real-time, autonomous and intelligent functions. Towards this end, in the present study, a fully autonomous UAV scenario is implemented, including the tasks of area scanning, target recognition, geo-location, monitoring, following and finally landing on a high speed moving platform. The underlying methodology includes AprilTag target identification through Graphics Processing Unit (GPU) parallelized processing, image processing and several optimized locations and approach algorithms employing gimbal movement, Global Navigation Satellite System (GNSS) readings and UAV navigation. For the experimentation, a commercial and a custom made quad-copter prototype were used, portraying a high and a low-computational embedded platform alternative. Among the successful targeting and follow procedures, it is shown that the landing approach can be successfully performed even under high platform speeds. PMID:27827883
Towards Autonomous Modular UAV Missions: The Detection, Geo-Location and Landing Paradigm.

PubMed

Kyristsis, Sarantis; Antonopoulos, Angelos; Chanialakis, Theofilos; Stefanakis, Emmanouel; Linardos, Christos; Tripolitsiotis, Achilles; Partsinevelos, Panagiotis

2016-11-03

Nowadays, various unmanned aerial vehicle (UAV) applications become increasingly demanding since they require real-time, autonomous and intelligent functions. Towards this end, in the present study, a fully autonomous UAV scenario is implemented, including the tasks of area scanning, target recognition, geo-location, monitoring, following and finally landing on a high speed moving platform. The underlying methodology includes AprilTag target identification through Graphics Processing Unit (GPU) parallelized processing, image processing and several optimized locations and approach algorithms employing gimbal movement, Global Navigation Satellite System (GNSS) readings and UAV navigation. For the experimentation, a commercial and a custom made quad-copter prototype were used, portraying a high and a low-computational embedded platform alternative. Among the successful targeting and follow procedures, it is shown that the landing approach can be successfully performed even under high platform speeds.
The Process of Parallelizing the Conjunction Prediction Algorithm of ESA's SSA Conjunction Prediction Service Using GPGPU

NASA Astrophysics Data System (ADS)

Fehr, M.; Navarro, V.; Martin, L.; Fletcher, E.

2013-08-01

Space Situational Awareness[8] (SSA) is defined as the comprehensive knowledge, understanding and maintained awareness of the population of space objects, the space environment and existing threats and risks. As ESA's SSA Conjunction Prediction Service (CPS) requires the repetitive application of a processing algorithm against a data set of man-made space objects, it is crucial to exploit the highly parallelizable nature of this problem. Currently the CPS system makes use of OpenMP[7] for parallelization purposes using CPU threads, but only a GPU with its hundreds of cores can fully benefit from such high levels of parallelism. This paper presents the adaptation of several core algorithms[5] of the CPS for general-purpose computing on graphics processing units (GPGPU) using NVIDIAs Compute Unified Device Architecture (CUDA).
Quantitative computer simulations of extraterrestrial processing operations

NASA Technical Reports Server (NTRS)

Vincent, T. L.; Nikravesh, P. E.

1989-01-01

The automation of a small, solid propellant mixer was studied. Temperature control is under investigation. A numerical simulation of the system is under development and will be tested using different control options. Control system hardware is currently being put into place. The construction of mathematical models and simulation techniques for understanding various engineering processes is also studied. Computer graphics packages were utilized for better visualization of the simulation results. The mechanical mixing of propellants is examined. Simulation of the mixing process is being done to study how one can control for chaotic behavior to meet specified mixing requirements. An experimental mixing chamber is also being built. It will allow visual tracking of particles under mixing. The experimental unit will be used to test ideas from chaos theory, as well as to verify simulation results. This project has applications to extraterrestrial propellant quality and reliability.
System Integration of FastSPECT III, a Dedicated SPECT Rodent-Brain Imager Based on BazookaSPECT Detector Technology

PubMed Central

Miller, Brian W.; Furenlid, Lars R.; Moore, Stephen K.; Barber, H. Bradford; Nagarkar, Vivek V.; Barrett, Harrison H.

2010-01-01

FastSPECT III is a stationary, single-photon emission computed tomography (SPECT) imager designed specifically for imaging and studying neurological pathologies in rodent brain, including Alzheimer’s and Parkinsons’s disease. Twenty independent BazookaSPECT [1] gamma-ray detectors acquire projections of a spherical field of view with pinholes selected for desired resolution and sensitivity. Each BazookaSPECT detector comprises a columnar CsI(Tl) scintillator, image-intensifier, optical lens, and fast-frame-rate CCD camera. Data stream back to processing computers via firewire interfaces, and heavy use of graphics processing units (GPUs) ensures that each frame of data is processed in real time to extract the images of individual gamma-ray events. Details of the system design, imaging aperture fabrication methods, and preliminary projection images are presented. PMID:21218137
Introduction to the Special Issue on Digital Signal Processing in Radio Astronomy

NASA Astrophysics Data System (ADS)

Price, D. C.; Kocz, J.; Bailes, M.; Greenhill, L. J.

2016-03-01

Advances in astronomy are intimately linked to advances in digital signal processing (DSP). This special issue is focused upon advances in DSP within radio astronomy. The trend within that community is to use off-the-shelf digital hardware where possible and leverage advances in high performance computing. In particular, graphics processing units (GPUs) and field programmable gate arrays (FPGAs) are being used in place of application-specific circuits (ASICs); high-speed Ethernet and Infiniband are being used for interconnect in place of custom backplanes. Further, to lower hurdles in digital engineering, communities have designed and released general-purpose FPGA-based DSP systems, such as the CASPER ROACH board, ASTRON Uniboard, and CSIRO Redback board. In this introductory paper, we give a brief historical overview, a summary of recent trends, and provide an outlook on future directions.
Graphics supercomputer for computational fluid dynamics research

NASA Astrophysics Data System (ADS)

Liaw, Goang S.

1994-11-01

The objective of this project is to purchase a state-of-the-art graphics supercomputer to improve the Computational Fluid Dynamics (CFD) research capability at Alabama A & M University (AAMU) and to support the Air Force research projects. A cutting-edge graphics supercomputer system, Onyx VTX, from Silicon Graphics Computer Systems (SGI), was purchased and installed. Other equipment including a desktop personal computer, PC-486 DX2 with a built-in 10-BaseT Ethernet card, a 10-BaseT hub, an Apple Laser Printer Select 360, and a notebook computer from Zenith were also purchased. A reading room has been converted to a research computer lab by adding some furniture and an air conditioning unit in order to provide an appropriate working environments for researchers and the purchase equipment. All the purchased equipment were successfully installed and are fully functional. Several research projects, including two existing Air Force projects, are being performed using these facilities.
Spectral-element Seismic Wave Propagation on CUDA/OpenCL Hardware Accelerators

NASA Astrophysics Data System (ADS)

Peter, D. B.; Videau, B.; Pouget, K.; Komatitsch, D.

2015-12-01

Seismic wave propagation codes are essential tools to investigate a variety of wave phenomena in the Earth. Furthermore, they can now be used for seismic full-waveform inversions in regional- and global-scale adjoint tomography. Although these seismic wave propagation solvers are crucial ingredients to improve the resolution of tomographic images to answer important questions about the nature of Earth's internal processes and subsurface structure, their practical application is often limited due to high computational costs. They thus need high-performance computing (HPC) facilities to improving the current state of knowledge. At present, numerous large HPC systems embed many-core architectures such as graphics processing units (GPUs) to enhance numerical performance. Such hardware accelerators can be programmed using either the CUDA programming environment or the OpenCL language standard. CUDA software development targets NVIDIA graphic cards while OpenCL was adopted by additional hardware accelerators, like e.g. AMD graphic cards, ARM-based processors as well as Intel Xeon Phi coprocessors. For seismic wave propagation simulations using the open-source spectral-element code package SPECFEM3D_GLOBE, we incorporated an automatic source-to-source code generation tool (BOAST) which allows us to use meta-programming of all computational kernels for forward and adjoint runs. Using our BOAST kernels, we generate optimized source code for both CUDA and OpenCL languages within the source code package. Thus, seismic wave simulations are able now to fully utilize CUDA and OpenCL hardware accelerators. We show benchmarks of forward seismic wave propagation simulations using SPECFEM3D_GLOBE on CUDA/OpenCL GPUs, validating results and comparing performances for different simulations and hardware usages.
Discovering epistasis in large scale genetic association studies by exploiting graphics cards.

PubMed

Chen, Gary K; Guo, Yunfei

2013-12-03

Despite the enormous investments made in collecting DNA samples and generating germline variation data across thousands of individuals in modern genome-wide association studies (GWAS), progress has been frustratingly slow in explaining much of the heritability in common disease. Today's paradigm of testing independent hypotheses on each single nucleotide polymorphism (SNP) marker is unlikely to adequately reflect the complex biological processes in disease risk. Alternatively, modeling risk as an ensemble of SNPs that act in concert in a pathway, and/or interact non-additively on log risk for example, may be a more sensible way to approach gene mapping in modern studies. Implementing such analyzes genome-wide can quickly become intractable due to the fact that even modest size SNP panels on modern genotype arrays (500k markers) pose a combinatorial nightmare, require tens of billions of models to be tested for evidence of interaction. In this article, we provide an in-depth analysis of programs that have been developed to explicitly overcome these enormous computational barriers through the use of processors on graphics cards known as Graphics Processing Units (GPU). We include tutorials on GPU technology, which will convey why they are growing in appeal with today's numerical scientists. One obvious advantage is the impressive density of microprocessor cores that are available on only a single GPU. Whereas high end servers feature up to 24 Intel or AMD CPU cores, the latest GPU offerings from nVidia feature over 2600 cores. Each compute node may be outfitted with up to 4 GPU devices. Success on GPUs varies across problems. However, epistasis screens fare well due to the high degree of parallelism exposed in these problems. Papers that we review routinely report GPU speedups of over two orders of magnitude (>100x) over standard CPU implementations.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.