Science.gov

Sample records for mobile graphics processing

  1. Evaluating Mobile Graphics Processing Units (GPUs) for Real-Time Resource Constrained Applications

    SciTech Connect

    Meredith, J; Conger, J; Liu, Y; Johnson, J

    2005-11-11

    Modern graphics processing units (GPUs) can provide tremendous performance boosts for some applications beyond what a single CPU can accomplish, and their performance is growing at a rate faster than CPUs as well. Mobile GPUs available for laptops have the small form factor and low power requirements suitable for use in embedded processing. We evaluated several desktop and mobile GPUs and CPUs on traditional and non-traditional graphics tasks, as well as on the most time consuming pieces of a full hyperspectral imaging application. Accuracy remained high despite small differences in arithmetic operations like rounding. Performance improvements are summarized here relative to a desktop Pentium 4 CPU.

  2. People detection method using graphics processing units for a mobile robot with an omnidirectional camera

    NASA Astrophysics Data System (ADS)

    Kang, Sungil; Roh, Annah; Nam, Bodam; Hong, Hyunki

    2011-12-01

    This paper presents a novel vision system for people detection using an omnidirectional camera mounted on a mobile robot. In order to determine regions of interest (ROI), we compute a dense optical flow map using graphics processing units, which enable us to examine compliance with the ego-motion of the robot in a dynamic environment. Shape-based classification algorithms are employed to sort ROIs into human beings and nonhumans. The experimental results show that the proposed system detects people more precisely than previous methods.

  3. New APIs for mobile graphics

    NASA Astrophysics Data System (ADS)

    Pulli, Kari

    2006-02-01

    Progress in mobile graphics technology during the last five years has been swift, and it has followed a similar path as on PCs: early proprietary software engines running on integer hardware paved the way to standards that provide a roadmap for graphics hardware acceleration. In this overview we cover five recent standards for 3D and 2D vector graphics for mobile devices. OpenGL ES is a low-level API for 3D graphics, meant for applications written in C or C++. M3G (JSR 184) is a high-level 3D API for mobile Java that can be implemented on top of OpenGL ES. Collada is a content interchange format and API that allows combining digital content creation tools and exporting the results to different run-time systems, including OpenGL ES and M3G. Two new 2D vector graphics APIs reflect the relations of OpenGL ES and M3G: OpenVG is a low-level API for C/C++ that can be used as a building block for a high-level mobile Java API JSR 226.

  4. Graphical Language for Data Processing

    NASA Technical Reports Server (NTRS)

    Alphonso, Keith

    2011-01-01

    A graphical language for processing data allows processing elements to be connected with virtual wires that represent data flows between processing modules. The processing of complex data, such as lidar data, requires many different algorithms to be applied. The purpose of this innovation is to automate the processing of complex data, such as LIDAR, without the need for complex scripting and programming languages. The system consists of a set of user-interface components that allow the user to drag and drop various algorithmic and processing components onto a process graph. By working graphically, the user can completely visualize the process flow and create complex diagrams. This innovation supports the nesting of graphs, such that a graph can be included in another graph as a single step for processing. In addition to the user interface components, the system includes a set of .NET classes that represent the graph internally. These classes provide the internal system representation of the graphical user interface. The system includes a graph execution component that reads the internal representation of the graph (as described above) and executes that graph. The execution of the graph follows the interpreted model of execution in that each node is traversed and executed from the original internal representation. In addition, there are components that allow external code elements, such as algorithms, to be easily integrated into the system, thus making the system infinitely expandable.

  5. Graphic Design in Libraries: A Conceptual Process

    ERIC Educational Resources Information Center

    Ruiz, Miguel

    2014-01-01

    Providing successful library services requires efficient and effective communication with users; therefore, it is important that content creators who develop visual materials understand key components of design and, specifically, develop a holistic graphic design process. Graphic design, as a form of visual communication, is the process of…

  6. Graphic Design in Libraries: A Conceptual Process

    ERIC Educational Resources Information Center

    Ruiz, Miguel

    2014-01-01

    Providing successful library services requires efficient and effective communication with users; therefore, it is important that content creators who develop visual materials understand key components of design and, specifically, develop a holistic graphic design process. Graphic design, as a form of visual communication, is the process of…

  7. Hyperspectral processing in graphical processing units

    NASA Astrophysics Data System (ADS)

    Winter, Michael E.; Winter, Edwin M.

    2011-06-01

    With the advent of the commercial 3D video card in the mid 1990s, we have seen an order of magnitude performance increase with each generation of new video cards. While these cards were designed primarily for visualization and video games, it became apparent after a short while that they could be used for scientific purposes. These Graphical Processing Units (GPUs) are rapidly being incorporated into data processing tasks usually reserved for general purpose computers. It has been found that many image processing problems scale well to modern GPU systems. We have implemented four popular hyperspectral processing algorithms (N-FINDR, linear unmixing, Principal Components, and the RX anomaly detection algorithm). These algorithms show an across the board speedup of at least a factor of 10, with some special cases showing extreme speedups of a hundred times or more.

  8. Graphics hardware accelerated panorama builder for mobile phones

    NASA Astrophysics Data System (ADS)

    Bordallo López, Miguel; Hannuksela, Jari; Silvén, Olli; Vehviläinen, Markku

    2009-02-01

    Modern mobile communication devices frequently contain built-in cameras allowing users to capture highresolution still images, but at the same time the imaging applications are facing both usability and throughput bottlenecks. The difficulties in taking ad hoc pictures of printed paper documents with multi-megapixel cellular phone cameras on a common business use case, illustrate these problems for anyone. The result can be examined only after several seconds, and is often blurry, so a new picture is needed, although the view-finder image had looked good. The process can be a frustrating one with waits and the user not being able to predict the quality beforehand. The problems can be traced to the processor speed and camera resolution mismatch, and application interactivity demands. In this context we analyze building mosaic images of printed documents from frames selected from VGA resolution (640x480 pixel) video. High interactivity is achieved by providing real-time feedback on the quality, while simultaneously guiding the user actions. The graphics processing unit of the mobile device can be used to speed up the reconstruction computations. To demonstrate the viability of the concept, we present an interactive document scanning application implemented on a Nokia N95 mobile phone.

  9. Cockpit weather graphics using mobile satellite communications

    NASA Technical Reports Server (NTRS)

    Seth, Shashi

    1993-01-01

    Many new companies are pushing state-of-the-art technology to bring a revolution in the cockpits of General Aviation (GA) aircraft. The vision, according to Dr. Bruce Holmes - the Assistant Director for Aeronautics at National Aeronautics and Space Administration's (NASA) Langley Research Center, is to provide such an advanced flight control system that the motor and cognitive skills you use to drive a car would be very similar to the ones you would use to fly an airplane. We at ViGYAN, Inc., are currently developing a system called the Pilot Weather Advisor (PWxA), which would be a part of such an advanced technology flight management system. The PWxA provides graphical depictions of weather information in the cockpit of aircraft in near real-time, through the use of broadcast satellite communications. The purpose of this system is to improve the safety and utility of GA aircraft operations. Considerable effort is being extended for research in the design of graphical weather systems, notably the works of Scanlon and Dash. The concept of providing pilots with graphical depictions of weather conditions, overlaid on geographical and navigational maps, is extremely powerful.

  10. Cockpit weather graphics using mobile satellite communications

    NASA Astrophysics Data System (ADS)

    Seth, Shashi

    Many new companies are pushing state-of-the-art technology to bring a revolution in the cockpits of General Aviation (GA) aircraft. The vision, according to Dr. Bruce Holmes - the Assistant Director for Aeronautics at National Aeronautics and Space Administration's (NASA) Langley Research Center, is to provide such an advanced flight control system that the motor and cognitive skills you use to drive a car would be very similar to the ones you would use to fly an airplane. We at ViGYAN, Inc., are currently developing a system called the Pilot Weather Advisor (PWxA), which would be a part of such an advanced technology flight management system. The PWxA provides graphical depictions of weather information in the cockpit of aircraft in near real-time, through the use of broadcast satellite communications. The purpose of this system is to improve the safety and utility of GA aircraft operations. Considerable effort is being extended for research in the design of graphical weather systems, notably the works of Scanlon and Dash. The concept of providing pilots with graphical depictions of weather conditions, overlaid on geographical and navigational maps, is extremely powerful.

  11. Processing and Interpreting Spatial Information Represented Graphically.

    ERIC Educational Resources Information Center

    Winn, Bill

    This paper proposes that three properties of graphic materials can be manipulated by instructional designers to model cognitive processes and thus help students learn. These properties are: the distance of elements in the graphic display from each other, the orientation of the elements to each other, and the sequence in which the elements are…

  12. Optimization Techniques for 3D Graphics Deployment on Mobile Devices

    NASA Astrophysics Data System (ADS)

    Koskela, Timo; Vatjus-Anttila, Jarkko

    2015-03-01

    3D Internet technologies are becoming essential enablers in many application areas including games, education, collaboration, navigation and social networking. The use of 3D Internet applications with mobile devices provides location-independent access and richer use context, but also performance issues. Therefore, one of the important challenges facing 3D Internet applications is the deployment of 3D graphics on mobile devices. In this article, we present an extensive survey on optimization techniques for 3D graphics deployment on mobile devices and qualitatively analyze the applicability of each technique from the standpoints of visual quality, performance and energy consumption. The analysis focuses on optimization techniques related to data-driven 3D graphics deployment, because it supports off-line use, multi-user interaction, user-created 3D graphics and creation of arbitrary 3D graphics. The outcome of the analysis facilitates the development and deployment of 3D Internet applications on mobile devices and provides guidelines for future research.

  13. Process and representation in graphical displays

    NASA Technical Reports Server (NTRS)

    Gillan, Douglas J.; Lewis, Robert; Rudisill, Marianne

    1990-01-01

    How people comprehend graphics is examined. Graphical comprehension involves the cognitive representation of information from a graphic display and the processing strategies that people apply to answer questions about graphics. Research on representation has examined both the features present in a graphic display and the cognitive representation of the graphic. The key features include the physical components of a graph, the relation between the figure and its axes, and the information in the graph. Tests of people's memory for graphs indicate that both the physical and informational aspect of a graph are important in the cognitive representation of a graph. However, the physical (or perceptual) features overshadow the information to a large degree. Processing strategies also involve a perception-information distinction. In order to answer simple questions (e.g., determining the value of a variable, comparing several variables, and determining the mean of a set of variables), people switch between two information processing strategies: (1) an arithmetic, look-up strategy in which they use a graph much like a table, looking up values and performing arithmetic calculations; and (2) a perceptual strategy in which they use the spatial characteristics of the graph to make comparisons and estimations. The user's choice of strategies depends on the task and the characteristics of the graph. A theory of graphic comprehension is presented.

  14. Process and representation in graphical displays

    NASA Technical Reports Server (NTRS)

    Gillan, Douglas J.; Lewis, Robert; Rudisill, Marianne

    1993-01-01

    Our initial model of graphic comprehension has focused on statistical graphs. Like other models of human-computer interaction, models of graphical comprehension can be used by human-computer interface designers and developers to create interfaces that present information in an efficient and usable manner. Our investigation of graph comprehension addresses two primary questions: how do people represent the information contained in a data graph?; and how do they process information from the graph? The topics of focus for graphic representation concern the features into which people decompose a graph and the representations of the graph in memory. The issue of processing can be further analyzed as two questions: what overall processing strategies do people use?; and what are the specific processing skills required?

  15. HMI conventions for process control graphics.

    PubMed

    Pikaar, Ruud N

    2012-01-01

    Process operators supervise and control complex processes. To enable the operator to do an adequate job, instrumentation and process control engineers need to address several related topics, such as console design, information design, navigation, and alarm management. In process control upgrade projects, usually a 1:1 conversion of existing graphics is proposed. This paper suggests another approach, efficiently leading to a reduced number of new powerful process graphics, supported by a permanent process overview displays. In addition a road map for structuring content (process information) and conventions for the presentation of objects, symbols, and so on, has been developed. The impact of the human factors engineering approach on process control upgrade projects is illustrated by several cases.

  16. Graphics processing unit-assisted lossless decompression

    DOEpatents

    Loughry, Thomas A.

    2016-04-12

    Systems and methods for decompressing compressed data that has been compressed by way of a lossless compression algorithm are described herein. In a general embodiment, a graphics processing unit (GPU) is programmed to receive compressed data packets and decompress such packets in parallel. The compressed data packets are compressed representations of an image, and the lossless compression algorithm is a Rice compression algorithm.

  17. Graphics Processing Unit Assisted Thermographic Compositing

    NASA Technical Reports Server (NTRS)

    Ragasa, Scott; Russell, Samuel S.

    2012-01-01

    Objective Develop a software application utilizing high performance computing techniques, including general purpose graphics processing units (GPGPUs), for the analysis and visualization of large thermographic data sets. Over the past several years, an increasing effort among scientists and engineers to utilize graphics processing units (GPUs) in a more general purpose fashion is allowing for previously unobtainable levels of computation by individual workstations. As data sets grow, the methods to work them grow at an equal, and often greater, pace. Certain common computations can take advantage of the massively parallel and optimized hardware constructs of the GPU which yield significant increases in performance. These common computations have high degrees of data parallelism, that is, they are the same computation applied to a large set of data where the result does not depend on other data elements. Image processing is one area were GPUs are being used to greatly increase the performance of certain analysis and visualization techniques.

  18. Interactive graphics, the design process, and education

    SciTech Connect

    Norton, F.J.

    1980-09-01

    The field of design and drafting is changing continuously - its parameters are ever shifting and its applications are increasing. The use of Computer-Aided Design (CAD) and Computer-Aided Manufacturing (CAM) is becoming increasingly common in industry. However, instruction in CAD and CAM has in general not been incorporated into university curricula. This paper addresses the need for increased instruction in interactive graphics at the student level, and particularly in conjunction with the design process used by engineers, designers, and drafters. The development of three-dimensional graphical models using CAD is seen as a vital part of product development. Applications to printed circuit design and numerical control (NC) operations are discussed. Effective educational programs in the use of CAD must relate to designers, users, and managers and may be developed either by industry or academia. Possible approaches to new programs include coursework, projects involving CAD, and special collaborative efforts between industry and academic institutions. 1 figure.

  19. Graphical analysis of power systems for mobile robotics

    NASA Astrophysics Data System (ADS)

    Raade, Justin William

    The field of mobile robotics places stringent demands on the power system. Energetic autonomy, or the ability to function for a useful operation time independent of any tether, refueling, or recharging, is a driving force in a robot designed for a field application. The focus of this dissertation is the development of two graphical analysis tools, namely Ragone plots and optimal hybridization plots, for the design of human scale mobile robotic power systems. These tools contribute to the intuitive understanding of the performance of a power system and expand the toolbox of the design engineer. Ragone plots are useful for graphically comparing the merits of different power systems for a wide range of operation times. They plot the specific power versus the specific energy of a system on logarithmic scales. The driving equations in the creation of a Ragone plot are derived in terms of several important system parameters. Trends at extreme operation times (both very short and very long) are examined. Ragone plot analysis is applied to the design of several power systems for high-power human exoskeletons. Power systems examined include a monopropellant-powered free piston hydraulic pump, a gasoline-powered internal combustion engine with hydraulic actuators, and a fuel cell with electric actuators. Hybrid power systems consist of two or more distinct energy sources that are used together to meet a single load. They can often outperform non-hybrid power systems in low duty-cycle applications or those with widely varying load profiles and long operation times. Two types of energy sources are defined: engine-like and capacitive. The hybridization rules for different combinations of energy sources are derived using graphical plots of hybrid power system mass versus the primary system power. Optimal hybridization analysis is applied to several power systems for low-power human exoskeletons. Hybrid power systems examined include a fuel cell and a solar panel coupled with

  20. Faster catalog matching on Graphics Processing Units

    NASA Astrophysics Data System (ADS)

    Lee, M. A.; Budavári, T.

    2017-07-01

    One of the most fundamental problems in observational astronomy is the cross-identification of sources. Observations are made at different times in different wavelengths with separate instruments, resulting in a large set of independent observations. The scientific outcome is often limited by our ability to quickly perform associations across catalogs. The matching, however, is difficult scientifically, statistically as well as computationally. The former two require detailed physical modeling and advanced probabilistic concepts; the latter is due to the large volumes of data and the problem's combinatorial nature. In order to tackle the computational challenge and to prepare for future surveys we developed a new implementation on Graphics Processing Units. Our solution scales across multiple devices and can process hundreds of trillions of crossmatch candidates per second in a single machine.

  1. Graphics Processing Unit Assisted Thermographic Compositing

    NASA Technical Reports Server (NTRS)

    Ragasa, Scott; McDougal, Matthew; Russell, Sam

    2013-01-01

    Objective: To develop a software application utilizing general purpose graphics processing units (GPUs) for the analysis of large sets of thermographic data. Background: Over the past few years, an increasing effort among scientists and engineers to utilize the GPU in a more general purpose fashion is allowing for supercomputer level results at individual workstations. As data sets grow, the methods to work them grow at an equal, and often greater, pace. Certain common computations can take advantage of the massively parallel and optimized hardware constructs of the GPU to allow for throughput that was previously reserved for compute clusters. These common computations have high degrees of data parallelism, that is, they are the same computation applied to a large set of data where the result does not depend on other data elements. Signal (image) processing is one area were GPUs are being used to greatly increase the performance of certain algorithms and analysis techniques.

  2. Partial wave analysis using graphics processing units

    NASA Astrophysics Data System (ADS)

    Berger, Niklaus; Beijiang, Liu; Jike, Wang

    2010-04-01

    Partial wave analysis is an important tool for determining resonance properties in hadron spectroscopy. For large data samples however, the un-binned likelihood fits employed are computationally very expensive. At the Beijing Spectrometer (BES) III experiment, an increase in statistics compared to earlier experiments of up to two orders of magnitude is expected. In order to allow for a timely analysis of these datasets, additional computing power with short turnover times has to be made available. It turns out that graphics processing units (GPUs) originally developed for 3D computer games have an architecture of massively parallel single instruction multiple data floating point units that is almost ideally suited for the algorithms employed in partial wave analysis. We have implemented a framework for tensor manipulation and partial wave fits called GPUPWA. The user writes a program in pure C++ whilst the GPUPWA classes handle computations on the GPU, memory transfers, caching and other technical details. In conjunction with a recent graphics processor, the framework provides a speed-up of the partial wave fit by more than two orders of magnitude compared to legacy FORTRAN code.

  3. Process control graphics for petrochemical plants

    SciTech Connect

    Lieber, R.E.

    1982-12-01

    Describes many specialized features of a computer control system, schematic/graphics in particular, which are vital to effectively run today's complex refineries and chemical plants. Illustrates such control systems as a full-graphic control house panel of the 60s, a European refinery control house of the early 70s, and the Ingolstadt refinery control house. Presents diagram showing a shape library. Implementation of state-of-the-art control theory, distributed control, dual hi-way digital instrument systems, and many other person-machine interface developments have been prime factors in process control. Further developments in person-machine interfaces are in progress including voice input/output, touch screen, and other entry devices. Color usage, angle of projection, control house lighting, and pattern recognition are all being studied by vendors, users, and academics. These studies involve psychologists concerned with ''quality of life'' factors, employee relations personnel concerned with labor contracts or restrictions, as well as operations personnel concerned with just getting the plant to run better.

  4. Graphics Processing Units for HEP trigger systems

    NASA Astrophysics Data System (ADS)

    Ammendola, R.; Bauce, M.; Biagioni, A.; Chiozzi, S.; Cotta Ramusino, A.; Fantechi, R.; Fiorini, M.; Giagu, S.; Gianoli, A.; Lamanna, G.; Lonardo, A.; Messina, A.; Neri, I.; Paolucci, P. S.; Piandani, R.; Pontisso, L.; Rescigno, M.; Simula, F.; Sozzi, M.; Vicini, P.

    2016-07-01

    General-purpose computing on GPUs (Graphics Processing Units) is emerging as a new paradigm in several fields of science, although so far applications have been tailored to the specific strengths of such devices as accelerator in offline computation. With the steady reduction of GPU latencies, and the increase in link and memory throughput, the use of such devices for real-time applications in high-energy physics data acquisition and trigger systems is becoming ripe. We will discuss the use of online parallel computing on GPU for synchronous low level trigger, focusing on CERN NA62 experiment trigger system. The use of GPU in higher level trigger system is also briefly considered.

  5. Kernel density estimation using graphical processing unit

    NASA Astrophysics Data System (ADS)

    Sunarko, Su'ud, Zaki

    2015-09-01

    Kernel density estimation for particles distributed over a 2-dimensional space is calculated using a single graphical processing unit (GTX 660Ti GPU) and CUDA-C language. Parallel calculations are done for particles having bivariate normal distribution and by assigning calculations for equally-spaced node points to each scalar processor in the GPU. The number of particles, blocks and threads are varied to identify favorable configuration. Comparisons are obtained by performing the same calculation using 1, 2 and 4 processors on a 3.0 GHz CPU using MPICH 2.0 routines. Speedups attained with the GPU are in the range of 88 to 349 times compared the multiprocessor CPU. Blocks of 128 threads are found to be the optimum configuration for this case.

  6. Identification of Learning Processes by Means of Computer Graphics.

    ERIC Educational Resources Information Center

    Sorensen, Birgitte Holm

    1993-01-01

    Describes a development project for the use of computer graphics and video in connection with an inservice training course for primary education teachers in Denmark. Topics addressed include research approaches to computers; computer graphics in learning processes; activities relating to computer graphics; the role of the teacher; and student…

  7. Graphics Processing Unit Assisted Thermographic Compositing

    NASA Technical Reports Server (NTRS)

    Ragasa, Scott; McDougal, Matthew; Russell, Sam

    2012-01-01

    Objective: To develop a software application utilizing general purpose graphics processing units (GPUs) for the analysis of large sets of thermographic data. Background: Over the past few years, an increasing effort among scientists and engineers to utilize the GPU in a more general purpose fashion is allowing for supercomputer level results at individual workstations. As data sets grow, the methods to work them grow at an equal, and often great, pace. Certain common computations can take advantage of the massively parallel and optimized hardware constructs of the GPU to allow for throughput that was previously reserved for compute clusters. These common computations have high degrees of data parallelism, that is, they are the same computation applied to a large set of data where the result does not depend on other data elements. Signal (image) processing is one area were GPUs are being used to greatly increase the performance of certain algorithms and analysis techniques. Technical Methodology/Approach: Apply massively parallel algorithms and data structures to the specific analysis requirements presented when working with thermographic data sets.

  8. A Relational Reasoning Approach to Text-Graphic Processing

    ERIC Educational Resources Information Center

    Danielson, Robert W.; Sinatra, Gale M.

    2017-01-01

    We propose that research on text-graphic processing could be strengthened by the inclusion of relational reasoning perspectives. We briefly outline four aspects of relational reasoning: "analogies," "anomalies," "antinomies", and "antitheses". Next, we illustrate how text-graphic researchers have been…

  9. A Relational Reasoning Approach to Text-Graphic Processing

    ERIC Educational Resources Information Center

    Danielson, Robert W.; Sinatra, Gale M.

    2017-01-01

    We propose that research on text-graphic processing could be strengthened by the inclusion of relational reasoning perspectives. We briefly outline four aspects of relational reasoning: "analogies," "anomalies," "antinomies", and "antitheses". Next, we illustrate how text-graphic researchers have been…

  10. Reading the Graphics: Reading Processes Prompted by the Graphics as Second Graders Read Informational Text

    ERIC Educational Resources Information Center

    Norman, Rebecca R.

    2010-01-01

    This dissertation is comprised of two manuscripts that resulted from a single study using verbal protocols to examine the reading processes prompted by the graphics as second graders read informational text. Verbal protocols have provided researchers with an understanding of the processes readers use as they read. Little is known, however, about…

  11. Parallelization of heterogeneous reactor calculations on a graphics processing unit

    NASA Astrophysics Data System (ADS)

    Malofeev, V. M.; Pal'shin, V. A.

    2016-12-01

    Parallelization is applied to the neutron calculations performed by the heterogeneous method on a graphics processing unit. The parallel algorithm of the modified TREC code is described. The efficiency of the parallel algorithm is evaluated.

  12. Parallelization of heterogeneous reactor calculations on a graphics processing unit

    SciTech Connect

    Malofeev, V. M. Pal’shin, V. A.

    2016-12-15

    Parallelization is applied to the neutron calculations performed by the heterogeneous method on a graphics processing unit. The parallel algorithm of the modified TREC code is described. The efficiency of the parallel algorithm is evaluated.

  13. The New Digital Engineering Design and Graphics Process.

    ERIC Educational Resources Information Center

    Barr, R. E.; Krueger, T. J.; Aanstoos, T. A.

    2002-01-01

    Summarizes the digital engineering design process using software widely available for the educational setting. Points out that newer technology used in the field is not used in engineering graphics education. (DDR)

  14. Graphics

    ERIC Educational Resources Information Center

    Post, Susan

    1975-01-01

    An art teacher described an elective course in graphics which was designed to enlarge a student's knowledge of value, color, shape within a shape, transparency, line and texture. This course utilized the technique of working a multi-colored print from a single block that was first introduced by Picasso. (Author/RK)

  15. Graphic Arts: Book Two. Process Camera, Stripping, and Platemaking.

    ERIC Educational Resources Information Center

    Farajollahi, Karim; And Others

    The second of a three-volume set of instructional materials for a course in graphic arts, this manual consists of 10 instructional units dealing with the process camera, stripping, and platemaking. Covered in the individual units are the process camera and darkroom photography, line photography, half-tone photography, other darkroom techniques,…

  16. Graphic Arts: Process Camera, Stripping, and Platemaking. Third Edition.

    ERIC Educational Resources Information Center

    Crummett, Dan

    This document contains teacher and student materials for a course in graphic arts concentrating on camera work, stripping, and plate making in the printing process. Eight units of instruction cover the following topics: (1) the process camera and darkroom equipment; (2) line photography; (3) halftone photography; (4) other darkroom techniques; (5)…

  17. Graphic Arts: Process Camera, Stripping, and Platemaking. Third Edition.

    ERIC Educational Resources Information Center

    Crummett, Dan

    This document contains teacher and student materials for a course in graphic arts concentrating on camera work, stripping, and plate making in the printing process. Eight units of instruction cover the following topics: (1) the process camera and darkroom equipment; (2) line photography; (3) halftone photography; (4) other darkroom techniques; (5)…

  18. Diffusion tensor fiber tracking on graphics processing units.

    PubMed

    Mittmann, Adiel; Comunello, Eros; von Wangenheim, Aldo

    2008-10-01

    Diffusion tensor magnetic resonance imaging has been successfully applied to the process of fiber tracking, which determines the location of fiber bundles within the human brain. This process, however, can be quite lengthy when run on a regular workstation. We present a means of executing this process by making use of the graphics processing units of computers' video cards, which provide a low-cost parallel execution environment that algorithms like fiber tracking can benefit from. With this method we have achieved performance gains varying from 14 to 40 times on common computers. Because of accuracy issues inherent to current graphics processing units, we define a variation index in order to assess how close the results obtained with our method are to those generated by programs running on the central processing units of computers. This index shows that results produced by our method are acceptable when compared to those of traditional programs.

  19. The Graphical Representation of Algorithmic Processes. Volume 1

    DTIC Science & Technology

    1989-12-01

    systems and their classifications. Static Dynamic PegaSys [18] BALSA [6] Booch Diagrams [4] PV [5] Process Data Flow Diagrams Structure Charts...techniques that attempt to show data or control flow in a static graphical representation of an algorithm. The PegaSys system strad- dles the fields of...Through PegaSys ," Computer, 18(8):72-85 (August 1985). 19. Myers, Brad A. "Incense: A system For Displaying Data Structures," Computer Graphics, 17(3):115

  20. An Interactive Graphics Program for Investigating Digital Signal Processing.

    ERIC Educational Resources Information Center

    Miller, Billy K.; And Others

    1983-01-01

    Describes development of an interactive computer graphics program for use in teaching digital signal processing. The program allows students to interactively configure digital systems on a monitor display and observe their system's performance by means of digital plots on the system's outputs. A sample program run is included. (JN)

  1. Digital-Computer Processing of Graphical Data. Final Report.

    ERIC Educational Resources Information Center

    Freeman, Herbert

    The final report of a two-year study concerned with the digital-computer processing of graphical data. Five separate investigations carried out under this study are described briefly, and a detailed bibliography, complete with abstracts, is included in which are listed the technical papers and reports published during the period of this program.…

  2. Engineering graphics and image processing at Langley Research Center

    NASA Technical Reports Server (NTRS)

    Voigt, Susan J.

    1985-01-01

    The objective of making raster graphics and image processing techniques readily available for the analysis and display of engineering and scientific data is stated. The approach is to develop and acquire tools and skills which are applied to support research activities in such disciplines as aeronautics and structures. A listing of grants and key personnel are given.

  3. Graphic Arts: Book Three. The Press and Related Processes.

    ERIC Educational Resources Information Center

    Farajollahi, Karim; And Others

    The third of a three-volume set of instructional materials for a graphic arts course, this manual consists of nine instructional units dealing with presses and related processes. Covered in the units are basic press fundamentals, offset press systems, offset press operating procedures, offset inks and dampening chemistry, preventive maintenance…

  4. The Use of Computer Graphics in the Design Process.

    ERIC Educational Resources Information Center

    Palazzi, Maria

    This master's thesis examines applications of computer technology to the field of industrial design and ways in which technology can transform the traditional process. Following a statement of the problem, the history and applications of the fields of computer graphics and industrial design are reviewed. The traditional industrial design process…

  5. Graphic Arts: The Press and Finishing Processes. Third Edition.

    ERIC Educational Resources Information Center

    Crummett, Dan

    This document contains teacher and student materials for a course in graphic arts concentrating on printing presses and the finishing process for publications. Seven units of instruction cover the following topics: (1) offset press systems; (2) offset inks and dampening chemistry; (3) offset press operating procedures; (4) preventive maintenance…

  6. Obliterable of graphics and correction of skew using Hough transform for mobile captured documents

    NASA Astrophysics Data System (ADS)

    Chethan, H. K.; Kumar, G. Hemantha

    2011-10-01

    CBDA is an emerging field in Computer Vision and Pattern Recognition. In recent technology camera are incorporated to several electronic equipments and are very interesting and thus playing a vital role by replacing scanner with hand held imaging devices like Digital Cameras, Mobile phones and gaming devices attached with these camera. The goal of the work is to remove graphics from the document which plays a vital role in recognition of characters from the mobile captured documents. In this paper we have proposed a novel method for separating or removal of graphics like logos, animations other than the text from the document and method to reduce noise and finally textual content skew is estimated and corrected using Hough Transform. The experimental results show the efficacy compared to the result of well known existing methods.

  7. Accelerating molecular dynamic simulation on graphics processing units.

    PubMed

    Friedrichs, Mark S; Eastman, Peter; Vaidyanathan, Vishal; Houston, Mike; Legrand, Scott; Beberg, Adam L; Ensign, Daniel L; Bruns, Christopher M; Pande, Vijay S

    2009-04-30

    We describe a complete implementation of all-atom protein molecular dynamics running entirely on a graphics processing unit (GPU), including all standard force field terms, integration, constraints, and implicit solvent. We discuss the design of our algorithms and important optimizations needed to fully take advantage of a GPU. We evaluate its performance, and show that it can be more than 700 times faster than a conventional implementation running on a single CPU core. (c) 2009 Wiley Periodicals, Inc.

  8. Efficient magnetohydrodynamic simulations on graphics processing units with CUDA

    NASA Astrophysics Data System (ADS)

    Wong, Hon-Cheng; Wong, Un-Hong; Feng, Xueshang; Tang, Zesheng

    2011-10-01

    Magnetohydrodynamic (MHD) simulations based on the ideal MHD equations have become a powerful tool for modeling phenomena in a wide range of applications including laboratory, astrophysical, and space plasmas. In general, high-resolution methods for solving the ideal MHD equations are computationally expensive and Beowulf clusters or even supercomputers are often used to run the codes that implemented these methods. With the advent of the Compute Unified Device Architecture (CUDA), modern graphics processing units (GPUs) provide an alternative approach to parallel computing for scientific simulations. In this paper we present, to the best of the author's knowledge, the first implementation of MHD simulations entirely on GPUs with CUDA, named GPU-MHD, to accelerate the simulation process. GPU-MHD supports both single and double precision computations. A series of numerical tests have been performed to validate the correctness of our code. Accuracy evaluation by comparing single and double precision computation results is also given. Performance measurements of both single and double precision are conducted on both the NVIDIA GeForce GTX 295 (GT200 architecture) and GTX 480 (Fermi architecture) graphics cards. These measurements show that our GPU-based implementation achieves between one and two orders of magnitude of improvement depending on the graphics card used, the problem size, and the precision when comparing to the original serial CPU MHD implementation. In addition, we extend GPU-MHD to support the visualization of the simulation results and thus the whole MHD simulation and visualization process can be performed entirely on GPUs.

  9. Adaptive-optics optical coherence tomography processing using a graphics processing unit.

    PubMed

    Shafer, Brandon A; Kriske, Jeffery E; Kocaoglu, Omer P; Turner, Timothy L; Liu, Zhuolin; Lee, John Jaehwan; Miller, Donald T

    2014-01-01

    Graphics processing units are increasingly being used for scientific computing for their powerful parallel processing abilities, and moderate price compared to super computers and computing grids. In this paper we have used a general purpose graphics processing unit to process adaptive-optics optical coherence tomography (AOOCT) images in real time. Increasing the processing speed of AOOCT is an essential step in moving the super high resolution technology closer to clinical viability.

  10. Adaptive-optics Optical Coherence Tomography Processing Using a Graphics Processing Unit*

    PubMed Central

    Shafer, Brandon A.; Kriske, Jeffery E.; Kocaoglu, Omer P.; Turner, Timothy L.; Liu, Zhuolin; Lee, John Jaehwan; Miller, Donald T.

    2015-01-01

    Graphics processing units are increasingly being used for scientific computing for their powerful parallel processing abilities, and moderate price compared to super computers and computing grids. In this paper we have used a general purpose graphics processing unit to process adaptive-optics optical coherence tomography (AOOCT) images in real time. Increasing the processing speed of AOOCT is an essential step in moving the super high resolution technology closer to clinical viability. PMID:25570838

  11. Graphics processing unit based computation for NDE applications

    NASA Astrophysics Data System (ADS)

    Nahas, C. A.; Rajagopal, Prabhu; Balasubramaniam, Krishnan; Krishnamurthy, C. V.

    2012-05-01

    Advances in parallel processing in recent years are helping to improve the cost of numerical simulation. Breakthroughs in Graphical Processing Unit (GPU) based computation now offer the prospect of further drastic improvements. The introduction of 'compute unified device architecture' (CUDA) by NVIDIA (the global technology company based in Santa Clara, California, USA) has made programming GPUs for general purpose computing accessible to the average programmer. Here we use CUDA to develop parallel finite difference schemes as applicable to two problems of interest to NDE community, namely heat diffusion and elastic wave propagation. The implementations are for two-dimensions. Performance improvement of the GPU implementation against serial CPU implementation is then discussed.

  12. Accelerating VASP electronic structure calculations using graphic processing units.

    PubMed

    Hacene, Mohamed; Anciaux-Sedrakian, Ani; Rozanska, Xavier; Klahr, Diego; Guignon, Thomas; Fleurat-Lessard, Paul

    2012-12-15

    We present a way to improve the performance of the electronic structure Vienna Ab initio Simulation Package (VASP) program. We show that high-performance computers equipped with graphics processing units (GPUs) as accelerators may reduce drastically the computation time when offloading these sections to the graphic chips. The procedure consists of (i) profiling the performance of the code to isolate the time-consuming parts, (ii) rewriting these so that the algorithms become better-suited for the chosen graphic accelerator, and (iii) optimizing memory traffic between the host computer and the GPU accelerator. We chose to accelerate VASP with NVIDIA GPU using CUDA. We compare the GPU and original versions of VASP by evaluating the Davidson and RMM-DIIS algorithms on chemical systems of up to 1100 atoms. In these tests, the total time is reduced by a factor between 3 and 8 when running on n (CPU core + GPU) compared to n CPU cores only, without any accuracy loss. Copyright © 2012 Wiley Periodicals, Inc.

  13. Graphics processing unit acceleration of computational electromagnetic methods

    NASA Astrophysics Data System (ADS)

    Inman, Matthew

    The use of Graphical Processing Units (GPU's) for scientific applications has been evolving and expanding for the decade. GPU's provide an alternative to the CPU in the creation and execution of the numerical codes that are often relied upon in to perform simulations in computational electromagnetics. While originally designed purely to display graphics on the users monitor, GPU's today are essentially powerful floating point co-processors that can be programmed not only to render complex graphics, but also perform the complex mathematical calculations often encountered in scientific computing. Currently the GPU's being produced often contain hundreds of separate cores able to access large amounts of high-speed dedicated memory. By utilizing the power offered by such a specialized processor, it is possible to drastically speed up the calculations required in computational electromagnetics. This increase in speed allows for the use of GPU based simulations in a variety of situations that the computational time has heretofore been a limiting factor in, such as in educational courses. Many situations in teaching electromagnetics often rely upon simple examples of problems due to the simulation times needed to analyze more complex problems. The use of GPU based simulations will be shown to allow demonstrations of more advanced problems than previously allowed by adapting the methods for use on the GPU. Modules will be developed for a wide variety of teaching situations utilizing the speed of the GPU to demonstrate various techniques and ideas previously unrealizable.

  14. Fast analytical scatter estimation using graphics processing units.

    PubMed

    Ingleby, Harry; Lippuner, Jonas; Rickey, Daniel W; Li, Yue; Elbakri, Idris

    2015-01-01

    To develop a fast patient-specific analytical estimator of first-order Compton and Rayleigh scatter in cone-beam computed tomography, implemented using graphics processing units. The authors developed an analytical estimator for first-order Compton and Rayleigh scatter in a cone-beam computed tomography geometry. The estimator was coded using NVIDIA's CUDA environment for execution on an NVIDIA graphics processing unit. Performance of the analytical estimator was validated by comparison with high-count Monte Carlo simulations for two different numerical phantoms. Monoenergetic analytical simulations were compared with monoenergetic and polyenergetic Monte Carlo simulations. Analytical and Monte Carlo scatter estimates were compared both qualitatively, from visual inspection of images and profiles, and quantitatively, using a scaled root-mean-square difference metric. Reconstruction of simulated cone-beam projection data of an anthropomorphic breast phantom illustrated the potential of this method as a component of a scatter correction algorithm. The monoenergetic analytical and Monte Carlo scatter estimates showed very good agreement. The monoenergetic analytical estimates showed good agreement for Compton single scatter and reasonable agreement for Rayleigh single scatter when compared with polyenergetic Monte Carlo estimates. For a voxelized phantom with dimensions 128 × 128 × 128 voxels and a detector with 256 × 256 pixels, the analytical estimator required 669 seconds for a single projection, using a single NVIDIA 9800 GX2 video card. Accounting for first order scatter in cone-beam image reconstruction improves the contrast to noise ratio of the reconstructed images. The analytical scatter estimator, implemented using graphics processing units, provides rapid and accurate estimates of single scatter and with further acceleration and a method to account for multiple scatter may be useful for practical scatter correction schemes.

  15. Line-by-line spectroscopic simulations on graphics processing units

    NASA Astrophysics Data System (ADS)

    Collange, Sylvain; Daumas, Marc; Defour, David

    2008-01-01

    We report here on software that performs line-by-line spectroscopic simulations on gases. Elaborate models (such as narrow band and correlated-K) are accurate and efficient for bands where various components are not simultaneously and significantly active. Line-by-line is probably the most accurate model in the infrared for blends of gases that contain high proportions of H 2O and CO 2 as this was the case for our prototype simulation. Our implementation on graphics processing units sustains a speedup close to 330 on computation-intensive tasks and 12 on memory intensive tasks compared to implementations on one core of high-end processors. This speedup is due to data parallelism, efficient memory access for specific patterns and some dedicated hardware operators only available in graphics processing units. It is obtained leaving most of processor resources available and it would scale linearly with the number of graphics processing units in parallel machines. Line-by-line simulation coupled with simulation of fluid dynamics was long believed to be economically intractable but our work shows that it could be done with some affordable additional resources compared to what is necessary to perform simulations on fluid dynamics alone. Program summaryProgram title: GPU4RE Catalogue identifier: ADZY_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADZY_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 62 776 No. of bytes in distributed program, including test data, etc.: 1 513 247 Distribution format: tar.gz Programming language: C++ Computer: x86 PC Operating system: Linux, Microsoft Windows. Compilation requires either gcc/g++ under Linux or Visual C++ 2003/2005 and Cygwin under Windows. It has been tested using gcc 4.1.2 under Ubuntu Linux 7.04 and using Visual C

  16. Graphics Processing Unit Accelerated Hirsch-Fye Quantum Monte Carlo

    NASA Astrophysics Data System (ADS)

    Moore, Conrad; Abu Asal, Sameer; Rajagoplan, Kaushik; Poliakoff, David; Caprino, Joseph; Tomko, Karen; Thakur, Bhupender; Yang, Shuxiang; Moreno, Juana; Jarrell, Mark

    2012-02-01

    In Dynamical Mean Field Theory and its cluster extensions, such as the Dynamic Cluster Algorithm, the bottleneck of the algorithm is solving the self-consistency equations with an impurity solver. Hirsch-Fye Quantum Monte Carlo is one of the most commonly used impurity and cluster solvers. This work implements optimizations of the algorithm, such as enabling large data re-use, suitable for the Graphics Processing Unit (GPU) architecture. The GPU's sheer number of concurrent parallel computations and large bandwidth to many shared memories takes advantage of the inherent parallelism in the Green function update and measurement routines, and can substantially improve the efficiency of the Hirsch-Fye impurity solver.

  17. Porting a Hall MHD Code to a Graphic Processing Unit

    NASA Technical Reports Server (NTRS)

    Dorelli, John C.

    2011-01-01

    We present our experience porting a Hall MHD code to a Graphics Processing Unit (GPU). The code is a 2nd order accurate MUSCL-Hancock scheme which makes use of an HLL Riemann solver to compute numerical fluxes and second-order finite differences to compute the Hall contribution to the electric field. The divergence of the magnetic field is controlled with Dedner?s hyperbolic divergence cleaning method. Preliminary benchmark tests indicate a speedup (relative to a single Nehalem core) of 58x for a double precision calculation. We discuss scaling issues which arise when distributing work across multiple GPUs in a CPU-GPU cluster.

  18. Optimized Laplacian image sharpening algorithm based on graphic processing unit

    NASA Astrophysics Data System (ADS)

    Ma, Tinghuai; Li, Lu; Ji, Sai; Wang, Xin; Tian, Yuan; Al-Dhelaan, Abdullah; Al-Rodhaan, Mznah

    2014-12-01

    In classical Laplacian image sharpening, all pixels are processed one by one, which leads to large amount of computation. Traditional Laplacian sharpening processed on CPU is considerably time-consuming especially for those large pictures. In this paper, we propose a parallel implementation of Laplacian sharpening based on Compute Unified Device Architecture (CUDA), which is a computing platform of Graphic Processing Units (GPU), and analyze the impact of picture size on performance and the relationship between the processing time of between data transfer time and parallel computing time. Further, according to different features of different memory, an improved scheme of our method is developed, which exploits shared memory in GPU instead of global memory and further increases the efficiency. Experimental results prove that two novel algorithms outperform traditional consequentially method based on OpenCV in the aspect of computing speed.

  19. Exploiting Graphics Processing Units for Computational Biology and Bioinformatics

    PubMed Central

    Payne, Joshua L.; Sinnott-Armstrong, Nicholas A.; Moore, Jason H.

    2010-01-01

    Advances in the video gaming industry have led to the production of low-cost, high-performance graphics processing units (GPUs) that possess more memory bandwidth and computational capability than central processing units (CPUs), the standard workhorses of scientific computing. With the recent release of general-purpose GPUs and Nvidia's GPU programming language, CUDA, graphics engines are being adopted widely in scientific computing applications, particularly in the fields of computational biology and bioinformatics. The goal of this article is to concisely present an introduction to GPU hardware and programming, aimed at the computational biologist or bioinformaticist. To this end, we discuss the primary differences between GPU and CPU architecture, introduce the basics of the CUDA programming language, and discuss important CUDA programming practices, such as the proper use of coalesced reads, data types, and memory hierarchies. We highlight each of these topics in the context of computing the all-pairs distance between instances in a dataset, a common procedure in numerous disciplines of scientific computing. We conclude with a runtime analysis of the GPU and CPU implementations of the all-pairs distance calculation. We show our final GPU implementation to outperform the CPU implementation by a factor of 1700. PMID:20658333

  20. Exploiting graphics processing units for computational biology and bioinformatics.

    PubMed

    Payne, Joshua L; Sinnott-Armstrong, Nicholas A; Moore, Jason H

    2010-09-01

    Advances in the video gaming industry have led to the production of low-cost, high-performance graphics processing units (GPUs) that possess more memory bandwidth and computational capability than central processing units (CPUs), the standard workhorses of scientific computing. With the recent release of generalpurpose GPUs and NVIDIA's GPU programming language, CUDA, graphics engines are being adopted widely in scientific computing applications, particularly in the fields of computational biology and bioinformatics. The goal of this article is to concisely present an introduction to GPU hardware and programming, aimed at the computational biologist or bioinformaticist. To this end, we discuss the primary differences between GPU and CPU architecture, introduce the basics of the CUDA programming language, and discuss important CUDA programming practices, such as the proper use of coalesced reads, data types, and memory hierarchies. We highlight each of these topics in the context of computing the all-pairs distance between instances in a dataset, a common procedure in numerous disciplines of scientific computing. We conclude with a runtime analysis of the GPU and CPU implementations of the all-pairs distance calculation. We show our final GPU implementation to outperform the CPU implementation by a factor of 1700.

  1. Solar physics applications of computer graphics and image processing

    NASA Technical Reports Server (NTRS)

    Altschuler, M. D.

    1985-01-01

    Computer graphics devices coupled with computers and carefully developed software provide new opportunities to achieve insight into the geometry and time evolution of scalar, vector, and tensor fields and to extract more information quickly and cheaply from the same image data. Two or more different fields which overlay in space can be calculated from the data (and the physics), then displayed from any perspective, and compared visually. The maximum regions of one field can be compared with the gradients of another. Time changing fields can also be compared. Images can be added, subtracted, transformed, noise filtered, frequency filtered, contrast enhanced, color coded, enlarged, compressed, parameterized, and histogrammed, in whole or section by section. Today it is possible to process multiple digital images to reveal spatial and temporal correlations and cross correlations. Data from different observatories taken at different times can be processed, interpolated, and transformed to a common coordinate system.

  2. Fast free-form deformation using graphics processing units.

    PubMed

    Modat, Marc; Ridgway, Gerard R; Taylor, Zeike A; Lehmann, Manja; Barnes, Josephine; Hawkes, David J; Fox, Nick C; Ourselin, Sébastien

    2010-06-01

    A large number of algorithms have been developed to perform non-rigid registration and it is a tool commonly used in medical image analysis. The free-form deformation algorithm is a well-established technique, but is extremely time consuming. In this paper we present a parallel-friendly formulation of the algorithm suitable for graphics processing unit execution. Using our approach we perform registration of T1-weighted MR images in less than 1 min and show the same level of accuracy as a classical serial implementation when performing segmentation propagation. This technology could be of significant utility in time-critical applications such as image-guided interventions, or in the processing of large data sets. Copyright 2009 Elsevier Ireland Ltd. All rights reserved.

  3. Graphics processing unit accelerated computation of digital holograms.

    PubMed

    Kang, Hoonjong; Yaraş, Fahri; Onural, Levent

    2009-12-01

    An approximation for fast digital hologram generation is implemented on a central processing unit (CPU), a graphics processing unit (GPU), and a multi-GPU computational platform. The computational performance of the method on each platform is measured and compared. The computational speed on the GPU platform is much faster than on a CPU, and the algorithm could be further accelerated on a multi-GPU platform. In addition, the accuracy of the algorithm for single- and double-precision arithmetic is evaluated. The quality of the reconstruction from the algorithm using single-precision arithmetic is comparable with the quality from the double-precision arithmetic, and thus the implementation using single-precision arithmetic on a multi-GPU platform can be used for holographic video displays.

  4. Solar physics applications of computer graphics and image processing

    NASA Technical Reports Server (NTRS)

    Altschuler, M. D.

    1985-01-01

    Computer graphics devices coupled with computers and carefully developed software provide new opportunities to achieve insight into the geometry and time evolution of scalar, vector, and tensor fields and to extract more information quickly and cheaply from the same image data. Two or more different fields which overlay in space can be calculated from the data (and the physics), then displayed from any perspective, and compared visually. The maximum regions of one field can be compared with the gradients of another. Time changing fields can also be compared. Images can be added, subtracted, transformed, noise filtered, frequency filtered, contrast enhanced, color coded, enlarged, compressed, parameterized, and histogrammed, in whole or section by section. Today it is possible to process multiple digital images to reveal spatial and temporal correlations and cross correlations. Data from different observatories taken at different times can be processed, interpolated, and transformed to a common coordinate system.

  5. Implementing wide baseline matching algorithms on a graphics processing unit.

    SciTech Connect

    Rothganger, Fredrick H.; Larson, Kurt W.; Gonzales, Antonio Ignacio; Myers, Daniel S.

    2007-10-01

    Wide baseline matching is the state of the art for object recognition and image registration problems in computer vision. Though effective, the computational expense of these algorithms limits their application to many real-world problems. The performance of wide baseline matching algorithms may be improved by using a graphical processing unit as a fast multithreaded co-processor. In this paper, we present an implementation of the difference of Gaussian feature extractor, based on the CUDA system of GPU programming developed by NVIDIA, and implemented on their hardware. For a 2000x2000 pixel image, the GPU-based method executes nearly thirteen times faster than a comparable CPU-based method, with no significant loss of accuracy.

  6. Graphics Processing Unit Acceleration of Gyrokinetic Turbulence Simulations

    NASA Astrophysics Data System (ADS)

    Hause, Benjamin; Parker, Scott

    2012-10-01

    We find a substantial increase in on-node performance using Graphics Processing Unit (GPU) acceleration in gyrokinetic delta-f particle-in-cell simulation. Optimization is performed on a two-dimensional slab gyrokinetic particle simulation using the Portland Group Fortran compiler with the GPU accelerator compiler directives. We have implemented the GPU acceleration on a Core I7 gaming PC with a NVIDIA GTX 580 GPU. We find comparable, or better, acceleration relative to the NERSC DIRAC cluster with the NVIDIA Tesla C2050 computing processor. The Tesla C 2050 is about 2.6 times more expensive than the GTX 580 gaming GPU. Optimization strategies and comparisons between DIRAC and the gaming PC will be presented. We will also discuss progress on optimizing the comprehensive three dimensional general geometry GEM code.

  7. Graphics processing units accelerated semiclassical initial value representation molecular dynamics

    SciTech Connect

    Tamascelli, Dario; Dambrosio, Francesco Saverio; Conte, Riccardo; Ceotto, Michele

    2014-05-07

    This paper presents a Graphics Processing Units (GPUs) implementation of the Semiclassical Initial Value Representation (SC-IVR) propagator for vibrational molecular spectroscopy calculations. The time-averaging formulation of the SC-IVR for power spectrum calculations is employed. Details about the GPU implementation of the semiclassical code are provided. Four molecules with an increasing number of atoms are considered and the GPU-calculated vibrational frequencies perfectly match the benchmark values. The computational time scaling of two GPUs (NVIDIA Tesla C2075 and Kepler K20), respectively, versus two CPUs (Intel Core i5 and Intel Xeon E5-2687W) and the critical issues related to the GPU implementation are discussed. The resulting reduction in computational time and power consumption is significant and semiclassical GPU calculations are shown to be environment friendly.

  8. Fast Docking on Graphics Processing Units via Ray-Casting

    PubMed Central

    Khar, Karen R.; Goldschmidt, Lukasz; Karanicolas, John

    2013-01-01

    Docking Approach using Ray Casting (DARC) is structure-based computational method for carrying out virtual screening by docking small-molecules into protein surface pockets. In a complementary study we find that DARC can be used to identify known inhibitors from large sets of decoy compounds, and can identify new compounds that are active in biochemical assays. Here, we describe our adaptation of DARC for use on Graphics Processing Units (GPUs), leading to a speedup of approximately 27-fold in typical-use cases over the corresponding calculations carried out using a CPU alone. This dramatic speedup of DARC will enable screening larger compound libraries, screening with more conformations of each compound, and including multiple receptor conformations when screening. We anticipate that all three of these enhanced approaches, which now become tractable, will lead to improved screening results. PMID:23976948

  9. Graphics Processing Units and High-Dimensional Optimization.

    PubMed

    Zhou, Hua; Lange, Kenneth; Suchard, Marc A

    2010-08-01

    This paper discusses the potential of graphics processing units (GPUs) in high-dimensional optimization problems. A single GPU card with hundreds of arithmetic cores can be inserted in a personal computer and dramatically accelerates many statistical algorithms. To exploit these devices fully, optimization algorithms should reduce to multiple parallel tasks, each accessing a limited amount of data. These criteria favor EM and MM algorithms that separate parameters and data. To a lesser extent block relaxation and coordinate descent and ascent also qualify. We demonstrate the utility of GPUs in nonnegative matrix factorization, PET image reconstruction, and multidimensional scaling. Speedups of 100 fold can easily be attained. Over the next decade, GPUs will fundamentally alter the landscape of computational statistics. It is time for more statisticians to get on-board.

  10. Accelerating Density Functional Calculations with Graphics Processing Unit.

    PubMed

    Yasuda, Koji

    2008-08-01

    An algorithm is presented for graphics processing units (GPUs), which execute single-precision arithmetic much faster than commodity microprocessors (CPUs), to calculate the exchange-correlation term in ab initio density functional calculations. The algorithm was implemented and applied to two molecules, taxol and valinomycin. The errors in the total energies were about 10(-5) a.u., which is accurate enough for practical usage. If the exchange-correlation term is split into a simple analytic model potential and the correction to it, and only the latter is calculated with the GPU, the energy error is decreased by an order of magnitude. The resulting time to compute the exchange-correlation term is smaller than it is on the latest CPU by a factor of 10, indicating that a GPU running the proposed algorithm accelerates the density functional calculation considerably.

  11. Fast Pattern Classification of Ventricular Arrhythmias Using Graphics Processing Units

    NASA Astrophysics Data System (ADS)

    Lopes, Noel; Ribeiro, Bernardete

    Graphics Processing Units (GPUs) can provide remarkable performance gains when compared to CPUs for computationally-intensive applications. In the biomedical area, most of the previous studies are focused on using Neural Networks (NNs) for pattern recognition of biomedical signals. However, the long training times prevent them to be used in real-time. This is critical for the fast detection of Ventricular Arrhythmias (VAs) which may cause cardiac arrest and sudden death. In this paper, we present a parallel implementation of the Back-Propagation (BP) and the Multiple Back-Propagation (MBP) algorithm which allowed significant training speedups. In our proposal, we explicitly specify data parallel computations by defining special functions (kernels); therefore, we can use a fast evaluation strategy for reducing the computational cost without wasting memory resources. The performance of the pattern classification implementation is compared against other reported algorithms.

  12. Graphics Processing Units and High-Dimensional Optimization

    PubMed Central

    Zhou, Hua; Lange, Kenneth; Suchard, Marc A.

    2011-01-01

    This paper discusses the potential of graphics processing units (GPUs) in high-dimensional optimization problems. A single GPU card with hundreds of arithmetic cores can be inserted in a personal computer and dramatically accelerates many statistical algorithms. To exploit these devices fully, optimization algorithms should reduce to multiple parallel tasks, each accessing a limited amount of data. These criteria favor EM and MM algorithms that separate parameters and data. To a lesser extent block relaxation and coordinate descent and ascent also qualify. We demonstrate the utility of GPUs in nonnegative matrix factorization, PET image reconstruction, and multidimensional scaling. Speedups of 100 fold can easily be attained. Over the next decade, GPUs will fundamentally alter the landscape of computational statistics. It is time for more statisticians to get on-board. PMID:21847315

  13. High-throughput sequence alignment using Graphics Processing Units.

    PubMed

    Schatz, Michael C; Trapnell, Cole; Delcher, Arthur L; Varshney, Amitabh

    2007-12-10

    The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs) in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA) from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies. MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU.

  14. Graphic Arts: Process Camera, Stripping, and Platemaking. Teacher Guide.

    ERIC Educational Resources Information Center

    Feasley, Sue C., Ed.

    This curriculum guide is the second in a three-volume series of instructional materials for competency-based graphic arts instruction. Each publication is designed to include the technical content and tasks necessary for a student to be employed in an entry-level graphic arts occupation. Introductory materials include an instructional/task…

  15. Graphic Arts: The Press and Finishing Processes. Teacher Guide.

    ERIC Educational Resources Information Center

    Feasley, Sue C., Ed.

    This curriculum guide is the third in a three-volume series of instructional materials for competency-based graphic arts instruction. Each publication is designed to include the technical content and tasks necessary for a student to be employed in an entry-level graphic arts occupation. Introductory materials include an instructional/task analysis…

  16. Role of Graphics Tools in the Learning Design Process

    ERIC Educational Resources Information Center

    Laisney, Patrice; Brandt-Pomares, Pascale

    2015-01-01

    This paper discusses the design activities of students in secondary school in France. Graphics tools are now part of the capacity of design professionals. It is therefore apt to reflect on their integration into the technological education. Has the use of intermediate graphical tools changed students' performance, and if so in what direction, in…

  17. Role of Graphics Tools in the Learning Design Process

    ERIC Educational Resources Information Center

    Laisney, Patrice; Brandt-Pomares, Pascale

    2015-01-01

    This paper discusses the design activities of students in secondary school in France. Graphics tools are now part of the capacity of design professionals. It is therefore apt to reflect on their integration into the technological education. Has the use of intermediate graphical tools changed students' performance, and if so in what direction, in…

  18. A streaming-based solution for remote visualization of 3D graphics on mobile devices.

    PubMed

    Lamberti, Fabrizio; Sanna, Andrea

    2007-01-01

    Mobile devices such as Personal Digital Assistants, Tablet PCs, and cellular phones have greatly enhanced user capability to connect to remote resources. Although a large set of applications are now available bridging the gap between desktop and mobile devices, visualization of complex 3D models is still a task hard to accomplish without specialized hardware. This paper proposes a system where a cluster of PCs, equipped with accelerated graphics cards managed by the Chromium software, is able to handle remote visualization sessions based on MPEG video streaming involving complex 3D models. The proposed framework allows mobile devices such as smart phones, Personal Digital Assistants (PDAs), and Tablet PCs to visualize objects consisting of millions of textured polygons and voxels at a frame rate of 30 fps or more depending on hardware resources at the server side and on multimedia capabilities at the client side. The server is able to concurrently manage multiple clients computing a video stream for each one; resolution and quality of each stream is tailored according to screen resolution and bandwidth of the client. The paper investigates in depth issues related to latency time, bit rate and quality of the generated stream, screen resolutions, as well as frames per second displayed.

  19. Matrix decomposition graphics processing unit solver for Poisson image editing

    NASA Astrophysics Data System (ADS)

    Lei, Zhao; Wei, Li

    2012-10-01

    In recent years, gradient-domain methods have been widely discussed in the image processing field, including seamless cloning and image stitching. These algorithms are commonly carried out by solving a large sparse linear system: the Poisson equation. However, solving the Poisson equation is a computational and memory intensive task which makes it not suitable for real-time image editing. A new matrix decomposition graphics processing unit (GPU) solver (MDGS) is proposed to settle the problem. A matrix decomposition method is used to distribute the work among GPU threads, so that MDGS will take full advantage of the computing power of current GPUs. Additionally, MDGS is a hybrid solver (combines both the direct and iterative techniques) and has two-level architecture. These enable MDGS to generate identical solutions with those of the common Poisson methods and achieve high convergence rate in most cases. This approach is advantageous in terms of parallelizability, enabling real-time image processing, low memory-taken and extensive applications.

  20. Simplification of 3D Graphics for Mobile Devices: Exploring the Trade-off Between Energy Savings and User Perceptions of Visual Quality

    NASA Astrophysics Data System (ADS)

    Vatjus-Anttila, Jarkko; Koskela, Timo; Lappalainen, Tuomas; Häkkilä, Jonna

    2017-03-01

    3D graphics have quickly become a popular form of media that can also be accessed with today's mobile devices. However, the use of 3D applications with mobile devices is typically a very energy-consuming task due to the processing complexity and the large file size of 3D graphics. As a result, their use may lead to rapid depletion of the limited battery life. In this paper, we investigate how much energy savings can be gained in the transmission and rendering of 3D graphics by simplifying geometry data. In this connection, we also examine users' perceptions on the visual quality of the simplified 3D models. The results of this paper provide new knowledge on the energy savings that can be gained through geometry simplification, as well as on how much the geometry can be simplified before the visual quality of 3D models becomes unacceptable for the mobile users. Based on the results, it can be concluded that geometry simplification can provide significant energy savings for mobile devices without disturbing the users. When geometry simplification is combined with distance based adjustment of detail, up to 52% energy savings were gained in our experiments compared to using only a single high quality 3D model.

  1. Graphics processing units in bioinformatics, computational biology and systems biology.

    PubMed

    Nobile, Marco S; Cazzaniga, Paolo; Tangherloni, Andrea; Besozzi, Daniela

    2017-09-01

    Several studies in Bioinformatics, Computational Biology and Systems Biology rely on the definition of physico-chemical or mathematical models of biological systems at different scales and levels of complexity, ranging from the interaction of atoms in single molecules up to genome-wide interaction networks. Traditional computational methods and software tools developed in these research fields share a common trait: they can be computationally demanding on Central Processing Units (CPUs), therefore limiting their applicability in many circumstances. To overcome this issue, general-purpose Graphics Processing Units (GPUs) are gaining an increasing attention by the scientific community, as they can considerably reduce the running time required by standard CPU-based software, and allow more intensive investigations of biological systems. In this review, we present a collection of GPU tools recently developed to perform computational analyses in life science disciplines, emphasizing the advantages and the drawbacks in the use of these parallel architectures. The complete list of GPU-powered tools here reviewed is available at http://bit.ly/gputools. © The Author 2016. Published by Oxford University Press.

  2. Parallel Latent Semantic Analysis using a Graphics Processing Unit

    SciTech Connect

    Cui, Xiaohui; Potok, Thomas E; Cavanagh, Joseph M

    2009-01-01

    Latent Semantic Analysis (LSA) can be used to reduce the dimensions of large Term-Document datasets using Singular Value Decomposition. However, with the ever expanding size of data sets, current implementations are not fast enough to quickly and easily compute the results on a standard PC. The Graphics Processing Unit (GPU) can solve some highly parallel problems much faster than the traditional sequential processor (CPU). Thus, a deployable system using a GPU to speedup large-scale LSA processes would be a much more effective choice (in terms of cost/performance ratio) than using a computer cluster. In this paper, we presented a parallel LSA implementation on the GPU, using NVIDIA Compute Unified Device Architecture (CUDA) and Compute Unified Basic Linear Algebra Subprograms (CUBLAS). The performance of this implementation is compared to traditional LSA implementation on CPU using an optimized Basic Linear Algebra Subprograms library. For large matrices that have dimensions divisible by 16, the GPU algorithm ran five to six times faster than the CPU version.

  3. Accelerating radio astronomy cross-correlation with graphics processing units

    NASA Astrophysics Data System (ADS)

    Clark, M. A.; LaPlante, P. C.; Greenhill, L. J.

    2013-05-01

    We present a highly parallel implementation of the cross-correlation of time-series data using graphics processing units (GPUs), which is scalable to hundreds of independent inputs and suitable for the processing of signals from 'large-Formula' arrays of many radio antennas. The computational part of the algorithm, the X-engine, is implemented efficiently on NVIDIA's Fermi architecture, sustaining up to 79% of the peak single-precision floating-point throughput. We compare performance obtained for hardware- and software-managed caches, observing significantly better performance for the latter. The high performance reported involves use of a multi-level data tiling strategy in memory and use of a pipelined algorithm with simultaneous computation and transfer of data from host to device memory. The speed of code development, flexibility, and low cost of the GPU implementations compared with application-specific integrated circuit (ASIC) and field programmable gate array (FPGA) implementations have the potential to greatly shorten the cycle of correlator development and deployment, for cases where some power-consumption penalty can be tolerated.

  4. Accelerated Searches of Gravitational Waves Using Graphics Processing Units

    NASA Astrophysics Data System (ADS)

    Chung, Shin Kee; Wen, Linqing; Blair, David; Cannon, Kipp

    2010-06-01

    The existence of gravitational waves was predicted by Albert Einstein. Black hole and neutron star binary systems will product strong gravitational waves through their inspiral and eventual merger. The analysis of the gravitational wave data is computationally intensive, requiring matched filtering of terabytes of data with a bank of at least 3000 numerical templates that represent predicted waveforms. We need to complete the analysis in real-time (within the duration of the signal) in order to enable follow-up observations with some conventional optical or radio telescopes. We report a novel application of a graphics processing units (GPUs) for the purpose of accelerating the search pipelines for gravitational waves from coalescing binary systems of compact objects. A speed-up of 16 fold in total has been achieved with an NVIDIA GeForce 8800 Ultra GPU card compared with a standard central processing unit (CPU). We show that further improvements are possible and discuss the reduction in CPU number required for the detection of inspiral sources afforded by the use of GPUs.

  5. Accelerating sino-atrium computer simulations with graphic processing units.

    PubMed

    Zhang, Hong; Xiao, Zheng; Lin, Shien-fong

    2015-01-01

    Sino-atrial node cells (SANCs) play a significant role in rhythmic firing. To investigate their role in arrhythmia and interactions with the atrium, computer simulations based on cellular dynamic mathematical models are generally used. However, the large-scale computation usually makes research difficult, given the limited computational power of Central Processing Units (CPUs). In this paper, an accelerating approach with Graphic Processing Units (GPUs) is proposed in a simulation consisting of the SAN tissue and the adjoining atrium. By using the operator splitting method, the computational task was made parallel. Three parallelization strategies were then put forward. The strategy with the shortest running time was further optimized by considering block size, data transfer and partition. The results showed that for a simulation with 500 SANCs and 30 atrial cells, the execution time taken by the non-optimized program decreased 62% with respect to a serial program running on CPU. The execution time decreased by 80% after the program was optimized. The larger the tissue was, the more significant the acceleration became. The results demonstrated the effectiveness of the proposed GPU-accelerating methods and their promising applications in more complicated biological simulations.

  6. Acceleration of cardiac tissue simulation with graphic processing units.

    PubMed

    Sato, Daisuke; Xie, Yuanfang; Weiss, James N; Qu, Zhilin; Garfinkel, Alan; Sanderson, Allen R

    2009-09-01

    In this technical note we show the promise of using graphic processing units (GPUs) to accelerate simulations of electrical wave propagation in cardiac tissue, one of the more demanding computational problems in cardiology. We have found that the computational speed of two-dimensional (2D) tissue simulations with a single commercially available GPU is about 30 times faster than with a single 2.0 GHz Advanced Micro Devices (AMD) Opteron processor. We have also simulated wave conduction in the three-dimensional (3D) anatomic heart with GPUs where we found the computational speed with a single GPU is 1.6 times slower than with a 32-central processing unit (CPU) Opteron cluster. However, a cluster with two or four GPUs is faster than the CPU-based cluster. These results demonstrate that a commodity personal computer is able to perform a whole heart simulation of electrical wave conduction within times that enable the investigators to interact more easily with their simulations.

  7. Handling geophysical flows: Numerical modelling using Graphical Processing Units

    NASA Astrophysics Data System (ADS)

    Garcia-Navarro, Pilar; Lacasta, Asier; Juez, Carmelo; Morales-Hernandez, Mario

    2016-04-01

    Computational tools may help engineers in the assessment of sediment transport during the decision-making processes. The main requirements are that the numerical results have to be accurate and simulation models must be fast. The present work is based on the 2D shallow water equations in combination with the 2D Exner equation [1]. The resulting numerical model accuracy was already discussed in previous work. Regarding the speed of the computation, the Exner equation slows down the already costly 2D shallow water model as the number of variables to solve is increased and the numerical stability is more restrictive. On the other hand, the movement of poorly sorted material over steep areas constitutes a hazardous environmental problem. Computational tools help in the predictions of such landslides [2]. In order to overcome this problem, this work proposes the use of Graphical Processing Units (GPUs) for decreasing significantly the simulation time [3, 4]. The numerical scheme implemented in GPU is based on a finite volume scheme. The mathematical model and the numerical implementation are compared against experimental and field data. In addition, the computational times obtained with the Graphical Hardware technology are compared against Single-Core (sequential) and Multi-Core (parallel) CPU implementations. References [Juez et al.(2014)] Juez, C., Murillo, J., & Garca-Navarro, P. (2014) A 2D weakly-coupled and efficient numerical model for transient shallow flow and movable bed. Advances in Water Resources. 71 93-109. [Juez et al.(2013)] Juez, C., Murillo, J., & Garca-Navarro, P. (2013) . 2D simulation of granular flow over irregular steep slopes using global and local coordinates. Journal of Computational Physics. 225 166-204. [Lacasta et al.(2014)] Lacasta, A., Morales-Hernndez, M., Murillo, J., & Garca-Navarro, P. (2014) An optimized GPU implementation of a 2D free surface simulation model on unstructured meshes Advances in Engineering Software. 78 1-15. [Lacasta

  8. Retrospective Study on Mathematical Modeling Based on Computer Graphic Processing

    NASA Astrophysics Data System (ADS)

    Zhang, Kai Li

    Graphics & image making is an important field in computer application, in which visualization software has been widely used with the characteristics of convenience and quick. However, it was thought by modeling designers that the software had been limited in it's function and flexibility because mathematics modeling platform was not built. A non-visualization graphics software appearing at this moment enabled the graphics & image design has a very good mathematics modeling platform. In the paper, a polished pyramid is established by multivariate spline function algorithm, and validate the non-visualization software is good in mathematical modeling.

  9. Graphics Processing Unit Acceleration of Gyrokinetic Turbulence Simulations

    NASA Astrophysics Data System (ADS)

    Hause, Benjamin; Parker, Scott; Chen, Yang

    2013-10-01

    We find a substantial increase in on-node performance using Graphics Processing Unit (GPU) acceleration in gyrokinetic delta-f particle-in-cell simulation. Optimization is performed on a two-dimensional slab gyrokinetic particle simulation using the Portland Group Fortran compiler with the OpenACC compiler directives and Fortran CUDA. Mixed implementation of both Open-ACC and CUDA is demonstrated. CUDA is required for optimizing the particle deposition algorithm. We have implemented the GPU acceleration on a third generation Core I7 gaming PC with two NVIDIA GTX 680 GPUs. We find comparable, or better, acceleration relative to the NERSC DIRAC cluster with the NVIDIA Tesla C2050 computing processor. The Tesla C 2050 is about 2.6 times more expensive than the GTX 580 gaming GPU. We also see enormous speedups (10 or more) on the Titan supercomputer at Oak Ridge with Kepler K20 GPUs. Results show speed-ups comparable or better than that of OpenMP models utilizing multiple cores. The use of hybrid OpenACC, CUDA Fortran, and MPI models across many nodes will also be discussed. Optimization strategies will be presented. We will discuss progress on optimizing the comprehensive three dimensional general geometry GEM code.

  10. Graphics processing unit-based alignment of protein interaction networks.

    PubMed

    Xie, Jiang; Zhou, Zhonghua; Ma, Jin; Xiang, Chaojuan; Nie, Qing; Zhang, Wu

    2015-08-01

    Network alignment is an important bridge to understanding human protein-protein interactions (PPIs) and functions through model organisms. However, the underlying subgraph isomorphism problem complicates and increases the time required to align protein interaction networks (PINs). Parallel computing technology is an effective solution to the challenge of aligning large-scale networks via sequential computing. In this study, the typical Hungarian-Greedy Algorithm (HGA) is used as an example for PIN alignment. The authors propose a HGA with 2-nearest neighbours (HGA-2N) and implement its graphics processing unit (GPU) acceleration. Numerical experiments demonstrate that HGA-2N can find alignments that are close to those found by HGA while dramatically reducing computing time. The GPU implementation of HGA-2N optimises the parallel pattern, computing mode and storage mode and it improves the computing time ratio between the CPU and GPU compared with HGA when large-scale networks are considered. By using HGA-2N in GPUs, conserved PPIs can be observed, and potential PPIs can be predicted. Among the predictions based on 25 common Gene Ontology terms, 42.8% can be found in the Human Protein Reference Database. Furthermore, a new method of reconstructing phylogenetic trees is introduced, which shows the same relationships among five herpes viruses that are obtained using other methods.

  11. Multilevel Summation of Electrostatic Potentials Using Graphics Processing Units.

    PubMed

    Hardy, David J; Stone, John E; Schulten, Klaus

    2009-03-01

    Physical and engineering practicalities involved in microprocessor design have resulted in flat performance growth for traditional single-core microprocessors. The urgent need for continuing increases in the performance of scientific applications requires the use of many-core processors and accelerators such as graphics processing units (GPUs). This paper discusses GPU acceleration of the multilevel summation method for computing electrostatic potentials and forces for a system of charged atoms, which is a problem of paramount importance in biomolecular modeling applications. We present and test a new GPU algorithm for the long-range part of the potentials that computes a cutoff pair potential between lattice points, essentially convolving a fixed 3-D lattice of "weights" over all sub-cubes of a much larger lattice. The implementation exploits the different memory subsystems provided on the GPU to stream optimally sized data sets through the multiprocessors. We demonstrate for the full multilevel summation calculation speedups of up to 26 using a single GPU and 46 using multiple GPUs, enabling the computation of a high-resolution map of the electrostatic potential for a system of 1.5 million atoms in under 12 seconds.

  12. Graphics processing unit-accelerated quantitative trait Loci detection.

    PubMed

    Chapuis, Guillaume; Filangi, Olivier; Elsen, Jean-Michel; Lavenier, Dominique; Le Roy, Pascale

    2013-09-01

    Mapping quantitative trait loci (QTL) using genetic marker information is a time-consuming analysis that has interested the mapping community in recent decades. The increasing amount of genetic marker data allows one to consider ever more precise QTL analyses while increasing the demand for computation. Part of the difficulty of detecting QTLs resides in finding appropriate critical values or threshold values, above which a QTL effect is considered significant. Different approaches exist to determine these thresholds, using either empirical methods or algebraic approximations. In this article, we present a new implementation of existing software, QTLMap, which takes advantage of the data parallel nature of the problem by offsetting heavy computations to a graphics processing unit (GPU). Developments on the GPU were implemented using Cuda technology. This new implementation performs up to 75 times faster than the previous multicore implementation, while maintaining the same results and level of precision (Double Precision) and computing both QTL values and thresholds. This speedup allows one to perform more complex analyses, such as linkage disequilibrium linkage analyses (LDLA) and multiQTL analyses, in a reasonable time frame.

  13. Efficient graphics processing unit-based voxel carving for surveillance

    NASA Astrophysics Data System (ADS)

    Ober-Gecks, Antje; Zwicker, Marius; Henrich, Dominik

    2016-07-01

    A graphics processing unit (GPU)-based implementation of a space carving method for the reconstruction of the photo hull is presented. In particular, the generalized voxel coloring with item buffer approach is transferred to the GPU. The fast computation on the GPU is realized by an incrementally calculated standard deviation within the likelihood ratio test, which is applied as color consistency criterion. A fast and efficient computation of complete voxel-pixel projections is provided using volume rendering methods. This generates a speedup of the iterative carving procedure while considering all given pixel color information. Different volume rendering methods, such as texture mapping and raycasting, are examined. The termination of the voxel carving procedure is controlled through an anytime concept. The photo hull algorithm is examined for its applicability to real-world surveillance scenarios as an online reconstruction method. For this reason, a GPU-based redesign of a visual hull algorithm is provided that utilizes geometric knowledge about known static occluders of the scene in order to create a conservative and complete visual hull that includes all given objects. This visual hull approximation serves as input for the photo hull algorithm.

  14. Use of general purpose graphics processing units with MODFLOW.

    PubMed

    Hughes, Joseph D; White, Jeremy T

    2013-01-01

    To evaluate the use of general-purpose graphics processing units (GPGPUs) to improve the performance of MODFLOW, an unstructured preconditioned conjugate gradient (UPCG) solver has been developed. The UPCG solver uses a compressed sparse row storage scheme and includes Jacobi, zero fill-in incomplete, and modified-incomplete lower-upper (LU) factorization, and generalized least-squares polynomial preconditioners. The UPCG solver also includes options for sequential and parallel solution on the central processing unit (CPU) using OpenMP. For simulations utilizing the GPGPU, all basic linear algebra operations are performed on the GPGPU; memory copies between the central processing unit CPU and GPCPU occur prior to the first iteration of the UPCG solver and after satisfying head and flow criteria or exceeding a maximum number of iterations. The efficiency of the UPCG solver for GPGPU and CPU solutions is benchmarked using simulations of a synthetic, heterogeneous unconfined aquifer with tens of thousands to millions of active grid cells. Testing indicates GPGPU speedups on the order of 2 to 8, relative to the standard MODFLOW preconditioned conjugate gradient (PCG) solver, can be achieved when (1) memory copies between the CPU and GPGPU are optimized, (2) the percentage of time performing memory copies between the CPU and GPGPU is small relative to the calculation time, (3) high-performance GPGPU cards are utilized, and (4) CPU-GPGPU combinations are used to execute sequential operations that are difficult to parallelize. Furthermore, UPCG solver testing indicates GPGPU speedups exceed parallel CPU speedups achieved using OpenMP on multicore CPUs for preconditioners that can be easily parallelized. Published 2013. This article is a U.S. Government work and is in the public domain in the USA.

  15. Use of general purpose graphics processing units with MODFLOW

    USGS Publications Warehouse

    Hughes, Joseph D.; White, Jeremy T.

    2013-01-01

    To evaluate the use of general-purpose graphics processing units (GPGPUs) to improve the performance of MODFLOW, an unstructured preconditioned conjugate gradient (UPCG) solver has been developed. The UPCG solver uses a compressed sparse row storage scheme and includes Jacobi, zero fill-in incomplete, and modified-incomplete lower-upper (LU) factorization, and generalized least-squares polynomial preconditioners. The UPCG solver also includes options for sequential and parallel solution on the central processing unit (CPU) using OpenMP. For simulations utilizing the GPGPU, all basic linear algebra operations are performed on the GPGPU; memory copies between the central processing unit CPU and GPCPU occur prior to the first iteration of the UPCG solver and after satisfying head and flow criteria or exceeding a maximum number of iterations. The efficiency of the UPCG solver for GPGPU and CPU solutions is benchmarked using simulations of a synthetic, heterogeneous unconfined aquifer with tens of thousands to millions of active grid cells. Testing indicates GPGPU speedups on the order of 2 to 8, relative to the standard MODFLOW preconditioned conjugate gradient (PCG) solver, can be achieved when (1) memory copies between the CPU and GPGPU are optimized, (2) the percentage of time performing memory copies between the CPU and GPGPU is small relative to the calculation time, (3) high-performance GPGPU cards are utilized, and (4) CPU-GPGPU combinations are used to execute sequential operations that are difficult to parallelize. Furthermore, UPCG solver testing indicates GPGPU speedups exceed parallel CPU speedups achieved using OpenMP on multicore CPUs for preconditioners that can be easily parallelized.

  16. MASSIVELY PARALLEL LATENT SEMANTIC ANALYSES USING A GRAPHICS PROCESSING UNIT

    SciTech Connect

    Cavanagh, J.; Cui, S.

    2009-01-01

    Latent Semantic Analysis (LSA) aims to reduce the dimensions of large term-document datasets using Singular Value Decomposition. However, with the ever-expanding size of datasets, current implementations are not fast enough to quickly and easily compute the results on a standard PC. A graphics processing unit (GPU) can solve some highly parallel problems much faster than a traditional sequential processor or central processing unit (CPU). Thus, a deployable system using a GPU to speed up large-scale LSA processes would be a much more effective choice (in terms of cost/performance ratio) than using a PC cluster. Due to the GPU’s application-specifi c architecture, harnessing the GPU’s computational prowess for LSA is a great challenge. We presented a parallel LSA implementation on the GPU, using NVIDIA® Compute Unifi ed Device Architecture and Compute Unifi ed Basic Linear Algebra Subprograms software. The performance of this implementation is compared to traditional LSA implementation on a CPU using an optimized Basic Linear Algebra Subprograms library. After implementation, we discovered that the GPU version of the algorithm was twice as fast for large matrices (1 000x1 000 and above) that had dimensions not divisible by 16. For large matrices that did have dimensions divisible by 16, the GPU algorithm ran fi ve to six times faster than the CPU version. The large variation is due to architectural benefi ts of the GPU for matrices divisible by 16. It should be noted that the overall speeds for the CPU version did not vary from relative normal when the matrix dimensions were divisible by 16. Further research is needed in order to produce a fully implementable version of LSA. With that in mind, the research we presented shows that the GPU is a viable option for increasing the speed of LSA, in terms of cost/performance ratio.

  17. Point-Cloud Compression for Vehicle-Based Mobile Mapping Systems Using Portable Network Graphics

    NASA Astrophysics Data System (ADS)

    Kohira, K.; Masuda, H.

    2017-09-01

    A mobile mapping system is effective for capturing dense point-clouds of roads and roadside objects Point-clouds of urban areas, residential areas, and arterial roads are useful for maintenance of infrastructure, map creation, and automatic driving. However, the data size of point-clouds measured in large areas is enormously large. A large storage capacity is required to store such point-clouds, and heavy loads will be taken on network if point-clouds are transferred through the network. Therefore, it is desirable to reduce data sizes of point-clouds without deterioration of quality. In this research, we propose a novel point-cloud compression method for vehicle-based mobile mapping systems. In our compression method, point-clouds are mapped onto 2D pixels using GPS time and the parameters of the laser scanner. Then, the images are encoded in the Portable Networking Graphics (PNG) format and compressed using the PNG algorithm. In our experiments, our method could efficiently compress point-clouds without deteriorating the quality.

  18. Accelerating chemical database searching using graphics processing units.

    PubMed

    Liu, Pu; Agrafiotis, Dimitris K; Rassokhin, Dmitrii N; Yang, Eric

    2011-08-22

    The utility of chemoinformatics systems depends on the accurate computer representation and efficient manipulation of chemical compounds. In such systems, a small molecule is often digitized as a large fingerprint vector, where each element indicates the presence/absence or the number of occurrences of a particular structural feature. Since in theory the number of unique features can be exceedingly large, these fingerprint vectors are usually folded into much shorter ones using hashing and modulo operations, allowing fast "in-memory" manipulation and comparison of molecules. There is increasing evidence that lossless fingerprints can substantially improve retrieval performance in chemical database searching (substructure or similarity), which have led to the development of several lossless fingerprint compression algorithms. However, any gains in storage and retrieval afforded by compression need to be weighed against the extra computational burden required for decompression before these fingerprints can be compared. Here we demonstrate that graphics processing units (GPU) can greatly alleviate this problem, enabling the practical application of lossless fingerprints on large databases. More specifically, we show that, with the help of a ~$500 ordinary video card, the entire PubChem database of ~32 million compounds can be searched in ~0.2-2 s on average, which is 2 orders of magnitude faster than a conventional CPU. If multiple query patterns are processed in batch, the speedup is even more dramatic (less than 0.02-0.2 s/query for 1000 queries). In the present study, we use the Elias gamma compression algorithm, which results in a compression ratio as high as 0.097.

  19. Massively Parallel Latent Semantic Analyzes using a Graphics Processing Unit

    SciTech Connect

    Cavanagh, Joseph M; Cui, Xiaohui

    2009-01-01

    Latent Semantic Indexing (LSA) aims to reduce the dimensions of large Term-Document datasets using Singular Value Decomposition. However, with the ever expanding size of data sets, current implementations are not fast enough to quickly and easily compute the results on a standard PC. The Graphics Processing Unit (GPU) can solve some highly parallel problems much faster than the traditional sequential processor (CPU). Thus, a deployable system using a GPU to speedup large-scale LSA processes would be a much more effective choice (in terms of cost/performance ratio) than using a computer cluster. Due to the GPU s application-specific architecture, harnessing the GPU s computational prowess for LSA is a great challenge. We present a parallel LSA implementation on the GPU, using NVIDIA Compute Unified Device Architecture and Compute Unified Basic Linear Algebra Subprograms. The performance of this implementation is compared to traditional LSA implementation on CPU using an optimized Basic Linear Algebra Subprograms library. After implementation, we discovered that the GPU version of the algorithm was twice as fast for large matrices (1000x1000 and above) that had dimensions not divisible by 16. For large matrices that did have dimensions divisible by 16, the GPU algorithm ran five to six times faster than the CPU version. The large variation is due to architectural benefits the GPU has for matrices divisible by 16. It should be noted that the overall speeds for the CPU version did not vary from relative normal when the matrix dimensions were divisible by 16. Further research is needed in order to produce a fully implementable version of LSA. With that in mind, the research we presented shows that the GPU is a viable option for increasing the speed of LSA, in terms of cost/performance ratio.

  20. Area-delay trade-offs of texture decompressors for a graphics processing unit

    NASA Astrophysics Data System (ADS)

    Novoa Súñer, Emilio; Ituero, Pablo; López-Vallejo, Marisa

    2011-05-01

    Graphics Processing Units have become a booster for the microelectronics industry. However, due to intellectual property issues, there is a serious lack of information on implementation details of the hardware architecture that is behind GPUs. For instance, the way texture is handled and decompressed in a GPU to reduce bandwidth usage has never been dealt with in depth from a hardware point of view. This work addresses a comparative study on the hardware implementation of different texture decompression algorithms for both conventional (PCs and video game consoles) and mobile platforms. Circuit synthesis is performed targeting both a reconfigurable hardware platform and a 90nm standard cell library. Area-delay trade-offs have been extensively analyzed, which allows us to compare the complexity of decompressors and thus determine suitability of algorithms for systems with limited hardware resources.

  1. A Block-Asynchronous Relaxation Method for Graphics Processing Units

    SciTech Connect

    Antz, Hartwig; Tomov, Stanimire; Dongarra, Jack; Heuveline, Vincent

    2011-11-30

    In this paper, we analyze the potential of asynchronous relaxation methods on Graphics Processing Units (GPUs). For this purpose, we developed a set of asynchronous iteration algorithms in CUDA and compared them with a parallel implementation of synchronous relaxation methods on CPU-based systems. For a set of test matrices taken from the University of Florida Matrix Collection we monitor the convergence behavior, the average iteration time and the total time-to-solution time. Analyzing the results, we observe that even for our most basic asynchronous relaxation scheme, despite its lower convergence rate compared to the Gauss-Seidel relaxation (that we expected), the asynchronous iteration running on GPUs is still able to provide solution approximations of certain accuracy in considerably shorter time then Gauss- Seidel running on CPUs. Hence, it overcompensates for the slower convergence by exploiting the scalability and the good fit of the asynchronous schemes for the highly parallel GPU architectures. Further, enhancing the most basic asynchronous approach with hybrid schemes – using multiple iterations within the ”subdomain” handled by a GPU thread block and Jacobi-like asynchronous updates across the ”boundaries”, subject to tuning various parameters – we manage to not only recover the loss of global convergence but often accelerate convergence of up to two times (compared to the effective but difficult to parallelize Gauss-Seidel type of schemes), while keeping the execution time of a global iteration practically the same. This shows the high potential of the asynchronous methods not only as a stand alone numerical solver for linear systems of equations fulfilling certain convergence conditions but more importantly as a smoother in multigrid methods. Due to the explosion of parallelism in todays architecture designs, the significance and the need for asynchronous methods, as the ones described in this work, is expected to grow.

  2. Flocking-based Document Clustering on the Graphics Processing Unit

    SciTech Connect

    Cui, Xiaohui; Potok, Thomas E; Patton, Robert M; ST Charles, Jesse Lee

    2008-01-01

    Abstract?Analyzing and grouping documents by content is a complex problem. One explored method of solving this problem borrows from nature, imitating the flocking behavior of birds. Each bird represents a single document and flies toward other documents that are similar to it. One limitation of this method of document clustering is its complexity O(n2). As the number of documents grows, it becomes increasingly difficult to receive results in a reasonable amount of time. However, flocking behavior, along with most naturally inspired algorithms such as ant colony optimization and particle swarm optimization, are highly parallel and have found increased performance on expensive cluster computers. In the last few years, the graphics processing unit (GPU) has received attention for its ability to solve highly-parallel and semi-parallel problems much faster than the traditional sequential processor. Some applications see a huge increase in performance on this new platform. The cost of these high-performance devices is also marginal when compared with the price of cluster machines. In this paper, we have conducted research to exploit this architecture and apply its strengths to the document flocking problem. Our results highlight the potential benefit the GPU brings to all naturally inspired algorithms. Using the CUDA platform from NIVIDA? we developed a document flocking implementation to be run on the NIVIDA?GEFORCE 8800. Additionally, we developed a similar but sequential implementation of the same algorithm to be run on a desktop CPU. We tested the performance of each on groups of news articles ranging in size from 200 to 3000 documents. The results of these tests were very significant. Performance gains ranged from three to nearly five times improvement of the GPU over the CPU implementation. This dramatic improvement in runtime makes the GPU a potentially revolutionary platform for document clustering algorithms.

  3. Viscoelastic Finite Difference Modeling Using Graphics Processing Units

    NASA Astrophysics Data System (ADS)

    Fabien-Ouellet, G.; Gloaguen, E.; Giroux, B.

    2014-12-01

    Full waveform seismic modeling requires a huge amount of computing power that still challenges today's technology. This limits the applicability of powerful processing approaches in seismic exploration like full-waveform inversion. This paper explores the use of Graphics Processing Units (GPU) to compute a time based finite-difference solution to the viscoelastic wave equation. The aim is to investigate whether the adoption of the GPU technology is susceptible to reduce significantly the computing time of simulations. The code presented herein is based on the freely accessible software of Bohlen (2002) in 2D provided under a General Public License (GNU) licence. This implementation is based on a second order centred differences scheme to approximate time differences and staggered grid schemes with centred difference of order 2, 4, 6, 8, and 12 for spatial derivatives. The code is fully parallel and is written using the Message Passing Interface (MPI), and it thus supports simulations of vast seismic models on a cluster of CPUs. To port the code from Bohlen (2002) on GPUs, the OpenCl framework was chosen for its ability to work on both CPUs and GPUs and its adoption by most of GPU manufacturers. In our implementation, OpenCL works in conjunction with MPI, which allows computations on a cluster of GPU for large-scale model simulations. We tested our code for model sizes between 1002 and 60002 elements. Comparison shows a decrease in computation time of more than two orders of magnitude between the GPU implementation run on a AMD Radeon HD 7950 and the CPU implementation run on a 2.26 GHz Intel Xeon Quad-Core. The speed-up varies depending on the order of the finite difference approximation and generally increases for higher orders. Increasing speed-ups are also obtained for increasing model size, which can be explained by kernel overheads and delays introduced by memory transfers to and from the GPU through the PCI-E bus. Those tests indicate that the GPU memory size

  4. Graphic Arts: Process Camera, Stripping, and Platemaking. Fourth Edition. Teacher Edition [and] Student Edition.

    ERIC Educational Resources Information Center

    Multistate Academic and Vocational Curriculum Consortium, Stillwater, OK.

    This publication contains both a teacher edition and a student edition of materials for a course in graphic arts that covers the process camera, stripping, and platemaking. The course introduces basic concepts and skills necessary for entry-level employment in a graphic communication occupation. The contents of the materials are tied to measurable…

  5. Graphic Arts: Process Camera, Stripping, and Platemaking. Fourth Edition. Teacher Edition [and] Student Edition.

    ERIC Educational Resources Information Center

    Multistate Academic and Vocational Curriculum Consortium, Stillwater, OK.

    This publication contains both a teacher edition and a student edition of materials for a course in graphic arts that covers the process camera, stripping, and platemaking. The course introduces basic concepts and skills necessary for entry-level employment in a graphic communication occupation. The contents of the materials are tied to measurable…

  6. Real-time radar signal processing using GPGPU (general-purpose graphic processing unit)

    NASA Astrophysics Data System (ADS)

    Kong, Fanxing; Zhang, Yan Rockee; Cai, Jingxiao; Palmer, Robert D.

    2016-05-01

    This study introduces a practical approach to develop real-time signal processing chain for general phased array radar on NVIDIA GPUs(Graphical Processing Units) using CUDA (Compute Unified Device Architecture) libraries such as cuBlas and cuFFT, which are adopted from open source libraries and optimized for the NVIDIA GPUs. The processed results are rigorously verified against those from the CPUs. Performance benchmarked in computation time with various input data cube sizes are compared across GPUs and CPUs. Through the analysis, it will be demonstrated that GPGPUs (General Purpose GPU) real-time processing of the array radar data is possible with relatively low-cost commercial GPUs.

  7. High Speed Data Processing for Imaging MS-Based Molecular Histology Using Graphical Processing Units

    NASA Astrophysics Data System (ADS)

    Jones, Emrys A.; van Zeijl, René J. M.; Andrén, Per E.; Deelder, André M.; Wolters, Lex; McDonnell, Liam A.

    2012-04-01

    Imaging MS enables the distributions of hundreds of biomolecular ions to be determined directly from tissue samples. The application of multivariate methods, to identify pixels possessing correlated MS profiles, is referred to as molecular histology as tissues can be annotated on the basis of the MS profiles. The application of imaging MS-based molecular histology to larger tissue series, for clinical applications, requires significantly increased computational capacity in order to efficiently analyze the very large, highly dimensional datasets. Such datasets are highly suited to processing using graphical processor units, a very cost-effective solution for high speed processing. Here we demonstrate up to 13× speed improvements for imaging MS-based molecular histology using off-the-shelf components, and demonstrate equivalence with CPU based calculations. It is then discussed how imaging MS investigations may be designed to fully exploit the high speed of graphical processor units.

  8. Software Graphics Processing Unit (sGPU) for Deep Space Applications

    NASA Technical Reports Server (NTRS)

    McCabe, Mary; Salazar, George; Steele, Glen

    2015-01-01

    A graphics processing capability will be required for deep space missions and must include a range of applications, from safety-critical vehicle health status to telemedicine for crew health. However, preliminary radiation testing of commercial graphics processing cards suggest they cannot operate in the deep space radiation environment. Investigation into an Software Graphics Processing Unit (sGPU)comprised of commercial-equivalent radiation hardened/tolerant single board computers, field programmable gate arrays, and safety-critical display software shows promising results. Preliminary performance of approximately 30 frames per second (FPS) has been achieved. Use of multi-core processors may provide a significant increase in performance.

  9. Commercial Off-The-Shelf (COTS) Graphics Processing Board (GPB) Radiation Test Evaluation Report

    NASA Technical Reports Server (NTRS)

    Salazar, George A.; Steele, Glen F.

    2013-01-01

    Large round trip communications latency for deep space missions will require more onboard computational capabilities to enable the space vehicle to undertake many tasks that have traditionally been ground-based, mission control responsibilities. As a result, visual display graphics will be required to provide simpler vehicle situational awareness through graphical representations, as well as provide capabilities never before done in a space mission, such as augmented reality for in-flight maintenance or Telepresence activities. These capabilities will require graphics processors and associated support electronic components for high computational graphics processing. In an effort to understand the performance of commercial graphics card electronics operating in the expected radiation environment, a preliminary test was performed on five commercial offthe- shelf (COTS) graphics cards. This paper discusses the preliminary evaluation test results of five COTS graphics processing cards tested to the International Space Station (ISS) low earth orbit radiation environment. Three of the five graphics cards were tested to a total dose of 6000 rads (Si). The test articles, test configuration, preliminary results, and recommendations are discussed.

  10. Grace: A cross-platform micromagnetic simulator on graphics processing units

    NASA Astrophysics Data System (ADS)

    Zhu, Ru

    2015-12-01

    A micromagnetic simulator running on graphics processing units (GPUs) is presented. Different from GPU implementations of other research groups which are predominantly running on NVidia's CUDA platform, this simulator is developed with C++ Accelerated Massive Parallelism (C++ AMP) and is hardware platform independent. It runs on GPUs from venders including NVidia, AMD and Intel, and achieves significant performance boost as compared to previous central processing unit (CPU) simulators, up to two orders of magnitude. The simulator paved the way for running large size micromagnetic simulations on both high-end workstations with dedicated graphics cards and low-end personal computers with integrated graphics cards, and is freely available to download.

  11. Harnessing graphics processing units for improved neuroimaging statistics.

    PubMed

    Eklund, Anders; Villani, Mattias; Laconte, Stephen M

    2013-09-01

    Simple models and algorithms based on restrictive assumptions are often used in the field of neuroimaging for studies involving functional magnetic resonance imaging, voxel based morphometry, and diffusion tensor imaging. Nonparametric statistical methods or flexible Bayesian models can be applied rather easily to yield more trustworthy results. The spatial normalization step required for multisubject studies can also be improved by taking advantage of more robust algorithms for image registration. A common drawback of algorithms based on weaker assumptions, however, is the increase in computational complexity. In this short overview, we will therefore present some examples of how inexpensive PC graphics hardware, normally used for demanding computer games, can be used to enable practical use of more realistic models and accurate algorithms, such that the outcome of neuroimaging studies really can be trusted.

  12. Mobile Devices and GPU Parallelism in Ionospheric Data Processing

    NASA Astrophysics Data System (ADS)

    Mascharka, D.; Pankratius, V.

    2015-12-01

    Scientific data acquisition in the field is often constrained by data transfer backchannels to analysis environments. Geoscientists are therefore facing practical bottlenecks with increasing sensor density and variety. Mobile devices, such as smartphones and tablets, offer promising solutions to key problems in scientific data acquisition, pre-processing, and validation by providing advanced capabilities in the field. This is due to affordable network connectivity options and the increasing mobile computational power. This contribution exemplifies a scenario faced by scientists in the field and presents the "Mahali TEC Processing App" developed in the context of the NSF-funded Mahali project. Aimed at atmospheric science and the study of ionospheric Total Electron Content (TEC), this app is able to gather data from various dual-frequency GPS receivers. It demonstrates parsing of full-day RINEX files on mobile devices and on-the-fly computation of vertical TEC values based on satellite ephemeris models that are obtained from NASA. Our experiments show how parallel computing on the mobile device GPU enables fast processing and visualization of up to 2 million datapoints in real-time using OpenGL. GPS receiver bias is estimated through minimum TEC approximations that can be interactively adjusted by scientists in the graphical user interface. Scientists can also perform approximate computations for "quickviews" to reduce CPU processing time and memory consumption. In the final stage of our mobile processing pipeline, scientists can upload data to the cloud for further processing. Acknowledgements: The Mahali project (http://mahali.mit.edu) is funded by the NSF INSPIRE grant no. AGS-1343967 (PI: V. Pankratius). We would like to acknowledge our collaborators at Boston College, Virginia Tech, Johns Hopkins University, Colorado State University, as well as the support of UNAVCO for loans of dual-frequency GPS receivers for use in this project, and Intel for loans of

  13. Graphics processing unit-based high-frame-rate color Doppler ultrasound processing.

    PubMed

    Chang, Li-Wen; Hsu, Ke-Hsin; Li, Pai-Chi

    2009-09-01

    Color Doppler ultrasound is a routinely used diagnostic tool for assessing blood flow information in real time. The required signal processing is computationally intensive, involving autocorrelation, linear filtering, median filtering, and thresholding. Because of the large amount of data and high computational requirement, color Doppler signal processing has been mainly implemented on custom-designed hardware, with software-based implementation--particularly on a general-purpose CPU--not being successful. In this paper, we describe the use of a graphics processing unit for implementing signal-processing algorithms for color Doppler ultrasound that achieves a frame rate of 160 fps for frames comprising 500 scan lines x 128 range samples, with each scan line being obtained from an ensemble size of 8 with an 8-tap FIR clutter filter.

  14. Development of a Flow Solver with Complex Kinetics on the Graphic Processing Units

    DTIC Science & Technology

    2011-09-22

    Physics 109, 11 (2011), 113308. [9] Klockner, A., Warburton, T., Bridge, J., and Hesthaven, J. Nodal Discontinuous Galerkin Methods on Graphics...Graphic Processing Units ( GPU ) to model reactive gas mixture with detailed chemical kinetics. The solver incorporates high-order finite volume methods...method. We explored different approaches in implementing a fast kinetics solver on the GPU . The detail of the implementation is discussed in the

  15. iMOSFLM: a new graphical interface for diffraction-image processing with MOSFLM

    SciTech Connect

    Battye, T. Geoff G.; Kontogiannis, Luke; Johnson, Owen; Powell, Harold R.; Leslie, Andrew G. W.

    2011-04-01

    A new graphical user interface to the MOSFLM program has been developed to simplify the processing of macromolecular diffraction data. The interface, iMOSFLM, allows data processing via a series of clearly defined tasks and provides visual feedback on the progress of each stage. iMOSFLM is a graphical user interface to the diffraction data-integration program MOSFLM. It is designed to simplify data processing by dividing the process into a series of steps, which are normally carried out sequentially. Each step has its own display pane, allowing control over parameters that influence that step and providing graphical feedback to the user. Suitable values for integration parameters are set automatically, but additional menus provide a detailed level of control for experienced users. The image display and the interfaces to the different tasks (indexing, strategy calculation, cell refinement, integration and history) are described. The most important parameters for each step and the best way of assessing success or failure are discussed.

  16. Student Thinking Processes While Constructing Graphic Representations of Textbook Content: What Insights Do Think-Alouds Provide?

    ERIC Educational Resources Information Center

    Scott, D. Beth; Dreher, Mariam Jean

    2016-01-01

    This study examined the thinking processes students engage in while constructing graphic representations of textbook content. Twenty-eight students who either used graphic representations in a routine manner during social studies instruction or learned to construct graphic representations based on the rhetorical patterns used to organize textbook…

  17. Student Thinking Processes While Constructing Graphic Representations of Textbook Content: What Insights Do Think-Alouds Provide?

    ERIC Educational Resources Information Center

    Scott, D. Beth; Dreher, Mariam Jean

    2016-01-01

    This study examined the thinking processes students engage in while constructing graphic representations of textbook content. Twenty-eight students who either used graphic representations in a routine manner during social studies instruction or learned to construct graphic representations based on the rhetorical patterns used to organize textbook…

  18. Acceleration of integral imaging based incoherent Fourier hologram capture using graphic processing unit.

    PubMed

    Jeong, Kyeong-Min; Kim, Hee-Seung; Hong, Sung-In; Lee, Sung-Keun; Jo, Na-Young; Kim, Yong-Soo; Lim, Hong-Gi; Park, Jae-Hyeung

    2012-10-08

    Speed enhancement of integral imaging based incoherent Fourier hologram capture using a graphic processing unit is reported. Integral imaging based method enables exact hologram capture of real-existing three-dimensional objects under regular incoherent illumination. In our implementation, we apply parallel computation scheme using the graphic processing unit, accelerating the processing speed. Using enhanced speed of hologram capture, we also implement a pseudo real-time hologram capture and optical reconstruction system. The overall operation speed is measured to be 1 frame per second.

  19. High throughput transmission optical projection tomography using low cost graphics processing unit.

    PubMed

    Vinegoni, Claudio; Fexon, Lyuba; Feruglio, Paolo Fumene; Pivovarov, Misha; Figueiredo, Jose-Luiz; Nahrendorf, Matthias; Pozzo, Antonio; Sbarbati, Andrea; Weissleder, Ralph

    2009-12-07

    We implement the use of a graphics processing unit (GPU) in order to achieve real time data processing for high-throughput transmission optical projection tomography imaging. By implementing the GPU we have obtained a 300 fold performance enhancement in comparison to a CPU workstation implementation. This enables to obtain on-the-fly reconstructions enabling for high throughput imaging.

  20. Parallel computing with graphics processing units for high-speed Monte Carlo simulation of photon migration.

    PubMed

    Alerstam, Erik; Svensson, Tomas; Andersson-Engels, Stefan

    2008-01-01

    General-purpose computing on graphics processing units (GPGPU) is shown to dramatically increase the speed of Monte Carlo simulations of photon migration. In a standard simulation of time-resolved photon migration in a semi-infinite geometry, the proposed methodology executed on a low-cost graphics processing unit (GPU) is a factor 1000 faster than simulation performed on a single standard processor. In addition, we address important technical aspects of GPU-based simulations of photon migration. The technique is expected to become a standard method in Monte Carlo simulations of photon migration.

  1. ReaDDyMM: Fast Interacting Particle Reaction-Diffusion Simulations Using Graphical Processing Units

    PubMed Central

    Biedermann, Johann; Ullrich, Alexander; Schöneberg, Johannes; Noé, Frank

    2015-01-01

    ReaDDy is a modular particle simulation package combining off-lattice reaction kinetics with arbitrary particle interaction forces. Here we present a graphical processing unit implementation of ReaDDy that employs the fast multiplatform molecular dynamics package OpenMM. A speedup of up to two orders of magnitude is demonstrated, giving us access to timescales of multiple seconds on single graphical processing units. This opens up the possibility of simulating cellular signal transduction events while resolving all protein copies. PMID:25650912

  2. Graphical user interface for image acquisition and processing

    DOEpatents

    Goldberg, Kenneth A.

    2002-01-01

    An event-driven GUI-based image acquisition interface for the IDL programming environment designed for CCD camera control and image acquisition directly into the IDL environment where image manipulation and data analysis can be performed, and a toolbox of real-time analysis applications. Running the image acquisition hardware directly from IDL removes the necessity of first saving images in one program and then importing the data into IDL for analysis in a second step. Bringing the data directly into IDL creates an opportunity for the implementation of IDL image processing and display functions in real-time. program allows control over the available charge coupled device (CCD) detector parameters, data acquisition, file saving and loading, and image manipulation and processing, all from within IDL. The program is built using IDL's widget libraries to control the on-screen display and user interface.

  3. Accelerating Malware Detection via a Graphics Processing Unit

    DTIC Science & Technology

    2010-09-01

    Processing Unit . . . . . . . . . . . . . . . . . . 4 PE Portable Executable . . . . . . . . . . . . . . . . . . . . . 4 COFF Common Object File Format...operating systems for the future [Szo05]. The PE format is an updated version of the common object file format ( COFF ) [Mic06]. Microsoft released a new...pro.mspx, Accessed July 2010, 2001. 79 Mic06. Microsoft. Common object file format ( coff ). MSDN, November 2006. Re- vision 4.1. Mic07a. Microsoft

  4. Text and Illustration Processing System (TIPS) User’s Manual. Volume II. Graphics Processing System.

    DTIC Science & Technology

    1981-08-01

    revers DD ’jAN 1473 EDITION OF I NOV 4? it OUSOLETE Unclasif ed S/N 0102- LFO014-6 01 SECURITY CLAWS rICATION Of’ THIS PA4E Cumin 1=3PIN Uncl assi...eliminating unwanted edge data, it permits any rectangular area of a graphic to be extracted into a separate graphic file. Any graphic file may be input to

  5. Discrete-Event Execution Alternatives on General Purpose Graphical Processing Units

    SciTech Connect

    Perumalla, Kalyan S

    2006-01-01

    Graphics cards, traditionally designed as accelerators for computer graphics, have evolved to support more general-purpose computation. General Purpose Graphical Processing Units (GPGPUs) are now being used as highly efficient, cost-effective platforms for executing certain simulation applications. While most of these applications belong to the category of time-stepped simulations, little is known about the applicability of GPGPUs to discrete event simulation (DES). Here, we identify some of the issues & challenges that the GPGPU stream-based interface raises for DES, and present some possible approaches to moving DES to GPGPUs. Initial performance results on simulation of a diffusion process show that DES-style execution on GPGPU runs faster than DES on CPU and also significantly faster than time-stepped simulations on either CPU or GPGPU.

  6. A graphically oriented specification language for automatic code generation. GRASP/Ada: A Graphical Representation of Algorithms, Structure, and Processes for Ada, phase 1

    NASA Technical Reports Server (NTRS)

    Cross, James H., II; Morrison, Kelly I.; May, Charles H., Jr.; Waddel, Kathryn C.

    1989-01-01

    The first phase of a three-phase effort to develop a new graphically oriented specification language which will facilitate the reverse engineering of Ada source code into graphical representations (GRs) as well as the automatic generation of Ada source code is described. A simplified view of the three phases of Graphical Representations for Algorithms, Structure, and Processes for Ada (GRASP/Ada) with respect to three basic classes of GRs is presented. Phase 1 concentrated on the derivation of an algorithmic diagram, the control structure diagram (CSD) (CRO88a) from Ada source code or Ada PDL. Phase 2 includes the generation of architectural and system level diagrams such as structure charts and data flow diagrams and should result in a requirements specification for a graphically oriented language able to support automatic code generation. Phase 3 will concentrate on the development of a prototype to demonstrate the feasibility of this new specification language.

  7. Validation of Mobility of Pedestrians with Low Vision Using Graphic Floor Signs and Voice Guides.

    PubMed

    Omori, Kiyohiro; Yanagihara, Takao; Kitagawa, Hiroshi; Ikeda, Norihiro

    2015-01-01

    Some people with low vision or elderly persons tend to walk while watching a nearby floor, therefore, they often overlook or hard to read suspended signs. In this study, we propose two kinds of voice guides, and an experiment is conducted by participants with low vision using these voice guides and graphic floor signs in order to investigate effectiveness of these combinations. In clock position method (CP), each direction of near facilities are described in using an analogy of a 12-hour clock. Meanwhile, in numbering method (NU), near facilities are put the number in clockwise order, however, each direction are only illustrated in a crossing sign. As a result of an experiment, it is showed that both voice guides are effective for pedestrians with low vision. NU is used as a complement of graphic floor signs. Meanwhile, CP is used independently with graphic floor signs, however, there is a risk in the case of using in the environment where pedestrians are easy to mistake the reference direction defined by the sounding speaker.

  8. [Mobile phone-computer wireless interactive graphics transmission technology and its medical application].

    PubMed

    Huang, Shuo; Liu, Jing

    2010-05-01

    Application of clinical digital medical imaging has raised many tough issues to tackle, such as data storage, management, and information sharing. Here we investigated a mobile phone based medical image management system which is capable of achieving personal medical imaging information storage, management and comprehensive health information analysis. The technologies related to the management system spanning the wireless transmission technology, the technical capabilities of phone in mobile health care and management of mobile medical database were discussed. Taking medical infrared images transmission between phone and computer as an example, the working principle of the present system was demonstrated.

  9. Embedded-Based Graphics Processing Unit Cluster Platform for Multiple Sequence Alignments

    PubMed Central

    Wei, Jyh-Da; Cheng, Hui-Jun; Lin, Chun-Yuan; Ye, Jin; Yeh, Kuan-Yu

    2017-01-01

    High-end graphics processing units (GPUs), such as NVIDIA Tesla/Fermi/Kepler series cards with thousands of cores per chip, are widely applied to high-performance computing fields in a decade. These desktop GPU cards should be installed in personal computers/servers with desktop CPUs, and the cost and power consumption of constructing a GPU cluster platform are very high. In recent years, NVIDIA releases an embedded board, called Jetson Tegra K1 (TK1), which contains 4 ARM Cortex-A15 CPUs and 192 Compute Unified Device Architecture cores (belong to Kepler GPUs). Jetson Tegra K1 has several advantages, such as the low cost, low power consumption, and high applicability, and it has been applied into several specific applications. In our previous work, a bioinformatics platform with a single TK1 (STK platform) was constructed, and this previous work is also used to prove that the Web and mobile services can be implemented in the STK platform with a good cost-performance ratio by comparing a STK platform with the desktop CPU and GPU. In this work, an embedded-based GPU cluster platform will be constructed with multiple TK1s (MTK platform). Complex system installation and setup are necessary procedures at first. Then, 2 job assignment modes are designed for the MTK platform to provide services for users. Finally, ClustalW v2.0.11 and ClustalWtk will be ported to the MTK platform. The experimental results showed that the speedup ratios achieved 5.5 and 4.8 times for ClustalW v2.0.11 and ClustalWtk, respectively, by comparing 6 TK1s with a single TK1. The MTK platform is proven to be useful for multiple sequence alignments. PMID:28835734

  10. Embedded-Based Graphics Processing Unit Cluster Platform for Multiple Sequence Alignments.

    PubMed

    Wei, Jyh-Da; Cheng, Hui-Jun; Lin, Chun-Yuan; Ye, Jin; Yeh, Kuan-Yu

    2017-01-01

    High-end graphics processing units (GPUs), such as NVIDIA Tesla/Fermi/Kepler series cards with thousands of cores per chip, are widely applied to high-performance computing fields in a decade. These desktop GPU cards should be installed in personal computers/servers with desktop CPUs, and the cost and power consumption of constructing a GPU cluster platform are very high. In recent years, NVIDIA releases an embedded board, called Jetson Tegra K1 (TK1), which contains 4 ARM Cortex-A15 CPUs and 192 Compute Unified Device Architecture cores (belong to Kepler GPUs). Jetson Tegra K1 has several advantages, such as the low cost, low power consumption, and high applicability, and it has been applied into several specific applications. In our previous work, a bioinformatics platform with a single TK1 (STK platform) was constructed, and this previous work is also used to prove that the Web and mobile services can be implemented in the STK platform with a good cost-performance ratio by comparing a STK platform with the desktop CPU and GPU. In this work, an embedded-based GPU cluster platform will be constructed with multiple TK1s (MTK platform). Complex system installation and setup are necessary procedures at first. Then, 2 job assignment modes are designed for the MTK platform to provide services for users. Finally, ClustalW v2.0.11 and ClustalWtk will be ported to the MTK platform. The experimental results showed that the speedup ratios achieved 5.5 and 4.8 times for ClustalW v2.0.11 and ClustalWtk, respectively, by comparing 6 TK1s with a single TK1. The MTK platform is proven to be useful for multiple sequence alignments.

  11. Graphic model of the processes involved in the production of casegood furniture

    Treesearch

    Kristen G. Hoff; Subhash C. Sarin; R. Bruce. Anderson; R. Bruce. Anderson

    1992-01-01

    Imports from foreign furniture manufacturers are on ,the rise, and American manufacturers must take advantage of recent technological advances to regain their lost market share. To facilitate the implementation of these technologies for improving productivity and quality, a graphic model of the wood furniture production process is presented using the IDEF modeling...

  12. Parallelized CCHE2D flow model with CUDA Fortran on Graphics Process Units

    USDA-ARS?s Scientific Manuscript database

    This paper presents the CCHE2D implicit flow model parallelized using CUDA Fortran programming technique on Graphics Processing Units (GPUs). A parallelized implicit Alternating Direction Implicit (ADI) solver using Parallel Cyclic Reduction (PCR) algorithm on GPU is developed and tested. This solve...

  13. A DDC Bibliography on Optical or Graphic Information Processing (Information Sciences Series). Volume I.

    ERIC Educational Resources Information Center

    Defense Documentation Center, Alexandria, VA.

    This unclassified-unlimited bibliography contains 183 references, with abstracts, dealing specifically with optical or graphic information processing. Citations are grouped under three headings: display devices and theory, character recognition, and pattern recognition. Within each group, they are arranged in accession number (AD-number) sequence.…

  14. Evaluating the Performance of Processing Medical Volume Data on Graphics Hardware

    NASA Astrophysics Data System (ADS)

    Raspe, Matthias; Lorenz, Guido; Müller, Stefan

    With the broad availability and increasing performance of commodity graphics processors (GPU), non-graphical applications have become an active field of research. However, leveraging the performance for advanced applications combining hardware and software implementations is more than just efficient shader programming: the data transfer is often the main limiting factor. Therefore, we will investigate in the applicability of commodity graphics hardware for medical data processing, and propose a GPU-based framework for representing computations on volume data. Also, we will show the clear performance gain of different operations compared to CPU algorithms and discuss their context. Not only the much higher performance of hardware implementations is attractive, but also the fact that the computation results can be visualized directly, i.e., without introducing an overhead and thus allowing for literally interactive applications.

  15. Mesh-particle interpolations on graphics processing units and multicore central processing units.

    PubMed

    Rossinelli, Diego; Conti, Christian; Koumoutsakos, Petros

    2011-06-13

    Particle-mesh interpolations are fundamental operations for particle-in-cell codes, as implemented in vortex methods, plasma dynamics and electrostatics simulations. In these simulations, the mesh is used to solve the field equations and the gradients of the fields are used in order to advance the particles. The time integration of particle trajectories is performed through an extensive resampling of the flow field at the particle locations. The computational performance of this resampling turns out to be limited by the memory bandwidth of the underlying computer architecture. We investigate how mesh-particle interpolation can be efficiently performed on graphics processing units (GPUs) and multicore central processing units (CPUs), and we present two implementation techniques. The single-precision results for the multicore CPU implementation show an acceleration of 45-70×, depending on system size, and an acceleration of 85-155× for the GPU implementation over an efficient single-threaded C++ implementation. In double precision, we observe a performance improvement of 30-40× for the multicore CPU implementation and 20-45× for the GPU implementation. With respect to the 16-threaded standard C++ implementation, the present CPU technique leads to a performance increase of roughly 2.8-3.7× in single precision and 1.7-2.4× in double precision, whereas the GPU technique leads to an improvement of 9× in single precision and 2.2-2.8× in double precision.

  16. Fast blood flow visualization of high-resolution laser speckle imaging data using graphics processing unit.

    PubMed

    Liu, Shusen; Li, Pengcheng; Luo, Qingming

    2008-09-15

    Laser speckle contrast analysis (LASCA) is a non-invasive, full-field optical technique that produces two-dimensional map of blood flow in biological tissue by analyzing speckle images captured by CCD camera. Due to the heavy computation required for speckle contrast analysis, video frame rate visualization of blood flow which is essentially important for medical usage is hardly achieved for the high-resolution image data by using the CPU (Central Processing Unit) of an ordinary PC (Personal Computer). In this paper, we introduced GPU (Graphics Processing Unit) into our data processing framework of laser speckle contrast imaging to achieve fast and high-resolution blood flow visualization on PCs by exploiting the high floating-point processing power of commodity graphics hardware. By using GPU, a 12-60 fold performance enhancement is obtained in comparison to the optimized CPU implementations.

  17. Graphical enhancement to support PCA-based process monitoring and fault diagnosis.

    PubMed

    Ralston, Patricia; DePuy, Gail; Graham, James H

    2004-10-01

    Principal component analysis (PCA) for process modeling and multivariate statistical techniques for monitoring, fault detection, and diagnosis are becoming more common in published research, but are still underutilized in practice. This paper summarizes an in-depth case study on a chemical process with 20 monitored process variables, one of which reflects product quality. The analysis is performed using the PLS_Toolbox 2.01 with MATLAB, augmented with software which automates the analysis and implements a statistical enhancement that uses confidence limits on the residuals of each variable for fault detection rather than just confidence limits on an overall residual. The newly developed graphical interface identifies and displays each variable's contribution to the faulty behavior of the process; and it aids greatly in analyzing results. The case study analyzed within shows that using the statistical enhancement can reduce the fault detection time, and the automated graphical interface implements the enhancement easily.

  18. Graphics to H.264 video encoding for 3D scene representation and interaction on mobile devices using region of interest

    NASA Astrophysics Data System (ADS)

    Le, Minh Tuan; Nguyen, Congdu; Yoon, Dae-Il; Jung, Eun Ku; Jia, Jie; Kim, Hae-Kwang

    2007-12-01

    In this paper, we propose a method of 3D graphics to video encoding and streaming that are embedded into a remote interactive 3D visualization system for rapidly representing a 3D scene on mobile devices without having to download it from the server. In particular, a 3D graphics to video framework is presented that increases the visual quality of regions of interest (ROI) of the video by performing more bit allocation to ROI during H.264 video encoding. The ROI are identified by projection 3D objects to a 2D plane during rasterization. The system offers users to navigate the 3D scene and interact with objects of interests for querying their descriptions. We developed an adaptive media streaming server that can provide an adaptive video stream in term of object-based quality to the client according to the user's preferences and the variation of network bandwidth. Results show that by doing ROI mode selection, PSNR of test sample slightly change while visual quality of objects increases evidently.

  19. GPU-powered Shotgun Stochastic Search for Dirichlet process mixtures of Gaussian Graphical Models

    PubMed Central

    Mukherjee, Chiranjit; Rodriguez, Abel

    2016-01-01

    Gaussian graphical models are popular for modeling high-dimensional multivariate data with sparse conditional dependencies. A mixture of Gaussian graphical models extends this model to the more realistic scenario where observations come from a heterogenous population composed of a small number of homogeneous sub-groups. In this paper we present a novel stochastic search algorithm for finding the posterior mode of high-dimensional Dirichlet process mixtures of decomposable Gaussian graphical models. Further, we investigate how to harness the massive thread-parallelization capabilities of graphical processing units to accelerate computation. The computational advantages of our algorithms are demonstrated with various simulated data examples in which we compare our stochastic search with a Markov chain Monte Carlo algorithm in moderate dimensional data examples. These experiments show that our stochastic search largely outperforms the Markov chain Monte Carlo algorithm in terms of computing-times and in terms of the quality of the posterior mode discovered. Finally, we analyze a gene expression dataset in which Markov chain Monte Carlo algorithms are too slow to be practically useful. PMID:28626348

  20. GPU-powered Shotgun Stochastic Search for Dirichlet process mixtures of Gaussian Graphical Models.

    PubMed

    Mukherjee, Chiranjit; Rodriguez, Abel

    2016-01-01

    Gaussian graphical models are popular for modeling high-dimensional multivariate data with sparse conditional dependencies. A mixture of Gaussian graphical models extends this model to the more realistic scenario where observations come from a heterogenous population composed of a small number of homogeneous sub-groups. In this paper we present a novel stochastic search algorithm for finding the posterior mode of high-dimensional Dirichlet process mixtures of decomposable Gaussian graphical models. Further, we investigate how to harness the massive thread-parallelization capabilities of graphical processing units to accelerate computation. The computational advantages of our algorithms are demonstrated with various simulated data examples in which we compare our stochastic search with a Markov chain Monte Carlo algorithm in moderate dimensional data examples. These experiments show that our stochastic search largely outperforms the Markov chain Monte Carlo algorithm in terms of computing-times and in terms of the quality of the posterior mode discovered. Finally, we analyze a gene expression dataset in which Markov chain Monte Carlo algorithms are too slow to be practically useful.

  1. Use of a graphics processing unit (GPU) to facilitate real-time 3D graphic presentation of the patient skin-dose distribution during fluoroscopic interventional procedures

    PubMed Central

    Rana, Vijay; Rudin, Stephen; Bednarek, Daniel R.

    2012-01-01

    We have developed a dose-tracking system (DTS) that calculates the radiation dose to the patient’s skin in real-time by acquiring exposure parameters and imaging-system-geometry from the digital bus on a Toshiba Infinix C-arm unit. The cumulative dose values are then displayed as a color map on an OpenGL-based 3D graphic of the patient for immediate feedback to the interventionalist. Determination of those elements on the surface of the patient 3D-graphic that intersect the beam and calculation of the dose for these elements in real time demands fast computation. Reducing the size of the elements results in more computation load on the computer processor and therefore a tradeoff occurs between the resolution of the patient graphic and the real-time performance of the DTS. The speed of the DTS for calculating dose to the skin is limited by the central processing unit (CPU) and can be improved by using the parallel processing power of a graphics processing unit (GPU). Here, we compare the performance speed of GPU-based DTS software to that of the current CPU-based software as a function of the resolution of the patient graphics. Results show a tremendous improvement in speed using the GPU. While an increase in the spatial resolution of the patient graphics resulted in slowing down the computational speed of the DTS on the CPU, the speed of the GPU-based DTS was hardly affected. This GPU-based DTS can be a powerful tool for providing accurate, real-time feedback about patient skin-dose to physicians while performing interventional procedures. PMID:24027616

  2. Use of a graphics processing unit (GPU) to facilitate real-time 3D graphic presentation of the patient skin-dose distribution during fluoroscopic interventional procedures.

    PubMed

    Rana, Vijay; Rudin, Stephen; Bednarek, Daniel R

    2012-02-23

    We have developed a dose-tracking system (DTS) that calculates the radiation dose to the patient's skin in real-time by acquiring exposure parameters and imaging-system-geometry from the digital bus on a Toshiba Infinix C-arm unit. The cumulative dose values are then displayed as a color map on an OpenGL-based 3D graphic of the patient for immediate feedback to the interventionalist. Determination of those elements on the surface of the patient 3D-graphic that intersect the beam and calculation of the dose for these elements in real time demands fast computation. Reducing the size of the elements results in more computation load on the computer processor and therefore a tradeoff occurs between the resolution of the patient graphic and the real-time performance of the DTS. The speed of the DTS for calculating dose to the skin is limited by the central processing unit (CPU) and can be improved by using the parallel processing power of a graphics processing unit (GPU). Here, we compare the performance speed of GPU-based DTS software to that of the current CPU-based software as a function of the resolution of the patient graphics. Results show a tremendous improvement in speed using the GPU. While an increase in the spatial resolution of the patient graphics resulted in slowing down the computational speed of the DTS on the CPU, the speed of the GPU-based DTS was hardly affected. This GPU-based DTS can be a powerful tool for providing accurate, real-time feedback about patient skin-dose to physicians while performing interventional procedures.

  3. Use of a graphics processing unit (GPU) to facilitate real-time 3D graphic presentation of the patient skin-dose distribution during fluoroscopic interventional procedures

    NASA Astrophysics Data System (ADS)

    Rana, Vijay; Rudin, Stephen; Bednarek, Daniel R.

    2012-03-01

    We have developed a dose-tracking system (DTS) that calculates the radiation dose to the patient's skin in realtime by acquiring exposure parameters and imaging-system-geometry from the digital bus on a Toshiba Infinix C-arm unit. The cumulative dose values are then displayed as a color map on an OpenGL-based 3D graphic of the patient for immediate feedback to the interventionalist. Determination of those elements on the surface of the patient 3D-graphic that intersect the beam and calculation of the dose for these elements in real time demands fast computation. Reducing the size of the elements results in more computation load on the computer processor and therefore a tradeoff occurs between the resolution of the patient graphic and the real-time performance of the DTS. The speed of the DTS for calculating dose to the skin is limited by the central processing unit (CPU) and can be improved by using the parallel processing power of a graphics processing unit (GPU). Here, we compare the performance speed of GPU-based DTS software to that of the current CPU-based software as a function of the resolution of the patient graphics. Results show a tremendous improvement in speed using the GPU. While an increase in the spatial resolution of the patient graphics resulted in slowing down the computational speed of the DTS on the CPU, the speed of the GPU-based DTS was hardly affected. This GPU-based DTS can be a powerful tool for providing accurate, real-time feedback about patient skin-dose to physicians while performing interventional procedures.

  4. Parallel computing for simultaneous iterative tomographic imaging by graphics processing units

    NASA Astrophysics Data System (ADS)

    Bello-Maldonado, Pedro D.; López, Ricardo; Rogers, Colleen; Jin, Yuanwei; Lu, Enyue

    2016-05-01

    In this paper, we address the problem of accelerating inversion algorithms for nonlinear acoustic tomographic imaging by parallel computing on graphics processing units (GPUs). Nonlinear inversion algorithms for tomographic imaging often rely on iterative algorithms for solving an inverse problem, thus computationally intensive. We study the simultaneous iterative reconstruction technique (SIRT) for the multiple-input-multiple-output (MIMO) tomography algorithm which enables parallel computations of the grid points as well as the parallel execution of multiple source excitation. Using graphics processing units (GPUs) and the Compute Unified Device Architecture (CUDA) programming model an overall improvement of 26.33x was achieved when combining both approaches compared with sequential algorithms. Furthermore we propose an adaptive iterative relaxation factor and the use of non-uniform weights to improve the overall convergence of the algorithm. Using these techniques, fast computations can be performed in parallel without the loss of image quality during the reconstruction process.

  5. Graphics Processing Unit-Based Bioheat Simulation to Facilitate Rapid Decision Making Associated with Cryosurgery Training.

    PubMed

    Keelan, Robert; Zhang, Hong; Shimada, Kenji; Rabin, Yoed

    2016-04-01

    This study focuses on the implementation of an efficient numerical technique for cryosurgery simulations on a graphics processing unit as an alternative means to accelerate runtime. This study is part of an ongoing effort to develop computerized training tools for cryosurgery, with prostate cryosurgery as a developmental model. The ability to perform rapid simulations of various test cases is critical to facilitate sound decision making associated with medical training. Consistent with clinical practice, the training tool aims at correlating the frozen region contour and the corresponding temperature field with the target region shape. The current study focuses on the feasibility of graphics processing unit-based computation using C++ accelerated massive parallelism, as one possible implementation. Benchmark results on a variety of computation platforms display between 3-fold acceleration (laptop) and 13-fold acceleration (gaming computer) of cryosurgery simulation, in comparison with the more common implementation on a multicore central processing unit. While the general concept of graphics processing unit-based simulations is not new, its application to phase-change problems, combined with the unique requirements for cryosurgery optimization, represents the core contribution of the current study. © The Author(s) 2015.

  6. TRIIG - Time-lapse reproduction of images through interactive graphics. [digital processing of quality hard copy

    NASA Technical Reports Server (NTRS)

    Buckner, J. D.; Council, H. W.; Edwards, T. R.

    1974-01-01

    Description of the hardware and software implementing the system of time-lapse reproduction of images through interactive graphics (TRIIG). The system produces a quality hard copy of processed images in a fast and inexpensive manner. This capability allows for optimal development of processing software through the rapid viewing of many image frames in an interactive mode. Three critical optical devices are used to reproduce an image: an Optronics photo reader/writer, the Adage Graphics Terminal, and Polaroid Type 57 high speed film. Typical sources of digitized images are observation satellites, such as ERTS or Mariner, computer coupled electron microscopes for high-magnification studies, or computer coupled X-ray devices for medical research.

  7. Impact of memory bottleneck on the performance of graphics processing units

    NASA Astrophysics Data System (ADS)

    Son, Dong Oh; Choi, Hong Jun; Kim, Jong Myon; Kim, Cheol Hong

    2015-12-01

    Recent graphics processing units (GPUs) can process general-purpose applications as well as graphics applications with the help of various user-friendly application programming interfaces (APIs) supported by GPU vendors. Unfortunately, utilizing the hardware resource in the GPU efficiently is a challenging problem, since the GPU architecture is totally different to the traditional CPU architecture. To solve this problem, many studies have focused on the techniques for improving the system performance using GPUs. In this work, we analyze the GPU performance varying GPU parameters such as the number of cores and clock frequency. According to our simulations, the GPU performance can be improved by 125.8% and 16.2% on average as the number of cores and clock frequency increase, respectively. However, the performance is saturated when memory bottleneck problems incur due to huge data requests to the memory. The performance of GPUs can be improved as the memory bottleneck is reduced by changing GPU parameters dynamically.

  8. iMOSFLM: a new graphical interface for diffraction-image processing with MOSFLM

    PubMed Central

    Battye, T. Geoff G.; Kontogiannis, Luke; Johnson, Owen; Powell, Harold R.; Leslie, Andrew G. W.

    2011-01-01

    iMOSFLM is a graphical user interface to the diffraction data-integration program MOSFLM. It is designed to simplify data processing by dividing the process into a series of steps, which are normally carried out sequentially. Each step has its own display pane, allowing control over parameters that influence that step and providing graphical feedback to the user. Suitable values for integration parameters are set automatically, but additional menus provide a detailed level of control for experienced users. The image display and the interfaces to the different tasks (indexing, strategy calculation, cell refinement, integration and history) are described. The most important parameters for each step and the best way of assessing success or failure are discussed. PMID:21460445

  9. Finite Element Optimization for Nondestructive Evaluation on a Graphics Processing Unit for Ground Vehicle Hull Inspection

    DTIC Science & Technology

    2013-08-22

    Toledo , Toledo , OH 3 US Army TARDEC, Warren, MI UNCLASSIFIED: Distribution Statement A Approved for Public Release Finite Element Optimization for...2 University of Toledo , Toledo , OH 3 US Army TARDEC, Warren, MI This is a reprint of the paper presented under the same title in the...GRAPHICS PROCESSING UNIT FOR GROUND VEHICLE HULL INSPECTION Victor U. Karthik ECE Department Michigan State University East Lansing, MI 48824

  10. Signal processing for ION mobility spectrometers

    NASA Technical Reports Server (NTRS)

    Taylor, S.; Hinton, M.; Turner, R.

    1995-01-01

    Signal processing techniques for systems based upon Ion Mobility Spectrometry will be discussed in the light of 10 years of experience in the design of real-time IMS. Among the topics to be covered are compensation techniques for variations in the number density of the gas - the use of an internal standard (a reference peak) or pressure and temperature sensors. Sources of noise and methods for noise reduction will be discussed together with resolution limitations and the ability of deconvolution techniques to improve resolving power. The use of neural networks (either by themselves or as a component part of a processing system) will be reviewed.

  11. A Fast MHD Code for Gravitationally Stratified Media using Graphical Processing Units: SMAUG

    NASA Astrophysics Data System (ADS)

    Griffiths, M. K.; Fedun, V.; Erdélyi, R.

    2015-03-01

    Parallelization techniques have been exploited most successfully by the gaming/graphics industry with the adoption of graphical processing units (GPUs), possessing hundreds of processor cores. The opportunity has been recognized by the computational sciences and engineering communities, who have recently harnessed successfully the numerical performance of GPUs. For example, parallel magnetohydrodynamic (MHD) algorithms are important for numerical modelling of highly inhomogeneous solar, astrophysical and geophysical plasmas. Here, we describe the implementation of SMAUG, the Sheffield Magnetohydrodynamics Algorithm Using GPUs. SMAUG is a 1-3D MHD code capable of modelling magnetized and gravitationally stratified plasma. The objective of this paper is to present the numerical methods and techniques used for porting the code to this novel and highly parallel compute architecture. The methods employed are justified by the performance benchmarks and validation results demonstrating that the code successfully simulates the physics for a range of test scenarios including a full 3D realistic model of wave propagation in the solar atmosphere.

  12. Three-dimensional photoacoustic tomography based on graphics-processing-unit-accelerated finite element method.

    PubMed

    Peng, Kuan; He, Ling; Zhu, Ziqiang; Tang, Jingtian; Xiao, Jiaying

    2013-12-01

    Compared with commonly used analytical reconstruction methods, the frequency-domain finite element method (FEM) based approach has proven to be an accurate and flexible algorithm for photoacoustic tomography. However, the FEM-based algorithm is computationally demanding, especially for three-dimensional cases. To enhance the algorithm's efficiency, in this work a parallel computational strategy is implemented in the framework of the FEM-based reconstruction algorithm using a graphic-processing-unit parallel frame named the "compute unified device architecture." A series of simulation experiments is carried out to test the accuracy and accelerating effect of the improved method. The results obtained indicate that the parallel calculation does not change the accuracy of the reconstruction algorithm, while its computational cost is significantly reduced by a factor of 38.9 with a GTX 580 graphics card using the improved method.

  13. Graphics processing unit-based quantitative second-harmonic generation imaging.

    PubMed

    Kabir, Mohammad Mahfuzul; Jonayat, A S M; Patel, Sanjay; Toussaint, Kimani C

    2014-09-01

    We adapt a graphics processing unit (GPU) to dynamic quantitative second-harmonic generation imaging. We demonstrate the temporal advantage of the GPU-based approach by computing the number of frames analyzed per second from SHG image videos showing varying fiber orientations. In comparison to our previously reported CPU-based approach, our GPU-based image analysis results in ∼10× improvement in computational time. This work can be adapted to other quantitative, nonlinear imaging techniques and provides a significant step toward obtaining quantitative information from fast in vivo biological processes.

  14. Fast extended focused imaging in digital holography using a graphics processing unit.

    PubMed

    Wang, Le; Zhao, Jianlin; Di, Jianglei; Jiang, Hongzhen

    2011-05-01

    We present a simple and effective method for reconstructing extended focused images in digital holography using a graphics processing unit (GPU). The Fresnel transform method is simplified by an algorithm named fast Fourier transform pruning with frequency shift. Then the pixel size consistency problem is solved by coordinate transformation and combining the subpixel resampling and the fast Fourier transform pruning with frequency shift. With the assistance of the GPU, we implemented an improved parallel version of this method, which obtained about a 300-500-fold speedup compared with central processing unit codes.

  15. Fast high-resolution computer-generated hologram computation using multiple graphics processing unit cluster system.

    PubMed

    Takada, Naoki; Shimobaba, Tomoyoshi; Nakayama, Hirotaka; Shiraki, Atsushi; Okada, Naohisa; Oikawa, Minoru; Masuda, Nobuyuki; Ito, Tomoyoshi

    2012-10-20

    To overcome the computational complexity of a computer-generated hologram (CGH), we implement an optimized CGH computation in our multi-graphics processing unit cluster system. Our system can calculate a CGH of 6,400×3,072 pixels from a three-dimensional (3D) object composed of 2,048 points in 55 ms. Furthermore, in the case of a 3D object composed of 4096 points, our system is 553 times faster than a conventional central processing unit (using eight threads).

  16. Graphics processing unit-based quantitative second-harmonic generation imaging

    NASA Astrophysics Data System (ADS)

    Kabir, Mohammad Mahfuzul; Jonayat, ASM; Patel, Sanjay; Toussaint, Kimani C., Jr.

    2014-09-01

    We adapt a graphics processing unit (GPU) to dynamic quantitative second-harmonic generation imaging. We demonstrate the temporal advantage of the GPU-based approach by computing the number of frames analyzed per second from SHG image videos showing varying fiber orientations. In comparison to our previously reported CPU-based approach, our GPU-based image analysis results in ˜10× improvement in computational time. This work can be adapted to other quantitative, nonlinear imaging techniques and provides a significant step toward obtaining quantitative information from fast in vivo biological processes.

  17. Using a commercial graphical processing unit and the CUDA programming language to accelerate scientific image processing applications

    NASA Astrophysics Data System (ADS)

    Broussard, Randy P.; Ives, Robert W.

    2011-01-01

    In the past two years the processing power of video graphics cards has quadrupled and is approaching super computer levels. State-of-the-art graphical processing units (GPU) boast of theoretical computational performance in the range of 1.5 trillion floating point operations per second (1.5 Teraflops). This processing power is readily accessible to the scientific community at a relatively small cost. High level programming languages are now available that give access to the internal architecture of the graphics card allowing greater algorithm optimization. This research takes memory access expensive portions of an image-based iris identification algorithm and hosts it on a GPU using the C++ compatible CUDA language. The selected segmentation algorithm uses basic image processing techniques such as image inversion, value squaring, thresholding, dilation, erosion and memory/computationally intensive calculations such as the circular Hough transform. Portions of the iris segmentation algorithm were accelerated by a factor of 77 over the 2008 GPU results. Some parts of the algorithm ran at speeds that were over 1600 times faster than their CPU counterparts. Strengths and limitations of the GPU Single Instruction Multiple Data architecture are discussed. Memory access times, instruction execution times, programming details and code samples are presented as part of the research.

  18. Advanced colour processing for mobile devices

    NASA Astrophysics Data System (ADS)

    Gillich, Eugen; Dörksen, Helene; Lohweg, Volker

    2015-02-01

    Mobile devices such as smartphones are going to play an important role in professionally image processing tasks. However, mobile systems were not designed for such applications, especially in terms of image processing requirements like stability and robustness. One major drawback is the automatic white balance, which comes with the devices. It is necessary for many applications, but of no use when applied to shiny surfaces. Such an issue appears when image acquisition takes place in differently coloured illuminations caused by different environments. This results in inhomogeneous appearances of the same subject. In our paper we show a new approach for handling the complex task of generating a low-noise and sharp image without spatial filtering. Our method is based on the fact that we analyze the spectral and saturation distribution of the channels. Furthermore, the RGB space is transformed into a more convenient space, a particular HSI space. We generate the greyscale image by a control procedure that takes into account the colour channels. This leads in an adaptive colour mixing model with reduced noise. The results of the optimized images are used to show how, e. g., image classification benefits from our colour adaptation approach.

  19. Systems Biology Graphical Notation: Process Description language Level 1 Version 1.3.

    PubMed

    Moodie, Stuart; Le Novère, Nicolas; Demir, Emek; Mi, Huaiyu; Villéger, Alice

    2015-09-04

    The Systems Biological Graphical Notation (SBGN) is an international community effort for standardized graphical representations of biological pathways and networks. The goal of SBGN is to provide unambiguous pathway and network maps for readers with different scientific backgrounds as well as to support efficient and accurate exchange of biological knowledge between different research communities, industry, and other players in systems biology. Three SBGN languages, Process Description (PD), Entity Relationship (ER) and Activity Flow (AF), allow for the representation of different aspects of biological and biochemical systems at different levels of detail. The SBGN Process Description language represents biological entities and processes between these entities within a network. SBGN PD focuses on the mechanistic description and temporal dependencies of biological interactions and transformations. The nodes (elements) are split into entity nodes describing, e.g., metabolites, proteins, genes and complexes, and process nodes describing, e.g., reactions and associations. The edges (connections) provide descriptions of relationships (or influences) between the nodes, such as consumption, production, stimulation and inhibition. Among all three languages of SBGN, PD is the closest to metabolic and regulatory pathways in biological literature and textbooks, but its well-defined semantics offer a superior precision in expressing biological knowledge.

  20. Creating Interactive Graphical Overlays in the Advanced Weather Interactive Processing System Using Shapefiles and DGM Files

    NASA Technical Reports Server (NTRS)

    Barrett, Joe H., III; Lafosse, Richard; Hood, Doris; Hoeth, Brian

    2007-01-01

    Graphical overlays can be created in real-time in the Advanced Weather Interactive Processing System (AWIPS) using shapefiles or Denver AWIPS Risk Reduction and Requirements Evaluation (DARE) Graphics Metafile (DGM) files. This presentation describes how to create graphical overlays on-the-fly for AWIPS, by using two examples of AWIPS applications that were created by the Applied Meteorology Unit (AMU) located at Cape Canaveral Air Force Station (CCAFS), Florida. The first example is the Anvil Threat Corridor Forecast Tool, which produces a shapefile that depicts a graphical threat corridor of the forecast movement of thunderstorm anvil clouds, based on the observed or forecast upper-level winds. This tool is used by the Spaceflight Meteorology Group (SMG) at Johnson Space Center, Texas and 45th Weather Squadron (45 WS) at CCAFS to analyze the threat of natural or space vehicle-triggered lightning over a location. The second example is a launch and landing trajectory tool that produces a DGM file that plots the ground track of space vehicles during launch or landing. The trajectory tool can be used by SMG and the 45 WS forecasters to analyze weather radar imagery along a launch or landing trajectory. The presentation will list the advantages and disadvantages of both file types for creating interactive graphical overlays in future AWIPS applications. Shapefiles are a popular format used extensively in Geographical Information Systems. They are usually used in AWIPS to depict static map backgrounds. A shapefile stores the geometry and attribute information of spatial features in a dataset (ESRI 1998). Shapefiles can contain point, line, and polygon features. Each shapefile contains a main file, index file, and a dBASE table. The main file contains a record for each spatial feature, which describes the feature with a list of its vertices. The index file contains the offset of each record from the beginning of the main file. The dBASE table contains records for each

  1. Employing OpenCL to Accelerate Ab Initio Calculations on Graphics Processing Units.

    PubMed

    Kussmann, Jörg; Ochsenfeld, Christian

    2017-06-13

    We present an extension of our graphics processing units (GPU)-accelerated quantum chemistry package to employ OpenCL compute kernels, which can be executed on a wide range of computing devices like CPUs, Intel Xeon Phi, and AMD GPUs. Here, we focus on the use of AMD GPUs and discuss differences as compared to CUDA-based calculations on NVIDIA GPUs. First illustrative timings are presented for hybrid density functional theory calculations using serial as well as parallel compute environments. The results show that AMD GPUs are as fast or faster than comparable NVIDIA GPUs and provide a viable alternative for quantum chemical applications.

  2. Accelerating the Gillespie τ-Leaping Method Using Graphics Processing Units

    PubMed Central

    Komarov, Ivan; D’Souza, Roshan M.; Tapia, Jose-Juan

    2012-01-01

    The Gillespie τ-Leaping Method is an approximate algorithm that is faster than the exact Direct Method (DM) due to the progression of the simulation with larger time steps. However, the procedure to compute the time leap τ is quite expensive. In this paper, we explore the acceleration of the τ-Leaping Method using Graphics Processing Unit (GPUs) for ultra-large networks ( reaction channels). We have developed data structures and algorithms that take advantage of the unique hardware architecture and available libraries. Our results show that we obtain a performance gain of over 60x when compared with the best conventional implementations. PMID:22715366

  3. Uncontracted Rys Quadrature Implementation of up to G Functions on Graphical Processing Units.

    PubMed

    Asadchev, Andrey; Allada, Veerendra; Felder, Jacob; Bode, Brett M; Gordon, Mark S; Windus, Theresa L

    2010-03-09

    An implementation is presented of an uncontracted Rys quadrature algorithm for electron repulsion integrals, including up to g functions on graphical processing units (GPUs). The general GPU programming model, the challenges associated with implementing the Rys quadrature on these highly parallel emerging architectures, and a new approach to implementing the quadrature are outlined. The performance of the implementation is evaluated for single and double precision on two different types of GPU devices. The performance obtained is on par with the matrix-vector routine from the CUDA basic linear algebra subroutines (CUBLAS) library.

  4. Monte Carlo Simulations of Random Frustrated Systems on Graphics Processing Units

    NASA Astrophysics Data System (ADS)

    Feng, Sheng; Fang, Ye; Hall, Sean; Papke, Ariane; Thomasson, Cade; Tam, Ka-Ming; Moreno, Juana; Jarrell, Mark

    2012-02-01

    We study the implementation of the classical Monte Carlo simulation for random frustrated models using the multithreaded computing environment provided by the the Compute Unified Device Architecture (CUDA) on modern Graphics Processing Units (GPU) with hundreds of cores and high memory bandwidth. The key for optimizing the performance of the GPU computing is in the proper handling of the data structure. Utilizing the multi-spin coding, we obtain an efficient GPU implementation of the parallel tempering Monte Carlo simulation for the Edwards-Anderson spin glass model. In the typical simulations, we find over two thousand times of speed-up over the single threaded CPU implementation.

  5. Graphics processing unit implementation of lattice Boltzmann models for flowing soft systems.

    PubMed

    Bernaschi, Massimo; Rossi, Ludovico; Benzi, Roberto; Sbragaglia, Mauro; Succi, Sauro

    2009-12-01

    A graphic processing unit (GPU) implementation of the multicomponent lattice Boltzmann equation with multirange interactions for soft-glassy materials ["glassy" lattice Boltzmann (LB)] is presented. Performance measurements for flows under shear indicate a GPU/CPU speed up in excess of 10 for 1024(2) grids. Such significant speed up permits to carry out multimillion time-steps simulations of 1024(2) grids within tens of hours of GPU time, thereby considerably expanding the scope of the glassy LB toward the investigation of long-time relaxation properties of soft-flowing glassy materials.

  6. Solution of relativistic quantum optics problems using clusters of graphical processing units

    SciTech Connect

    Gordon, D.F. Hafizi, B.; Helle, M.H.

    2014-06-15

    Numerical solution of relativistic quantum optics problems requires high performance computing due to the rapid oscillations in a relativistic wavefunction. Clusters of graphical processing units are used to accelerate the computation of a time dependent relativistic wavefunction in an arbitrary external potential. The stationary states in a Coulomb potential and uniform magnetic field are determined analytically and numerically, so that they can used as initial conditions in fully time dependent calculations. Relativistic energy levels in extreme magnetic fields are recovered as a means of validation. The relativistic ionization rate is computed for an ion illuminated by a laser field near the usual barrier suppression threshold, and the ionizing wavefunction is displayed.

  7. General Purpose Graphics Processing Unit Based High-Rate Rice Decompression and Reed-Solomon Decoding

    SciTech Connect

    Loughry, Thomas A.

    2015-02-01

    As the volume of data acquired by space-based sensors increases, mission data compression/decompression and forward error correction code processing performance must likewise scale. This competency development effort was explored using the General Purpose Graphics Processing Unit (GPGPU) to accomplish high-rate Rice Decompression and high-rate Reed-Solomon (RS) decoding at the satellite mission ground station. Each algorithm was implemented and benchmarked on a single GPGPU. Distributed processing across one to four GPGPUs was also investigated. The results show that the GPGPU has considerable potential for performing satellite communication Data Signal Processing, with three times or better performance improvements and up to ten times reduction in cost over custom hardware, at least in the case of Rice Decompression and Reed-Solomon Decoding.

  8. Implementation and performance of a general purpose graphics processing unit in hyperspectral image analysis

    NASA Astrophysics Data System (ADS)

    van der Werff, H. M. A.; Bakker, W. H.

    2014-02-01

    A graphics processing unit (GPU) can perform massively parallel computations at relatively low cost. Software interfaces like NVIDIA CUDA allow for General Purpose computing on a GPU (GPGPU). Wrappers of the CUDA libraries for higher-level programming languages such as MATLAB and IDL allow its use in image processing. In this paper, we implement GPGPU in IDL with two distance measures frequently used in image classification, Euclidean distance and spectral angle, and apply these to hyperspectral imagery. First we vary the data volume of a synthetic dataset by changing the number of image pixels, spectral bands and classification endmembers to determine speed-up and to find the smallest data volume that would still benefit from using graphics hardware. Then we process real datasets that are too large to fit in the GPU memory, and study the effect of resulting extra data transfers on computing performance. We show that our GPU algorithms outperform the same algorithms for a central processor unit (CPU), that a significant speed-up can already be obtained on relatively small datasets, and that data transfers in large datasets do not significantly influence performance. Given that no specific knowledge on parallel computing is required for this implementation, remote sensing scientists should now be able to implement and use GPGPU for their data analysis.

  9. Real-time image edge enhancement with a spiral phase filter and graphic processing unit.

    PubMed

    Zhong, Zhi; Gao, Pengjun; Shan, Mingguang; Wang, Ying; Zhang, Yabin

    2014-07-01

    Isotropic image edge enhancement with high contrast can be achieved using a spiral phase filter (SPF) in a 4f optical system. However, real-time application of edge enhancement with SPF has generally been limited due to the requirement of coherent light or complex phase-shifting operation. In this paper, we demonstrate a real-time image edge enhancement method using a SPF and a graphic processing unit (GPU). By implementing the process of virtual spiral phase filtering on GPU, we are able to speed up the whole procedure by more than 8.3× with respect to CPU processing, and ultimately achieve video rate for megapixel images. In particular, our implementation can achieve higher speedup for more multiple images. These developments are increasing the potential for image edge enhancement of moving objects.

  10. Rapid learning-based video stereolization using graphic processing unit acceleration

    NASA Astrophysics Data System (ADS)

    Sun, Tian; Jung, Cheolkon; Wang, Lei; Kim, Joongkyu

    2016-09-01

    Video stereolization has received much attention in recent years due to the lack of stereoscopic three-dimensional (3-D) contents. Although video stereolization can enrich stereoscopic 3-D contents, it is hard to achieve automatic two-dimensional-to-3-D conversion with less computational cost. We proposed rapid learning-based video stereolization using a graphic processing unit (GPU) acceleration. We first generated an initial depth map based on learning from examples. Then, we refined the depth map using saliency and cross-bilateral filtering to make object boundaries clear. Finally, we performed depth-image-based-rendering to generate stereoscopic 3-D views. To accelerate the computation of video stereolization, we provided a parallelizable hybrid GPU-central processing unit (CPU) solution to be suitable for running on GPU. Experimental results demonstrate that the proposed method is nearly 180 times faster than CPU-based processing and achieves a good performance comparable to the-state-of-the-art ones.

  11. High-resolution, real-time three-dimensional shape measurement on graphics processing unit

    NASA Astrophysics Data System (ADS)

    Karpinsky, Nikolaus; Hoke, Morgan; Chen, Vincent; Zhang, Song

    2014-02-01

    A three-dimensional (3-D) shape measurement system that can simultaneously achieve 3-D shape acquisition, reconstruction, and display at 30 frames per second (fps) with 480,000 measurement points per frame is presented. The entire processing pipeline was realized on a graphics processing unit (GPU) without the need of substantial central processing unit (CPU) power, making it achievable on a portable device, namely a laptop computer. Furthermore, the system is extremely inexpensive compared with similar state-of-art systems, making it possible to be accessed by the general public. Specifically, advanced GPU techniques such as multipass rendering and offscreen rendering were used in conjunction with direct memory access to achieve the aforementioned performance. The developed system, implementation details, and experimental results to verify the performance of the proposed technique are presented.

  12. Modified graphical autocatalytic set model of combustion process in circulating fluidized bed boiler

    NASA Astrophysics Data System (ADS)

    Yusof, Nurul Syazwani; Bakar, Sumarni Abu; Ismail, Razidah

    2014-07-01

    Circulating Fluidized Bed Boiler (CFB) is a device for generating steam by burning fossil fuels in a furnace operating under a special hydrodynamic condition. Autocatalytic Set has provided a graphical model of chemical reactions that occurred during combustion process in CFB. Eight important chemical substances known as species were represented as nodes and catalytic relationships between nodes are represented by the edges in the graph. In this paper, the model is extended and modified by considering other relevant chemical reactions that also exist during the process. Catalytic relationship among the species in the model is discussed. The result reveals that the modified model is able to gives more explanation of the relationship among the species during the process at initial time t.

  13. BarraCUDA - a fast short read sequence aligner using graphics processing units

    PubMed Central

    2012-01-01

    Background With the maturation of next-generation DNA sequencing (NGS) technologies, the throughput of DNA sequencing reads has soared to over 600 gigabases from a single instrument run. General purpose computing on graphics processing units (GPGPU), extracts the computing power from hundreds of parallel stream processors within graphics processing cores and provides a cost-effective and energy efficient alternative to traditional high-performance computing (HPC) clusters. In this article, we describe the implementation of BarraCUDA, a GPGPU sequence alignment software that is based on BWA, to accelerate the alignment of sequencing reads generated by these instruments to a reference DNA sequence. Findings Using the NVIDIA Compute Unified Device Architecture (CUDA) software development environment, we ported the most computational-intensive alignment component of BWA to GPU to take advantage of the massive parallelism. As a result, BarraCUDA offers a magnitude of performance boost in alignment throughput when compared to a CPU core while delivering the same level of alignment fidelity. The software is also capable of supporting multiple CUDA devices in parallel to further accelerate the alignment throughput. Conclusions BarraCUDA is designed to take advantage of the parallelism of GPU to accelerate the alignment of millions of sequencing reads generated by NGS instruments. By doing this, we could, at least in part streamline the current bioinformatics pipeline such that the wider scientific community could benefit from the sequencing technology. BarraCUDA is currently available from http://seqbarracuda.sf.net PMID:22244497

  14. Accelerated multidimensional radiofrequency pulse design for parallel transmission using concurrent computation on multiple graphics processing units.

    PubMed

    Deng, Weiran; Yang, Cungeng; Stenger, V Andrew

    2011-02-01

    Multidimensional radiofrequency (RF) pulses are of current interest because of their promise for improving high-field imaging and for optimizing parallel transmission methods. One major drawback is that the computation time of numerically designed multidimensional RF pulses increases rapidly with their resolution and number of transmitters. This is critical because the construction of multidimensional RF pulses often needs to be in real time. The use of graphics processing units for computations is a recent approach for accelerating image reconstruction applications. We propose the use of graphics processing units for the design of multidimensional RF pulses including the utilization of parallel transmitters. Using a desktop computer with four NVIDIA Tesla C1060 computing processors, we found acceleration factors on the order of 20 for standard eight-transmitter two-dimensional spiral RF pulses with a 64 × 64 excitation resolution and a 10-μsec dwell time. We also show that even greater acceleration factors can be achieved for more complex RF pulses. Copyright © 2010 Wiley-Liss, Inc.

  15. Graphical processing unit implementation of an integrated shape-based active contour: Application to digital pathology.

    PubMed

    Ali, Sahirzeeshan; Madabhushi, Anant

    2011-01-01

    Commodity graphics hardware has become a cost-effective parallel platform to solve m any general computational problems. In medical imaging and more so in digital pathology, segmentation of multiple structures on high-resolution images, is often a complex and computationally expensive task. Shape-based level set segmentation has recently emerged as a natural solution to segmenting overlapping and occluded objects. However the flexibility of the level set method has traditionally resulted in long computation times and therefore might have limited clinical utility. The processing times even for moderately sized images could run into several hours of computation time. Hence there is a clear need to accelerate these segmentations schemes. In this paper, we present a parallel implementation of a computationally heavy segmentation scheme on a graphical processing unit (GPU). The segmentation scheme incorporates level sets with shape priors to segment multiple overlapping nuclei from very large digital pathology images. We report a speedup of 19× compared to multithreaded C and MATLAB-based implementations of the same scheme, albeit with slight reduction in accuracy. Our GPU-based segmentation scheme was rigorously and quantitatively evaluated for the problem of nuclei segmentation and overlap resolution on digitized histopathology images corresponding to breast and prostate biopsy tissue specimens.

  16. Using Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

    NASA Astrophysics Data System (ADS)

    O'Connor, A. S.; Justice, B.; Harris, A. T.

    2013-12-01

    Graphics Processing Units (GPUs) are high-performance multiple-core processors capable of very high computational speeds and large data throughput. Modern GPUs are inexpensive and widely available commercially. These are general-purpose parallel processors with support for a variety of programming interfaces, including industry standard languages such as C. GPU implementations of algorithms that are well suited for parallel processing can often achieve speedups of several orders of magnitude over optimized CPU codes. Significant improvements in speeds for imagery orthorectification, atmospheric correction, target detection and image transformations like Independent Components Analsyis (ICA) have been achieved using GPU-based implementations. Additional optimizations, when factored in with GPU processing capabilities, can provide 50x - 100x reduction in the time required to process large imagery. Exelis Visual Information Solutions (VIS) has implemented a CUDA based GPU processing frame work for accelerating ENVI and IDL processes that can best take advantage of parallelization. Testing Exelis VIS has performed shows that orthorectification can take as long as two hours with a WorldView1 35,0000 x 35,000 pixel image. With GPU orthorecification, the same orthorectification process takes three minutes. By speeding up image processing, imagery can successfully be used by first responders, scientists making rapid discoveries with near real time data, and provides an operational component to data centers needing to quickly process and disseminate data.

  17. Speedup for quantum optimal control from automatic differentiation based on graphics processing units

    NASA Astrophysics Data System (ADS)

    Leung, Nelson; Abdelhafez, Mohamed; Koch, Jens; Schuster, David

    2017-04-01

    We implement a quantum optimal control algorithm based on automatic differentiation and harness the acceleration afforded by graphics processing units (GPUs). Automatic differentiation allows us to specify advanced optimization criteria and incorporate them in the optimization process with ease. We show that the use of GPUs can speedup calculations by more than an order of magnitude. Our strategy facilitates efficient numerical simulations on affordable desktop computers and exploration of a host of optimization constraints and system parameters relevant to real-life experiments. We demonstrate optimization of quantum evolution based on fine-grained evaluation of performance at each intermediate time step, thus enabling more intricate control on the evolution path, suppression of departures from the truncated model subspace, as well as minimization of the physical time needed to perform high-fidelity state preparation and unitary gates.

  18. Denoising NMR time-domain signal by singular-value decomposition accelerated by graphics processing units.

    PubMed

    Man, Pascal P; Bonhomme, Christian; Babonneau, Florence

    2014-01-01

    We present a post-processing method that decreases the NMR spectrum noise without line shape distortion. As a result the signal-to-noise (S/N) ratio of a spectrum increases. This method is called Cadzow enhancement procedure that is based on the singular-value decomposition of time-domain signal. We also provide software whose execution duration is a few seconds for typical data when it is executed in modern graphic-processing unit. We tested this procedure not only on low sensitive nucleus (29)Si in hybrid materials but also on low gyromagnetic ratio, quadrupole nucleus (87)Sr in reference sample Sr(NO3)2. Improving the spectrum S/N ratio facilitates the determination of T/Q ratio of hybrid materials. It is also applicable to simulated spectrum, resulting shorter simulation duration for powder averaging. An estimation of the number of singular values needed for denoising is also provided.

  19. Near-real-time simulations of biolelectric activity in small mammalian hearts using graphical processing units.

    PubMed

    Vigmond, Edward J; Boyle, Patrick M; Leon, L; Plank, Gernot

    2009-01-01

    Simulations of cardiac bioelectric phenomena remain a significant challenge despite continual advancements in computational machinery. Spanning large temporal and spatial ranges demands millions of nodes to accurately depict geometry, and a comparable number of timesteps to capture dynamics. This study explores a new hardware computing paradigm, the graphics processing unit (GPU), to accelerate cardiac models, and analyzes results in the context of simulating a small mammalian heart in real time. The ODEs associated with membrane ionic flow were computed on traditional CPU and compared to GPU performance, for one to four parallel processing units. The scalability of solving the PDE responsible for tissue coupling was examined on a cluster using up to 128 cores. Results indicate that the GPU implementation was between 9 and 17 times faster than the CPU implementation and scaled similarly. Solving the PDE was still 160 times slower than real time.

  20. Acceleration of early-photon fluorescence molecular tomography with graphics processing units.

    PubMed

    Wang, Xin; Zhang, Bin; Cao, Xu; Liu, Fei; Luo, Jianwen; Bai, Jing

    2013-01-01

    Fluorescence molecular tomography (FMT) with early-photons can improve the spatial resolution and fidelity of the reconstructed results. However, its computing scale is always large which limits its applications. In this paper, we introduced an acceleration strategy for the early-photon FMT with graphics processing units (GPUs). According to the procedure, the whole solution of FMT was divided into several modules and the time consumption for each module is studied. In this strategy, two most time consuming modules (Gd and W modules) were accelerated with GPU, respectively, while the other modules remained coded in the Matlab. Several simulation studies with a heterogeneous digital mouse atlas were performed to confirm the performance of the acceleration strategy. The results confirmed the feasibility of the strategy and showed that the processing speed was improved significantly.

  1. Fast direct reconstruction strategy of dynamic fluorescence molecular tomography using graphics processing units

    NASA Astrophysics Data System (ADS)

    Chen, Maomao; Zhang, Jiulou; Cai, Chuangjian; Gao, Yang; Luo, Jianwen

    2016-06-01

    Dynamic fluorescence molecular tomography (DFMT) is a valuable method to evaluate the metabolic process of contrast agents in different organs in vivo, and direct reconstruction methods can improve the temporal resolution of DFMT. However, challenges still remain due to the large time consumption of the direct reconstruction methods. An acceleration strategy using graphics processing units (GPU) is presented. The procedure of conjugate gradient optimization in the direct reconstruction method is programmed using the compute unified device architecture and then accelerated on GPU. Numerical simulations and in vivo experiments are performed to validate the feasibility of the strategy. The results demonstrate that, compared with the traditional method, the proposed strategy can reduce the time consumption by ˜90% without a degradation of quality.

  2. Real-time liquid-crystal atmosphere turbulence simulator with graphic processing unit.

    PubMed

    Hu, Lifa; Xuan, Li; Li, Dayu; Cao, Zhaoliang; Mu, Quanquan; Liu, Yonggang; Peng, Zenghui; Lu, Xinghai

    2009-04-27

    To generate time-evolving atmosphere turbulence in real time, a phase-generating method for our liquid-crystal (LC) atmosphere turbulence simulator (ATS) is derived based on the Fourier series (FS) method. A real matrix expression for generating turbulence phases is given and calculated with a graphic processing unit (GPU), the GeForce 8800 Ultra. A liquid crystal on silicon (LCOS) with 256x256 pixels is used as the turbulence simulator. The total time to generate a turbulence phase is about 7.8 ms for calculation and readout with the GPU. A parallel processing method of calculating and sending a picture to the LCOS is used to improve the simulating speed of our LC ATS. Therefore, the real-time turbulence phase-generation frequency of our LC ATS is up to 128 Hz. To our knowledge, it is the highest speed used to generate a turbulence phase in real time.

  3. Computation of the Density Matrix in Electronic Structure Theory in Parallel on Multiple Graphics Processing Units.

    PubMed

    Cawkwell, M J; Wood, M A; Niklasson, Anders M N; Mniszewski, S M

    2014-12-09

    The algorithm developed in Cawkwell, M. J. et al. J. Chem. Theory Comput. 2012 , 8 , 4094 for the computation of the density matrix in electronic structure theory on a graphics processing unit (GPU) using the second-order spectral projection (SP2) method [ Niklasson, A. M. N. Phys. Rev. B 2002 , 66 , 155115 ] has been efficiently parallelized over multiple GPUs on a single compute node. The parallel implementation provides significant speed-ups with respect to the single GPU version with no loss of accuracy. The performance and accuracy of the parallel GPU-based algorithm is compared with the performance of the SP2 algorithm and traditional matrix diagonalization methods on a multicore central processing unit (CPU).

  4. Simplified electroholographic color reconstruction system using graphics processing unit and liquid crystal display projector.

    PubMed

    Shiraki, Atsushi; Takada, Naoki; Niwa, Masashi; Ichihashi, Yasuyuki; Shimobaba, Tomoyoshi; Masuda, Nobuyuki; Ito, Tomoyoshi

    2009-08-31

    We have constructed a simple color electroholography system that has excellent cost performance. It uses a graphics processing unit (GPU) and a liquid crystal display (LCD) projector. The structure of the GPU is suitable for calculating computer-generated holograms (CGHs). The calculation speed of the GPU is approximately 1,500 times faster than that of a central processing unit. The LCD projector is an inexpensive, high-performance device for displaying CGHs. It has high-definition LCD panels for red, green and blue. Thus, it can be easily used for color electroholography. For a three-dimensional object consisting of 1,000 points, our system succeeded in real-time color holographic reconstruction at rate of 30 frames per second.

  5. Fast crustal deformation computing method for multiple computations accelerated by a graphics processing unit cluster

    NASA Astrophysics Data System (ADS)

    Yamaguchi, Takuma; Ichimura, Tsuyoshi; Yagi, Yuji; Agata, Ryoichiro; Hori, Takane; Hori, Muneo

    2017-08-01

    As high-resolution observational data become more common, the demand for numerical simulations of crustal deformation using 3-D high-fidelity modelling is increasing. To increase the efficiency of performing numerical simulations with high computation costs, we developed a fast solver using heterogeneous computing, with graphics processing units (GPUs) and central processing units, and then used the solver in crustal deformation computations. The solver was based on an iterative solver and was devised so that a large proportion of the computation was calculated more quickly using GPUs. To confirm the utility of the proposed solver, we demonstrated a numerical simulation of the coseismic slip distribution estimation, which requires 360 000 crustal deformation computations with 82 196 106 degrees of freedom.

  6. Real-time digital holographic microscopy using the graphic processing unit.

    PubMed

    Shimobaba, Tomoyoshi; Sato, Yoshikuni; Miura, Junya; Takenouchi, Mai; Ito, Tomoyoshi

    2008-08-04

    Digital holographic microscopy (DHM) is a well-known powerful method allowing both the amplitude and phase of a specimen to be simultaneously observed. In order to obtain a reconstructed image from a hologram, numerous calculations for the Fresnel diffraction are required. The Fresnel diffraction can be accelerated by the FFT (Fast Fourier Transform) algorithm. However, real-time reconstruction from a hologram is difficult even if we use a recent central processing unit (CPU) to calculate the Fresnel diffraction by the FFT algorithm. In this paper, we describe a real-time DHM system using a graphic processing unit (GPU) with many stream processors, which allows use as a highly parallel processor. The computational speed of the Fresnel diffraction using the GPU is faster than that of recent CPUs. The real-time DHM system can obtain reconstructed images from holograms whose size is 512 x 512 grids in 24 frames per second.

  7. Graphics processing unit parallel accelerated solution of the discrete ordinates for photon transport in biological tissues.

    PubMed

    Peng, Kuan; Gao, Xinbo; Qu, Xiaochao; Ren, Nunu; Chen, Xueli; He, Xiaowei; Wang, Xiaorei; Liang, Jimin; Tian, Jie

    2011-07-20

    As a widely used numerical solution for the radiation transport equation (RTE), the discrete ordinates can predict the propagation of photons through biological tissues more accurately relative to the diffusion equation. The discrete ordinates reduce the RTE to a serial of differential equations that can be solved by source iteration (SI). However, the tremendous time consumption of SI, which is partly caused by the expensive computation of each SI step, limits its applications. In this paper, we present a graphics processing unit (GPU) parallel accelerated SI method for discrete ordinates. Utilizing the calculation independence on the levels of the discrete ordinate equation and spatial element, the proposed method reduces the time cost of each SI step by parallel calculation. The photon reflection at the boundary was calculated based on the results of the last SI step to ensure the calculation independence on the level of the discrete ordinate equation. An element sweeping strategy was proposed to detect the calculation independence on the level of the spatial element. A GPU parallel frame called the compute unified device architecture was employed to carry out the parallel computation. The simulation experiments, which were carried out with a cylindrical phantom and numerical mouse, indicated that the time cost of each SI step can be reduced up to a factor of 228 by the proposed method with a GTX 260 graphics card. © 2011 Optical Society of America

  8. GPUmotif: An Ultra-Fast and Energy-Efficient Motif Analysis Program Using Graphics Processing Units

    PubMed Central

    Zandevakili, Pooya; Hu, Ming; Qin, Zhaohui

    2012-01-01

    Computational detection of TF binding patterns has become an indispensable tool in functional genomics research. With the rapid advance of new sequencing technologies, large amounts of protein-DNA interaction data have been produced. Analyzing this data can provide substantial insight into the mechanisms of transcriptional regulation. However, the massive amount of sequence data presents daunting challenges. In our previous work, we have developed a novel algorithm called Hybrid Motif Sampler (HMS) that enables more scalable and accurate motif analysis. Despite much improvement, HMS is still time-consuming due to the requirement to calculate matching probabilities position-by-position. Using the NVIDIA CUDA toolkit, we developed a graphics processing unit (GPU)-accelerated motif analysis program named GPUmotif. We proposed a “fragmentation" technique to hide data transfer time between memories. Performance comparison studies showed that commonly-used model-based motif scan and de novo motif finding procedures such as HMS can be dramatically accelerated when running GPUmotif on NVIDIA graphics cards. As a result, energy consumption can also be greatly reduced when running motif analysis using GPUmotif. The GPUmotif program is freely available at http://sourceforge.net/projects/gpumotif/ PMID:22662128

  9. Monte Carlo simulation of photon migration in 3D turbid media accelerated by graphics processing units.

    PubMed

    Fang, Qianqian; Boas, David A

    2009-10-26

    We report a parallel Monte Carlo algorithm accelerated by graphics processing units (GPU) for modeling time-resolved photon migration in arbitrary 3D turbid media. By taking advantage of the massively parallel threads and low-memory latency, this algorithm allows many photons to be simulated simultaneously in a GPU. To further improve the computational efficiency, we explored two parallel random number generators (RNG), including a floating-point-only RNG based on a chaotic lattice. An efficient scheme for boundary reflection was implemented, along with the functions for time-resolved imaging. For a homogeneous semi-infinite medium, good agreement was observed between the simulation output and the analytical solution from the diffusion theory. The code was implemented with CUDA programming language, and benchmarked under various parameters, such as thread number, selection of RNG and memory access pattern. With a low-cost graphics card, this algorithm has demonstrated an acceleration ratio above 300 when using 1792 parallel threads over conventional CPU computation. The acceleration ratio drops to 75 when using atomic operations. These results render the GPU-based Monte Carlo simulation a practical solution for data analysis in a wide range of diffuse optical imaging applications, such as human brain or small-animal imaging.

  10. A software architecture for multi-cellular system simulations on graphics processing units.

    PubMed

    Jeannin-Girardon, Anne; Ballet, Pascal; Rodin, Vincent

    2013-09-01

    The first aim of simulation in virtual environment is to help biologists to have a better understanding of the simulated system. The cost of such simulation is significantly reduced compared to that of in vivo simulation. However, the inherent complexity of biological system makes it hard to simulate these systems on non-parallel architectures: models might be made of sub-models and take several scales into account; the number of simulated entities may be quite large. Today, graphics cards are used for general purpose computing which has been made easier thanks to frameworks like CUDA or OpenCL. Parallelization of models may however not be easy: parallel computer programing skills are often required; several hardware architectures may be used to execute models. In this paper, we present the software architecture we built in order to implement various models able to simulate multi-cellular system. This architecture is modular and it implements data structures adapted for graphics processing units architectures. It allows efficient simulation of biological mechanisms.

  11. Real-time nonlinear finite element analysis for surgical simulation using graphics processing units.

    PubMed

    Taylor, Zeike A; Cheng, Mario; Ourselin, Sébastien

    2007-01-01

    Clinical employment of biomechanical modelling techniques in areas of medical image analysis and surgical simulation is often hindered by conflicting requirements for high fidelity in the modelling approach and high solution speeds. We report the development of techniques for high-speed nonlinear finite element (FE) analysis for surgical simulation. We employ a previously developed nonlinear total Lagrangian explicit FE formulation which offers significant computational advantages for soft tissue simulation. However, the key contribution of the work is the presentation of a fast graphics processing unit (GPU) solution scheme for the FE equations. To the best of our knowledge this represents the first GPU implementation of a nonlinear FE solver. We show that the present explicit FE scheme is well-suited to solution via highly parallel graphics hardware, and that even a midrange GPU allows significant solution speed gains (up to 16.4x) compared with equivalent CPU implementations. For the models tested the scheme allows real-time solution of models with up to 16000 tetrahedral elements. The use of GPUs for such purposes offers a cost-effective high-performance alternative to expensive multi-CPU machines, and may have important applications in medical image analysis and surgical simulation.

  12. High-speed nonlinear finite element analysis for surgical simulation using graphics processing units.

    PubMed

    Taylor, Z A; Cheng, M; Ourselin, S

    2008-05-01

    The use of biomechanical modelling, especially in conjunction with finite element analysis, has become common in many areas of medical image analysis and surgical simulation. Clinical employment of such techniques is hindered by conflicting requirements for high fidelity in the modelling approach, and fast solution speeds. We report the development of techniques for high-speed nonlinear finite element analysis for surgical simulation. We use a fully nonlinear total Lagrangian explicit finite element formulation which offers significant computational advantages for soft tissue simulation. However, the key contribution of the work is the presentation of a fast graphics processing unit (GPU) solution scheme for the finite element equations. To the best of our knowledge, this represents the first GPU implementation of a nonlinear finite element solver. We show that the present explicit finite element scheme is well suited to solution via highly parallel graphics hardware, and that even a midrange GPU allows significant solution speed gains (up to 16.8 x) compared with equivalent CPU implementations. For the models tested the scheme allows real-time solution of models with up to 16,000 tetrahedral elements. The use of GPUs for such purposes offers a cost-effective high-performance alternative to expensive multi-CPU machines, and may have important applications in medical image analysis and surgical simulation.

  13. GPUmotif: an ultra-fast and energy-efficient motif analysis program using graphics processing units.

    PubMed

    Zandevakili, Pooya; Hu, Ming; Qin, Zhaohui

    2012-01-01

    Computational detection of TF binding patterns has become an indispensable tool in functional genomics research. With the rapid advance of new sequencing technologies, large amounts of protein-DNA interaction data have been produced. Analyzing this data can provide substantial insight into the mechanisms of transcriptional regulation. However, the massive amount of sequence data presents daunting challenges. In our previous work, we have developed a novel algorithm called Hybrid Motif Sampler (HMS) that enables more scalable and accurate motif analysis. Despite much improvement, HMS is still time-consuming due to the requirement to calculate matching probabilities position-by-position. Using the NVIDIA CUDA toolkit, we developed a graphics processing unit (GPU)-accelerated motif analysis program named GPUmotif. We proposed a "fragmentation" technique to hide data transfer time between memories. Performance comparison studies showed that commonly-used model-based motif scan and de novo motif finding procedures such as HMS can be dramatically accelerated when running GPUmotif on NVIDIA graphics cards. As a result, energy consumption can also be greatly reduced when running motif analysis using GPUmotif. The GPUmotif program is freely available at http://sourceforge.net/projects/gpumotif/

  14. Massively Parallel Signal Processing using the Graphics Processing Unit for Real-Time Brain–Computer Interface Feature Extraction

    PubMed Central

    Wilson, J. Adam; Williams, Justin C.

    2009-01-01

    The clock speeds of modern computer processors have nearly plateaued in the past 5 years. Consequently, neural prosthetic systems that rely on processing large quantities of data in a short period of time face a bottleneck, in that it may not be possible to process all of the data recorded from an electrode array with high channel counts and bandwidth, such as electrocorticographic grids or other implantable systems. Therefore, in this study a method of using the processing capabilities of a graphics card [graphics processing unit (GPU)] was developed for real-time neural signal processing of a brain–computer interface (BCI). The NVIDIA CUDA system was used to offload processing to the GPU, which is capable of running many operations in parallel, potentially greatly increasing the speed of existing algorithms. The BCI system records many channels of data, which are processed and translated into a control signal, such as the movement of a computer cursor. This signal processing chain involves computing a matrix–matrix multiplication (i.e., a spatial filter), followed by calculating the power spectral density on every channel using an auto-regressive method, and finally classifying appropriate features for control. In this study, the first two computationally intensive steps were implemented on the GPU, and the speed was compared to both the current implementation and a central processing unit-based implementation that uses multi-threading. Significant performance gains were obtained with GPU processing: the current implementation processed 1000 channels of 250 ms in 933 ms, while the new GPU method took only 27 ms, an improvement of nearly 35 times. PMID:19636394

  15. Massively Parallel Signal Processing using the Graphics Processing Unit for Real-Time Brain-Computer Interface Feature Extraction.

    PubMed

    Wilson, J Adam; Williams, Justin C

    2009-01-01

    The clock speeds of modern computer processors have nearly plateaued in the past 5 years. Consequently, neural prosthetic systems that rely on processing large quantities of data in a short period of time face a bottleneck, in that it may not be possible to process all of the data recorded from an electrode array with high channel counts and bandwidth, such as electrocorticographic grids or other implantable systems. Therefore, in this study a method of using the processing capabilities of a graphics card [graphics processing unit (GPU)] was developed for real-time neural signal processing of a brain-computer interface (BCI). The NVIDIA CUDA system was used to offload processing to the GPU, which is capable of running many operations in parallel, potentially greatly increasing the speed of existing algorithms. The BCI system records many channels of data, which are processed and translated into a control signal, such as the movement of a computer cursor. This signal processing chain involves computing a matrix-matrix multiplication (i.e., a spatial filter), followed by calculating the power spectral density on every channel using an auto-regressive method, and finally classifying appropriate features for control. In this study, the first two computationally intensive steps were implemented on the GPU, and the speed was compared to both the current implementation and a central processing unit-based implementation that uses multi-threading. Significant performance gains were obtained with GPU processing: the current implementation processed 1000 channels of 250 ms in 933 ms, while the new GPU method took only 27 ms, an improvement of nearly 35 times.

  16. SU-E-P-59: A Graphical Interface for XCAT Phantom Configuration, Generation and Processing

    SciTech Connect

    Myronakis, M; Cai, W; Dhou, S; Cifter, F; Lewis, J; Hurwitz, M

    2015-06-15

    Purpose: To design a comprehensive open-source, publicly available, graphical user interface (GUI) to facilitate the configuration, generation, processing and use of the 4D Extended Cardiac-Torso (XCAT) phantom. Methods: The XCAT phantom includes over 9000 anatomical objects as well as respiratory, cardiac and tumor motion. It is widely used for research studies in medical imaging and radiotherapy. The phantom generation process involves the configuration of a text script to parameterize the geometry, motion, and composition of the whole body and objects within it, and to generate simulated PET or CT images. To avoid the need for manual editing or script writing, our MATLAB-based GUI uses slider controls, drop-down lists, buttons and graphical text input to parameterize and process the phantom. Results: Our GUI can be used to: a) generate parameter files; b) generate the voxelized phantom; c) combine the phantom with a lesion; d) display the phantom; e) produce average and maximum intensity images from the phantom output files; f) incorporate irregular patient breathing patterns; and f) generate DICOM files containing phantom images. The GUI provides local help information using tool-tip strings on the currently selected phantom, minimizing the need for external documentation. The DICOM generation feature is intended to simplify the process of importing the phantom images into radiotherapy treatment planning systems or other clinical software. Conclusion: The GUI simplifies and automates the use of the XCAT phantom for imaging-based research projects in medical imaging or radiotherapy. This has the potential to accelerate research conducted with the XCAT phantom, or to ease the learning curve for new users. This tool does not include the XCAT phantom software itself. We would like to acknowledge funding from MRA, Varian Medical Systems Inc.

  17. [Applying graphics processing unit in real-time signal processing and visualization of ophthalmic Fourier-domain OCT system].

    PubMed

    Liu, Qiaoyan; Li, Yuejie; Xu, Qiujing; Zhao, Jincheng; Wang, Liwei; Gao, Yonghe

    2013-01-01

    This investigation introduces GPU (Graphics Processing Unit)- based CUDA (Compute Unified Device Architecture) technology into signal processing of ophthalmic FD-OCT (Fourier-Domain Optical Coherence Tomography) imaging system, can realize parallel data processing, using CUDA to optimize relevant operations and algorithms, in order to solve the technical bottlenecks that currently affect ophthalmic real-time imaging in OCT system. Laboratory results showed that with GPU as a general parallel computing processor, the speed of imaging data processing using GPU+CPU mode is more than dozens times faster than traditional CPU platform based serial computing and imaging mode when executing the same data processing, which reaches the clinical requirements for two dimensional real-time imaging.

  18. Implementation and optimization of ultrasound signal processing algorithms on mobile GPU

    NASA Astrophysics Data System (ADS)

    Kong, Woo Kyu; Lee, Wooyoul; Kim, Kyu Cheol; Yoo, Yangmo; Song, Tai-Kyong

    2014-03-01

    A general-purpose graphics processing unit (GPGPU) has been used for improving computing power in medical ultrasound imaging systems. Recently, a mobile GPU becomes powerful to deal with 3D games and videos at high frame rates on Full HD or HD resolution displays. This paper proposes the method to implement ultrasound signal processing on a mobile GPU available in the high-end smartphone (Galaxy S4, Samsung Electronics, Seoul, Korea) with programmable shaders on the OpenGL ES 2.0 platform. To maximize the performance of the mobile GPU, the optimization of shader design and load sharing between vertex and fragment shader was performed. The beamformed data were captured from a tissue mimicking phantom (Model 539 Multipurpose Phantom, ATS Laboratories, Inc., Bridgeport, CT, USA) by using a commercial ultrasound imaging system equipped with a research package (Ultrasonix Touch, Ultrasonix, Richmond, BC, Canada). The real-time performance is evaluated by frame rates while varying the range of signal processing blocks. The implementation method of ultrasound signal processing on OpenGL ES 2.0 was verified by analyzing PSNR with MATLAB gold standard that has the same signal path. CNR was also analyzed to verify the method. From the evaluations, the proposed mobile GPU-based processing method has no significant difference with the processing using MATLAB (i.e., PSNR<52.51 dB). The comparable results of CNR were obtained from both processing methods (i.e., 11.31). From the mobile GPU implementation, the frame rates of 57.6 Hz were achieved. The total execution time was 17.4 ms that was faster than the acquisition time (i.e., 34.4 ms). These results indicate that the mobile GPU-based processing method can support real-time ultrasound B-mode processing on the smartphone.

  19. Real-time display on Fourier domain optical coherence tomography system using a graphics processing unit.

    PubMed

    Watanabe, Yuuki; Itagaki, Toshiki

    2009-01-01

    Fourier domain optical coherence tomography (FD-OCT) requires resampling of spectrally resolved depth information from wavelength to wave number, and the subsequent application of the inverse Fourier transform. The display rates of OCT images are much slower than the image acquisition rates due to processing speed limitations on most computers. We demonstrate a real-time display of processed OCT images using a linear-in-wave-number (linear-k) spectrometer and a graphics processing unit (GPU). We use the linear-k spectrometer with the combination of a diffractive grating with 1200 lines/mm and a F2 equilateral prism in the 840-nm spectral region to avoid calculating the resampling process. The calculations of the fast Fourier transform (FFT) are accelerated by the GPU with many stream processors, which realizes highly parallel processing. A display rate of 27.9 frames/sec for processed images (2048 FFT size x 1000 lateral A-scans) is achieved in our OCT system using a line scan CCD camera operated at 27.9 kHz.

  20. Parallel multigrid solver of radiative transfer equation for photon transport via graphics processing unit.

    PubMed

    Gao, Hao; Phan, Lan; Lin, Yuting

    2012-09-01

    A graphics processing unit-based parallel multigrid solver for a radiative transfer equation with vacuum boundary condition or reflection boundary condition is presented for heterogeneous media with complex geometry based on two-dimensional triangular meshes or three-dimensional tetrahedral meshes. The computational complexity of this parallel solver is linearly proportional to the degrees of freedom in both angular and spatial variables, while the full multigrid method is utilized to minimize the number of iterations. The overall gain of speed is roughly 30 to 300 fold with respect to our prior multigrid solver, which depends on the underlying regime and the parallelization. The numerical validations are presented with the MATLAB codes at https://sites.google.com/site/rtefastsolver/.

  1. Mendel-GPU: haplotyping and genotype imputation on graphics processing units

    PubMed Central

    Chen, Gary K.; Wang, Kai; Stram, Alex H.; Sobel, Eric M.; Lange, Kenneth

    2012-01-01

    Motivation: In modern sequencing studies, one can improve the confidence of genotype calls by phasing haplotypes using information from an external reference panel of fully typed unrelated individuals. However, the computational demands are so high that they prohibit researchers with limited computational resources from haplotyping large-scale sequence data. Results: Our graphics processing unit based software delivers haplotyping and imputation accuracies comparable to competing programs at a fraction of the computational cost and peak memory demand. Availability: Mendel-GPU, our OpenCL software, runs on Linux platforms and is portable across AMD and nVidia GPUs. Users can download both code and documentation at http://code.google.com/p/mendel-gpu/. Contact: gary.k.chen@usc.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22954633

  2. Mendel-GPU: haplotyping and genotype imputation on graphics processing units.

    PubMed

    Chen, Gary K; Wang, Kai; Stram, Alex H; Sobel, Eric M; Lange, Kenneth

    2012-11-15

    In modern sequencing studies, one can improve the confidence of genotype calls by phasing haplotypes using information from an external reference panel of fully typed unrelated individuals. However, the computational demands are so high that they prohibit researchers with limited computational resources from haplotyping large-scale sequence data. Our graphics processing unit based software delivers haplotyping and imputation accuracies comparable to competing programs at a fraction of the computational cost and peak memory demand. Mendel-GPU, our OpenCL software, runs on Linux platforms and is portable across AMD and nVidia GPUs. Users can download both code and documentation at http://code.google.com/p/mendel-gpu/. gary.k.chen@usc.edu. Supplementary data are available at Bioinformatics online.

  3. Classification of hyperspectral imagery using MapReduce on a NVIDIA graphics processing unit (Conference Presentation)

    NASA Astrophysics Data System (ADS)

    Ramirez, Andres; Rahnemoonfar, Maryam

    2017-04-01

    A hyperspectral image provides multidimensional figure rich in data consisting of hundreds of spectral dimensions. Analyzing the spectral and spatial information of such image with linear and non-linear algorithms will result in high computational time. In order to overcome this problem, this research presents a system using a MapReduce-Graphics Processing Unit (GPU) model that can help analyzing a hyperspectral image through the usage of parallel hardware and a parallel programming model, which will be simpler to handle compared to other low-level parallel programming models. Additionally, Hadoop was used as an open-source version of the MapReduce parallel programming model. This research compared classification accuracy results and timing results between the Hadoop and GPU system and tested it against the following test cases: the CPU and GPU test case, a CPU test case and a test case where no dimensional reduction was applied.

  4. A graphical method to evaluate predominant geochemical processes occurring in groundwater systems for radiocarbon dating

    USGS Publications Warehouse

    Han, Liang-Feng; Plummer, L. Niel; Aggarwal, Pradeep

    2012-01-01

    A graphical method is described for identifying geochemical reactions needed in the interpretation of radiocarbon age in groundwater systems. Graphs are constructed by plotting the measured 14C, δ13C, and concentration of dissolved inorganic carbon and are interpreted according to specific criteria to recognize water samples that are consistent with a wide range of processes, including geochemical reactions, carbon isotopic exchange, 14C decay, and mixing of waters. The graphs are used to provide a qualitative estimate of radiocarbon age, to deduce the hydrochemical complexity of a groundwater system, and to compare samples from different groundwater systems. Graphs of chemical and isotopic data from a series of previously-published groundwater studies are used to demonstrate the utility of the approach. Ultimately, the information derived from the graphs is used to improve geochemical models for adjustment of radiocarbon ages in groundwater systems.

  5. Parallel multigrid solver of radiative transfer equation for photon transport via graphics processing unit

    PubMed Central

    Phan, Lan; Lin, Yuting

    2012-01-01

    Abstract. A graphics processing unit–based parallel multigrid solver for a radiative transfer equation with vacuum boundary condition or reflection boundary condition is presented for heterogeneous media with complex geometry based on two-dimensional triangular meshes or three-dimensional tetrahedral meshes. The computational complexity of this parallel solver is linearly proportional to the degrees of freedom in both angular and spatial variables, while the full multigrid method is utilized to minimize the number of iterations. The overall gain of speed is roughly 30 to 300 fold with respect to our prior multigrid solver, which depends on the underlying regime and the parallelization. The numerical validations are presented with the MATLAB codes at https://sites.google.com/site/rtefastsolver/. PMID:23085905

  6. Particle-In-Cell simulations of high pressure plasmas using graphics processing units

    NASA Astrophysics Data System (ADS)

    Gebhardt, Markus; Atteln, Frank; Brinkmann, Ralf Peter; Mussenbrock, Thomas; Mertmann, Philipp; Awakowicz, Peter

    2009-10-01

    Particle-In-Cell (PIC) simulations are widely used to understand the fundamental phenomena in low-temperature plasmas. Particularly plasmas at very low gas pressures are studied using PIC methods. The inherent drawback of these methods is that they are very time consuming -- certain stability conditions has to be satisfied. This holds even more for the PIC simulation of high pressure plasmas due to the very high collision rates. The simulations take up to very much time to run on standard computers and require the help of computer clusters or super computers. Recent advances in the field of graphics processing units (GPUs) provides every personal computer with a highly parallel multi processor architecture for very little money. This architecture is freely programmable and can be used to implement a wide class of problems. In this paper we present the concepts of a fully parallel PIC simulation of high pressure plasmas using the benefits of GPU programming.

  7. Automated Code Engine for Graphical Processing Units: Application to the Effective Core Potential Integrals and Gradients.

    PubMed

    Song, Chenchen; Wang, Lee-Ping; Martínez, Todd J

    2016-01-12

    We present an automated code engine (ACE) that automatically generates optimized kernels for computing integrals in electronic structure theory on a given graphical processing unit (GPU) computing platform. The code generator in ACE creates multiple code variants with different memory and floating point operation trade-offs. A graph representation is created as the foundation of the code generation, which allows the code generator to be extended to various types of integrals. The code optimizer in ACE determines the optimal code variant and GPU configurations for a given GPU computing platform by scanning over all possible code candidates and then choosing the best-performing code candidate for each kernel. We apply ACE to the optimization of effective core potential integrals and gradients. It is observed that the best code candidate varies with differing angular momentum, floating point precision, and type of GPU being used, which shows that the ACE may be a powerful tool in adapting to fast evolving GPU architectures.

  8. On the use of graphics processing units (GPUs) for molecular dynamics simulation of spherical particles

    NASA Astrophysics Data System (ADS)

    Hidalgo, R. C.; Kanzaki, T.; Alonso-Marroquin, F.; Luding, S.

    2013-06-01

    General-purpose computation on Graphics Processing Units (GPU) on personal computers has recently become an attractive alternative to parallel computing on clusters and supercomputers. We present the GPU-implementation of an accurate molecular dynamics algorithm for a system of spheres. The new hybrid CPU-GPU implementation takes into account all the degrees of freedom, including the quaternion representation of 3D rotations. For additional versatility, the contact interaction between particles is defined using a force law of enhanced generality, which accounts for the elastic and dissipative interactions, and the hard-sphere interaction parameters are translated to the soft-sphere parameter set. We prove that the algorithm complies with the statistical mechanical laws by examining the homogeneous cooling of a granular gas with rotation. The results are in excellent agreement with well established mean-field theories for low-density hard sphere systems. This GPU technique dramatically reduces user waiting time, compared with a traditional CPU implementation.

  9. Real-time spatiotemporal division multiplexing electroholography with a single graphics processing unit utilizing movie features.

    PubMed

    Niwase, Hiroaki; Takada, Naoki; Araki, Hiromitsu; Nakayama, Hirotaka; Sugiyama, Atsushi; Kakue, Takashi; Shimobaba, Tomoyoshi; Ito, Tomoyoshi

    2014-11-17

    We propose a real-time spatiotemporal division multiplexing electroholography utilizing the features of movies. The proposed method spatially divides a 3-D object into plural parts and periodically selects a divided part in each frame, thereby reconstructing a three-dimensional (3-D) movie of the original object. Computer-generated holograms of the selected part are calculated by a single graphics processing unit and sequentially displayed on a spatial light modulator. Visual continuity enables a reconstructed movie of the original 3-D object. The proposed method realized a real-time reconstructed movie of a 3-D object composed of 11,646 points at over 30 frames per second (fps). We also displayed a reconstructed movie of a 3-D object composed of 44,647 points at about 10 fps.

  10. Performance improvements for iterative electron tomography reconstruction using graphics processing units (GPUs).

    PubMed

    Palenstijn, W J; Batenburg, K J; Sijbers, J

    2011-11-01

    Iterative reconstruction algorithms are becoming increasingly important in electron tomography of biological samples. These algorithms, however, impose major computational demands. Parallelization must be employed to maintain acceptable running times. Graphics Processing Units (GPUs) have been demonstrated to be highly cost-effective for carrying out these computations with a high degree of parallelism. In a recent paper by Xu et al. (2010), a GPU implementation strategy was presented that obtains a speedup of an order of magnitude over a previously proposed GPU-based electron tomography implementation. In this technical note, we demonstrate that by making alternative design decisions in the GPU implementation, an additional speedup can be obtained, again of an order of magnitude. By carefully considering memory access locality when dividing the workload among blocks of threads, the GPU's cache is used more efficiently, making more effective use of the available memory bandwidth. Copyright © 2011 Elsevier Inc. All rights reserved.

  11. Acceleration of the GAMESS-UK electronic structure package on graphical processing units.

    PubMed

    Wilkinson, Karl A; Sherwood, Paul; Guest, Martyn F; Naidoo, Kevin J

    2011-07-30

    The approach used to calculate the two-electron integral by many electronic structure packages including generalized atomic and molecular electronic structure system-UK has been designed for CPU-based compute units. We redesigned the two-electron compute algorithm for acceleration on a graphical processing unit (GPU). We report the acceleration strategy and illustrate it on the (ss|ss) type integrals. This strategy is general for Fortran-based codes and uses the Accelerator compiler from Portland Group International and GPU-based accelerators from Nvidia. The evaluation of (ss|ss) type integrals within calculations using Hartree Fock ab initio methods and density functional theory are accelerated by single and quad GPU hardware systems by factors of 43 and 153, respectively. The overall speedup for a single self consistent field cycle is at least a factor of eight times faster on a single GPU compared with that of a single CPU.

  12. FAST CALCULATION OF THE LOMB-SCARGLE PERIODOGRAM USING GRAPHICS PROCESSING UNITS

    SciTech Connect

    Townsend, R. H. D.

    2010-12-15

    I introduce a new code for fast calculation of the Lomb-Scargle periodogram that leverages the computing power of graphics processing units (GPUs). After establishing a background to the newly emergent field of GPU computing, I discuss the code design and narrate key parts of its source. Benchmarking calculations indicate no significant differences in accuracy compared to an equivalent CPU-based code. However, the differences in performance are pronounced; running on a low-end GPU, the code can match eight CPU cores, and on a high-end GPU it is faster by a factor approaching 30. Applications of the code include analysis of long photometric time series obtained by ongoing satellite missions and upcoming ground-based monitoring facilities, and Monte Carlo simulation of periodogram statistical properties.

  13. Efficient implementation of effective core potential integrals and gradients on graphical processing units

    NASA Astrophysics Data System (ADS)

    Song, Chenchen; Wang, Lee-Ping; Sachse, Torsten; Preiß, Julia; Presselt, Martin; Martínez, Todd J.

    2015-07-01

    Effective core potential integral and gradient evaluations are accelerated via implementation on graphical processing units (GPUs). Two simple formulas are proposed to estimate the upper bounds of the integrals, and these are used for screening. A sorting strategy is designed to balance the workload between GPU threads properly. Significant improvements in performance and reduced scaling with system size are observed when combining the screening and sorting methods, and the calculations are highly efficient for systems containing up to 10 000 basis functions. The GPU implementation preserves the precision of the calculation; the ground state Hartree-Fock energy achieves good accuracy for CdSe and ZnTe nanocrystals, and energy is well conserved in ab initio molecular dynamics simulations.

  14. Stochastic proximity embedding on graphics processing units: taking multidimensional scaling to a new scale.

    PubMed

    Yang, Eric; Liu, Pu; Rassokhin, Dimitrii N; Agrafiotis, Dimitris K

    2011-11-28

    Stochastic proximity embedding (SPE) was developed as a method for efficiently calculating lower dimensional embeddings of high-dimensional data sets. Rather than using a global minimization scheme, SPE relies upon updating the distances of randomly selected points in an iterative fashion. This was found to generate embeddings of comparable quality to those obtained using classical multidimensional scaling algorithms. However, SPE is able to obtain these results in O(n) rather than O(n²) time and thus is much better suited to large data sets. In an effort both to speed up SPE and utilize it for even larger problems, we have created a multithreaded implementation which takes advantage of the growing general computing power of graphics processing units (GPUs). The use of GPUs allows the embedding of data sets containing millions of data points in interactive time scales.

  15. ASAMgpu V1.0 - a moist fully compressible atmospheric model using graphics processing units (GPUs)

    NASA Astrophysics Data System (ADS)

    Horn, S.

    2012-03-01

    In this work the three dimensional compressible moist atmospheric model ASAMgpu is presented. The calculations are done using graphics processing units (GPUs). To ensure platform independence OpenGL and GLSL are used, with that the model runs on any hardware supporting fragment shaders. The MPICH2 library enables interprocess communication allowing the usage of more than one GPU through domain decomposition. Time integration is done with an explicit three step Runge-Kutta scheme with a time-splitting algorithm for the acoustic waves. The results for four test cases are shown in this paper. A rising dry heat bubble, a cold bubble induced density flow, a rising moist heat bubble in a saturated environment, and a DYCOMS-II case.

  16. ASAMgpu V1.0 - a moist fully compressible atmospheric model using graphics processing units (GPUs)

    NASA Astrophysics Data System (ADS)

    Horn, S.

    2011-10-01

    In this work the three dimensional compressible moist atmospheric model ASAMgpu is presented. The calculations are done using graphics processing units (GPUs). To ensure platform independence OpenGL and GLSL is used, with that the model runs on any hardware supporting fragment shaders. The MPICH2 library enables interprocess communication allowing the usage of more than one GPU through domain decomposition. Time integration is done with an explicit three step Runge-Kutta scheme with a timesplitting algorithm for the acoustic waves. The results for four test cases are shown in this paper. A rising dry heat bubble, a cold bubble induced density flow, a rising moist heat bubble in a saturated environment and a DYCOMS-II case.

  17. Using graphics processing units to accelerate perturbation Monte Carlo simulation in a turbid medium

    NASA Astrophysics Data System (ADS)

    Cai, Fuhong; He, Sailing

    2012-04-01

    We report a fast perturbation Monte Carlo (PMC) algorithm accelerated by graphics processing units (GPU). The two-step PMC simulation [Opt. Lett. 36, 2095 (2011)] is performed by storing the seeds instead of the photon's trajectory, and thus the requirement in computer random-access memory (RAM) becomes minimal. The two-step PMC is extremely suitable for implementation onto GPU. In a standard simulation of spatially-resolved photon migration in the turbid media, the acceleration ratio between using GPU and using conventional CPU is about 1000. Furthermore, since in the two-step PMC algorithm one records the effective seeds, which is associated to the photon that reaches a region of interest in this letter, and then re-run the MC simulation based on the recorded effective seeds, radiative transfer equation (RTE) can be solved by two-step PMC not only with an arbitrary change in the absorption coefficient, but also with large change in the scattering coefficient.

  18. Using graphics processing units to accelerate perturbation Monte Carlo simulation in a turbid medium.

    PubMed

    Cai, Fuhong; He, Sailing

    2012-04-01

    We report a fast perturbation Monte Carlo (PMC) algorithm accelerated by graphics processing units (GPU). The two-step PMC simulation [Opt. Lett. 36, 2095 (2011)] is performed by storing the seeds instead of the photon's trajectory, and thus the requirement in computer random-access memory (RAM) becomes minimal. The two-step PMC is extremely suitable for implementation onto GPU. In a standard simulation of spatially-resolved photon migration in the turbid media, the acceleration ratio between using GPU and using conventional CPU is about 1000. Furthermore, since in the two-step PMC algorithm one records the effective seeds, which is associated to the photon that reaches a region of interest in this letter, and then re-run the MC simulation based on the recorded effective seeds, radiative transfer equation (RTE) can be solved by two-step PMC not only with an arbitrary change in the absorption coefficient, but also with large change in the scattering coefficient.

  19. Using general-purpose computing on graphics processing units (GPGPU) to accelerate the ordinary kriging algorithm

    NASA Astrophysics Data System (ADS)

    Gutiérrez de Ravé, E.; Jiménez-Hornero, F. J.; Ariza-Villaverde, A. B.; Gómez-López, J. M.

    2014-03-01

    Spatial interpolation methods have been applied to many disciplines, the ordinary kriging interpolation being one of the methods most frequently used. However, kriging comprises a computational cost that scales as the cube of the number of data points. Therefore, one most pressing problems in geostatistical simulations is that of developing methods that can reduce the computational time. Weights calculation and then the estimate for each unknown point is the most time-consuming step in ordinary kriging. This work investigates the potential reduction in execution time by selecting the suitable operations involved in this step to be parallelized by using general-purpose computing on graphics processing units (GPGPU) and Compute Unified Device Architecture (CUDA). This study has been performed by taking into account comparative studies between graphic and central processing units on two different machines, a personal computer (GPU, GeForce 9500, and CPU, AMD Athlon X2 4600) and a server (GPU, Tesla C1060, and CPU, Xeon 5600). In addition, two data types (float and double) have been considered in the executions. The experimental results indicate that parallel implementation of matrix inverse by using GPGPU and CUDA will be enough to reduce the execution time of weights calculation and estimation for each unknown point and, as a result, the global performance time of ordinary kriging. In addition, suitable array dimensions for using the available parallelized code have been determined for each case. Thus, it is possible to obtain relevant saved times compared to those resulting from considering wider parallelized extension. This fact demonstrates the convenience of carrying out this kind of study in other interpolation calculation methodologies using matrices.

  20. Applying a visual language for image processing as a graphical teaching tool in medical imaging

    NASA Astrophysics Data System (ADS)

    Birchman, James J.; Tanimoto, Steven L.; Rowberg, Alan H.; Choi, Hyung-Sik; Kim, Yongmin

    1992-05-01

    Typical user interaction in image processing is with command line entries, pull-down menus, or text menu selections from a list, and as such is not generally graphical in nature. Although applying these interactive methods to construct more sophisticated algorithms from a series of simple image processing steps may be clear to engineers and programmers, it may not be clear to clinicians. A solution to this problem is to implement a visual programming language using visual representations to express image processing algorithms. Visual representations promote a more natural and rapid understanding of image processing algorithms by providing more visual insight into what the algorithms do than the interactive methods mentioned above can provide. Individuals accustomed to dealing with images will be more likely to understand an algorithm that is represented visually. This is especially true of referring physicians, such as surgeons in an intensive care unit. With the increasing acceptance of picture archiving and communications system (PACS) workstations and the trend toward increasing clinical use of image processing, referring physicians will need to learn more sophisticated concepts than simply image access and display. If the procedures that they perform commonly, such as window width and window level adjustment and image enhancement using unsharp masking, are depicted visually in an interactive environment, it will be easier for them to learn and apply these concepts. The software described in this paper is a visual programming language for imaging processing which has been implemented on the NeXT computer using NeXTstep user interface development tools and other tools in an object-oriented environment. The concept is based upon the description of a visual language titled `Visualization of Vision Algorithms' (VIVA). Iconic representations of simple image processing steps are placed into a workbench screen and connected together into a dataflow path by the user. As

  1. Real time 3D structural and Doppler OCT imaging on graphics processing units

    NASA Astrophysics Data System (ADS)

    Sylwestrzak, Marcin; Szlag, Daniel; Szkulmowski, Maciej; Gorczyńska, Iwona; Bukowska, Danuta; Wojtkowski, Maciej; Targowski, Piotr

    2013-03-01

    In this report the application of graphics processing unit (GPU) programming for real-time 3D Fourier domain Optical Coherence Tomography (FdOCT) imaging with implementation of Doppler algorithms for visualization of the flows in capillary vessels is presented. Generally, the time of the data processing of the FdOCT data on the main processor of the computer (CPU) constitute a main limitation for real-time imaging. Employing additional algorithms, such as Doppler OCT analysis, makes this processing even more time consuming. Lately developed GPUs, which offers a very high computational power, give a solution to this problem. Taking advantages of them for massively parallel data processing, allow for real-time imaging in FdOCT. The presented software for structural and Doppler OCT allow for the whole processing with visualization of 2D data consisting of 2000 A-scans generated from 2048 pixels spectra with frame rate about 120 fps. The 3D imaging in the same mode of the volume data build of 220 × 100 A-scans is performed at a rate of about 8 frames per second. In this paper a software architecture, organization of the threads and optimization applied is shown. For illustration the screen shots recorded during real time imaging of the phantom (homogeneous water solution of Intralipid in glass capillary) and the human eye in-vivo is presented.

  2. Real-time resampling in Fourier domain optical coherence tomography using a graphics processing unit.

    PubMed

    Van der Jeught, Sam; Bradu, Adrian; Podoleanu, Adrian Gh

    2010-01-01

    Fourier domain optical coherence tomography (FD-OCT) requires either a linear-in-wavenumber spectrometer or a computationally heavy software algorithm to recalibrate the acquired optical signal from wavelength to wavenumber. The first method is sensitive to the position of the prism in the spectrometer, while the second method drastically slows down the system speed when it is implemented on a serially oriented central processing unit. We implement the full resampling process on a commercial graphics processing unit (GPU), distributing the necessary calculations to many stream processors that operate in parallel. A comparison between several recalibration methods is made in terms of performance and image quality. The GPU is also used to accelerate the fast Fourier transform (FFT) and to remove the background noise, thereby achieving full GPU-based signal processing without the need for extra resampling hardware. A display rate of 25 framessec is achieved for processed images (1,024 x 1,024 pixels) using a line-scan charge-coupled device (CCD) camera operating at 25.6 kHz.

  3. Visual displays that directly interface and provide read-outs of molecular states via molecular graphics processing units.

    PubMed

    Poje, Julia E; Kastratovic, Tamara; Macdonald, Andrew R; Guillermo, Ana C; Troetti, Steven E; Jabado, Omar J; Fanning, M Leigh; Stefanovic, Darko; Macdonald, Joanne

    2014-08-25

    The monitoring of molecular systems usually requires sophisticated technologies to interpret nanoscale events into electronic-decipherable signals. We demonstrate a new method for obtaining read-outs of molecular states that uses graphics processing units made from molecular circuits. Because they are made from molecules, the units are able to directly interact with molecular systems. We developed deoxyribozyme-based graphics processing units able to monitor nucleic acids and output alphanumerical read-outs via a fluorescent display. Using this design we created a molecular 7-segment display, a molecular calculator able to add and multiply small numbers, and a molecular automaton able to diagnose Ebola and Marburg virus sequences. These molecular graphics processing units provide insight for the construction of autonomous biosensing devices, and are essential components for the development of molecular computing platforms devoid of electronics. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  4. [Influence of the recording interval and a graphic organizer on the writing process/product and on other psychological variables].

    PubMed

    García Sánchez, Jesús N; Rodríguez Pérez, Celestino

    2007-05-01

    An experimental study of the influence of the recording interval and a graphic organizer on the processes of writing composition and on the final product is presented. We studied 326 participants, age 10 to 16 years old, by means of a nested design. Two groups were compared: one group was aided in the writing process with a graphic organizer and the other was not. Each group was subdivided into two further groups: one with a mean recording interval of 45 seconds and the other with approximately 90 seconds recording interval in a writing log. The results showed that the group aided by a graphic organizer obtained better results both in processes and writing product, and that the groups assessed with an average interval of 45 seconds obtained worse results. Implications for educational practice are discussed, and limitations and future perspectives are commented on.

  5. Process industries - graphic arts, paint, plastics, and textiles: all cousins under the skin

    NASA Astrophysics Data System (ADS)

    Simon, Frederick T.

    2002-06-01

    The origin and selection of colors in the process industries is different depending upon how the creative process is applied and what are the capabilities of the manufacturing process. The fashion industry (clothing) with its supplier of textiles is the leader of color innovation. Color may be introduced into textile products at several stages in the manufacturing process from fiber through yarn and finally into fabric. The paint industry is divided into two major applications: automotive and trades sales. Automotive colors are selected by stylists who are in the employ of the automobile manufacturers. Trade sales paint on the other hand can be decided by paint manufactureres or by invididuals who patronize custom mixing facilities. Plastics colors are for the most part decided by the industrial designers who include color as part of the design. Graphic Arts (painting) is a burgeoning industry that uses color in image reproduction and package design. Except for text, printed material in color today has become the norm rather than an exception.

  6. Real-time speckle variance swept-source optical coherence tomography using a graphics processing unit

    PubMed Central

    Lee, Kenneth K. C.; Mariampillai, Adrian; Yu, Joe X. Z.; Cadotte, David W.; Wilson, Brian C.; Standish, Beau A.; Yang, Victor X. D.

    2012-01-01

    Abstract: Advances in swept source laser technology continues to increase the imaging speed of swept-source optical coherence tomography (SS-OCT) systems. These fast imaging speeds are ideal for microvascular detection schemes, such as speckle variance (SV), where interframe motion can cause severe imaging artifacts and loss of vascular contrast. However, full utilization of the laser scan speed has been hindered by the computationally intensive signal processing required by SS-OCT and SV calculations. Using a commercial graphics processing unit that has been optimized for parallel data processing, we report a complete high-speed SS-OCT platform capable of real-time data acquisition, processing, display, and saving at 108,000 lines per second. Subpixel image registration of structural images was performed in real-time prior to SV calculations in order to reduce decorrelation from stationary structures induced by the bulk tissue motion. The viability of the system was successfully demonstrated in a high bulk tissue motion scenario of human fingernail root imaging where SV images (512 × 512 pixels, n = 4) were displayed at 54 frames per second. PMID:22808428

  7. Real-time blood flow visualization using the graphics processing unit.

    PubMed

    Yang, Owen; Cuccia, David; Choi, Bernard

    2011-01-01

    Laser speckle imaging (LSI) is a technique in which coherent light incident on a surface produces a reflected speckle pattern that is related to the underlying movement of optical scatterers, such as red blood cells, indicating blood flow. Image-processing algorithms can be applied to produce speckle flow index (SFI) maps of relative blood flow. We present a novel algorithm that employs the NVIDIA Compute Unified Device Architecture (CUDA) platform to perform laser speckle image processing on the graphics processing unit. Software written in C was integrated with CUDA and integrated into a LabVIEW Virtual Instrument (VI) that is interfaced with a monochrome CCD camera able to acquire high-resolution raw speckle images at nearly 10 fps. With the CUDA code integrated into the LabVIEW VI, the processing and display of SFI images were performed also at ∼10 fps. We present three video examples depicting real-time flow imaging during a reactive hyperemia maneuver, with fluid flow through an in vitro phantom, and a demonstration of real-time LSI during laser surgery of a port wine stain birthmark.

  8. Real-time blood flow visualization using the graphics processing unit

    NASA Astrophysics Data System (ADS)

    Yang, Owen; Cuccia, David; Choi, Bernard

    2011-01-01

    Laser speckle imaging (LSI) is a technique in which coherent light incident on a surface produces a reflected speckle pattern that is related to the underlying movement of optical scatterers, such as red blood cells, indicating blood flow. Image-processing algorithms can be applied to produce speckle flow index (SFI) maps of relative blood flow. We present a novel algorithm that employs the NVIDIA Compute Unified Device Architecture (CUDA) platform to perform laser speckle image processing on the graphics processing unit. Software written in C was integrated with CUDA and integrated into a LabVIEW Virtual Instrument (VI) that is interfaced with a monochrome CCD camera able to acquire high-resolution raw speckle images at nearly 10 fps. With the CUDA code integrated into the LabVIEW VI, the processing and display of SFI images were performed also at ~10 fps. We present three video examples depicting real-time flow imaging during a reactive hyperemia maneuver, with fluid flow through an in vitro phantom, and a demonstration of real-time LSI during laser surgery of a port wine stain birthmark.

  9. Atmospheric process evaluation of mobile source emissions

    SciTech Connect

    1995-07-01

    During the past two decades there has been a considerable effort in the US to develop and introduce an alternative to the use of gasoline and conventional diesel fuel for transportation. The primary motives for this effort have been twofold: energy security and improvement in air quality, most notably ozone, or smog. The anticipated improvement in air quality is associated with a decrease in the atmospheric reactivity, and sometimes a decrease in the mass emission rate, of the organic gas and NO{sub x} emissions from alternative fuels when compared to conventional transportation fuels. Quantification of these air quality impacts is a prerequisite to decisions on adopting alternative fuels. The purpose of this report is to present a critical review of the procedures and data base used to assess the impact on ambient air quality of mobile source emissions from alternative and conventional transportation fuels and to make recommendations as to how this process can be improved. Alternative transportation fuels are defined as methanol, ethanol, CNG, LPG, and reformulated gasoline. Most of the discussion centers on light-duty AFVs operating on these fuels. Other advanced transportation technologies and fuels such as hydrogen, electric vehicles, and fuel cells, will not be discussed. However, the issues raised herein can also be applied to these technologies and other classes of vehicles, such as heavy-duty diesels (HDDs). An evaluation of the overall impact of AFVs on society requires consideration of a number of complex issues. It involves the development of new vehicle technology associated with engines, fuel systems, and emission control technology; the implementation of the necessary fuel infrastructure; and an appropriate understanding of the economic, health, safety, and environmental impacts associated with the use of these fuels. This report addresses the steps necessary to properly evaluate the impact of AFVs on ozone air quality.

  10. Adiabatic/nonadiabatic state-to-state reactive scattering dynamics implemented on graphics processing units.

    PubMed

    Zhang, Pei-Yu; Han, Ke-Li

    2013-09-12

    An efficient graphics processing units (GPUs) version of time-dependent wavepacket code is developed for the atom-diatom state-to-state reactive scattering processes. The propagation of the wavepacket is entirely calculated on GPUs employing the split-operator method after preparation of the initial wavepacket on the central processing unit (CPU). An additional split-operator method is introduced in the rotational part of the Hamiltonian to decrease communication of GPUs without losing accuracy of state-to-state information. The code is tested to calculate the differential cross sections of H + H2 reaction and state-resolved reaction probabilities of nonadiabatic triplet-singlet transitions of O((3)P,(1)D) + H2 for the total angular momentum J = 0. The global speedups of 22.11, 38.80, and 44.80 are found comparing the parallel computation of one GPU, two GPUs by exact rotational operator, and two GPU versions by an approximate rotational operator with serial computation of the CPU, respectively.

  11. Optical diagnostics of a single evaporating droplet using fast parallel computing on graphics processing units

    NASA Astrophysics Data System (ADS)

    Jakubczyk, D.; Migacz, S.; Derkachov, G.; Woźniak, M.; Archer, J.; Kolwas, K.

    2016-09-01

    We report on the first application of the graphics processing units (GPUs) accelerated computing technology to improve performance of numerical methods used for the optical characterization of evaporating microdroplets. Single microdroplets of various liquids with different volatility and molecular weight (glycerine, glycols, water, etc.), as well as mixtures of liquids and diverse suspensions evaporate inside the electrodynamic trap under the chosen temperature and composition of atmosphere. The series of scattering patterns recorded from the evaporating microdroplets are processed by fitting complete Mie theory predictions with gradientless lookup table method. We showed that computations on GPUs can be effectively applied to inverse scattering problems. In particular, our technique accelerated calculations of the Mie scattering theory on a single-core processor in a Matlab environment over 800 times and almost 100 times comparing to the corresponding code in C language. Additionally, we overcame problems of the time-consuming data post-processing when some of the parameters (particularly the refractive index) of an investigated liquid are uncertain. Our program allows us to track the parameters characterizing the evaporating droplet nearly simultaneously with the progress of evaporation.

  12. Parallel particle swarm optimization on a graphics processing unit with application to trajectory optimization

    NASA Astrophysics Data System (ADS)

    Wu, Q.; Xiong, F.; Wang, F.; Xiong, Y.

    2016-10-01

    In order to reduce the computational time, a fully parallel implementation of the particle swarm optimization (PSO) algorithm on a graphics processing unit (GPU) is presented. Instead of being executed on the central processing unit (CPU) sequentially, PSO is executed in parallel via the GPU on the compute unified device architecture (CUDA) platform. The processes of fitness evaluation, updating of velocity and position of all particles are all parallelized and introduced in detail. Comparative studies on the optimization of four benchmark functions and a trajectory optimization problem are conducted by running PSO on the GPU (GPU-PSO) and CPU (CPU-PSO). The impact of design dimension, number of particles and size of the thread-block in the GPU and their interactions on the computational time is investigated. The results show that the computational time of the developed GPU-PSO is much shorter than that of CPU-PSO, with comparable accuracy, which demonstrates the remarkable speed-up capability of GPU-PSO.

  13. Four-dimensional structural and Doppler optical coherence tomography imaging on graphics processing units.

    PubMed

    Sylwestrzak, Marcin; Szlag, Daniel; Szkulmowski, Maciej; Gorczynska, Iwona; Bukowska, Danuta; Wojtkowski, Maciej; Targowski, Piotr

    2012-10-01

    The authors present the application of graphics processing unit (GPU) programming for real-time three-dimensional (3-D) Fourier domain optical coherence tomography (FdOCT) imaging with implementation of flow visualization algorithms. One of the limitations of FdOCT is data processing time, which is generally longer than data acquisition time. Utilizing additional algorithms, such as Doppler analysis, further increases computation time. The general purpose computing on GPU (GPGPU) has been used successfully for structural OCT imaging, but real-time 3-D imaging of flows has so far not been presented. We have developed software for structural and Doppler OCT processing capable of visualization of two-dimensional (2-D) data (2000 A-scans, 2048 pixels per spectrum) with an image refresh rate higher than 120 Hz. The 3-D imaging of 100×100 A-scans data is performed at a rate of about 9 volumes per second. We describe the software architecture, organization of threads, and optimization. Screen shots recorded during real-time imaging of a flow phantom and the human eye are presented.

  14. Graphics processing unit-based dispersion encoded full-range frequency-domain optical coherence tomography.

    PubMed

    Wang, Ling; Hofer, Bernd; Guggenheim, Jeremy A; Povazay, Boris

    2012-07-01

    Dispersion encoded full-range (DEFR) frequency-domain optical coherence tomography (FD-OCT) and its enhanced version, fast DEFR, utilize dispersion mismatch between sample and reference arm to eliminate the ambiguity in OCT signals caused by non-complex valued spectral measurement, thereby numerically doubling the usable information content. By iteratively suppressing asymmetrically dispersed complex conjugate artifacts of OCT-signal pulses the complex valued signal can be recovered without additional measurements, thus doubling the spatial signal range to cover the full positive and negative sampling range. Previously the computational complexity and low processing speed limited application of DEFR to smaller amounts of data and did not allow for interactive operation at high resolution. We report a graphics processing unit (GPU)-based implementation of fast DEFR, which significantly improves reconstruction speed by a factor of more than 90 in respect to CPU-based processing and thereby overcomes these limitations. Implemented on a commercial low-cost GPU, a display line rate of ∼21,000 depth scans/s for 2048 samples/depth scan using 10 iterations of the fast DEFR algorithm has been achieved, sufficient for real-time visualization in situ.

  15. Performance and scalability of Fourier domain optical coherence tomography acceleration using graphics processing units.

    PubMed

    Li, Jian; Bloch, Pavel; Xu, Jing; Sarunic, Marinko V; Shannon, Lesley

    2011-05-01

    Fourier domain optical coherence tomography (FD-OCT) provides faster line rates, better resolution, and higher sensitivity for noninvasive, in vivo biomedical imaging compared to traditional time domain OCT (TD-OCT). However, because the signal processing for FD-OCT is computationally intensive, real-time FD-OCT applications demand powerful computing platforms to deliver acceptable performance. Graphics processing units (GPUs) have been used as coprocessors to accelerate FD-OCT by leveraging their relatively simple programming model to exploit thread-level parallelism. Unfortunately, GPUs do not "share" memory with their host processors, requiring additional data transfers between the GPU and CPU. In this paper, we implement a complete FD-OCT accelerator on a consumer grade GPU/CPU platform. Our data acquisition system uses spectrometer-based detection and a dual-arm interferometer topology with numerical dispersion compensation for retinal imaging. We demonstrate that the maximum line rate is dictated by the memory transfer time and not the processing time due to the GPU platform's memory model. Finally, we discuss how the performance trends of GPU-based accelerators compare to the expected future requirements of FD-OCT data rates.

  16. Graphics processing unit-based dispersion encoded full-range frequency-domain optical coherence tomography

    NASA Astrophysics Data System (ADS)

    Wang, Ling; Hofer, Bernd; Guggenheim, Jeremy A.; Považay, Boris

    2012-07-01

    Dispersion encoded full-range (DEFR) frequency-domain optical coherence tomography (FD-OCT) and its enhanced version, fast DEFR, utilize dispersion mismatch between sample and reference arm to eliminate the ambiguity in OCT signals caused by non-complex valued spectral measurement, thereby numerically doubling the usable information content. By iteratively suppressing asymmetrically dispersed complex conjugate artifacts of OCT-signal pulses the complex valued signal can be recovered without additional measurements, thus doubling the spatial signal range to cover the full positive and negative sampling range. Previously the computational complexity and low processing speed limited application of DEFR to smaller amounts of data and did not allow for interactive operation at high resolution. We report a graphics processing unit (GPU)-based implementation of fast DEFR, which significantly improves reconstruction speed by a factor of more than 90 in respect to CPU-based processing and thereby overcomes these limitations. Implemented on a commercial low-cost GPU, a display line rate of ~21,000 depth scans/s for 2048 samples/depth scan using 10 iterations of the fast DEFR algorithm has been achieved, sufficient for real-time visualization in situ.

  17. Developing extensible lattice-Boltzmann simulators for general-purpose graphics-processing units

    SciTech Connect

    Walsh, S C; Saar, M O

    2011-12-21

    Lattice-Boltzmann methods are versatile numerical modeling techniques capable of reproducing a wide variety of fluid-mechanical behavior. These methods are well suited to parallel implementation, particularly on the single-instruction multiple data (SIMD) parallel processing environments found in computer graphics processing units (GPUs). Although more recent programming tools dramatically improve the ease with which GPU programs can be written, the programming environment still lacks the flexibility available to more traditional CPU programs. In particular, it may be difficult to develop modular and extensible programs that require variable on-device functionality with current GPU architectures. This paper describes a process of automatic code generation that overcomes these difficulties for lattice-Boltzmann simulations. It details the development of GPU-based modules for an extensible lattice-Boltzmann simulation package - LBHydra. The performance of the automatically generated code is compared to equivalent purpose written codes for both single-phase, multiple-phase, and multiple-component flows. The flexibility of the new method is demonstrated by simulating a rising, dissolving droplet in a porous medium with user generated lattice-Boltzmann models and subroutines.

  18. Mobile Monitoring Data Processing and Analysis Strategies

    EPA Science Inventory

    The development of portable, high-time resolution instruments for measuring the concentrations of a variety of air pollutants has made it possible to collect data while in motion. This strategy, known as mobile monitoring, involves mounting air sensors on variety of different pla...

  19. Mobile Monitoring Data Processing & Analysis Strategies

    EPA Science Inventory

    The development of portable, high-time resolution instruments for measuring the concentrations of a variety of air pollutants has made it possible to collect data while in motion. This strategy, known as mobile monitoring, involves mounting air sensors on variety of different pla...

  20. Mobile Monitoring Data Processing & Analysis Strategies

    EPA Science Inventory

    The development of portable, high-time resolution instruments for measuring the concentrations of a variety of air pollutants has made it possible to collect data while in motion. This strategy, known as mobile monitoring, involves mounting air sensors on variety of different pla...

  1. Mobile Monitoring Data Processing and Analysis Strategies

    EPA Science Inventory

    The development of portable, high-time resolution instruments for measuring the concentrations of a variety of air pollutants has made it possible to collect data while in motion. This strategy, known as mobile monitoring, involves mounting air sensors on variety of different pla...

  2. A performance comparison of different graphics processing units running direct N-body simulations

    NASA Astrophysics Data System (ADS)

    Capuzzo-Dolcetta, R.; Spera, M.

    2013-11-01

    Hybrid computational architectures based on the joint power of Central Processing Units (CPUs) and Graphic Processing Units (GPUs) are becoming popular and powerful hardware tools for a wide range of simulations in biology, chemistry, engineering, physics, etc. In this paper we present a performance comparison of various GPUs available on market when applied to the numerical integration of the classic, gravitational, N-body problem. To do this, we developed an OpenCL version of the parallel code HiGPUs used for these tests, because this portable version is the only apt to work on GPUs of different makes. The main general result is that we confirm the reliability, speed and cheapness of GPUs when applied to the examined kind of problems (i.e. when the forces to evaluate are dependent on the mutual distances, as it happens in gravitational physics and molecular dynamics). More specifically, we find that also the cheap GPUs built to be employed just for gaming applications are very performant in terms of computing speed also in scientific applications and, although with some limitations concerning on-board memory, can be a good choice to build a cheap and efficient machine for scientific applications.

  3. Accelerated rescaling of single Monte Carlo simulation runs with the Graphics Processing Unit (GPU).

    PubMed

    Yang, Owen; Choi, Bernard

    2013-01-01

    To interpret fiber-based and camera-based measurements of remitted light from biological tissues, researchers typically use analytical models, such as the diffusion approximation to light transport theory, or stochastic models, such as Monte Carlo modeling. To achieve rapid (ideally real-time) measurement of tissue optical properties, especially in clinical situations, there is a critical need to accelerate Monte Carlo simulation runs. In this manuscript, we report on our approach using the Graphics Processing Unit (GPU) to accelerate rescaling of single Monte Carlo runs to calculate rapidly diffuse reflectance values for different sets of tissue optical properties. We selected MATLAB to enable non-specialists in C and CUDA-based programming to use the generated open-source code. We developed a software package with four abstraction layers. To calculate a set of diffuse reflectance values from a simulated tissue with homogeneous optical properties, our rescaling GPU-based approach achieves a reduction in computation time of several orders of magnitude as compared to other GPU-based approaches. Specifically, our GPU-based approach generated a diffuse reflectance value in 0.08ms. The transfer time from CPU to GPU memory currently is a limiting factor with GPU-based calculations. However, for calculation of multiple diffuse reflectance values, our GPU-based approach still can lead to processing that is ~3400 times faster than other GPU-based approaches.

  4. The feasibility of genome-scale biological network inference using Graphics Processing Units.

    PubMed

    Thiagarajan, Raghuram; Alavi, Amir; Podichetty, Jagdeep T; Bazil, Jason N; Beard, Daniel A

    2017-01-01

    Systems research spanning fields from biology to finance involves the identification of models to represent the underpinnings of complex systems. Formal approaches for data-driven identification of network interactions include statistical inference-based approaches and methods to identify dynamical systems models that are capable of fitting multivariate data. Availability of large data sets and so-called 'big data' applications in biology present great opportunities as well as major challenges for systems identification/reverse engineering applications. For example, both inverse identification and forward simulations of genome-scale gene regulatory network models pose compute-intensive problems. This issue is addressed here by combining the processing power of Graphics Processing Units (GPUs) and a parallel reverse engineering algorithm for inference of regulatory networks. It is shown that, given an appropriate data set, information on genome-scale networks (systems of 1000 or more state variables) can be inferred using a reverse-engineering algorithm in a matter of days on a small-scale modern GPU cluster.

  5. Graphics Processing Unit (GPU) Acceleration of the Goddard Earth Observing System Atmospheric Model

    NASA Technical Reports Server (NTRS)

    Putnam, Williama

    2011-01-01

    The Goddard Earth Observing System 5 (GEOS-5) is the atmospheric model used by the Global Modeling and Assimilation Office (GMAO) for a variety of applications, from long-term climate prediction at relatively coarse resolution, to data assimilation and numerical weather prediction, to very high-resolution cloud-resolving simulations. GEOS-5 is being ported to a graphics processing unit (GPU) cluster at the NASA Center for Climate Simulation (NCCS). By utilizing GPU co-processor technology, we expect to increase the throughput of GEOS-5 by at least an order of magnitude, and accelerate the process of scientific exploration across all scales of global modeling, including: The large-scale, high-end application of non-hydrostatic, global, cloud-resolving modeling at 10- to I-kilometer (km) global resolutions Intermediate-resolution seasonal climate and weather prediction at 50- to 25-km on small clusters of GPUs Long-range, coarse-resolution climate modeling, enabled on a small box of GPUs for the individual researcher After being ported to the GPU cluster, the primary physics components and the dynamical core of GEOS-5 have demonstrated a potential speedup of 15-40 times over conventional processor cores. Performance improvements of this magnitude reduce the required scalability of 1-km, global, cloud-resolving models from an unfathomable 6 million cores to an attainable 200,000 GPU-enabled cores.

  6. Spatial resolution recovery utilizing multi-ray tracing and graphic processing unit in PET image reconstruction.

    PubMed

    Liang, Yicheng; Peng, Hao

    2015-02-07

    Depth-of-interaction (DOI) poses a major challenge for a PET system to achieve uniform spatial resolution across the field-of-view, particularly for small animal and organ-dedicated PET systems. In this work, we implemented an analytical method to model system matrix for resolution recovery, which was then incorporated in PET image reconstruction on a graphical processing unit platform, due to its parallel processing capacity. The method utilizes the concepts of virtual DOI layers and multi-ray tracing to calculate the coincidence detection response function for a given line-of-response. The accuracy of the proposed method was validated for a small-bore PET insert to be used for simultaneous PET/MR breast imaging. In addition, the performance comparisons were studied among the following three cases: 1) no physical DOI and no resolution modeling; 2) two physical DOI layers and no resolution modeling; and 3) no physical DOI design but with a different number of virtual DOI layers. The image quality was quantitatively evaluated in terms of spatial resolution (full-width-half-maximum and position offset), contrast recovery coefficient and noise. The results indicate that the proposed method has the potential to be used as an alternative to other physical DOI designs and achieve comparable imaging performances, while reducing detector/system design cost and complexity.

  7. Density-fitted singles and doubles coupled cluster on graphics processing units

    NASA Astrophysics Data System (ADS)

    DePrince, , A. Eugene, III; Kennedy, Matthew R.; Sumpter, Bobby G.; Sherrill, C. David

    2014-03-01

    We adapt an algorithm for singles and doubles coupled cluster (CCSD) that uses density fitting or Cholesky decomposition (CD) in the construction and contraction of all electron repulsion integrals (ERIs) for use on heterogeneous compute nodes consisting of a multicore central processing unit (CPU) and at least one graphics processing unit (GPU). The use of approximate three-index ERIs ameliorates two of the major difficulties in designing scientific algorithms for GPUs: (1) the extremely limited global memory on the devices and (2) the overhead associated with data motion across the bus. For the benzene trimer described by an aug-cc-pVDZ basis set, the use of a single NVIDIA Tesla C2070 (Fermi) GPU accelerates a CD-CCSD computation by a factor of 2.1, relative to the multicore CPU-only algorithm that uses six highly efficient Intel Core i7-3930K CPU cores. The use of two Fermi GPUs provides an acceleration of 2.89, which is comparable to that observed when using a single NVIDIA Kepler K20c GPU (2.73).

  8. Fast ray-tracing of human eye optics on Graphics Processing Units.

    PubMed

    Wei, Qi; Patkar, Saket; Pai, Dinesh K

    2014-05-01

    We present a new technique for simulating retinal image formation by tracing a large number of rays from objects in three dimensions as they pass through the optic apparatus of the eye to objects. Simulating human optics is useful for understanding basic questions of vision science and for studying vision defects and their corrections. Because of the complexity of computing such simulations accurately, most previous efforts used simplified analytical models of the normal eye. This makes them less effective in modeling vision disorders associated with abnormal shapes of the ocular structures which are hard to be precisely represented by analytical surfaces. We have developed a computer simulator that can simulate ocular structures of arbitrary shapes, for instance represented by polygon meshes. Topographic and geometric measurements of the cornea, lens, and retina from keratometer or medical imaging data can be integrated for individualized examination. We utilize parallel processing using modern Graphics Processing Units (GPUs) to efficiently compute retinal images by tracing millions of rays. A stable retinal image can be generated within minutes. We simulated depth-of-field, accommodation, chromatic aberrations, as well as astigmatism and correction. We also show application of the technique in patient specific vision correction by incorporating geometric models of the orbit reconstructed from clinical medical images. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  9. Simulating 3-D lung dynamics using a programmable graphics processing unit.

    PubMed

    Santhanam, Anand P; Hamza-Lup, Felix G; Rolland, Jannick P

    2007-09-01

    Medical simulations of lung dynamics promise to be effective tools for teaching and training clinical and surgical procedures related to lungs. Their effectiveness may be greatly enhanced when visualized in an augmented reality (AR) environment. However, the computational requirements of AR environments limit the availability of the central processing unit (CPU) for the lung dynamics simulation for different breathing conditions. In this paper, we present a method for computing lung deformations in real time by taking advantage of the programmable graphics processing unit (GPU). This will save the CPU time for other AR-associated tasks such as tracking, communication, and interaction management. An approach for the simulations of the three-dimensional (3-D) lung dynamics using Green's formulation in the case of upright position is taken into consideration. We extend this approach to other orientations as well as the subsequent changes in breathing. Specifically, the proposed extension presents a computational optimization and its implementation in a GPU. Results show that the computational requirements for simulating the deformation of a 3-D lung model are significantly reduced for point-based rendering.

  10. High-Throughput Characterization of Porous Materials Using Graphics Processing Units.

    PubMed

    Kim, Jihan; Martin, Richard L; Rübel, Oliver; Haranczyk, Maciej; Smit, Berend

    2012-05-08

    We have developed a high-throughput graphics processing unit (GPU) code that can characterize a large database of crystalline porous materials. In our algorithm, the GPU is utilized to accelerate energy grid calculations, where the grid values represent interactions (i.e., Lennard-Jones + Coulomb potentials) between gas molecules (i.e., CH4 and CO2) and materials' framework atoms. Using a parallel flood fill central processing unit (CPU) algorithm, inaccessible regions inside the framework structures are identified and blocked, based on their energy profiles. Finally, we compute the Henry coefficients and heats of adsorption through statistical Widom insertion Monte Carlo moves in the domain restricted to the accessible space. The code offers significant speedup over a single core CPU code and allows us to characterize a set of porous materials at least an order of magnitude larger than those considered in earlier studies. For structures selected from such a prescreening algorithm, full adsorption isotherms can be calculated by conducting multiple Grand Canonical Monte Carlo (GCMC) simulations concurrently within the GPU.

  11. Fast Monte Carlo simulations of ultrasound-modulated light using a graphics processing unit.

    PubMed

    Leung, Terence S; Powell, Samuel

    2010-01-01

    Ultrasound-modulated optical tomography (UOT) is based on "tagging" light in turbid media with focused ultrasound. In comparison to diffuse optical imaging, UOT can potentially offer a better spatial resolution. The existing Monte Carlo (MC) model for simulating ultrasound-modulated light is central processing unit (CPU) based and has been employed in several UOT related studies. We reimplemented the MC model with a graphics processing unit [(GPU), Nvidia GeForce 9800] that can execute the algorithm up to 125 times faster than its CPU (Intel Core Quad) counterpart for a particular set of optical and acoustic parameters. We also show that the incorporation of ultrasound propagation in photon migration modeling increases the computational time considerably, by a factor of at least 6, in one case, even with a GPU. With slight adjustment to the code, MC simulations were also performed to demonstrate the effect of ultrasonic modulation on the speckle pattern generated by the light model (available as animation). This was computed in 4 s with our GPU implementation as compared to 290 s using the CPU.

  12. Real-time lossy compression of hyperspectral images using iterative error analysis on graphics processing units

    NASA Astrophysics Data System (ADS)

    Sánchez, Sergio; Plaza, Antonio

    2012-06-01

    Hyperspectral image compression is an important task in remotely sensed Earth Observation as the dimensionality of this kind of image data is ever increasing. This requires on-board compression in order to optimize the donwlink connection when sending the data to Earth. A successful algorithm to perform lossy compression of remotely sensed hyperspectral data is the iterative error analysis (IEA) algorithm, which applies an iterative process which allows controlling the amount of information loss and compression ratio depending on the number of iterations. This algorithm, which is based on spectral unmixing concepts, can be computationally expensive for hyperspectral images with high dimensionality. In this paper, we develop a new parallel implementation of the IEA algorithm for hyperspectral image compression on graphics processing units (GPUs). The proposed implementation is tested on several different GPUs from NVidia, and is shown to exhibit real-time performance in the analysis of an Airborne Visible Infra-Red Imaging Spectrometer (AVIRIS) data sets collected over different locations. The proposed algorithm and its parallel GPU implementation represent a significant advance towards real-time onboard (lossy) compression of hyperspectral data where the quality of the compression can be also adjusted in real-time.

  13. Acceleration of iterative Navier-Stokes solvers on graphics processing units

    NASA Astrophysics Data System (ADS)

    Tomczak, Tadeusz; Zadarnowska, Katarzyna; Koza, Zbigniew; Matyka, Maciej; Mirosław, Łukasz

    2013-04-01

    While new power-efficient computer architectures exhibit spectacular theoretical peak performance, they require specific conditions to operate efficiently, which makes porting complex algorithms a challenge. Here, we report results of the semi-implicit method for pressure linked equations (SIMPLE) and the pressure implicit with operator splitting (PISO) methods implemented on the graphics processing unit (GPU). We examine the advantages and disadvantages of the full porting over a partial acceleration of these algorithms run on unstructured meshes. We found that the full-port strategy requires adjusting the internal data structures to the new hardware and proposed a convenient format for storing internal data structures on GPUs. Our implementation is validated on standard steady and unsteady problems and its computational efficiency is checked by comparing its results and run times with those of some standard software (OpenFOAM) run on central processing unit (CPU). The results show that a server-class GPU outperforms a server-class dual-socket multi-core CPU system running essentially the same algorithm by up to a factor of 4.

  14. A full graphics processing unit implementation of uncertainty-aware drainage basin delineation

    NASA Astrophysics Data System (ADS)

    Eränen, David; Oksanen, Juha; Westerholm, Jan; Sarjakoski, Tapani

    2014-12-01

    Terrain analysis based on modern, high-resolution Digital Elevation Models (DEMs) has become quite time consuming because of the large amounts of data involved. Additionally, when the propagation of uncertainties during the analysis process is investigated using the Monte Carlo method, the run time of the algorithm can increase by a factor of between 100 and 1000, depending on the desired accuracy of the result. This increase in run time constitutes a large barrier when we expect the use of uncertainty-aware terrain analysis become more general. In this paper, we evaluate the use of Graphics Processing Units (GPUs) in uncertainty-aware drainage basin delineation. All computations are run on a GPU, including the creation of the realization of a stationary DEM uncertainty model, stream burning, pit filling, flow direction calculation, and the actual delineation of the drainage basins. On average, our GPU version is approximately 11 times faster than a sequential, one-core CPU version performing the same task.

  15. Spatial resolution recovery utilizing multi-ray tracing and graphic processing unit in PET image reconstruction

    NASA Astrophysics Data System (ADS)

    Liang, Yicheng; Peng, Hao

    2015-02-01

    Depth-of-interaction (DOI) poses a major challenge for a PET system to achieve uniform spatial resolution across the field-of-view, particularly for small animal and organ-dedicated PET systems. In this work, we implemented an analytical method to model system matrix for resolution recovery, which was then incorporated in PET image reconstruction on a graphical processing unit platform, due to its parallel processing capacity. The method utilizes the concepts of virtual DOI layers and multi-ray tracing to calculate the coincidence detection response function for a given line-of-response. The accuracy of the proposed method was validated for a small-bore PET insert to be used for simultaneous PET/MR breast imaging. In addition, the performance comparisons were studied among the following three cases: 1) no physical DOI and no resolution modeling; 2) two physical DOI layers and no resolution modeling; and 3) no physical DOI design but with a different number of virtual DOI layers. The image quality was quantitatively evaluated in terms of spatial resolution (full-width-half-maximum and position offset), contrast recovery coefficient and noise. The results indicate that the proposed method has the potential to be used as an alternative to other physical DOI designs and achieve comparable imaging performances, while reducing detector/system design cost and complexity.

  16. VACTIV: A graphical dialog based program for an automatic processing of line and band spectra

    NASA Astrophysics Data System (ADS)

    Zlokazov, V. B.

    2013-05-01

    The program VACTIV-Visual ACTIV-has been developed for an automatic analysis of spectrum-like distributions, in particular gamma-ray spectra or alpha-spectra and is a standard graphical dialog based Windows XX application, driven by a menu, mouse and keyboard. On the one hand, it was a conversion of an existing Fortran program ACTIV [1] to the DELPHI language; on the other hand, it is a transformation of the sequential syntax of Fortran programming to a new object-oriented style, based on the organization of event interactions. New features implemented in the algorithms of both the versions consisted in the following as peak model both an analytical function and a graphical curve could be used; the peak search algorithm was able to recognize not only Gauss peaks but also peaks with an irregular form; both narrow peaks (2-4 channels) and broad ones (50-100 channels); the regularization technique in the fitting guaranteed a stable solution in the most complicated cases of strongly overlapping or weak peaks. The graphical dialog interface of VACTIV is much more convenient than the batch mode of ACTIV. [1] V.B. Zlokazov, Computer Physics Communications, 28 (1982) 27-37. NEW VERSION PROGRAM SUMMARYProgram Title: VACTIV Catalogue identifier: ABAC_v2_0 Licensing provisions: no Programming language: DELPHI 5-7 Pascal. Computer: IBM PC series. Operating system: Windows XX. RAM: 1 MB Keywords: Nuclear physics, spectrum decomposition, least squares analysis, graphical dialog, object-oriented programming. Classification: 17.6. Catalogue identifier of previous version: ABAC_v1_0 Journal reference of previous version: Comput. Phys. Commun. 28 (1982) 27 Does the new version supersede the previous version?: Yes. Nature of problem: Program VACTIV is intended for precise analysis of arbitrary spectrum-like distributions, e.g. gamma-ray and X-ray spectra and allows the user to carry out the full cycle of automatic processing of such spectra, i.e. calibration, automatic peak search

  17. Graphical Technique to Support the Teaching/Learning Process of Software Process Reference Models

    NASA Astrophysics Data System (ADS)

    Espinosa-Curiel, Ismael Edrein; Rodríguez-Jacobo, Josefina; Fernández-Zepeda, José Alberto

    In this paper, we propose a set of diagrams to visualize software process reference models (PRM). The diagrams, called dimods, are the combination of some visual and process modeling techniques such as rich pictures, mind maps, IDEF and RAD diagrams. We show the use of this technique by designing a set of dimods for the Mexican Software Industry Process Model (MoProSoft). Additionally, we perform an evaluation of the usefulness of dimods. The result of the evaluation shows that dimods may be a support tool that facilitates the understanding, memorization, and learning of software PRMs in both, software development organizations and universities. The results also show that dimods may have advantages over the traditional description methods for these types of models.

  18. On the Social Psychology of Social Mobility Processes.

    ERIC Educational Resources Information Center

    Kerckhoff, Alan C.

    1989-01-01

    Discusses two types of research--the "new structuralism" approach and "work and personality" studies--on the occupational attainment aspect of social mobility. Suggests that a life course approach to social mobility processes may provide a basis for integrating the structural and social psychological perspectives. Contains 25…

  19. Mobile Ultrasound Plane Wave Beamforming on iPhone or iPad using Metal- based GPU Processing

    NASA Astrophysics Data System (ADS)

    Hewener, Holger J.; Tretbar, Steffen H.

    Mobile and cost effective ultrasound devices are being used in point of care scenarios or the drama room. To reduce the costs of such devices we already presented the possibilities of consumer devices like the Apple iPad for full signal processing of raw data for ultrasound image generation. Using technologies like plane wave imaging to generate a full image with only one excitation/reception event the acquisition times and power consumption of ultrasound imaging can be reduced for low power mobile devices based on consumer electronics realizing the transition from FPGA or ASIC based beamforming into more flexible software beamforming. The massive parallel beamforming processing can be done with the Apple framework "Metal" for advanced graphics and general purpose GPU processing for the iOS platform. We were able to integrate the beamforming reconstruction into our mobile ultrasound processing application with imaging rates up to 70 Hz on iPad Air 2 hardware.

  20. Density functional theory calculation on many-cores hybrid central processing unit-graphic processing unit architectures.

    PubMed

    Genovese, Luigi; Ospici, Matthieu; Deutsch, Thierry; Méhaut, Jean-François; Neelov, Alexey; Goedecker, Stefan

    2009-07-21

    We present the implementation of a full electronic structure calculation code on a hybrid parallel architecture with graphic processing units (GPUs). This implementation is performed on a free software code based on Daubechies wavelets. Such code shows very good performances, systematic convergence properties, and an excellent efficiency on parallel computers. Our GPU-based acceleration fully preserves all these properties. In particular, the code is able to run on many cores which may or may not have a GPU associated, and thus on parallel and massive parallel hybrid machines. With double precision calculations, we may achieve considerable speedup, between a factor of 20 for some operations and a factor of 6 for the whole density functional theory code.

  1. Large-scale analytical Fourier transform of photomask layouts using graphics processing units

    NASA Astrophysics Data System (ADS)

    Sakamoto, Julia A.

    2015-10-01

    Compensation of lens-heating effects during the exposure scan in an optical lithographic system requires knowledge of the heating profile in the pupil of the projection lens. A necessary component in the accurate estimation of this profile is the total integrated distribution of light, relying on the squared modulus of the Fourier transform (FT) of the photomask layout for individual process layers. Requiring a layout representation in pixelated image format, the most common approach is to compute the FT numerically via the fast Fourier transform (FFT). However, the file size for a standard 26- mm×33-mm mask with 5-nm pixels is an overwhelming 137 TB in single precision; the data importing process alone, prior to FFT computation, can render this method highly impractical. A more feasible solution is to handle layout data in a highly compact format with vertex locations of mask features (polygons), which correspond to elements in an integrated circuit, as well as pattern symmetries and repetitions (e.g., GDSII format). Provided the polygons can decompose into shapes for which analytical FT expressions are possible, the analytical approach dramatically reduces computation time and alleviates the burden of importing extensive mask data. Algorithms have been developed for importing and interpreting hierarchical layout data and computing the analytical FT on a graphics processing unit (GPU) for rapid parallel processing, not assuming incoherent imaging. Testing was performed on the active layer of a 392- μm×297-μm virtual chip test structure with 43 substructures distributed over six hierarchical levels. The factor of improvement in the analytical versus numerical approach for importing layout data, performing CPU-GPU memory transfers, and executing the FT on a single NVIDIA Tesla K20X GPU was 1.6×104, 4.9×103, and 3.8×103, respectively. Various ideas for algorithm enhancements will be discussed.

  2. Accelerating resolution-of-the-identity second-order Møller-Plesset quantum chemistry calculations with graphical processing units.

    PubMed

    Vogt, Leslie; Olivares-Amaya, Roberto; Kermes, Sean; Shao, Yihan; Amador-Bedolla, Carlos; Aspuru-Guzik, Alan

    2008-03-13

    The modification of a general purpose code for quantum mechanical calculations of molecular properties (Q-Chem) to use a graphical processing unit (GPU) is reported. A 4.3x speedup of the resolution-of-the-identity second-order Møller-Plesset perturbation theory (RI-MP2) execution time is observed in single point energy calculations of linear alkanes. The code modification is accomplished using the compute unified basic linear algebra subprograms (CUBLAS) library for an NVIDIA Quadro FX 5600 graphics card. Furthermore, speedups of other matrix algebra based electronic structure calculations are anticipated as a result of using a similar approach.

  3. Image Processing for Multiple-Target Tracking on a Graphics Processing Unit

    DTIC Science & Technology

    2009-03-01

    software. A MTT system was developed in MATLAB to provide baseline performance metrics for processing 24-bit, 1920× 1080 color video footage filmed at...processor cores. These cores are capable of executing one warp (or set of 32 threads) at a time. The NVIDIA GTX 280 has 30 SIMT miltiprocessor cores, which...footage is shot at 30 frames per second, 1920× 1080 pixels, and uses progressive scan (entire image is updated each frame). The experiments use a

  4. Interactive Computing and Graphics in Undergraduate Digital Signal Processing. Microcomputing Working Paper Series F 84-9.

    ERIC Educational Resources Information Center

    Onaral, Banu; And Others

    This report describes the development of a Drexel University electrical and computer engineering course on digital filter design that used interactive computing and graphics, and was one of three courses in a senior-level sequence on digital signal processing (DSP). Interactive and digital analysis/design routines and the interconnection of these…

  5. Interactive Computing and Graphics in Undergraduate Digital Signal Processing. Microcomputing Working Paper Series F 84-9.

    ERIC Educational Resources Information Center

    Onaral, Banu; And Others

    This report describes the development of a Drexel University electrical and computer engineering course on digital filter design that used interactive computing and graphics, and was one of three courses in a senior-level sequence on digital signal processing (DSP). Interactive and digital analysis/design routines and the interconnection of these…

  6. Accounting for Students' Schemes in the Development of a Graphical Process for Solving Polynomial Inequalities in Instrumented Activity

    ERIC Educational Resources Information Center

    Rivera, Ferdinand D.

    2007-01-01

    This paper provides an instrumental account of precalculus students' graphical process for solving polynomial inequalities. It is carried out in terms of the students' instrumental schemes as mediated by handheld graphing calculators and in cooperation with their classmates in a classroom setting. The ethnographic narrative relays an instrumental…

  7. Fast, multi-channel real-time processing of signals with microsecond latency using graphics processing units

    NASA Astrophysics Data System (ADS)

    Rath, N.; Kato, S.; Levesque, J. P.; Mauel, M. E.; Navratil, G. A.; Peng, Q.

    2014-04-01

    Fast, digital signal processing (DSP) has many applications. Typical hardware options for performing DSP are field-programmable gate arrays (FPGAs), application-specific integrated DSP chips, or general purpose personal computer systems. This paper presents a novel DSP platform that has been developed for feedback control on the HBT-EP tokamak device. The system runs all signal processing exclusively on a Graphics Processing Unit (GPU) to achieve real-time performance with latencies below 8 μs. Signals are transferred into and out of the GPU using PCI Express peer-to-peer direct-memory-access transfers without involvement of the central processing unit or host memory. Tests were performed on the feedback control system of the HBT-EP tokamak using forty 16-bit floating point inputs and outputs each and a sampling rate of up to 250 kHz. Signals were digitized by a D-TACQ ACQ196 module, processing done on an NVIDIA GTX 580 GPU programmed in CUDA, and analog output was generated by D-TACQ AO32CPCI modules.

  8. Fast, multi-channel real-time processing of signals with microsecond latency using graphics processing units

    SciTech Connect

    Rath, N. Levesque, J. P.; Mauel, M. E.; Navratil, G. A.; Peng, Q.; Kato, S.

    2014-04-15

    Fast, digital signal processing (DSP) has many applications. Typical hardware options for performing DSP are field-programmable gate arrays (FPGAs), application-specific integrated DSP chips, or general purpose personal computer systems. This paper presents a novel DSP platform that has been developed for feedback control on the HBT-EP tokamak device. The system runs all signal processing exclusively on a Graphics Processing Unit (GPU) to achieve real-time performance with latencies below 8 μs. Signals are transferred into and out of the GPU using PCI Express peer-to-peer direct-memory-access transfers without involvement of the central processing unit or host memory. Tests were performed on the feedback control system of the HBT-EP tokamak using forty 16-bit floating point inputs and outputs each and a sampling rate of up to 250 kHz. Signals were digitized by a D-TACQ ACQ196 module, processing done on an NVIDIA GTX 580 GPU programmed in CUDA, and analog output was generated by D-TACQ AO32CPCI modules.

  9. Fast, multi-channel real-time processing of signals with microsecond latency using graphics processing units.

    PubMed

    Rath, N; Kato, S; Levesque, J P; Mauel, M E; Navratil, G A; Peng, Q

    2014-04-01

    Fast, digital signal processing (DSP) has many applications. Typical hardware options for performing DSP are field-programmable gate arrays (FPGAs), application-specific integrated DSP chips, or general purpose personal computer systems. This paper presents a novel DSP platform that has been developed for feedback control on the HBT-EP tokamak device. The system runs all signal processing exclusively on a Graphics Processing Unit (GPU) to achieve real-time performance with latencies below 8 μs. Signals are transferred into and out of the GPU using PCI Express peer-to-peer direct-memory-access transfers without involvement of the central processing unit or host memory. Tests were performed on the feedback control system of the HBT-EP tokamak using forty 16-bit floating point inputs and outputs each and a sampling rate of up to 250 kHz. Signals were digitized by a D-TACQ ACQ196 module, processing done on an NVIDIA GTX 580 GPU programmed in CUDA, and analog output was generated by D-TACQ AO32CPCI modules.

  10. Graphical CONOPS Prototype to Demonstrate Emerging Methods, Processes, and Tools at ARDEC

    DTIC Science & Technology

    2013-07-17

    continue the investigation of graphical 3D gaming environments in the construction of a shared mental model during concept development. A result of the...using a game development environment. This task was always envisioned as part of a larger CONOPS research agenda. As research progressed, the potential...research is a continuation of research initiated in August 2011. The goal of the research was to continue the investigation of graphical 3D gaming

  11. TMSEEG: A MATLAB-Based Graphical User Interface for Processing Electrophysiological Signals during Transcranial Magnetic Stimulation.

    PubMed

    Atluri, Sravya; Frehlich, Matthew; Mei, Ye; Garcia Dominguez, Luis; Rogasch, Nigel C; Wong, Willy; Daskalakis, Zafiris J; Farzan, Faranak

    2016-01-01

    Concurrent recording of electroencephalography (EEG) during transcranial magnetic stimulation (TMS) is an emerging and powerful tool for studying brain health and function. Despite a growing interest in adaptation of TMS-EEG across neuroscience disciplines, its widespread utility is limited by signal processing challenges. These challenges arise due to the nature of TMS and the sensitivity of EEG to artifacts that often mask TMS-evoked potentials (TEP)s. With an increase in the complexity of data processing methods and a growing interest in multi-site data integration, analysis of TMS-EEG data requires the development of a standardized method to recover TEPs from various sources of artifacts. This article introduces TMSEEG, an open-source MATLAB application comprised of multiple algorithms organized to facilitate a step-by-step procedure for TMS-EEG signal processing. Using a modular design and interactive graphical user interface (GUI), this toolbox aims to streamline TMS-EEG signal processing for both novice and experienced users. Specifically, TMSEEG provides: (i) targeted removal of TMS-induced and general EEG artifacts; (ii) a step-by-step modular workflow with flexibility to modify existing algorithms and add customized algorithms; (iii) a comprehensive display and quantification of artifacts; (iv) quality control check points with visual feedback of TEPs throughout the data processing workflow; and (v) capability to label and store a database of artifacts. In addition to these features, the software architecture of TMSEEG ensures minimal user effort in initial setup and configuration of parameters for each processing step. This is partly accomplished through a close integration with EEGLAB, a widely used open-source toolbox for EEG signal processing. In this article, we introduce TMSEEG, validate its features and demonstrate its application in extracting TEPs across several single- and multi-pulse TMS protocols. As the first open-source GUI-based pipeline

  12. TMSEEG: A MATLAB-Based Graphical User Interface for Processing Electrophysiological Signals during Transcranial Magnetic Stimulation

    PubMed Central

    Atluri, Sravya; Frehlich, Matthew; Mei, Ye; Garcia Dominguez, Luis; Rogasch, Nigel C.; Wong, Willy; Daskalakis, Zafiris J.; Farzan, Faranak

    2016-01-01

    Concurrent recording of electroencephalography (EEG) during transcranial magnetic stimulation (TMS) is an emerging and powerful tool for studying brain health and function. Despite a growing interest in adaptation of TMS-EEG across neuroscience disciplines, its widespread utility is limited by signal processing challenges. These challenges arise due to the nature of TMS and the sensitivity of EEG to artifacts that often mask TMS-evoked potentials (TEP)s. With an increase in the complexity of data processing methods and a growing interest in multi-site data integration, analysis of TMS-EEG data requires the development of a standardized method to recover TEPs from various sources of artifacts. This article introduces TMSEEG, an open-source MATLAB application comprised of multiple algorithms organized to facilitate a step-by-step procedure for TMS-EEG signal processing. Using a modular design and interactive graphical user interface (GUI), this toolbox aims to streamline TMS-EEG signal processing for both novice and experienced users. Specifically, TMSEEG provides: (i) targeted removal of TMS-induced and general EEG artifacts; (ii) a step-by-step modular workflow with flexibility to modify existing algorithms and add customized algorithms; (iii) a comprehensive display and quantification of artifacts; (iv) quality control check points with visual feedback of TEPs throughout the data processing workflow; and (v) capability to label and store a database of artifacts. In addition to these features, the software architecture of TMSEEG ensures minimal user effort in initial setup and configuration of parameters for each processing step. This is partly accomplished through a close integration with EEGLAB, a widely used open-source toolbox for EEG signal processing. In this article, we introduce TMSEEG, validate its features and demonstrate its application in extracting TEPs across several single- and multi-pulse TMS protocols. As the first open-source GUI-based pipeline

  13. Accelerating image reconstruction in three-dimensional optoacoustic tomography on graphics processing units

    PubMed Central

    Wang, Kun; Huang, Chao; Kao, Yu-Jiun; Chou, Cheng-Ying; Oraevsky, Alexander A.; Anastasio, Mark A.

    2013-01-01

    Purpose: Optoacoustic tomography (OAT) is inherently a three-dimensional (3D) inverse problem. However, most studies of OAT image reconstruction still employ two-dimensional imaging models. One important reason is because 3D image reconstruction is computationally burdensome. The aim of this work is to accelerate existing image reconstruction algorithms for 3D OAT by use of parallel programming techniques. Methods: Parallelization strategies are proposed to accelerate a filtered backprojection (FBP) algorithm and two different pairs of projection/backprojection operations that correspond to two different numerical imaging models. The algorithms are designed to fully exploit the parallel computing power of graphics processing units (GPUs). In order to evaluate the parallelization strategies for the projection/backprojection pairs, an iterative image reconstruction algorithm is implemented. Computer simulation and experimental studies are conducted to investigate the computational efficiency and numerical accuracy of the developed algorithms. Results: The GPU implementations improve the computational efficiency by factors of 1000, 125, and 250 for the FBP algorithm and the two pairs of projection/backprojection operators, respectively. Accurate images are reconstructed by use of the FBP and iterative image reconstruction algorithms from both computer-simulated and experimental data. Conclusions: Parallelization strategies for 3D OAT image reconstruction are proposed for the first time. These GPU-based implementations significantly reduce the computational time for 3D image reconstruction, complementing our earlier work on 3D OAT iterative image reconstruction. PMID:23387778

  14. An Optimized Multicolor Point-Implicit Solver for Unstructured Grid Applications on Graphics Processing Units

    NASA Technical Reports Server (NTRS)

    Zubair, Mohammad; Nielsen, Eric; Luitjens, Justin; Hammond, Dana

    2016-01-01

    In the field of computational fluid dynamics, the Navier-Stokes equations are often solved using an unstructuredgrid approach to accommodate geometric complexity. Implicit solution methodologies for such spatial discretizations generally require frequent solution of large tightly-coupled systems of block-sparse linear equations. The multicolor point-implicit solver used in the current work typically requires a significant fraction of the overall application run time. In this work, an efficient implementation of the solver for graphics processing units is proposed. Several factors present unique challenges to achieving an efficient implementation in this environment. These include the variable amount of parallelism available in different kernel calls, indirect memory access patterns, low arithmetic intensity, and the requirement to support variable block sizes. In this work, the solver is reformulated to use standard sparse and dense Basic Linear Algebra Subprograms (BLAS) functions. However, numerical experiments show that the performance of the BLAS functions available in existing CUDA libraries is suboptimal for matrices representative of those encountered in actual simulations. Instead, optimized versions of these functions are developed. Depending on block size, the new implementations show performance gains of up to 7x over the existing CUDA library functions.

  15. Multidimensional upwind hydrodynamics on unstructured meshes using graphics processing units - I. Two-dimensional uniform meshes

    NASA Astrophysics Data System (ADS)

    Paardekooper, S.-J.

    2017-08-01

    We present a new method for numerical hydrodynamics which uses a multidimensional generalization of the Roe solver and operates on an unstructured triangular mesh. The main advantage over traditional methods based on Riemann solvers, which commonly use one-dimensional flux estimates as building blocks for a multidimensional integration, is its inherently multidimensional nature, and as a consequence its ability to recognize multidimensional stationary states that are not hydrostatic. A second novelty is the focus on graphics processing units (GPUs). By tailoring the algorithms specifically to GPUs, we are able to get speedups of 100-250 compared to a desktop machine. We compare the multidimensional upwind scheme to a traditional, dimensionally split implementation of the Roe solver on several test problems, and we find that the new method significantly outperforms the Roe solver in almost all cases. This comes with increased computational costs per time-step, which makes the new method approximately a factor of 2 slower than a dimensionally split scheme acting on a structured grid.

  16. Efficient molecular dynamics simulations with many-body potentials on graphics processing units

    NASA Astrophysics Data System (ADS)

    Fan, Zheyong; Chen, Wei; Vierimaa, Ville; Harju, Ari

    2017-09-01

    Graphics processing units have been extensively used to accelerate classical molecular dynamics simulations. However, there is much less progress on the acceleration of force evaluations for many-body potentials compared to pairwise ones. In the conventional force evaluation algorithm for many-body potentials, the force, virial stress, and heat current for a given atom are accumulated within different loops, which could result in write conflict between different threads in a CUDA kernel. In this work, we provide a new force evaluation algorithm, which is based on an explicit pairwise force expression for many-body potentials derived recently (Fan et al., 2015). In our algorithm, the force, virial stress, and heat current for a given atom can be accumulated within a single thread and is free of write conflicts. We discuss the formulations and algorithms and evaluate their performance. A new open-source code, GPUMD, is developed based on the proposed formulations. For the Tersoff many-body potential, the double precision performance of GPUMD using a Tesla K40 card is equivalent to that of the LAMMPS (Large-scale Atomic/Molecular Massively Parallel Simulator) molecular dynamics code running with about 100 CPU cores (Intel Xeon CPU X5670 @ 2.93 GHz).

  17. Accelerating the Gillespie Exact Stochastic Simulation Algorithm using hybrid parallel execution on graphics processing units.

    PubMed

    Komarov, Ivan; D'Souza, Roshan M

    2012-01-01

    The Gillespie Stochastic Simulation Algorithm (GSSA) and its variants are cornerstone techniques to simulate reaction kinetics in situations where the concentration of the reactant is too low to allow deterministic techniques such as differential equations. The inherent limitations of the GSSA include the time required for executing a single run and the need for multiple runs for parameter sweep exercises due to the stochastic nature of the simulation. Even very efficient variants of GSSA are prohibitively expensive to compute and perform parameter sweeps. Here we present a novel variant of the exact GSSA that is amenable to acceleration by using graphics processing units (GPUs). We parallelize the execution of a single realization across threads in a warp (fine-grained parallelism). A warp is a collection of threads that are executed synchronously on a single multi-processor. Warps executing in parallel on different multi-processors (coarse-grained parallelism) simultaneously generate multiple trajectories. Novel data-structures and algorithms reduce memory traffic, which is the bottleneck in computing the GSSA. Our benchmarks show an 8×-120× performance gain over various state-of-the-art serial algorithms when simulating different types of models.

  18. Accelerating large-scale protein structure alignments with graphics processing units.

    PubMed

    Pang, Bin; Zhao, Nan; Becchi, Michela; Korkin, Dmitry; Shyu, Chi-Ren

    2012-02-22

    Large-scale protein structure alignment, an indispensable tool to structural bioinformatics, poses a tremendous challenge on computational resources. To ensure structure alignment accuracy and efficiency, efforts have been made to parallelize traditional alignment algorithms in grid environments. However, these solutions are costly and of limited accessibility. Others trade alignment quality for speedup by using high-level characteristics of structure fragments for structure comparisons. We present ppsAlign, a parallel protein structure Alignment framework designed and optimized to exploit the parallelism of Graphics Processing Units (GPUs). As a general-purpose GPU platform, ppsAlign could take many concurrent methods, such as TM-align and Fr-TM-align, into the parallelized algorithm design. We evaluated ppsAlign on an NVIDIA Tesla C2050 GPU card, and compared it with existing software solutions running on an AMD dual-core CPU. We observed a 36-fold speedup over TM-align, a 65-fold speedup over Fr-TM-align, and a 40-fold speedup over MAMMOTH. ppsAlign is a high-performance protein structure alignment tool designed to tackle the computational complexity issues from protein structural data. The solution presented in this paper allows large-scale structure comparisons to be performed using massive parallel computing power of GPU.

  19. Simulation of Coarse-Grained Protein-Protein Interactions with Graphics Processing Units.

    PubMed

    Tunbridge, Ian; Best, Robert B; Gain, James; Kuttel, Michelle M

    2010-11-09

    We report a hybrid parallel central and graphics processing units (CPU-GPU) implementation of a coarse-grained model for replica exchange Monte Carlo (REMC) simulations of protein assemblies. We describe the design, optimization, validation, and benchmarking of our algorithms, particularly the parallelization strategy, which is specific to the requirements of GPU hardware. Performance evaluation of our hybrid implementation shows scaled speedup as compared to a single-core CPU; reference simulations of small 100 residue proteins have a modest speedup of 4, while large simulations with thousands of residues are up to 1400 times faster. Importantly, the combination of coarse-grained models with highly parallel GPU hardware vastly increases the length- and time-scales accessible for protein simulation, making it possible to simulate much larger systems of interacting proteins than have previously been attempted. As a first step toward the simulation of the assembly of an entire viral capsid, we have demonstrated that the chosen coarse-grained model, together with REMC sampling, is capable of identifying the correctly bound structure, for a pair of fragments from the human hepatitis B virus capsid. Our parallel solution can easily be generalized to other interaction functions and other types of macromolecules and has implications for the parallelization of similar N-body problems that require random access lookups.

  20. Developing a multiscale, multi-resolution agent-based brain tumor model by graphics processing units.

    PubMed

    Zhang, Le; Jiang, Beini; Wu, Yukun; Strouthos, Costas; Sun, Phillip Zhe; Su, Jing; Zhou, Xiaobo

    2011-12-16

    Multiscale agent-based modeling (MABM) has been widely used to simulate Glioblastoma Multiforme (GBM) and its progression. At the intracellular level, the MABM approach employs a system of ordinary differential equations to describe quantitatively specific intracellular molecular pathways that determine phenotypic switches among cells (e.g. from migration to proliferation and vice versa). At the intercellular level, MABM describes cell-cell interactions by a discrete module. At the tissue level, partial differential equations are employed to model the diffusion of chemoattractants, which are the input factors of the intracellular molecular pathway. Moreover, multiscale analysis makes it possible to explore the molecules that play important roles in determining the cellular phenotypic switches that in turn drive the whole GBM expansion. However, owing to limited computational resources, MABM is currently a theoretical biological model that uses relatively coarse grids to simulate a few cancer cells in a small slice of brain cancer tissue. In order to improve this theoretical model to simulate and predict actual GBM cancer progression in real time, a graphics processing unit (GPU)-based parallel computing algorithm was developed and combined with the multi-resolution design to speed up the MABM. The simulated results demonstrated that the GPU-based, multi-resolution and multiscale approach can accelerate the previous MABM around 30-fold with relatively fine grids in a large extracellular matrix. Therefore, the new model has great potential for simulating and predicting real-time GBM progression, if real experimental data are incorporated.

  1. Accelerating large-scale protein structure alignments with graphics processing units

    PubMed Central

    2012-01-01

    Background Large-scale protein structure alignment, an indispensable tool to structural bioinformatics, poses a tremendous challenge on computational resources. To ensure structure alignment accuracy and efficiency, efforts have been made to parallelize traditional alignment algorithms in grid environments. However, these solutions are costly and of limited accessibility. Others trade alignment quality for speedup by using high-level characteristics of structure fragments for structure comparisons. Findings We present ppsAlign, a parallel protein structure Alignment framework designed and optimized to exploit the parallelism of Graphics Processing Units (GPUs). As a general-purpose GPU platform, ppsAlign could take many concurrent methods, such as TM-align and Fr-TM-align, into the parallelized algorithm design. We evaluated ppsAlign on an NVIDIA Tesla C2050 GPU card, and compared it with existing software solutions running on an AMD dual-core CPU. We observed a 36-fold speedup over TM-align, a 65-fold speedup over Fr-TM-align, and a 40-fold speedup over MAMMOTH. Conclusions ppsAlign is a high-performance protein structure alignment tool designed to tackle the computational complexity issues from protein structural data. The solution presented in this paper allows large-scale structure comparisons to be performed using massive parallel computing power of GPU. PMID:22357132

  2. Accelerating All-Atom Normal Mode Analysis with Graphics Processing Unit.

    PubMed

    Liu, Li; Liu, Xiaofeng; Gong, Jiayu; Jiang, Hualiang; Li, Honglin

    2011-06-14

    All-atom normal mode analysis (NMA) is an efficient way to predict the collective motions in a given macromolecule, which is essential for the understanding of protein biological function and drug design. However, the calculations are limited in time scale mainly because the required diagonalization of the Hessian matrix by Householder-QR transformation is a computationally exhausting task. In this paper, we demonstrate the parallel computing power of the graphics processing unit (GPU) in NMA by mapping Householder-QR transformation onto GPU using Compute Unified Device Architecture (CUDA). The results revealed that the GPU-accelerated all-atom NMA could reduce the runtime of diagonalization significantly and achieved over 20× speedup over CPU-based NMA. In addition, we analyzed the influence of precision on both the performance and the accuracy of GPU. Although the performance of GPU with double precision is weaker than that with single precision in theory, more accurate results and an acceptable speedup of double precision were obtained in our approach by reducing the data transfer time to a minimum. Finally, the inherent drawbacks of GPU and the corresponding solution to deal with the limitation in computational scale are also discussed in this study.

  3. Parallel design of JPEG-LS encoder on graphics processing units

    NASA Astrophysics Data System (ADS)

    Duan, Hao; Fang, Yong; Huang, Bormin

    2012-01-01

    With recent technical advances in graphic processing units (GPUs), GPUs have outperformed CPUs in terms of compute capability and memory bandwidth. Many successful GPU applications to high performance computing have been reported. JPEG-LS is an ISO/IEC standard for lossless image compression which utilizes adaptive context modeling and run-length coding to improve compression ratio. However, adaptive context modeling causes data dependency among adjacent pixels and the run-length coding has to be performed in a sequential way. Hence, using JPEG-LS to compress large-volume hyperspectral image data is quite time-consuming. We implement an efficient parallel JPEG-LS encoder for lossless hyperspectral compression on a NVIDIA GPU using the computer unified device architecture (CUDA) programming technology. We use the block parallel strategy, as well as such CUDA techniques as coalesced global memory access, parallel prefix sum, and asynchronous data transfer. We also show the relation between GPU speedup and AVIRIS block size, as well as the relation between compression ratio and AVIRIS block size. When AVIRIS images are divided into blocks, each with 64×64 pixels, we gain the best GPU performance with 26.3x speedup over its original CPU code.

  4. Accelerated Molecular Dynamics Simulations with the AMOEBA Polarizable Force Field on Graphics Processing Units.

    PubMed

    Lindert, Steffen; Bucher, Denis; Eastman, Peter; Pande, Vijay; McCammon, J Andrew

    2013-11-12

    The accelerated molecular dynamics (aMD) method has recently been shown to enhance the sampling of biomolecules in molecular dynamics (MD) simulations, often by several orders of magnitude. Here, we describe an implementation of the aMD method for the OpenMM application layer that takes full advantage of graphics processing units (GPUs) computing. The aMD method is shown to work in combination with the AMOEBA polarizable force field (AMOEBA-aMD), allowing the simulation of long time-scale events with a polarizable force field. Benchmarks are provided to show that the AMOEBA-aMD method is efficiently implemented and produces accurate results in its standard parametrization. For the BPTI protein, we demonstrate that the protein structure described with AMOEBA remains stable even on the extended time scales accessed at high levels of accelerations. For the DNA repair metalloenzyme endonuclease IV, we show that the use of the AMOEBA force field is a significant improvement over fixed charged models for describing the enzyme active-site. The new AMOEBA-aMD method is publicly available (http://wiki.simtk.org/openmm/VirtualRepository) and promises to be interesting for studying complex systems that can benefit from both the use of a polarizable force field and enhanced sampling.

  5. Density-fitted singles and doubles coupled cluster on graphics processing units

    SciTech Connect

    Sherrill, David; Sumpter, Bobby G; DePrince, III, A. Eugene

    2014-01-01

    We adapt an algorithm for singles and doubles coupled cluster (CCSD) that uses density fitting (DF) or Cholesky decomposition (CD) in the construction and contraction of all electron repulsion integrals (ERI s) for use on heterogeneous compute nodes consisting of a multicore CPU and at least one graphics processing unit (GPU). The use of approximate 3-index ERI s ameliorates two of the major difficulties in designing scientific algorithms for GPU s: (i) the extremely limited global memory on the devices and (ii) the overhead associated with data motion across the PCI bus. For the benzene trimer described by an aug-cc-pVDZ basis set, the use of a single NVIDIA Tesla C2070 (Fermi) GPU accelerates a CD-CCSD computation by a factor of 2.1, relative to the multicore CPU-only algorithm that uses 6 highly efficient Intel core i7-3930K CPU cores. The use of two Fermis provides an acceleration of 2.89, which is comparable to that observed when using a single NVIDIA Kepler K20c GPU (2.73).

  6. The application of projected conjugate gradient solvers on graphical processing units

    SciTech Connect

    Lin, Youzuo; Renaut, Rosemary

    2011-01-26

    Graphical processing units introduce the capability for large scale computation at the desktop. Presented numerical results verify that efficiencies and accuracies of basic linear algebra subroutines of all levels when implemented in CUDA and Jacket are comparable. But experimental results demonstrate that the basic linear algebra subroutines of level three offer the greatest potential for improving efficiency of basic numerical algorithms. We consider the solution of the multiple right hand side set of linear equations using Krylov subspace-based solvers. Thus, for the multiple right hand side case, it is more efficient to make use of a block implementation of the conjugate gradient algorithm, rather than to solve each system independently. Jacket is used for the implementation. Furthermore, including projection from one system to another improves efficiency. A relevant example, for which simulated results are provided, is the reconstruction of a three dimensional medical image volume acquired from a positron emission tomography scanner. Efficiency of the reconstruction is improved by using projection across nearby slices.

  7. NBSymple, a double parallel, symplectic N-body code running on graphic processing units

    NASA Astrophysics Data System (ADS)

    Capuzzo-Dolcetta, R.; Mastrobuono-Battisti, A.; Maschietti, D.

    2011-07-01

    We present and discuss the characteristics and performance, both in term of computational speed and precision, of a numerical code which integrates the equation of motions of N 'particles' interacting via Newtonian gravitation and move in an external galactic smooth field. The force evaluation on every particle is done by mean of direct summation of the contribution of all the other system's particles, avoiding truncation error. The time integration is done with second-order and sixth-order symplectic schemes. The code, NBSymple, has been parallelized twice, by mean of the Compute Unified Device Architecture (CUDA) to make the all-pair force evaluation as fast as possible on high-performance Graphic Processing Units NVIDIA TESLA C1060, while the O( N) computations are distributed on various CPUs by mean of OpenMP Application Program. The code works both in single-precision floating point arithmetics or in double precision. The use of single-precision allows the use of the GPU performance at best but, of course, limits the precision of simulation in some critical situations. We find a good compromise in using a software reconstruction of double-precision for those variables that are most critical for the overall precision of the code. The code is available on the web site astrowww.phys.uniroma1.it/dolcetta/nbsymple.html.

  8. NBSymple: A Double Parallel, Symplectic N-body Code Running on Graphic Processing Units

    NASA Astrophysics Data System (ADS)

    Capuzzo-Dolcetta, R.; Mastrobuono-Battisti, A.

    2010-10-01

    NBSymple is a numerical code which numerically integrates the equation of motions of N 'particles' interacting via Newtonian gravitation and move in an external galactic smooth field. The force evaluation on every particle is done by mean of direct summation of the contribution of all the other system's particle, avoiding truncation error. The time integration is done with second-order and sixth-order symplectic schemes. NBSymple has been parallelized twice, by mean of the Computer Unified Device Architecture to make the all-pair force evaluation as fast as possible on high-performance Graphic Processing Units NVIDIA TESLA C 1060, while the O(N) computations are distributed on various CPUs by mean of OpenMP Application Program. The code works both in single precision floating point arithmetics or in double precision. The use of single precision allows the use at best of the GPU performances but, of course, limits the precision of simulation in some critical situations. We find a good compromise in using a software reconstruction of double precision for those variables that are most critical for the overall precision of the code.

  9. Space Object Collision Probability via Monte Carlo on the Graphics Processing Unit

    NASA Astrophysics Data System (ADS)

    Vittaldev, Vivek; Russell, Ryan P.

    2017-09-01

    Fast and accurate collision probability computations are essential for protecting space assets. Monte Carlo (MC) simulation is the most accurate but computationally intensive method. A Graphics Processing Unit (GPU) is used to parallelize the computation and reduce the overall runtime. Using MC techniques to compute the collision probability is common in literature as the benchmark. An optimized implementation on the GPU, however, is a challenging problem and is the main focus of the current work. The MC simulation takes samples from the uncertainty distributions of the Resident Space Objects (RSOs) at any time during a time window of interest and outputs the separations at closest approach. Therefore, any uncertainty propagation method may be used and the collision probability is automatically computed as a function of RSO collision radii. Integration using a fixed time step and a quartic interpolation after every Runge Kutta step ensures that no close approaches are missed. Two orders of magnitude speedups over a serial CPU implementation are shown, and speedups improve moderately with higher fidelity dynamics. The tool makes the MC approach tractable on a single workstation, and can be used as a final product, or for verifying surrogate and analytical collision probability methods.

  10. Graphics processing unit accelerated one-dimensional blood flow computation in the human arterial tree.

    PubMed

    Itu, Lucian; Sharma, Puneet; Kamen, Ali; Suciu, Constantin; Comaniciu, Dorin

    2013-12-01

    One-dimensional blood flow models have been used extensively for computing pressure and flow waveforms in the human arterial circulation. We propose an improved numerical implementation based on a graphics processing unit (GPU) for the acceleration of the execution time of one-dimensional model. A novel parallel hybrid CPU-GPU algorithm with compact copy operations (PHCGCC) and a parallel GPU only (PGO) algorithm are developed, which are compared against previously introduced PHCG versions, a single-threaded CPU only algorithm and a multi-threaded CPU only algorithm. Different second-order numerical schemes (Lax-Wendroff and Taylor series) are evaluated for the numerical solution of one-dimensional model, and the computational setups include physiologically motivated non-periodic (Windkessel) and periodic boundary conditions (BC) (structured tree) and elastic and viscoelastic wall laws. Both the PHCGCC and the PGO implementations improved the execution time significantly. The speed-up values over the single-threaded CPU only implementation range from 5.26 to 8.10 × , whereas the speed-up values over the multi-threaded CPU only implementation range from 1.84 to 4.02 × . The PHCGCC algorithm performs best for an elastic wall law with non-periodic BC and for viscoelastic wall laws, whereas the PGO algorithm performs best for an elastic wall law with periodic BC. Copyright © 2013 John Wiley & Sons, Ltd.

  11. Accelerating frequency-domain diffuse optical tomographic image reconstruction using graphics processing units.

    PubMed

    Prakash, Jaya; Chandrasekharan, Venkittarayan; Upendra, Vishwajith; Yalavarthy, Phaneendra K

    2010-01-01

    Diffuse optical tomographic image reconstruction uses advanced numerical models that are computationally costly to be implemented in the real time. The graphics processing units (GPUs) offer desktop massive parallelization that can accelerate these computations. An open-source GPU-accelerated linear algebra library package is used to compute the most intensive matrix-matrix calculations and matrix decompositions that are used in solving the system of linear equations. These open-source functions were integrated into the existing frequency-domain diffuse optical image reconstruction algorithms to evaluate the acceleration capability of the GPUs (NVIDIA Tesla C 1060) with increasing reconstruction problem sizes. These studies indicate that single precision computations are sufficient for diffuse optical tomographic image reconstruction. The acceleration per iteration can be up to 40, using GPUs compared to traditional CPUs in case of three-dimensional reconstruction, where the reconstruction problem is more underdetermined, making the GPUs more attractive in the clinical settings. The current limitation of these GPUs in the available onboard memory (4 GB) that restricts the reconstruction of a large set of optical parameters, more than 13,377.

  12. Accelerating image reconstruction in three-dimensional optoacoustic tomography on graphics processing units.

    PubMed

    Wang, Kun; Huang, Chao; Kao, Yu-Jiun; Chou, Cheng-Ying; Oraevsky, Alexander A; Anastasio, Mark A

    2013-02-01

    Optoacoustic tomography (OAT) is inherently a three-dimensional (3D) inverse problem. However, most studies of OAT image reconstruction still employ two-dimensional imaging models. One important reason is because 3D image reconstruction is computationally burdensome. The aim of this work is to accelerate existing image reconstruction algorithms for 3D OAT by use of parallel programming techniques. Parallelization strategies are proposed to accelerate a filtered backprojection (FBP) algorithm and two different pairs of projection/backprojection operations that correspond to two different numerical imaging models. The algorithms are designed to fully exploit the parallel computing power of graphics processing units (GPUs). In order to evaluate the parallelization strategies for the projection/backprojection pairs, an iterative image reconstruction algorithm is implemented. Computer simulation and experimental studies are conducted to investigate the computational efficiency and numerical accuracy of the developed algorithms. The GPU implementations improve the computational efficiency by factors of 1000, 125, and 250 for the FBP algorithm and the two pairs of projection/backprojection operators, respectively. Accurate images are reconstructed by use of the FBP and iterative image reconstruction algorithms from both computer-simulated and experimental data. Parallelization strategies for 3D OAT image reconstruction are proposed for the first time. These GPU-based implementations significantly reduce the computational time for 3D image reconstruction, complementing our earlier work on 3D OAT iterative image reconstruction.

  13. Graphics processing unit (GPU)-accelerated particle filter framework for positron emission tomography image reconstruction.

    PubMed

    Yu, Fengchao; Liu, Huafeng; Hu, Zhenghui; Shi, Pengcheng

    2012-04-01

    As a consequence of the random nature of photon emissions and detections, the data collected by a positron emission tomography (PET) imaging system can be shown to be Poisson distributed. Meanwhile, there have been considerable efforts within the tracer kinetic modeling communities aimed at establishing the relationship between the PET data and physiological parameters that affect the uptake and metabolism of the tracer. Both statistical and physiological models are important to PET reconstruction. The majority of previous efforts are based on simplified, nonphysical mathematical expression, such as Poisson modeling of the measured data, which is, on the whole, completed without consideration of the underlying physiology. In this paper, we proposed a graphics processing unit (GPU)-accelerated reconstruction strategy that can take both statistical model and physiological model into consideration with the aid of state-space evolution equations. The proposed strategy formulates the organ activity distribution through tracer kinetics models and the photon-counting measurements through observation equations, thus making it possible to unify these two constraints into a general framework. In order to accelerate reconstruction, GPU-based parallel computing is introduced. Experiments of Zubal-thorax-phantom data, Monte Carlo simulated phantom data, and real phantom data show the power of the method. Furthermore, thanks to the computing power of the GPU, the reconstruction time is practical for clinical application.

  14. High-Throughput Characterization of Porous Materials Using Graphics Processing Units

    SciTech Connect

    Kim, Jihan; Martin, Richard L.; Rübel, Oliver; Haranczyk, Maciej; Smit, Berend

    2012-05-08

    We have developed a high-throughput graphics processing units (GPU) code that can characterize a large database of crystalline porous materials. In our algorithm, the GPU is utilized to accelerate energy grid calculations where the grid values represent interactions (i.e., Lennard-Jones + Coulomb potentials) between gas molecules (i.e., CH$_{4}$ and CO$_{2}$) and material's framework atoms. Using a parallel flood fill CPU algorithm, inaccessible regions inside the framework structures are identified and blocked based on their energy profiles. Finally, we compute the Henry coefficients and heats of adsorption through statistical Widom insertion Monte Carlo moves in the domain restricted to the accessible space. The code offers significant speedup over a single core CPU code and allows us to characterize a set of porous materials at least an order of magnitude larger than ones considered in earlier studies. For structures selected from such a prescreening algorithm, full adsorption isotherms can be calculated by conducting multiple grand canonical Monte Carlo simulations concurrently within the GPU.

  15. Accelerating Electrostatic Surface Potential Calculation with Multiscale Approximation on Graphics Processing Units

    PubMed Central

    Anandakrishnan, Ramu; Scogland, Tom R. W.; Fenley, Andrew T.; Gordon, John C.; Feng, Wu-chun; Onufriev, Alexey V.

    2010-01-01

    Tools that compute and visualize biomolecular electrostatic surface potential have been used extensively for studying biomolecular function. However, determining the surface potential for large biomolecules on a typical desktop computer can take days or longer using currently available tools and methods. Two commonly used techniques to speed up these types of electrostatic computations are approximations based on multi-scale coarse-graining and parallelization across multiple processors. This paper demonstrates that for the computation of electrostatic surface potential, these two techniques can be combined to deliver significantly greater speed-up than either one separately, something that is in general not always possible. Specifically, the electrostatic potential computation, using an analytical linearized Poisson Boltzmann (ALPB) method, is approximated using the hierarchical charge partitioning (HCP) multiscale method, and parallelized on an ATI Radeon 4870 graphical processing unit (GPU). The implementation delivers a combined 934-fold speed-up for a 476,040 atom viral capsid, compared to an equivalent non-parallel implementation on an Intel E6550 CPU without the approximation. This speed-up is significantly greater than the 42-fold speed-up for the HCP approximation alone or the 182-fold speed-up for the GPU alone. PMID:20452792

  16. Space Object Collision Probability via Monte Carlo on the Graphics Processing Unit

    NASA Astrophysics Data System (ADS)

    Vittaldev, Vivek; Russell, Ryan P.

    2017-03-01

    Fast and accurate collision probability computations are essential for protecting space assets. Monte Carlo (MC) simulation is the most accurate but computationally intensive method. A Graphics Processing Unit (GPU) is used to parallelize the computation and reduce the overall runtime. Using MC techniques to compute the collision probability is common in literature as the benchmark. An optimized implementation on the GPU, however, is a challenging problem and is the main focus of the current work. The MC simulation takes samples from the uncertainty distributions of the Resident Space Objects (RSOs) at any time during a time window of interest and outputs the separations at closest approach. Therefore, any uncertainty propagation method may be used and the collision probability is automatically computed as a function of RSO collision radii. Integration using a fixed time step and a quartic interpolation after every Runge Kutta step ensures that no close approaches are missed. Two orders of magnitude speedups over a serial CPU implementation are shown, and speedups improve moderately with higher fidelity dynamics. The tool makes the MC approach tractable on a single workstation, and can be used as a final product, or for verifying surrogate and analytical collision probability methods.

  17. Fast data preprocessing with Graphics Processing Units for inverse problem solving in light-scattering measurements

    NASA Astrophysics Data System (ADS)

    Derkachov, G.; Jakubczyk, T.; Jakubczyk, D.; Archer, J.; Woźniak, M.

    2017-07-01

    Utilising Compute Unified Device Architecture (CUDA) platform for Graphics Processing Units (GPUs) enables significant reduction of computation time at a moderate cost, by means of parallel computing. In the paper [Jakubczyk et al., Opto-Electron. Rev., 2016] we reported using GPU for Mie scattering inverse problem solving (up to 800-fold speed-up). Here we report the development of two subroutines utilising GPU at data preprocessing stages for the inversion procedure: (i) A subroutine, based on ray tracing, for finding spherical aberration correction function. (ii) A subroutine performing the conversion of an image to a 1D distribution of light intensity versus azimuth angle (i.e. scattering diagram), fed from a movie-reading CPU subroutine running in parallel. All subroutines are incorporated in PikeReader application, which we make available on GitHub repository. PikeReader returns a sequence of intensity distributions versus a common azimuth angle vector, corresponding to the recorded movie. We obtained an overall ∼ 400 -fold speed-up of calculations at data preprocessing stages using CUDA codes running on GPU in comparison to single thread MATLAB-only code running on CPU.

  18. Accelerating the Gillespie Exact Stochastic Simulation Algorithm Using Hybrid Parallel Execution on Graphics Processing Units

    PubMed Central

    Komarov, Ivan; D'Souza, Roshan M.

    2012-01-01

    The Gillespie Stochastic Simulation Algorithm (GSSA) and its variants are cornerstone techniques to simulate reaction kinetics in situations where the concentration of the reactant is too low to allow deterministic techniques such as differential equations. The inherent limitations of the GSSA include the time required for executing a single run and the need for multiple runs for parameter sweep exercises due to the stochastic nature of the simulation. Even very efficient variants of GSSA are prohibitively expensive to compute and perform parameter sweeps. Here we present a novel variant of the exact GSSA that is amenable to acceleration by using graphics processing units (GPUs). We parallelize the execution of a single realization across threads in a warp (fine-grained parallelism). A warp is a collection of threads that are executed synchronously on a single multi-processor. Warps executing in parallel on different multi-processors (coarse-grained parallelism) simultaneously generate multiple trajectories. Novel data-structures and algorithms reduce memory traffic, which is the bottleneck in computing the GSSA. Our benchmarks show an 8×−120× performance gain over various state-of-the-art serial algorithms when simulating different types of models. PMID:23152751

  19. Developing a multiscale, multi-resolution agent-based brain tumor model by graphics processing units

    PubMed Central

    2011-01-01

    Multiscale agent-based modeling (MABM) has been widely used to simulate Glioblastoma Multiforme (GBM) and its progression. At the intracellular level, the MABM approach employs a system of ordinary differential equations to describe quantitatively specific intracellular molecular pathways that determine phenotypic switches among cells (e.g. from migration to proliferation and vice versa). At the intercellular level, MABM describes cell-cell interactions by a discrete module. At the tissue level, partial differential equations are employed to model the diffusion of chemoattractants, which are the input factors of the intracellular molecular pathway. Moreover, multiscale analysis makes it possible to explore the molecules that play important roles in determining the cellular phenotypic switches that in turn drive the whole GBM expansion. However, owing to limited computational resources, MABM is currently a theoretical biological model that uses relatively coarse grids to simulate a few cancer cells in a small slice of brain cancer tissue. In order to improve this theoretical model to simulate and predict actual GBM cancer progression in real time, a graphics processing unit (GPU)-based parallel computing algorithm was developed and combined with the multi-resolution design to speed up the MABM. The simulated results demonstrated that the GPU-based, multi-resolution and multiscale approach can accelerate the previous MABM around 30-fold with relatively fine grids in a large extracellular matrix. Therefore, the new model has great potential for simulating and predicting real-time GBM progression, if real experimental data are incorporated. PMID:22176732

  20. Accelerating adaptive inverse distance weighting interpolation algorithm on a graphics processing unit

    PubMed Central

    Xu, Liangliang; Xu, Nengxiong

    2017-01-01

    This paper focuses on designing and implementing parallel adaptive inverse distance weighting (AIDW) interpolation algorithms by using the graphics processing unit (GPU). The AIDW is an improved version of the standard IDW, which can adaptively determine the power parameter according to the data points’ spatial distribution pattern and achieve more accurate predictions than those predicted by IDW. In this paper, we first present two versions of the GPU-accelerated AIDW, i.e. the naive version without profiting from the shared memory and the tiled version taking advantage of the shared memory. We also implement the naive version and the tiled version using two data layouts, structure of arrays and array of aligned structures, on both single and double precision. We then evaluate the performance of parallel AIDW by comparing it with its corresponding serial algorithm on three different machines equipped with the GPUs GT730M, M5000 and K40c. The experimental results indicate that: (i) there is no significant difference in the computational efficiency when different data layouts are employed; (ii) the tiled version is always slightly faster than the naive version; and (iii) on single precision the achieved speed-up can be up to 763 (on the GPU M5000), while on double precision the obtained highest speed-up is 197 (on the GPU K40c). To benefit the community, all source code and testing data related to the presented parallel AIDW algorithm are publicly available.

  1. Accelerating electrostatic surface potential calculation with multi-scale approximation on graphics processing units.

    PubMed

    Anandakrishnan, Ramu; Scogland, Tom R W; Fenley, Andrew T; Gordon, John C; Feng, Wu-chun; Onufriev, Alexey V

    2010-06-01

    Tools that compute and visualize biomolecular electrostatic surface potential have been used extensively for studying biomolecular function. However, determining the surface potential for large biomolecules on a typical desktop computer can take days or longer using currently available tools and methods. Two commonly used techniques to speed-up these types of electrostatic computations are approximations based on multi-scale coarse-graining and parallelization across multiple processors. This paper demonstrates that for the computation of electrostatic surface potential, these two techniques can be combined to deliver significantly greater speed-up than either one separately, something that is in general not always possible. Specifically, the electrostatic potential computation, using an analytical linearized Poisson-Boltzmann (ALPB) method, is approximated using the hierarchical charge partitioning (HCP) multi-scale method, and parallelized on an ATI Radeon 4870 graphical processing unit (GPU). The implementation delivers a combined 934-fold speed-up for a 476,040 atom viral capsid, compared to an equivalent non-parallel implementation on an Intel E6550 CPU without the approximation. This speed-up is significantly greater than the 42-fold speed-up for the HCP approximation alone or the 182-fold speed-up for the GPU alone.

  2. Efficient gaussian density formulation of volume and surface areas of macromolecules on graphical processing units.

    PubMed

    Zhang, Baofeng; Kilburg, Denise; Eastman, Peter; Pande, Vijay S; Gallicchio, Emilio

    2017-04-15

    We present an algorithm to efficiently compute accurate volumes and surface areas of macromolecules on graphical processing unit (GPU) devices using an analytic model which represents atomic volumes by continuous Gaussian densities. The volume of the molecule is expressed by means of the inclusion-exclusion formula, which is based on the summation of overlap integrals among multiple atomic densities. The surface area of the molecule is obtained by differentiation of the molecular volume with respect to atomic radii. The many-body nature of the model makes a port to GPU devices challenging. To our knowledge, this is the first reported full implementation of this model on GPU hardware. To accomplish this, we have used recursive strategies to construct the tree of overlaps and to accumulate volumes and their gradients on the tree data structures so as to minimize memory contention. The algorithm is used in the formulation of a surface area-based non-polar implicit solvent model implemented as an open source plug-in (named GaussVol) for the popular OpenMM library for molecular mechanics modeling. GaussVol is 50 to 100 times faster than our best optimized implementation for the CPUs, achieving speeds in excess of 100 ns/day with 1 fs time-step for protein-sized systems on commodity GPUs. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.

  3. Graphics processing unit (GPU)-based computation of heat conduction in thermally anisotropic solids

    NASA Astrophysics Data System (ADS)

    Nahas, C. A.; Balasubramaniam, Krishnan; Rajagopal, Prabhu

    2013-01-01

    Numerical modeling of anisotropic media is a computationally intensive task since it brings additional complexity to the field problem in such a way that the physical properties are different in different directions. Largely used in the aerospace industry because of their lightweight nature, composite materials are a very good example of thermally anisotropic media. With advancements in video gaming technology, parallel processors are much cheaper today and accessibility to higher-end graphical processing devices has increased dramatically over the past couple of years. Since these massively parallel GPUs are very good in handling floating point arithmetic, they provide a new platform for engineers and scientists to accelerate their numerical models using commodity hardware. In this paper we implement a parallel finite difference model of thermal diffusion through anisotropic media using the NVIDIA CUDA (Compute Unified device Architecture). We use the NVIDIA GeForce GTX 560 Ti as our primary computing device which consists of 384 CUDA cores clocked at 1645 MHz with a standard desktop pc as the host platform. We compare the results from standard CPU implementation for its accuracy and speed and draw implications for simulation using the GPU paradigm.

  4. Graphic processing unit accelerated real-time partially coherent beam generator

    NASA Astrophysics Data System (ADS)

    Ni, Xiaolong; Liu, Zhi; Chen, Chunyi; Jiang, Huilin; Fang, Hanhan; Song, Lujun; Zhang, Su

    2016-07-01

    A method of using liquid-crystals (LCs) to generate a partially coherent beam in real-time is described. An expression for generating a partially coherent beam is given and calculated using a graphic processing unit (GPU), i.e., the GeForce GTX 680. A liquid-crystal on silicon (LCOS) with 256 × 256 pixels is used as the partially coherent beam generator (PCBG). An optimizing method with partition convolution is used to improve the generating speed of our LC PCBG. The total time needed to generate a random phase map with a coherence width range from 0.015 mm to 1.5 mm is less than 2.4 ms for calculation and readout with the GPU; adding the time needed for the CPU to read and send to LCOS with the response time of the LC PCBG, the real-time partially coherent beam (PCB) generation frequency of our LC PCBG is up to 312 Hz. To our knowledge, it is the first real-time partially coherent beam generator. A series of experiments based on double pinhole interference are performed. The result shows that to generate a laser beam with a coherence width of 0.9 mm and 1.5 mm, with a mean error of approximately 1%, the RMS values needed 0.021306 and 0.020883 and the PV values required 0.073576 and 0.072998, respectively.

  5. Multidisciplinary Simulation Acceleration using Multiple Shared-Memory Graphical Processing Units

    NASA Astrophysics Data System (ADS)

    Kemal, Jonathan Yashar

    For purposes of optimizing and analyzing turbomachinery and other designs, the unsteady Favre-averaged flow-field differential equations for an ideal compressible gas can be solved in conjunction with the heat conduction equation. We solve all equations using the finite-volume multiple-grid numerical technique, with the dual time-step scheme used for unsteady simulations. Our numerical solver code targets CUDA-capable Graphical Processing Units (GPUs) produced by NVIDIA. Making use of MPI, our solver can run across networked compute notes, where each MPI process can use either a GPU or a Central Processing Unit (CPU) core for primary solver calculations. We use NVIDIA Tesla C2050/C2070 GPUs based on the Fermi architecture, and compare our resulting performance against Intel Zeon X5690 CPUs. Solver routines converted to CUDA typically run about 10 times faster on a GPU for sufficiently dense computational grids. We used a conjugate cylinder computational grid and ran a turbulent steady flow simulation using 4 increasingly dense computational grids. Our densest computational grid is divided into 13 blocks each containing 1033x1033 grid points, for a total of 13.87 million grid points or 1.07 million grid points per domain block. To obtain overall speedups, we compare the execution time of the solver's iteration loop, including all resource intensive GPU-related memory copies. Comparing the performance of 8 GPUs to that of 8 CPUs, we obtain an overall speedup of about 6.0 when using our densest computational grid. This amounts to an 8-GPU simulation running about 39.5 times faster than running than a single-CPU simulation.

  6. Biogeochemical Processes Regulating the Mobility of Uranium in Sediments

    SciTech Connect

    Belli, Keaton M.; Taillefert, Martial

    2016-07-01

    This book chapters reviews the latest knowledge on the biogeochemical processes regulating the mobility of uranium in sediments. It contains both data from the literature and new data from the authors.

  7. In-Situ Statistical Analysis of Autotune Simulation Data using Graphical Processing Units

    SciTech Connect

    Ranjan, Niloo; Sanyal, Jibonananda; New, Joshua Ryan

    2013-08-01

    Developing accurate building energy simulation models to assist energy efficiency at speed and scale is one of the research goals of the Whole-Building and Community Integration group, which is a part of Building Technologies Research and Integration Center (BTRIC) at Oak Ridge National Laboratory (ORNL). The aim of the Autotune project is to speed up the automated calibration of building energy models to match measured utility or sensor data. The workflow of this project takes input parameters and runs EnergyPlus simulations on Oak Ridge Leadership Computing Facility s (OLCF) computing resources such as Titan, the world s second fastest supercomputer. Multiple simulations run in parallel on nodes having 16 processors each and a Graphics Processing Unit (GPU). Each node produces a 5.7 GB output file comprising 256 files from 64 simulations. Four types of output data covering monthly, daily, hourly, and 15-minute time steps for each annual simulation is produced. A total of 270TB+ of data has been produced. In this project, the simulation data is statistically analyzed in-situ using GPUs while annual simulations are being computed on the traditional processors. Titan, with its recent addition of 18,688 Compute Unified Device Architecture (CUDA) capable NVIDIA GPUs, has greatly extended its capability for massively parallel data processing. CUDA is used along with C/MPI to calculate statistical metrics such as sum, mean, variance, and standard deviation leveraging GPU acceleration. The workflow developed in this project produces statistical summaries of the data which reduces by multiple orders of magnitude the time and amount of data that needs to be stored. These statistical capabilities are anticipated to be useful for sensitivity analysis of EnergyPlus simulations.

  8. Use of graphical statistical process control tools to monitor and improve outcomes in cardiac surgery.

    PubMed

    Smith, Ian R; Garlick, Bruce; Gardner, Michael A; Brighouse, Russell D; Foster, Kelley A; Rivers, John T

    2013-02-01

    Graphical Statistical Process Control (SPC) tools have been shown to promptly identify significant variations in clinical outcomes in a range of health care settings. We explored the application of these techniques to qualitatively inform the routine cardiac surgical morbidity and mortality (M&M) review process at a single site. Baseline clinical and procedural data relating to 4774 consecutive cardiac surgical procedures, performed between the 1st January 2003 and the 30th April 2011, were retrospectively evaluated. A range of appropriate performance measures and benchmarks were developed and evaluated using a combination of CUmulative SUM (CUSUM) charts, Exponentially Weighted Moving Average (EWMA) charts and Funnel Plots. Charts have been discussed at the unit's routine M&M meetings. Risk adjustment (RA) based on EuroSCORE has been incorporated into the charts to improve performance. Discrete and aggregated measures, including Blood Product/Reoperation, major acute post-procedural complications and Length of Stay/Readmission<28 days have proved to be usable measures for monitoring outcomes. Monitoring trends in minor morbidities provides a valuable warning of impending changes in significant events. Instances of variation in performance have been examined and could be related to differences in individual operator performance via individual operator curves. SPC tools facilitate near "real-time" performance monitoring allowing early detection and intervention in altered performance. Careful interpretation of charts for group and individual operators has proven helpful in detecting and differentiating systemic vs. individual variation. Copyright © 2012 Australian and New Zealand Society of Cardiac and Thoracic Surgeons (ANZSCTS) and the Cardiac Society of Australia and New Zealand (CSANZ). Published by Elsevier B.V. All rights reserved.

  9. Full Stokes finite-element modeling of ice sheets using a graphics processing unit

    NASA Astrophysics Data System (ADS)

    Seddik, H.; Greve, R.

    2016-12-01

    Thermo-mechanical simulation of ice sheets is an important approach to understand and predict their evolution in a changing climate. For that purpose, higher order (e.g., ISSM, BISICLES) and full Stokes (e.g., Elmer/Ice, http://elmerice.elmerfem.org) models are increasingly used to more accurately model the flow of entire ice sheets. In parallel to this development, the rapidly improving performance and capabilities of Graphics Processing Units (GPUs) allows to efficiently offload more calculations of complex and computationally demanding problems on those devices. Thus, in order to continue the trend of using full Stokes models with greater resolutions, using GPUs should be considered for the implementation of ice sheet models. We developed the GPU-accelerated ice-sheet model Sainō. Sainō is an Elmer (http://www.csc.fi/english/pages/elmer) derivative implemented in Objective-C which solves the full Stokes equations with the finite element method. It uses the standard OpenCL language (http://www.khronos.org/opencl/) to offload the assembly of the finite element matrix on the GPU. A mesh-coloring scheme is used so that elements with the same color (non-sharing nodes) are assembled in parallel on the GPU without the need for synchronization primitives. The current implementation shows that, for the ISMIP-HOM experiment A, during the matrix assembly in double precision with 8000, 87,500 and 252,000 brick elements, Sainō is respectively 2x, 10x and 14x faster than Elmer/Ice (when both models are run on a single processing unit). In single precision, Sainō is even 3x, 20x and 25x faster than Elmer/Ice. A detailed description of the comparative results between Sainō and Elmer/Ice will be presented, and further perspectives in optimization and the limitations of the current implementation.

  10. Computing the Density Matrix in Electronic Structure Theory on Graphics Processing Units.

    PubMed

    Cawkwell, M J; Sanville, E J; Mniszewski, S M; Niklasson, Anders M N

    2012-11-13

    The self-consistent solution of a Schrödinger-like equation for the density matrix is a critical and computationally demanding step in quantum-based models of interatomic bonding. This step was tackled historically via the diagonalization of the Hamiltonian. We have investigated the performance and accuracy of the second-order spectral projection (SP2) algorithm for the computation of the density matrix via a recursive expansion of the Fermi operator in a series of generalized matrix-matrix multiplications. We demonstrate that owing to its simplicity, the SP2 algorithm [Niklasson, A. M. N. Phys. Rev. B2002, 66, 155115] is exceptionally well suited to implementation on graphics processing units (GPUs). The performance in double and single precision arithmetic of a hybrid GPU/central processing unit (CPU) and full GPU implementation of the SP2 algorithm exceed those of a CPU-only implementation of the SP2 algorithm and traditional matrix diagonalization when the dimensions of the matrices exceed about 2000 × 2000. Padding schemes for arrays allocated in the GPU memory that optimize the performance of the CUBLAS implementations of the level 3 BLAS DGEMM and SGEMM subroutines for generalized matrix-matrix multiplications are described in detail. The analysis of the relative performance of the hybrid CPU/GPU and full GPU implementations indicate that the transfer of arrays between the GPU and CPU constitutes only a small fraction of the total computation time. The errors measured in the self-consistent density matrices computed using the SP2 algorithm are generally smaller than those measured in matrices computed via diagonalization. Furthermore, the errors in the density matrices computed using the SP2 algorithm do not exhibit any dependence of system size, whereas the errors increase linearly with the number of orbitals when diagonalization is employed.

  11. Numerical simulation of disperse particle flows on a graphics processing unit

    NASA Astrophysics Data System (ADS)

    Sierakowski, Adam J.

    In both nature and technology, we commonly encounter solid particles being carried within fluid flows, from dust storms to sediment erosion and from food processing to energy generation. The motion of uncountably many particles in highly dynamic flow environments characterizes the tremendous complexity of such phenomena. While methods exist for the full-scale numerical simulation of such systems, current computational capabilities require the simplification of the numerical task with significant approximation using closure models widely recognized as insufficient. There is therefore a fundamental need for the investigation of the underlying physical processes governing these disperse particle flows. In the present work, we develop a new tool based on the Physalis method for the first-principles numerical simulation of thousands of particles (a small fraction of an entire disperse particle flow system) in order to assist in the search for new reduced-order closure models. We discuss numerous enhancements to the efficiency and stability of the Physalis method, which introduces the influence of spherical particles to a fixed-grid incompressible Navier-Stokes flow solver using a local analytic solution to the flow equations. Our first-principles investigation demands the modeling of unresolved length and time scales associated with particle collisions. We introduce a collision model alongside Physalis, incorporating lubrication effects and proposing a new nonlinearly damped Hertzian contact model. By reproducing experimental studies from the literature, we document extensive validation of the methods. We discuss the implementation of our methods for massively parallel computation using a graphics processing unit (GPU). We combine Eulerian grid-based algorithms with Lagrangian particle-based algorithms to achieve computational throughput up to 90 times faster than the legacy implementation of Physalis for a single central processing unit. By avoiding all data

  12. Graphics Processing Unit Acceleration and Parallelization of GENESIS for Large-Scale Molecular Dynamics Simulations.

    PubMed

    Jung, Jaewoon; Naurse, Akira; Kobayashi, Chigusa; Sugita, Yuji

    2016-10-11

    The graphics processing unit (GPU) has become a popular computational platform for molecular dynamics (MD) simulations of biomolecules. A significant speedup in the simulations of small- or medium-size systems using only a few computer nodes with a single or multiple GPUs has been reported. Because of GPU memory limitation and slow communication between GPUs on different computer nodes, it is not straightforward to accelerate MD simulations of large biological systems that contain a few million or more atoms on massively parallel supercomputers with GPUs. In this study, we develop a new scheme in our MD software, GENESIS, to reduce the total computational time on such computers. Computationally intensive real-space nonbonded interactions are computed mainly on GPUs in the scheme, while less intensive bonded interactions and communication-intensive reciprocal-space interactions are performed on CPUs. On the basis of the midpoint cell method as a domain decomposition scheme, we invent the single particle interaction list for reducing the GPU memory usage. Since total computational time is limited by the reciprocal-space computation, we utilize the RESPA multiple time-step integration and reduce the CPU resting time by assigning a subset of nonbonded interactions on CPUs as well as on GPUs when the reciprocal-space computation is skipped. We validated our GPU implementations in GENESIS on BPTI and a membrane protein, porin, by MD simulations and an alanine-tripeptide by REMD simulations. Benchmark calculations on TSUBAME supercomputer showed that an MD simulation of a million atoms system was scalable up to 256 computer nodes with GPUs.

  13. Real-time computation of parameter fitting and image reconstruction using graphical processing units

    NASA Astrophysics Data System (ADS)

    Locans, Uldis; Adelmann, Andreas; Suter, Andreas; Fischer, Jannis; Lustermann, Werner; Dissertori, Günther; Wang, Qiulin

    2017-06-01

    In recent years graphical processing units (GPUs) have become a powerful tool in scientific computing. Their potential to speed up highly parallel applications brings the power of high performance computing to a wider range of users. However, programming these devices and integrating their use in existing applications is still a challenging task. In this paper we examined the potential of GPUs for two different applications. The first application, created at Paul Scherrer Institut (PSI), is used for parameter fitting during data analysis of μSR (muon spin rotation, relaxation and resonance) experiments. The second application, developed at ETH, is used for PET (Positron Emission Tomography) image reconstruction and analysis. Applications currently in use were examined to identify parts of the algorithms in need of optimization. Efficient GPU kernels were created in order to allow applications to use a GPU, to speed up the previously identified parts. Benchmarking tests were performed in order to measure the achieved speedup. During this work, we focused on single GPU systems to show that real time data analysis of these problems can be achieved without the need for large computing clusters. The results show that the currently used application for parameter fitting, which uses OpenMP to parallelize calculations over multiple CPU cores, can be accelerated around 40 times through the use of a GPU. The speedup may vary depending on the size and complexity of the problem. For PET image analysis, the obtained speedups of the GPU version were more than × 40 larger compared to a single core CPU implementation. The achieved results show that it is possible to improve the execution time by orders of magnitude.

  14. Porting ONETEP to graphical processing unit-based coprocessors. 1. FFT box operations.

    PubMed

    Wilkinson, Karl; Skylaris, Chris-Kriton

    2013-10-30

    We present the first graphical processing unit (GPU) coprocessor-enabled version of the Order-N Electronic Total Energy Package (ONETEP) code for linear-scaling first principles quantum mechanical calculations on materials. This work focuses on porting to the GPU the parts of the code that involve atom-localized fast Fourier transform (FFT) operations. These are among the most computationally intensive parts of the code and are used in core algorithms such as the calculation of the charge density, the local potential integrals, the kinetic energy integrals, and the nonorthogonal generalized Wannier function gradient. We have found that direct porting of the isolated FFT operations did not provide any benefit. Instead, it was necessary to tailor the port to each of the aforementioned algorithms to optimize data transfer to and from the GPU. A detailed discussion of the methods used and tests of the resulting performance are presented, which show that individual steps in the relevant algorithms are accelerated by a significant amount. However, the transfer of data between the GPU and host machine is a significant bottleneck in the reported version of the code. In addition, an initial investigation into a dynamic precision scheme for the ONETEP energy calculation has been performed to take advantage of the enhanced single precision capabilities of GPUs. The methods used here result in no disruption to the existing code base. Furthermore, as the developments reported here concern the core algorithms, they will benefit the full range of ONETEP functionality. Our use of a directive-based programming model ensures portability to other forms of coprocessors and will allow this work to form the basis of future developments to the code designed to support emerging high-performance computing platforms. Copyright © 2013 Wiley Periodicals, Inc.

  15. A New Method Based on Graphics Processing Units for Fast Near-Infrared Optical Tomography.

    PubMed

    Jiang, Jingjing; Ahnen, Linda; Kalyanov, Alexander; Lindner, Scott; Wolf, Martin; Majos, Salvador Sanchez

    2017-01-01

    The accuracy of images obtained by Diffuse Optical Tomography (DOT) could be substantially increased by the newly developed time resolved (TR) cameras. These devices result in unprecedented data volumes, which present a challenge to conventional image reconstruction techniques. In addition, many clinical applications require taking photons in air regions like the trachea into account, where the diffusion model fails. Image reconstruction techniques based on photon tracking are mandatory in those cases but have not been implemented so far due to computing demands. We aimed at designing an inversion algorithm which could be implemented on commercial graphics processing units (GPUs) by making use of information obtained with other imaging modalities. The method requires a segmented volume and an approximately uniform value for the reduced scattering coefficient in the volume under study. The complex photon path is reduced to a small number of partial path lengths within each segment resulting in drastically reduced memory usage and computation time. Our approach takes advantage of wavelength normalized data which renders it robust against instrumental biases and skin irregularities which is critical for realistic clinical applications. The accuracy of this method has been assessed with both simulated and experimental inhomogeneous phantoms showing good agreement with target values. The simulation study analyzed a phantom containing a tumor next to an air region. For the experimental test, a segmented cuboid phantom was illuminated by a supercontinuum laser and data were gathered by a state of the art TR camera. Reconstructions were obtained on a GPU-installed computer in less than 2 h. To our knowledge, it is the first time Monte Carlo methods have been successfully used for DOT based on TR cameras. This opens the door to applications such as accurate measurements of oxygenation in neck tumors where the presence of air regions is a problem for conventional approaches.

  16. Accelerating Wright–Fisher Forward Simulations on the Graphics Processing Unit

    PubMed Central

    Lawrie, David S.

    2017-01-01

    Forward Wright–Fisher simulations are powerful in their ability to model complex demography and selection scenarios, but suffer from slow execution on the Central Processor Unit (CPU), thus limiting their usefulness. However, the single-locus Wright–Fisher forward algorithm is exceedingly parallelizable, with many steps that are so-called “embarrassingly parallel,” consisting of a vast number of individual computations that are all independent of each other and thus capable of being performed concurrently. The rise of modern Graphics Processing Units (GPUs) and programming languages designed to leverage the inherent parallel nature of these processors have allowed researchers to dramatically speed up many programs that have such high arithmetic intensity and intrinsic concurrency. The presented GPU Optimized Wright–Fisher simulation, or “GO Fish” for short, can be used to simulate arbitrary selection and demographic scenarios while running over 250-fold faster than its serial counterpart on the CPU. Even modest GPU hardware can achieve an impressive speedup of over two orders of magnitude. With simulations so accelerated, one can not only do quick parametric bootstrapping of previously estimated parameters, but also use simulated results to calculate the likelihoods and summary statistics of demographic and selection models against real polymorphism data, all without restricting the demographic and selection scenarios that can be modeled or requiring approximations to the single-locus forward algorithm for efficiency. Further, as many of the parallel programming techniques used in this simulation can be applied to other computationally intensive algorithms important in population genetics, GO Fish serves as an exciting template for future research into accelerating computation in evolution. GO Fish is part of the Parallel PopGen Package available at: http://dl42.github.io/ParallelPopGen/. PMID:28768689

  17. Accelerating Wright-Fisher Forward Simulations on the Graphics Processing Unit.

    PubMed

    Lawrie, David S

    2017-09-07

    Forward Wright-Fisher simulations are powerful in their ability to model complex demography and selection scenarios, but suffer from slow execution on the Central Processor Unit (CPU), thus limiting their usefulness. However, the single-locus Wright-Fisher forward algorithm is exceedingly parallelizable, with many steps that are so-called "embarrassingly parallel," consisting of a vast number of individual computations that are all independent of each other and thus capable of being performed concurrently. The rise of modern Graphics Processing Units (GPUs) and programming languages designed to leverage the inherent parallel nature of these processors have allowed researchers to dramatically speed up many programs that have such high arithmetic intensity and intrinsic concurrency. The presented GPU Optimized Wright-Fisher simulation, or "GO Fish" for short, can be used to simulate arbitrary selection and demographic scenarios while running over 250-fold faster than its serial counterpart on the CPU. Even modest GPU hardware can achieve an impressive speedup of over two orders of magnitude. With simulations so accelerated, one can not only do quick parametric bootstrapping of previously estimated parameters, but also use simulated results to calculate the likelihoods and summary statistics of demographic and selection models against real polymorphism data, all without restricting the demographic and selection scenarios that can be modeled or requiring approximations to the single-locus forward algorithm for efficiency. Further, as many of the parallel programming techniques used in this simulation can be applied to other computationally intensive algorithms important in population genetics, GO Fish serves as an exciting template for future research into accelerating computation in evolution. GO Fish is part of the Parallel PopGen Package available at: http://dl42.github.io/ParallelPopGen/. Copyright © 2017 Lawrie.

  18. Fast analysis of molecular dynamics trajectories with graphics processing units-Radial distribution function histogramming

    SciTech Connect

    Levine, Benjamin G.; Stone, John E.; Kohlmeyer, Axel

    2011-05-01

    The calculation of radial distribution functions (RDFs) from molecular dynamics trajectory data is a common and computationally expensive analysis task. The rate limiting step in the calculation of the RDF is building a histogram of the distance between atom pairs in each trajectory frame. Here we present an implementation of this histogramming scheme for multiple graphics processing units (GPUs). The algorithm features a tiling scheme to maximize the reuse of data at the fastest levels of the GPU's memory hierarchy and dynamic load balancing to allow high performance on heterogeneous configurations of GPUs. Several versions of the RDF algorithm are presented, utilizing the specific hardware features found on different generations of GPUs. We take advantage of larger shared memory and atomic memory operations available on state-of-the-art GPUs to accelerate the code significantly. The use of atomic memory operations allows the fast, limited-capacity on-chip memory to be used much more efficiently, resulting in a fivefold increase in performance compared to the version of the algorithm without atomic operations. The ultimate version of the algorithm running in parallel on four NVIDIA GeForce GTX 480 (Fermi) GPUs was found to be 92 times faster than a multithreaded implementation running on an Intel Xeon 5550 CPU. On this multi-GPU hardware, the RDF between two selections of 1,000,000 atoms each can be calculated in 26.9 s per frame. The multi-GPU RDF algorithms described here are implemented in VMD, a widely used and freely available software package for molecular dynamics visualization and analysis.

  19. Quantum Chemistry for Solvated Molecules on Graphical Processing Units Using Polarizable Continuum Models.

    PubMed

    Liu, Fang; Luehr, Nathan; Kulik, Heather J; Martínez, Todd J

    2015-07-14

    The conductor-like polarization model (C-PCM) with switching/Gaussian smooth discretization is a widely used implicit solvation model in chemical simulations. However, its application in quantum mechanical calculations of large-scale biomolecular systems can be limited by computational expense of both the gas phase electronic structure and the solvation interaction. We have previously used graphical processing units (GPUs) to accelerate the first of these steps. Here, we extend the use of GPUs to accelerate electronic structure calculations including C-PCM solvation. Implementation on the GPU leads to significant acceleration of the generation of the required integrals for C-PCM. We further propose two strategies to improve the solution of the required linear equations: a dynamic convergence threshold and a randomized block-Jacobi preconditioner. These strategies are not specific to GPUs and are expected to be beneficial for both CPU and GPU implementations. We benchmark the performance of the new implementation using over 20 small proteins in solvent environment. Using a single GPU, our method evaluates the C-PCM related integrals and their derivatives more than 10× faster than that with a conventional CPU-based implementation. Our improvements to the linear solver provide a further 3× acceleration. The overall calculations including C-PCM solvation require, typically, 20-40% more effort than that for their gas phase counterparts for a moderate basis set and molecule surface discretization level. The relative cost of the C-PCM solvation correction decreases as the basis sets and/or cavity radii increase. Therefore, description of solvation with this model should be routine. We also discuss applications to the study of the conformational landscape of an amyloid fibril.

  20. Fast Analysis of Molecular Dynamics Trajectories with Graphics Processing Units-Radial Distribution Function Histogramming.

    PubMed

    Levine, Benjamin G; Stone, John E; Kohlmeyer, Axel

    2011-05-01

    The calculation of radial distribution functions (RDFs) from molecular dynamics trajectory data is a common and computationally expensive analysis task. The rate limiting step in the calculation of the RDF is building a histogram of the distance between atom pairs in each trajectory frame. Here we present an implementation of this histogramming scheme for multiple graphics processing units (GPUs). The algorithm features a tiling scheme to maximize the reuse of data at the fastest levels of the GPU's memory hierarchy and dynamic load balancing to allow high performance on heterogeneous configurations of GPUs. Several versions of the RDF algorithm are presented, utilizing the specific hardware features found on different generations of GPUs. We take advantage of larger shared memory and atomic memory operations available on state-of-the-art GPUs to accelerate the code significantly. The use of atomic memory operations allows the fast, limited-capacity on-chip memory to be used much more efficiently, resulting in a fivefold increase in performance compared to the version of the algorithm without atomic operations. The ultimate version of the algorithm running in parallel on four NVIDIA GeForce GTX 480 (Fermi) GPUs was found to be 92 times faster than a multithreaded implementation running on an Intel Xeon 5550 CPU. On this multi-GPU hardware, the RDF between two selections of 1,000,000 atoms each can be calculated in 26.9 seconds per frame. The multi-GPU RDF algorithms described here are implemented in VMD, a widely used and freely available software package for molecular dynamics visualization and analysis.

  1. FLOCKING-BASED DOCUMENT CLUSTERING ON THE GRAPHICS PROCESSING UNIT [Book Chapter

    SciTech Connect

    Charles, J S; Patton, R M; Potok, T E; Cui, X

    2008-01-01

    Analyzing and grouping documents by content is a complex problem. One explored method of solving this problem borrows from nature, imitating the fl ocking behavior of birds. Each bird represents a single document and fl ies toward other documents that are similar to it. One limitation of this method of document clustering is its complexity O(n2). As the number of documents grows, it becomes increasingly diffi cult to receive results in a reasonable amount of time. However, fl ocking behavior, along with most naturally inspired algorithms such as ant colony optimization and particle swarm optimization, are highly parallel and have experienced improved performance on expensive cluster computers. In the last few years, the graphics processing unit (GPU) has received attention for its ability to solve highly-parallel and semi-parallel problems much faster than the traditional sequential processor. Some applications see a huge increase in performance on this new platform. The cost of these high-performance devices is also marginal when compared with the price of cluster machines. In this paper, we have conducted research to exploit this architecture and apply its strengths to the document flocking problem. Our results highlight the potential benefi t the GPU brings to all naturally inspired algorithms. Using the CUDA platform from NVIDIA®, we developed a document fl ocking implementation to be run on the NVIDIA® GEFORCE 8800. Additionally, we developed a similar but sequential implementation of the same algorithm to be run on a desktop CPU. We tested the performance of each on groups of news articles ranging in size from 200 to 3,000 documents. The results of these tests were very signifi cant. Performance gains ranged from three to nearly fi ve times improvement of the GPU over the CPU implementation. This dramatic improvement in runtime makes the GPU a potentially revolutionary platform for document clustering algorithms.

  2. Fast Analysis of Molecular Dynamics Trajectories with Graphics Processing Units—Radial Distribution Function Histogramming

    PubMed Central

    Stone, John E.; Kohlmeyer, Axel

    2011-01-01

    The calculation of radial distribution functions (RDFs) from molecular dynamics trajectory data is a common and computationally expensive analysis task. The rate limiting step in the calculation of the RDF is building a histogram of the distance between atom pairs in each trajectory frame. Here we present an implementation of this histogramming scheme for multiple graphics processing units (GPUs). The algorithm features a tiling scheme to maximize the reuse of data at the fastest levels of the GPU’s memory hierarchy and dynamic load balancing to allow high performance on heterogeneous configurations of GPUs. Several versions of the RDF algorithm are presented, utilizing the specific hardware features found on different generations of GPUs. We take advantage of larger shared memory and atomic memory operations available on state-of-the-art GPUs to accelerate the code significantly. The use of atomic memory operations allows the fast, limited-capacity on-chip memory to be used much more efficiently, resulting in a fivefold increase in performance compared to the version of the algorithm without atomic operations. The ultimate version of the algorithm running in parallel on four NVIDIA GeForce GTX 480 (Fermi) GPUs was found to be 92 times faster than a multithreaded implementation running on an Intel Xeon 5550 CPU. On this multi-GPU hardware, the RDF between two selections of 1,000,000 atoms each can be calculated in 26.9 seconds per frame. The multi-GPU RDF algorithms described here are implemented in VMD, a widely used and freely available software package for molecular dynamics visualization and analysis. PMID:21547007

  3. Real-time Graphics Processing Unit Based Fourier Domain Optical Coherence Tomography and Surgical Applications

    NASA Astrophysics Data System (ADS)

    Zhang, Kang

    2011-12-01

    In this dissertation, real-time Fourier domain optical coherence tomography (FD-OCT) capable of multi-dimensional micrometer-resolution imaging targeted specifically for microsurgical intervention applications was developed and studied. As a part of this work several ultra-high speed real-time FD-OCT imaging and sensing systems were proposed and developed. A real-time 4D (3D+time) OCT system platform using the graphics processing unit (GPU) to accelerate OCT signal processing, the imaging reconstruction, visualization, and volume rendering was developed. Several GPU based algorithms such as non-uniform fast Fourier transform (NUFFT), numerical dispersion compensation, and multi-GPU implementation were developed to improve the impulse response, SNR roll-off and stability of the system. Full-range complex-conjugate-free FD-OCT was also implemented on the GPU architecture to achieve doubled image range and improved SNR. These technologies overcome the imaging reconstruction and visualization bottlenecks widely exist in current ultra-high speed FD-OCT systems and open the way to interventional OCT imaging for applications in guided microsurgery. A hand-held common-path optical coherence tomography (CP-OCT) distance-sensor based microsurgical tool was developed and validated. Through real-time signal processing, edge detection and feed-back control, the tool was shown to be capable of track target surface and compensate motion. The micro-incision test using a phantom was performed using a CP-OCT-sensor integrated hand-held tool, which showed an incision error less than +/-5 microns, comparing to >100 microns error by free-hand incision. The CP-OCT distance sensor has also been utilized to enhance the accuracy and safety of optical nerve stimulation. Finally, several experiments were conducted to validate the system for surgical applications. One of them involved 4D OCT guided micro-manipulation using a phantom. Multiple volume renderings of one 3D data set were

  4. Creating Interactive Graphical Overlays in the Advanced Weather Interactive Processing System (AWIPS) Using Shapefiles and DGM Files

    NASA Technical Reports Server (NTRS)

    Barrett, Joe H., III; Lafosse, Richard; Hood, Doris; Hoeth, Brian

    2007-01-01

    Graphical overlays can be created in real-time in the Advanced Weather Interactive Processing System (AWIPS) using shapefiles or DARE Graphics Metafile (DGM) files. This presentation describes how to create graphical overlays on-the-fly for AWIPS, by using two examples of AWIPS applications that were created by the Applied Meteorology Unit (AMU). The first example is the Anvil Threat Corridor Forecast Tool, which produces a shapefile that depicts a graphical threat corridor of the forecast movement of thunderstorm anvil clouds, based on the observed or forecast upper-level winds. This tool is used by the Spaceflight Meteorology Group (SMG) and 45th Weather Squadron (45 WS) to analyze the threat of natural or space vehicle-triggered lightning over a location. The second example is a launch and landing trajectory tool that produces a DGM file that plots the ground track of space vehicles during launch or landing. The trajectory tool can be used by SMG and the 45 WS forecasters to analyze weather radar imagery along a launch or landing trajectory. Advantages of both file types will be listed.

  5. Large eddy simulations of turbulent flows on graphics processing units: Application to film-cooling flows

    NASA Astrophysics Data System (ADS)

    Shinn, Aaron F.

    Computational Fluid Dynamics (CFD) simulations can be very computationally expensive, especially for Large Eddy Simulations (LES) and Direct Numerical Simulations (DNS) of turbulent ows. In LES the large, energy containing eddies are resolved by the computational mesh, but the smaller (sub-grid) scales are modeled. In DNS, all scales of turbulence are resolved, including the smallest dissipative (Kolmogorov) scales. Clusters of CPUs have been the standard approach for such simulations, but an emerging approach is the use of Graphics Processing Units (GPUs), which deliver impressive computing performance compared to CPUs. Recently there has been great interest in the scientific computing community to use GPUs for general-purpose computation (such as the numerical solution of PDEs) rather than graphics rendering. To explore the use of GPUs for CFD simulations, an incompressible Navier-Stokes solver was developed for a GPU. This solver is capable of simulating unsteady laminar flows or performing a LES or DNS of turbulent ows. The Navier-Stokes equations are solved via a fractional-step method and are spatially discretized using the finite volume method on a Cartesian mesh. An immersed boundary method based on a ghost cell treatment was developed to handle flow past complex geometries. The implementation of these numerical methods had to suit the architecture of the GPU, which is designed for massive multithreading. The details of this implementation will be described, along with strategies for performance optimization. Validation of the GPU-based solver was performed for fundamental bench-mark problems, and a performance assessment indicated that the solver was over an order-of-magnitude faster compared to a CPU. The GPU-based Navier-Stokes solver was used to study film-cooling flows via Large Eddy Simulation. In modern gas turbine engines, the film-cooling method is used to protect turbine blades from hot combustion gases. Therefore, understanding the physics of

  6. CAI and Imagery: Interactive Computer Graphics for Teaching About Invisible Process. Technical Report No. 74.

    ERIC Educational Resources Information Center

    Rigney, Joseph W.; Lutz, Kathy A.

    In preparation for a study of using interactive computer graphics for training, some current theorizing about internal and external, digital and analog representational systems are reviewed. The possibility is considered that there are two, overlapping, internal, analog representational systems, one for organismic states and the other for external…

  7. Graphic Arts: The Press and Finishing Processes. Fourth Edition. Teacher Edition [and] Student Edition.

    ERIC Educational Resources Information Center

    Farajollahi, Karim; Ogle, Gary; Reed, William; Woodcock, Kenneth

    Part of a series of instructional materials for courses on graphic communication, this packet contains both teacher and student materials for seven units that cover the following topics: (1) offset press systems; (2) offset inks and dampening chemistry; (3) offset press operating procedures; (4) preventive maintenance and troubleshooting; (5) job…

  8. Graphic Arts: The Press and Finishing Processes. Fourth Edition. Teacher Edition [and] Student Edition.

    ERIC Educational Resources Information Center

    Farajollahi, Karim; Ogle, Gary; Reed, William; Woodcock, Kenneth

    Part of a series of instructional materials for courses on graphic communication, this packet contains both teacher and student materials for seven units that cover the following topics: (1) offset press systems; (2) offset inks and dampening chemistry; (3) offset press operating procedures; (4) preventive maintenance and troubleshooting; (5) job…

  9. Efficient particle-in-cell simulation of auroral plasma phenomena using a CUDA enabled graphics processing unit

    NASA Astrophysics Data System (ADS)

    Sewell, Stephen

    This thesis introduces a software framework that effectively utilizes low-cost commercially available Graphic Processing Units (GPUs) to simulate complex scientific plasma phenomena that are modeled using the Particle-In-Cell (PIC) paradigm. The software framework that was developed conforms to the Compute Unified Device Architecture (CUDA), a standard for general purpose graphic processing that was introduced by NVIDIA Corporation. This framework has been verified for correctness and applied to advance the state of understanding of the electromagnetic aspects of the development of the Aurora Borealis and Aurora Australis. For each phase of the PIC methodology, this research has identified one or more methods to exploit the problem's natural parallelism and effectively map it for execution on the graphic processing unit and its host processor. The sources of overhead that can reduce the effectiveness of parallelization for each of these methods have also been identified. One of the novel aspects of this research was the utilization of particle sorting during the grid interpolation phase. The final representation resulted in simulations that executed about 38 times faster than simulations that were run on a single-core general-purpose processing system. The scalability of this framework to larger problem sizes and future generation systems has also been investigated.

  10. Real-time 2D parallel windowed Fourier transform for fringe pattern analysis using Graphics Processing Unit.

    PubMed

    Gao, Wenjing; Huyen, Nguyen Thi Thanh; Loi, Ho Sy; Kemao, Qian

    2009-12-07

    In optical interferometers, fringe projection systems, and synthetic aperture radars, fringe patterns are common outcomes and usually degraded by unavoidable noises. The presence of noises makes the phase extraction and phase unwrapping challenging. Windowed Fourier transform (WFT) based algorithms have been proven to be effective for fringe pattern analysis to various applications. However, the WFT-based algorithms are computationally expensive, prohibiting them from real-time applications. In this paper, we propose a fast parallel WFT-based library using graphics processing units and computer unified device architecture. Real-time WFT-based algorithms are achieved with 4 frames per second in processing 256x256 fringe patterns. Up to 132x speedup is obtained for WFT-based algorithms using NVIDIA GTX295 graphics card than sequential C in quad-core 2.5GHz Intel(R)Xeon(R) CPU E5420.

  11. Application of computer generated color graphic techniques to the processing and display of three dimensional fluid dynamic data

    NASA Technical Reports Server (NTRS)

    Anderson, B. H.; Putt, C. W.; Giamati, C. C.

    1981-01-01

    Color coding techniques used in the processing of remote sensing imagery were adapted and applied to the fluid dynamics problems associated with turbofan mixer nozzles. The computer generated color graphics were found to be useful in reconstructing the measured flow field from low resolution experimental data to give more physical meaning to this information and in scanning and interpreting the large volume of computer generated data from the three dimensional viscous computer code used in the analysis.

  12. Compressed sensing reconstruction for whole-heart imaging with 3D radial trajectories: a graphics processing unit implementation.

    PubMed

    Nam, Seunghoon; Akçakaya, Mehmet; Basha, Tamer; Stehning, Christian; Manning, Warren J; Tarokh, Vahid; Nezafat, Reza

    2013-01-01

    A disadvantage of three-dimensional (3D) isotropic acquisition in whole-heart coronary MRI is the prolonged data acquisition time. Isotropic 3D radial trajectories allow undersampling of k-space data in all three spatial dimensions, enabling accelerated acquisition of the volumetric data. Compressed sensing (CS) reconstruction can provide further acceleration in the acquisition by removing the incoherent artifacts due to undersampling and improving the image quality. However, the heavy computational overhead of the CS reconstruction has been a limiting factor for its application. In this article, a parallelized implementation of an iterative CS reconstruction method for 3D radial acquisitions using a commercial graphics processing unit is presented. The execution time of the graphics processing unit-implemented CS reconstruction was compared with that of the C++ implementation, and the efficacy of the undersampled 3D radial acquisition with CS reconstruction was investigated in both phantom and whole-heart coronary data sets. Subsequently, the efficacy of CS in suppressing streaking artifacts in 3D whole-heart coronary MRI with 3D radial imaging and its convergence properties were studied. The CS reconstruction provides improved image quality (in terms of vessel sharpness and suppression of noise-like artifacts) compared with the conventional 3D gridding algorithm, and the graphics processing unit implementation greatly reduces the execution time of CS reconstruction yielding 34-54 times speed-up compared with C++ implementation. Copyright © 2012 Wiley Periodicals, Inc.

  13. Discontinuous Galerkin method with Gaussian artificial viscosity on graphical processing units for nonlinear acoustics

    NASA Astrophysics Data System (ADS)

    Tripathi, Bharat B.; Marchiano, Régis; Baskar, Sambandam; Coulouvrat, François

    2015-10-01

    Propagation of acoustical shock waves in complex geometry is a topic of interest in the field of nonlinear acoustics. For instance, simulation of Buzz Saw Noice requires the treatment of shock waves generated by the turbofan through the engines of aeroplanes with complex geometries and wall liners. Nevertheless, from a numerical point of view it remains a challenge. The two main hurdles are to take into account the complex geometry of the domain and to deal with the spurious oscillations (Gibbs phenomenon) near the discontinuities. In this work, first we derive the conservative hyperbolic system of nonlinear acoustics (up to quadratic nonlinear terms) using the fundamental equations of fluid dynamics. Then, we propose to adapt the classical nodal discontinuous Galerkin method to develop a high fidelity solver for nonlinear acoustics. The discontinuous Galerkin method is a hybrid of finite element and finite volume method and is very versatile to handle complex geometry. In order to obtain better performance, the method is parallelized on Graphical Processing Units. Like other numerical methods, discontinuous Galerkin method suffers with the problem of Gibbs phenomenon near the shock, which is a numerical artifact. Among the various ways to manage these spurious oscillations, we choose the method of parabolic regularization. Although, the introduction of artificial viscosity into the system is a popular way of managing shocks, we propose a new approach of introducing smooth artificial viscosity locally in each element, wherever needed. Firstly, a shock sensor using the linear coefficients of the spectral solution is used to locate the position of the discontinuities. Then, a viscosity coefficient depending on the shock sensor is introduced into the hyperbolic system of equations, only in the elements near the shock. The viscosity is applied as a two-dimensional Gaussian patch with its shape parameters depending on the element dimensions, referred here as Element

  14. A tool for mapping Single Nucleotide Polymorphisms using Graphics Processing Units

    PubMed Central

    2014-01-01

    Background Single Nucleotide Polymorphism (SNP) genotyping analysis is very susceptible to SNPs chromosomal position errors. As it is known, SNPs mapping data are provided along the SNP arrays without any necessary information to assess in advance their accuracy. Moreover, these mapping data are related to a given build of a genome and need to be updated when a new build is available. As a consequence, researchers often plan to remap SNPs with the aim to obtain more up-to-date SNPs chromosomal positions. In this work, we present G-SNPM a GPU (Graphics Processing Unit) based tool to map SNPs on a genome. Methods G-SNPM is a tool that maps a short sequence representative of a SNP against a reference DNA sequence in order to find the physical position of the SNP in that sequence. In G-SNPM each SNP is mapped on its related chromosome by means of an automatic three-stage pipeline. In the first stage, G-SNPM uses the GPU-based short-read mapping tool SOAP3-dp to parallel align on a reference chromosome its related sequences representative of a SNP. In the second stage G-SNPM uses another short-read mapping tool to remap the sequences unaligned or ambiguously aligned by SOAP3-dp (in this stage SHRiMP2 is used, which exploits specialized vector computing hardware to speed-up the dynamic programming algorithm of Smith-Waterman). In the last stage, G-SNPM analyzes the alignments obtained by SOAP3-dp and SHRiMP2 to identify the absolute position of each SNP. Results and conclusions To assess G-SNPM, we used it to remap the SNPs of some commercial chips. Experimental results shown that G-SNPM has been able to remap without ambiguity almost all SNPs. Based on modern GPUs, G-SNPM provides fast mappings without worsening the accuracy of the results. G-SNPM can be used to deal with specialized Genome Wide Association Studies (GWAS), as well as in annotation tasks that require to update the SNP mapping probes. PMID:24564714

  15. SU-E-J-91: FFT Based Medical Image Registration Using a Graphics Processing Unit (GPU).

    PubMed

    Luce, J; Hoggarth, M; Lin, J; Block, A; Roeske, J

    2012-06-01

    To evaluate the efficiency gains obtained from using a Graphics Processing Unit (GPU) to perform a Fourier Transform (FT) based image registration. Fourier-based image registration involves obtaining the FT of the component images, and analyzing them in Fourier space to determine the translations and rotations of one image set relative to another. An important property of FT registration is that by enlarging the images (adding additional pixels), one can obtain translations and rotations with sub-pixel resolution. The expense, however, is an increased computational time. GPUs may decrease the computational time associated with FT image registration by taking advantage of their parallel architecture to perform matrix computations much more efficiently than a Central Processor Unit (CPU). In order to evaluate the computational gains produced by a GPU, images with known translational shifts were utilized. A program was written in the Interactive Data Language (IDL; Exelis, Boulder, CO) to performCPU-based calculations. Subsequently, the program was modified using GPU bindings (Tech-X, Boulder, CO) to perform GPU-based computation on the same system. Multiple image sizes were used, ranging from 256×256 to 2304×2304. The time required to complete the full algorithm by the CPU and GPU were benchmarked and the speed increase was defined as the ratio of the CPU-to-GPU computational time. The ratio of the CPU-to- GPU time was greater than 1.0 for all images, which indicates the GPU is performing the algorithm faster than the CPU. The smallest improvement, a 1.21 ratio, was found with the smallest image size of 256×256, and the largest speedup, a 4.25 ratio, was observed with the largest image size of 2304×2304. GPU programming resulted in a significant decrease in computational time associated with a FT image registration algorithm. The inclusion of the GPU may provide near real-time, sub-pixel registration capability. © 2012 American Association of Physicists in

  16. Computer Graphics.

    ERIC Educational Resources Information Center

    Halpern, Jeanne W.

    1970-01-01

    Computer graphics have been called the most exciting development in computer technology. At the University of Michigan, three kinds of graphics output equipment are now being used: symbolic printers, line plotters or drafting devices, and cathode-ray tubes (CRT). Six examples are given that demonstrate the range of graphics use at the University.…

  17. Interpretation of Medical Imaging Data with a Mobile Application: A Mobile Digital Imaging Processing Environment

    PubMed Central

    Lin, Meng Kuan; Nicolini, Oliver; Waxenegger, Harald; Galloway, Graham J.; Ullmann, Jeremy F. P.; Janke, Andrew L.

    2013-01-01

    Digital Imaging Processing (DIP) requires data extraction and output from a visualization tool to be consistent. Data handling and transmission between the server and a user is a systematic process in service interpretation. The use of integrated medical services for management and viewing of imaging data in combination with a mobile visualization tool can be greatly facilitated by data analysis and interpretation. This paper presents an integrated mobile application and DIP service, called M-DIP. The objective of the system is to (1) automate the direct data tiling, conversion, pre-tiling of brain images from Medical Imaging NetCDF (MINC), Neuroimaging Informatics Technology Initiative (NIFTI) to RAW formats; (2) speed up querying of imaging measurement; and (3) display high-level of images with three dimensions in real world coordinates. In addition, M-DIP provides the ability to work on a mobile or tablet device without any software installation using web-based protocols. M-DIP implements three levels of architecture with a relational middle-layer database, a stand-alone DIP server, and a mobile application logic middle level realizing user interpretation for direct querying and communication. This imaging software has the ability to display biological imaging data at multiple zoom levels and to increase its quality to meet users’ expectations. Interpretation of bioimaging data is facilitated by an interface analogous to online mapping services using real world coordinate browsing. This allows mobile devices to display multiple datasets simultaneously from a remote site. M-DIP can be used as a measurement repository that can be accessed by any network environment, such as a portable mobile or tablet device. In addition, this system and combination with mobile applications are establishing a virtualization tool in the neuroinformatics field to speed interpretation services. PMID:23847587

  18. Interpretation of medical imaging data with a mobile application: a mobile digital imaging processing environment.

    PubMed

    Lin, Meng Kuan; Nicolini, Oliver; Waxenegger, Harald; Galloway, Graham J; Ullmann, Jeremy F P; Janke, Andrew L

    2013-01-01

    Digital Imaging Processing (DIP) requires data extraction and output from a visualization tool to be consistent. Data handling and transmission between the server and a user is a systematic process in service interpretation. The use of integrated medical services for management and viewing of imaging data in combination with a mobile visualization tool can be greatly facilitated by data analysis and interpretation. This paper presents an integrated mobile application and DIP service, called M-DIP. The objective of the system is to (1) automate the direct data tiling, conversion, pre-tiling of brain images from Medical Imaging NetCDF (MINC), Neuroimaging Informatics Technology Initiative (NIFTI) to RAW formats; (2) speed up querying of imaging measurement; and (3) display high-level of images with three dimensions in real world coordinates. In addition, M-DIP provides the ability to work on a mobile or tablet device without any software installation using web-based protocols. M-DIP implements three levels of architecture with a relational middle-layer database, a stand-alone DIP server, and a mobile application logic middle level realizing user interpretation for direct querying and communication. This imaging software has the ability to display biological imaging data at multiple zoom levels and to increase its quality to meet users' expectations. Interpretation of bioimaging data is facilitated by an interface analogous to online mapping services using real world coordinate browsing. This allows mobile devices to display multiple datasets simultaneously from a remote site. M-DIP can be used as a measurement repository that can be accessed by any network environment, such as a portable mobile or tablet device. In addition, this system and combination with mobile applications are establishing a virtualization tool in the neuroinformatics field to speed interpretation services.

  19. Accelerating quantum chemistry calculations with graphical processing units - toward in high-density (HD) silico drug discovery.

    PubMed

    Hagiwara, Yohsuke; Ohno, Kazuki; Orita, Masaya; Koga, Ryota; Endo, Toshio; Akiyama, Yutaka; Sekijima, Masakazu

    2013-09-01

    The growing power of central processing units (CPU) has made it possible to use quantum mechanical (QM) calculations for in silico drug discovery. However, limited CPU power makes large-scale in silico screening such as virtual screening with QM calculations a challenge. Recently, general-purpose computing on graphics processing units (GPGPU) has offered an alternative, because of its significantly accelerated computational time over CPU. Here, we review a GPGPU-based supercomputer, TSUBAME2.0, and its promise for next generation in silico drug discovery, in high-density (HD) silico drug discovery.

  20. Ab initio nonadiabatic dynamics of multichromophore complexes: a scalable graphical-processing-unit-accelerated exciton framework.

    PubMed

    Sisto, Aaron; Glowacki, David R; Martinez, Todd J

    2014-09-16

    ("fragmenting") a molecular system and then stitching it back together. In this Account, we address both of these problems, the first by using graphical processing units (GPUs) and electronic structure algorithms tuned for these architectures and the second by using an exciton model as a framework in which to stitch together the solutions of the smaller problems. The multitiered parallel framework outlined here is aimed at nonadiabatic dynamics simulations on large supramolecular multichromophoric complexes in full atomistic detail. In this framework, the lowest tier of parallelism involves GPU-accelerated electronic structure theory calculations, for which we summarize recent progress in parallelizing the computation and use of electron repulsion integrals (ERIs), which are the major computational bottleneck in both density functional theory (DFT) and time-dependent density functional theory (TDDFT). The topmost tier of parallelism relies on a distributed memory framework, in which we build an exciton model that couples chromophoric units. Combining these multiple levels of parallelism allows access to ground and excited state dynamics for large multichromophoric assemblies. The parallel excitonic framework is in good agreement with much more computationally demanding TDDFT calculations of the full assembly.

  1. FamSeq: a variant calling program for family-based sequencing data using graphics processing units.

    PubMed

    Peng, Gang; Fan, Yu; Wang, Wenyi

    2014-10-01

    Various algorithms have been developed for variant calling using next-generation sequencing data, and various methods have been applied to reduce the associated false positive and false negative rates. Few variant calling programs, however, utilize the pedigree information when the family-based sequencing data are available. Here, we present a program, FamSeq, which reduces both false positive and false negative rates by incorporating the pedigree information from the Mendelian genetic model into variant calling. To accommodate variations in data complexity, FamSeq consists of four distinct implementations of the Mendelian genetic model: the Bayesian network algorithm, a graphics processing unit version of the Bayesian network algorithm, the Elston-Stewart algorithm and the Markov chain Monte Carlo algorithm. To make the software efficient and applicable to large families, we parallelized the Bayesian network algorithm that copes with pedigrees with inbreeding loops without losing calculation precision on an NVIDIA graphics processing unit. In order to compare the difference in the four methods, we applied FamSeq to pedigree sequencing data with family sizes that varied from 7 to 12. When there is no inbreeding loop in the pedigree, the Elston-Stewart algorithm gives analytical results in a short time. If there are inbreeding loops in the pedigree, we recommend the Bayesian network method, which provides exact answers. To improve the computing speed of the Bayesian network method, we parallelized the computation on a graphics processing unit. This allowed the Bayesian network method to process the whole genome sequencing data of a family of 12 individuals within two days, which was a 10-fold time reduction compared to the time required for this computation on a central processing unit.

  2. [Use of spreadsheet for statistical and graphical processing of records from the ambulatory blood pressure monitor Spacelabs 90207].

    PubMed

    Borges, N; Polónia, J

    1993-04-01

    The introduction of portable devices for non-invasive ambulatory blood-pressure measurement is recognized as an advance in the study of human arterial hypertension, allowing a significant improvement in the selection of hypertensive patients as well as in the analysis of the effects of antihypertensive drugs during clinical trials. The Spacelabs 90207 is a recent example of this kind of apparatus, possessing high levels of portability and being highly classified in validation studies. Nevertheless, the software of this apparatus (like other similar devices) has severe limitations in what concerns the calculation of the area under the curve of blood pressure during the time of measurement, as well as in the possibility of grouping several records in a database for easy statistic and graphic analysis of different groups of records. In order to overcome these difficulties, the authors describe the development of a group of programs, using Microsoft Excel v3.0 spreadsheets and macros, that allow a direct import of individual files from the Spacelabs software to a spreadsheet and its further processing in three phases. These three phases, which we designated by "conversion", "export to database" and "statistic and graphic analysis", will permit an easy and fast statistic and graphic analysis of selected groups of records.

  3. Using wesBench to Study the Rendering Performance of Graphics Processing Units

    SciTech Connect

    Bethel, Edward W

    2010-01-08

    Graphics operations consist of two broad operations. The first, which we refer to here as vertex operations, consists of transformation, lighting, primitive assembly, and so forth. The second, which we refer to as pixel or fragment operations, consist of rasterization, texturing, scissoring, blending, and fill. Overall GPU rendering performance is a function of throughput of both these interdependent stages: if one stage is slower than the other, the faster stage will be forced to run more slowly and overall rendering performance will be adversely affected. This relationship is commutative: if the later stage has a greater workload than the earlier stage, the earlier stage will be forced to 'slow down.' For example, a large triangle that covers many screen pixels will incur a very small amount of work in the vertex stage while at the same time incurring a relatively large amount of work in the fragment stage. Rendering performance of a scene consisting of many large-area triangles will be limited by throughput of the fragment stage, which will have relatively more work than the vertex stage. There are two main objectives for this document. First, we introduce a new graphics benchmark, wesBench, which is useful for measuring performance of both stages of the rendering pipeline under varying conditions. Second, we present its methodology for measuring performance and show results of several performance measurement studies aimed at producing better understanding of GPU rendering performance characteristics and limits under varying configurations. First, in Section 2, we explore the 'crossover' point between geometry and rasterization. Second, in Section 3, we explore additional performance characteristics, some of which are ill- or un-documented. Lastly, several appendices provide additional material concerning problems with the gfxbench benchmark, and details about the new wesBench graphics benchmark.

  4. High mobility, printable, and solution-processed graphene electronics.

    PubMed

    Wang, Shuai; Ang, Priscilla Kailian; Wang, Ziqian; Tang, Ai Ling Lena; Thong, John T L; Loh, Kian Ping

    2010-01-01

    The ability to print graphene sheets onto large scale, flexible substrates holds promise for large scale, transparent electronics on flexible substrates. Solution processable graphene sheets derived from graphite can form stable dispersions in solutions and are amenable to bulk scale processing and ink jet printing. However, the electrical conductivity and carrier mobilities of this material are usually reported to be orders of magnitude poorer than that of the mechanically cleaved counterpart due to its higher density of defects, which restricts its use in electronics. Here, we show that by optimizing several key factors in processing, we are able to fabricate high mobility graphene films derived from large sized graphene oxide sheets, which paves the way for all-carbon post-CMOS electronics. All-carbon source-drain channel electronics fabricated from such films exhibit significantly improved transport characteristics, with carrier mobilities of 365 cm(2)/(V.s) for hole and 281 cm(2)/(V.s) for electron, measured in air at room temperature. In particular, intrinsic mobility as high as 5000 cm(2)/(V.s) can be obtained from such solution-processed graphene films when ionic screening is applied to nullify the Coulombic scattering by charged impurities.

  5. Business process analysis of a foodborne outbreak investigation mobile system

    NASA Astrophysics Data System (ADS)

    Nowicki, T.; Waszkowski, R.; Saniuk, A.

    2016-08-01

    Epidemiological investigation during an outbreak of food-borne disease requires taking a number of activities carried out in the field. This results in a restriction of access to current data about the epidemic and reducing the possibility of transferring information from the field to headquarters. This problem can be solved by using an appropriate system of mobile devices. The purpose of this paper is to present the IT solution based on the central repository for epidemiological investigations and mobile devices designed for use in the field. Based on such a solution business processes can be properly rebuild in a way to achieve better results in the activities of health inspectors.

  6. [Dynamic Pulse Signal Processing and Analyzing in Mobile System].

    PubMed

    Chou, Yongxin; Zhang, Aihua; Ou, Jiqing; Qi, Yusheng

    2015-09-01

    In order to derive dynamic pulse rate variability (DPRV) signal from dynamic pulse signal in real time, a method for extracting DPRV signal was proposed and a portable mobile monitoring system was designed. The system consists of a front end for collecting and wireless sending pulse signal and a mobile terminal. The proposed method is employed to extract DPRV from dynamic pulse signal in mobile terminal, and the DPRV signal is analyzed both in the time domain and the frequency domain and also with non-linear method in real time. The results show that the proposed method can accurately derive DPRV signal in real time, the system can be used for processing and analyzing DPRV signal in real time.

  7. NATURAL graphics

    NASA Technical Reports Server (NTRS)

    Jones, R. H.

    1984-01-01

    The hardware and software developments in computer graphics are discussed. Major topics include: system capabilities, hardware design, system compatibility, and software interface with the data base management system.

  8. Image processing for navigation on a mobile embedded platform: design of an autonomous mobile robot

    NASA Astrophysics Data System (ADS)

    Loose, Harald; Lemke, Christiane; Papazov, Chavdar

    2006-02-01

    This paper deals with intelligent mobile platforms connected to a camera controlled by a small hardware-platform called RCUBE. This platform is able to provide features of a typical actuator-sensor board with various inputs and outputs as well as computing power and image recognition capabilities. Several intelligent autonomous RCBUE devices can be equipped and programmed to participate in the BOSPORUS network. These components form an intelligent network for gathering sensor and image data, sensor data fusion, navigation and control of mobile platforms. The RCUBE platform provides a standalone solution for image processing, which will be explained and presented. It plays a major role for several components in a reference implementation of the BOSPORUS system. On the one hand, intelligent cameras will be positioned in the environment, analyzing the events from a fixed point of view and sharing their perceptions with other components in the system. On the other hand, image processing results will contribute to a reliable navigation of a mobile system, which is crucially important. Fixed landmarks and other objects appropriate for determining the position of a mobile system can be recognized. For navigation other methods are added, i.e. GPS calculations and odometers.

  9. Adaptive step ODE algorithms for the 3D simulation of electric heart activity with graphics processing units.

    PubMed

    Garcia-Molla, V M; Liberos, A; Vidal, A; Guillem, M S; Millet, J; Gonzalez, A; Martinez-Zaldivar, F J; Climent, A M

    2014-01-01

    In this paper we studied the implementation and performance of adaptive step methods for large systems of ordinary differential equations systems in graphics processing units, focusing on the simulation of three-dimensional electric cardiac activity. The Rush-Larsen method was applied in all the implemented solvers to improve efficiency. We compared the adaptive methods with the fixed step methods, and we found that the fixed step methods can be faster while the adaptive step methods are better in terms of accuracy and robustness. © 2013 Elsevier Ltd. Published by Elsevier Ltd. All rights reserved.

  10. An atomic orbital-based formulation of the complete active space self-consistent field method on graphical processing units

    SciTech Connect

    Hohenstein, Edward G.; Luehr, Nathan; Ufimtsev, Ivan S.; Martínez, Todd J.

    2015-06-14

    Despite its importance, state-of-the-art algorithms for performing complete active space self-consistent field (CASSCF) computations have lagged far behind those for single reference methods. We develop an algorithm for the CASSCF orbital optimization that uses sparsity in the atomic orbital (AO) basis set to increase the applicability of CASSCF. Our implementation of this algorithm uses graphical processing units (GPUs) and has allowed us to perform CASSCF computations on molecular systems containing more than one thousand atoms. Additionally, we have implemented analytic gradients of the CASSCF energy; the gradients also benefit from GPU acceleration as well as sparsity in the AO basis.

  11. Visualization of complex processes in lipid systems using computer simulations and molecular graphics.

    PubMed

    Telenius, Jelena; Vattulainen, Ilpo; Monticelli, Luca

    2009-01-01

    Computer simulation has become an increasingly popular tool in the study of lipid membranes, complementing experimental techniques by providing information on structure and dynamics at high spatial and temporal resolution. Molecular visualization is the most powerful way to represent the results of molecular simulations, and can be used to illustrate complex transformations of lipid aggregates more easily and more effectively than written text. In this chapter, we review some basic aspects of simulation methodologies commonly employed in the study of lipid membranes and we describe a few examples of complex phenomena that have been recently investigated using molecular simulations. We then explain how molecular visualization provides added value to computational work in the field of biological membranes, and we conclude by listing a few molecular graphics packages widely used in scientific publications.

  12. Graphics processing unit accelerated intensity-based optical coherence tomography angiography using differential frames with real-time motion correction.

    PubMed

    Watanabe, Yuuki; Takahashi, Yuhei; Numazawa, Hiroshi

    2014-02-01

    We demonstrate intensity-based optical coherence tomography (OCT) angiography using the squared difference of two sequential frames with bulk-tissue-motion (BTM) correction. This motion correction was performed by minimization of the sum of the pixel values using axial- and lateral-pixel-shifted structural OCT images. We extract the BTM-corrected image from a total of 25 calculated OCT angiographic images. Image processing was accelerated by a graphics processing unit (GPU) with many stream processors to optimize the parallel processing procedure. The GPU processing rate was faster than that of a line scan camera (46.9 kHz). Our OCT system provides the means of displaying structural OCT images and BTM-corrected OCT angiographic images in real time.

  13. CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units.

    PubMed

    Liu, Yongchao; Maskell, Douglas L; Schmidt, Bertil

    2009-05-06

    The Smith-Waterman algorithm is one of the most widely used tools for searching biological sequence databases due to its high sensitivity. Unfortunately, the Smith-Waterman algorithm is computationally demanding, which is further compounded by the exponential growth of sequence databases. The recent emergence of many-core architectures, and their associated programming interfaces, provides an opportunity to accelerate sequence database searches using commonly available and inexpensive hardware. Our CUDASW++ implementation (benchmarked on a single-GPU NVIDIA GeForce GTX 280 graphics card and a dual-GPU GeForce GTX 295 graphics card) provides a significant performance improvement compared to other publicly available implementations, such as SWPS3, CBESW, SW-CUDA, and NCBI-BLAST. CUDASW++ supports query sequences of length up to 59K and for query sequences ranging in length from 144 to 5,478 in Swiss-Prot release 56.6, the single-GPU version achieves an average performance of 9.509 GCUPS with a lowest performance of 9.039 GCUPS and a highest performance of 9.660 GCUPS, and the dual-GPU version achieves an average performance of 14.484 GCUPS with a lowest performance of 10.660 GCUPS and a highest performance of 16.087 GCUPS. CUDASW++ is publicly available open-source software. It provides a significant performance improvement for Smith-Waterman-based protein sequence database searches by fully exploiting the compute capability of commonly used CUDA-enabled low-cost GPUs.

  14. A real-time GNSS-R system based on software-defined radio and graphics processing units

    NASA Astrophysics Data System (ADS)

    Hobiger, Thomas; Amagai, Jun; Aida, Masanori; Narita, Hideki

    2012-04-01

    Reflected signals of the Global Navigation Satellite System (GNSS) from the sea or land surface can be utilized to deduce and monitor physical and geophysical parameters of the reflecting area. Unlike most other remote sensing techniques, GNSS-Reflectometry (GNSS-R) operates as a passive radar that takes advantage from the increasing number of navigation satellites that broadcast their L-band signals. Thereby, most of the GNSS-R receiver architectures are based on dedicated hardware solutions. Software-defined radio (SDR) technology has advanced in the recent years and enabled signal processing in real-time, which makes it an ideal candidate for the realization of a flexible GNSS-R system. Additionally, modern commodity graphic cards, which offer massive parallel computing performances, allow to handle the whole signal processing chain without interfering with the PC's CPU. Thus, this paper describes a GNSS-R system which has been developed on the principles of software-defined radio supported by General Purpose Graphics Processing Units (GPGPUs), and presents results from initial field tests which confirm the anticipated capability of the system.

  15. Graphic Storytelling

    ERIC Educational Resources Information Center

    Thompson, John

    2009-01-01

    Graphic storytelling is a medium that allows students to make and share stories, while developing their art communication skills. American comics today are more varied in genre, approach, and audience than ever before. When considering the impact of Japanese manga on the youth, graphic storytelling emerges as a powerful player in pop culture. In…

  16. Business Graphics

    NASA Technical Reports Server (NTRS)

    1987-01-01

    Genigraphics Corporation's Masterpiece 8770 FilmRecorder is an advanced high resolution system designed to improve and expand a company's in-house graphics production. GRAFTIME/software package was designed to allow office personnel with minimal training to produce professional level graphics for business communications and presentations. Products are no longer being manufactured.

  17. Graphic Storytelling

    ERIC Educational Resources Information Center

    Thompson, John

    2009-01-01

    Graphic storytelling is a medium that allows students to make and share stories, while developing their art communication skills. American comics today are more varied in genre, approach, and audience than ever before. When considering the impact of Japanese manga on the youth, graphic storytelling emerges as a powerful player in pop culture. In…

  18. Accelerated Multi-Dimensional RF Pulse Design for Parallel Transmission Using Concurrent Computation on Multiple Graphics Processing Units

    PubMed Central

    Deng, Weiran; Yang, Cungeng; Stenger, V. Andrew

    2010-01-01

    Multi-dimensional RF pulses are of current interest due to their promise for improving high field imaging as well as for optimizing parallel transmission methods. One major drawback is that the computation time of numerically designed multi-dimensional RF pulses increases rapidly with their resolution and number of transmitters. This is critical because the construction of multi-dimensional RF pulses often needs to be in real time. The use of graphics processing units for computations is a recent approach for accelerating image reconstruction applications. We propose the use of graphics processing units for the design of multi-dimensional RF pulses including the utilization of parallel transmitters. Using a desktop computer with four NVIDIA Tesla C1060 computing processors, we found acceleration factors on the order of twenty for standard eight-transmitter 2D spiral RF pulses with a 64 × 64 excitation resolution and a ten-microsecond dwell time. We also show that even greater acceleration factors can be achieved for more complex RF pulses. PMID:21264929

  19. High mobility solution processed organic thin film transistors

    NASA Astrophysics Data System (ADS)

    Park, Sung Kyu

    To date, most high mobility organic thin film transistors (OTFTs) have used vapor-deposited organic semiconductors as the active material. The OTFT fabrication processes for vapor deposited organic materials are not so different from conventional inorganic TFT fabrication. Therefore they are constrained by similar production costs with some savings related to reduced processing temperatures and low cost substrates. Solution-processed OTFTs are of interest because of their compatibility with roll-to-roll processing which may allow simplified device fabrication and further reduced processing costs. In the project outlined for this thesis, high performance solution processed small molecule OTFTs were developed for their application in integrated circuits and flat panel displays. 6,13-bis(triisopropylsilylethynyl) pentacene (TIPS-pentacene) and fluorinated 5,11-bis(triethylsilylethynyl) anthradithiophene (F-TES ADT) were used as high performance solution processible small molecules. Using TIPS-pentacene and F-TES ADT in combination with a variety of device fabrication techniques, mobilities greater than 1.2 cm2/V·s and 3 cm2/V·s respectively have been obtained. These devices were made using a drop casting process and represent the best mobility for solution processed OTFTs to date. Additionally, using the F-TES ADT, spin cast OTFTs which show mobilities of greater than 1.0 cm2/V·s with relatively good film uniformity have been obtained. Film growth which is considerably more ordered on pentafluorobenzene (PFBT) treated Au surfaces, and on samples with patterned PFBT-Au structures grains appear to grow out from the PFBT-Au areas into the oxide areas were observed. This results in a substantial variation in field effect mobility with gate length as grains growing from the source and drain electrodes meet and overlap. Spun F-TES ADT OTFTs fabricated with films deposited on PFBT-treated Au electrodes show mobilities of 0.1--0.5 cm2/V·s from a toluene solution and 0

  20. Real-time reconstruction of sensitivity encoded radial magnetic resonance imaging using a graphics processing unit.

    PubMed

    Sørensen, Thomas Sangild; Atkinson, David; Schaeffter, Tobias; Hansen, Michael Schacht

    2009-12-01

    A barrier to the adoption of non-Cartesian parallel magnetic resonance imaging for real-time applications has been the times required for the image reconstructions. These times have exceeded the underlying acquisition time thus preventing real-time display of the acquired images. We present a reconstruction algorithm for commodity graphics hardware (GPUs) to enable real time reconstruction of sensitivity encoded radial imaging (radial SENSE). We demonstrate that a radial profile order based on the golden ratio facilitates reconstruction from an arbitrary number of profiles. This allows the temporal resolution to be adjusted on the fly. A user adaptable regularization term is also included and, particularly for highly undersampled data, used to interactively improve the reconstruction quality. Each reconstruction is fully self-contained from the profile stream, i.e., the required coil sensitivity profiles, sampling density compensation weights, regularization terms, and noise estimates are computed in real-time from the acquisition data itself. The reconstruction implementation is verified using a steady state free precession (SSFP) pulse sequence and quantitatively evaluated. Three applications are demonstrated; real-time imaging with real-time SENSE 1) or k- t SENSE 2) reconstructions, and 3) offline reconstruction with interactive adjustment of reconstruction settings.

  1. Graphical Man/Machine Communications

    DTIC Science & Technology

    Progress is reported concerning the use of computer controlled graphical displays in the areas of radiaton diffusion and hydrodynamics, general...ventricular dynamics. Progress is continuing on the use of computer graphics in architecture. Some progress in halftone graphics is reported with no basic...developments presented. Colored halftone perspective pictures are being used to represent multivariable situations. Nonlinear waveform processing is

  2. Graphic pathogeographies.

    PubMed

    Donovan, Courtney

    2014-09-01

    This paper focuses on the graphic pathogeographies in David B.'s Epileptic and David Small's Stitches: A Memoir to highlight the significance of geographic concepts in graphic novels of health and disease. Despite its importance in such works, few scholars have examined the role of geography in their narrative and structure. I examine the role of place in Epileptic and Stitches to extend the academic discussion on graphic novels of health and disease and identify how such works bring attention to the role of geography in the individual's engagement with health, disease, and related settings.

  3. Graphics Processing Unit-Accelerated Code for Computing Second-Order Wiener Kernels and Spike-Triggered Covariance.

    PubMed

    Mano, Omer; Clark, Damon A

    2017-01-01

    Sensory neuroscience seeks to understand and predict how sensory neurons respond to stimuli. Nonlinear components of neural responses are frequently characterized by the second-order Wiener kernel and the closely-related spike-triggered covariance (STC). Recent advances in data acquisition have made it increasingly common and computationally intensive to compute second-order Wiener kernels/STC matrices. In order to speed up this sort of analysis, we developed a graphics processing unit (GPU)-accelerated module that computes the second-order Wiener kernel of a system's response to a stimulus. The generated kernel can be easily transformed for use in standard STC analyses. Our code speeds up such analyses by factors of over 100 relative to current methods that utilize central processing units (CPUs). It works on any modern GPU and may be integrated into many data analysis workflows. This module accelerates data analysis so that more time can be spent exploring parameter space and interpreting data.

  4. GPU MrBayes V3.1: MrBayes on Graphics Processing Units for Protein Sequence Data.

    PubMed

    Pang, Shuai; Stones, Rebecca J; Ren, Ming-Ming; Liu, Xiao-Guang; Wang, Gang; Xia, Hong-ju; Wu, Hao-Yang; Liu, Yang; Xie, Qiang

    2015-09-01

    We present a modified GPU (graphics processing unit) version of MrBayes, called ta(MC)(3) (GPU MrBayes V3.1), for Bayesian phylogenetic inference on protein data sets. Our main contributions are 1) utilizing 64-bit variables, thereby enabling ta(MC)(3) to process larger data sets than MrBayes; and 2) to use Kahan summation to improve accuracy, convergence rates, and consequently runtime. Versus the current fastest software, we achieve a speedup of up to around 2.5 (and up to around 90 vs. serial MrBayes), and more on multi-GPU hardware. GPU MrBayes V3.1 is available from http://sourceforge.net/projects/mrbayes-gpu/.

  5. Parallelized multi–graphics processing unit framework for high-speed Gabor-domain optical coherence microscopy

    PubMed Central

    Tankam, Patrice; Santhanam, Anand P.; Lee, Kye-Sung; Won, Jungeun; Canavesi, Cristina; Rolland, Jannick P.

    2014-01-01

    Abstract. Gabor-domain optical coherence microscopy (GD-OCM) is a volumetric high-resolution technique capable of acquiring three-dimensional (3-D) skin images with histological resolution. Real-time image processing is needed to enable GD-OCM imaging in a clinical setting. We present a parallelized and scalable multi-graphics processing unit (GPU) computing framework for real-time GD-OCM image processing. A parallelized control mechanism was developed to individually assign computation tasks to each of the GPUs. For each GPU, the optimal number of amplitude-scans (A-scans) to be processed in parallel was selected to maximize GPU memory usage and core throughput. We investigated five computing architectures for computational speed-up in processing 1000×1000 A-scans. The proposed parallelized multi-GPU computing framework enables processing at a computational speed faster than the GD-OCM image acquisition, thereby facilitating high-speed GD-OCM imaging in a clinical setting. Using two parallelized GPUs, the image processing of a 1×1×0.6  mm3 skin sample was performed in about 13 s, and the performance was benchmarked at 6.5 s with four GPUs. This work thus demonstrates that 3-D GD-OCM data may be displayed in real-time to the examiner using parallelized GPU processing. PMID:24695868

  6. Parallelized multi-graphics processing unit framework for high-speed Gabor-domain optical coherence microscopy.

    PubMed

    Tankam, Patrice; Santhanam, Anand P; Lee, Kye-Sung; Won, Jungeun; Canavesi, Cristina; Rolland, Jannick P

    2014-07-01

    Gabor-domain optical coherence microscopy (GD-OCM) is a volumetric high-resolution technique capable of acquiring three-dimensional (3-D) skin images with histological resolution. Real-time image processing is needed to enable GD-OCM imaging in a clinical setting. We present a parallelized and scalable multi-graphics processing unit (GPU) computing framework for real-time GD-OCM image processing. A parallelized control mechanism was developed to individually assign computation tasks to each of the GPUs. For each GPU, the optimal number of amplitude-scans (A-scans) to be processed in parallel was selected to maximize GPU memory usage and core throughput. We investigated five computing architectures for computational speed-up in processing 1000×1000 A-scans. The proposed parallelized multi-GPU computing framework enables processing at a computational speed faster than the GD-OCM image acquisition, thereby facilitating high-speed GD-OCM imaging in a clinical setting. Using two parallelized GPUs, the image processing of a 1×1×0.6  mm3 skin sample was performed in about 13 s, and the performance was benchmarked at 6.5 s with four GPUs. This work thus demonstrates that 3-D GD-OCM data may be displayed in real-time to the examiner using parallelized GPU processing.

  7. Parallelized multi-graphics processing unit framework for high-speed Gabor-domain optical coherence microscopy

    NASA Astrophysics Data System (ADS)

    Tankam, Patrice; Santhanam, Anand P.; Lee, Kye-Sung; Won, Jungeun; Canavesi, Cristina; Rolland, Jannick P.

    2014-07-01

    Gabor-domain optical coherence microscopy (GD-OCM) is a volumetric high-resolution technique capable of acquiring three-dimensional (3-D) skin images with histological resolution. Real-time image processing is needed to enable GD-OCM imaging in a clinical setting. We present a parallelized and scalable multi-graphics processing unit (GPU) computing framework for real-time GD-OCM image processing. A parallelized control mechanism was developed to individually assign computation tasks to each of the GPUs. For each GPU, the optimal number of amplitude-scans (A-scans) to be processed in parallel was selected to maximize GPU memory usage and core throughput. We investigated five computing architectures for computational speed-up in processing 1000×1000 A-scans. The proposed parallelized multi-GPU computing framework enables processing at a computational speed faster than the GD-OCM image acquisition, thereby facilitating high-speed GD-OCM imaging in a clinical setting. Using two parallelized GPUs, the image processing of a 1×1×0.6 mm3 skin sample was performed in about 13 s, and the performance was benchmarked at 6.5 s with four GPUs. This work thus demonstrates that 3-D GD-OCM data may be displayed in real-time to the examiner using parallelized GPU processing.

  8. The Metropolis Monte Carlo method with CUDA enabled Graphic Processing Units

    SciTech Connect

    Hall, Clifford; Ji, Weixiao; Blaisten-Barojas, Estela

    2014-02-01

    We present a CPU–GPU system for runtime acceleration of large molecular simulations using GPU computation and memory swaps. The memory architecture of the GPU can be used both as container for simulation data stored on the graphics card and as floating-point code target, providing an effective means for the manipulation of atomistic or molecular data on the GPU. To fully take advantage of this mechanism, efficient GPU realizations of algorithms used to perform atomistic and molecular simulations are essential. Our system implements a versatile molecular engine, including inter-molecule interactions and orientational variables for performing the Metropolis Monte Carlo (MMC) algorithm, which is one type of Markov chain Monte Carlo. By combining memory objects with floating-point code fragments we have implemented an MMC parallel engine that entirely avoids the communication time of molecular data at runtime. Our runtime acceleration system is a forerunner of a new class of CPU–GPU algorithms exploiting memory concepts combined with threading for avoiding bus bandwidth and communication. The testbed molecular system used here is a condensed phase system of oligopyrrole chains. A benchmark shows a size scaling speedup of 60 for systems with 210,000 pyrrole monomers. Our implementation can easily be combined with MPI to connect in parallel several CPU–GPU duets. -- Highlights: •We parallelize the Metropolis Monte Carlo (MMC) algorithm on one CPU—GPU duet. •The Adaptive Tempering Monte Carlo employs MMC and profits from this CPU—GPU implementation. •Our benchmark shows a size scaling-up speedup of 62 for systems with 225,000 particles. •The testbed involves a polymeric system of oligopyrroles in the condensed phase. •The CPU—GPU parallelization includes dipole—dipole and Mie—Jones classic potentials.

  9. Enabling customer self service through image processing on mobile devices

    NASA Astrophysics Data System (ADS)

    Kliche, Ingmar; Hellmann, Sascha; Kreutel, Jörn

    2013-03-01

    Our paper will outline the results of a research project that employs image processing for the automatic diagnosis of technical devices whose internal state is communicated through visual displays. In particular, we developed a method for detecting exceptional states of retail wireless routers, analysing the state and blinking behaviour of the LEDs that make up most routers' user interface. The method was made configurable by means of abstracting away from a particular device's display properties, thus being able to analyse a whole range of different devices whose displays are covered by our abstraction. The method of analysis and its configuration mechanism were implemented as a native mobile application for the Android Platform. It employs the local camera of mobile devices for capturing a router's state, and uses overlaid visual hints for guiding the user toward that perspective from where an analysis is possible.

  10. [Communicative process in the mobile emergency service (SAMU/192)].

    PubMed

    dos Santos, Maria Claudia; Bernardes, Andrea; Gabriel, Carmen Silvia; Evora, Yolanda Dora Martinez; Rocha, Fernanda Ludmilla Rossi

    2012-03-01

    This study aims to characterize the communication process among nursing assistants who work in vehicles of the basic life support of the mobile emergency service, in the coordination of this service, and in the unified medical regulation service in a city of the state of São Paulo, Brazil. This descriptive and qualitative research used the thematic content analysis for data analysis. Semi-structured interviews were used for the data collection, which was held in January, 2010. Results show diffculties in communication with both the medical regulation service and the coordination. Among the most highlighted aspects are failures during the radio transmission, lack of qualified radio operators, difficult access to the coordination and lack of supervision by nurses. However, it was possible to detect solutions that aim to improve the communication ana consequently, the service offered by the mobile emergency service.

  11. Processing and rendering of Fourier domain optical coherence tomography images at a line rate over 524 kHz using a graphics processing unit.

    PubMed

    Rasakanthan, Janarthanan; Sugden, Kate; Tomlins, Peter H

    2011-02-01

    In Fourier domain optical coherence tomography (FD-OCT), a large amount of interference data needs to be resampled from the wavelength domain to the wavenumber domain prior to Fourier transformation. We present an approach to optimize this data processing, using a graphics processing unit (GPU) and parallel processing algorithms. We demonstrate an increased processing and rendering rate over that previously reported by using GPU paged memory to render data in the GPU rather than copying back to the CPU. This avoids unnecessary and slow data transfer, enabling a processing and display rate of well over 524,000 A-scan∕s for a single frame. To the best of our knowledge this is the fastest processing demonstrated to date and the first time that FD-OCT processing and rendering has been demonstrated entirely on a GPU.

  12. IGIS (Interactive Geologic Interpretation System) computer-aided photogeologic mapping with image processing, graphics and CAD/CAM capabilities

    SciTech Connect

    McGuffie, B.A.; Johnson, L.F.; Alley, R.E.; Lang, H.R. )

    1989-10-01

    Advances in computer technology are changing the way geologists integrate and use data. Although many geoscience disciplines are absolutely dependent upon computer processing, photogeological and map interpretation computer procedures are just now being developed. Historically, geologists collected data in the field and mapped manually on a topographic map or aerial photographic base. New software called the interactive Geologic Interpretation System (IGIS) is being developed at the Jet Propulsion Laboratory (JPL) within the National Aeronautics and Space Administration (NASA)-funded Multispectral Analysis of Sedimentary Basins Project. To complement conventional geological mapping techniques, Landsat Thematic Mapper (TM) or other digital remote sensing image data and co-registered digital elevation data are combined using computer imaging, graphics, and CAD/CAM techniques to provide tools for photogeologic interpretation, strike/dip determination, cross section construction, stratigraphic section measurement, topographic slope measurement, terrain profile generation, rotatable 3-D block diagram generation, and seismic analysis.

  13. Calculation method for computer-generated holograms with cylindrical basic object light by using a graphics processing unit.

    PubMed

    Sakata, Hironobu; Hosoyachi, Kouhei; Yang, Chan-Young; Sakamoto, Yuji

    2011-12-01

    It takes an enormous amount of time to calculate a computer-generated hologram (CGH). A fast calculation method for a CGH using precalculated object light has been proposed in which the light waves of an arbitrary object are calculated using transform calculations of the precalculated object light. However, this method requires a huge amount of memory. This paper proposes the use of a method that uses a cylindrical basic object light to reduce the memory requirement. Furthermore, it is accelerated by using a graphics processing unit (GPU). Experimental results show that the calculation speed on a GPU is about 65 times faster than that on a CPU. © 2011 Optical Society of America

  14. Arbitrary Angular Momentum Electron Repulsion Integrals with Graphical Processing Units: Application to the Resolution of Identity Hartree-Fock Method.

    PubMed

    Kalinowski, Jaroslaw; Wennmohs, Frank; Neese, Frank

    2017-07-11

    A resolution of identity based implementation of the Hartree-Fock method on graphical processing units (GPUs) is presented that is capable of handling basis functions with arbitrary angular momentum. For practical reasons, only functions up to (ff|f) angular momentum are presently calculated on the GPU, thus leaving the calculation of higher angular momenta integrals on the CPU of the hybrid CPU-GPU environment. Speedups of up to a factor of 30 are demonstrated relative to state-of-the-art serial and parallel CPU implementations. Benchmark calculations with over 3500 contracted basis functions (def2-SVP or def2-TZVP basis sets) are reported. The presented implementation supports all devices with OpenCL support and is capable of utilizing multiple GPU cards over either MPI or OpenCL itself.

  15. Analytical gradients for tensor hyper-contracted MP2 and SOS-MP2 on graphical processing units

    NASA Astrophysics Data System (ADS)

    Song, Chenchen; Martínez, Todd J.

    2017-10-01

    Analytic energy gradients for tensor hyper-contraction (THC) are derived and implemented for second-order Møller-Plesset perturbation theory (MP2), with and without the scaled-opposite-spin (SOS)-MP2 approximation. By exploiting the THC factorization, the formal scaling of MP2 and SOS-MP2 gradient calculations with respect to system size is reduced to quartic and cubic, respectively. An efficient implementation has been developed that utilizes both graphics processing units and sparse tensor techniques exploiting spatial sparsity of the atomic orbitals. THC-MP2 has been applied to both geometry optimization and ab initio molecular dynamics (AIMD) simulations. The resulting energy conservation in micro-canonical AIMD demonstrates that the implementation provides accurate nuclear gradients with respect to the THC-MP2 potential energy surfaces.

  16. Distributed cooperating processes in a mobile robot control system

    NASA Technical Reports Server (NTRS)

    Skillman, Thomas L., Jr.

    1988-01-01

    A mobile inspection robot has been proposed for the NASA Space Station. It will be a free flying autonomous vehicle that will leave a berthing unit to accomplish a variety of inspection tasks around the Space Station, and then return to its berth to recharge, refuel, and transfer information. The Flying Eye robot will receive voice communication to change its attitude, move at a constant velocity, and move to a predefined location along a self generated path. This mobile robot control system requires integration of traditional command and control techniques with a number of AI technologies. Speech recognition, natural language understanding, task and path planning, sensory abstraction and pattern recognition are all required for successful implementation. The interface between the traditional numeric control techniques and the symbolic processing to the AI technologies must be developed, and a distributed computing approach will be needed to meet the real time computing requirements. To study the integration of the elements of this project, a novel mobile robot control architecture and simulation based on the blackboard architecture was developed. The control system operation and structure is discussed.

  17. Distributed cooperating processes in a mobile robot control system

    NASA Technical Reports Server (NTRS)

    Skillman, Thomas L., Jr.

    1988-01-01

    A mobile inspection robot has been proposed for the NASA Space Station. It will be a free flying autonomous vehicle that will leave a berthing unit to accomplish a variety of inspection tasks around the Space Station, and then return to its berth to recharge, refuel, and transfer information. The Flying Eye robot will receive voice communication to change its attitude, move at a constant velocity, and move to a predefined location along a self generated path. This mobile robot control system requires integration of traditional command and control techniques with a number of AI technologies. Speech recognition, natural language understanding, task and path planning, sensory abstraction and pattern recognition are all required for successful implementation. The interface between the traditional numeric control techniques and the symbolic processing to the AI technologies must be developed, and a distributed computing approach will be needed to meet the real time computing requirements. To study the integration of the elements of this project, a novel mobile robot control architecture and simulation based on the blackboard architecture was developed. The control system operation and structure is discussed.

  18. Robot graphic simulation testbed

    NASA Technical Reports Server (NTRS)

    Cook, George E.; Sztipanovits, Janos; Biegl, Csaba; Karsai, Gabor; Springfield, James F.

    1991-01-01

    The objective of this research was twofold. First, the basic capabilities of ROBOSIM (graphical simulation system) were improved and extended by taking advantage of advanced graphic workstation technology and artificial intelligence programming techniques. Second, the scope of the graphic simulation testbed was extended to include general problems of Space Station automation. Hardware support for 3-D graphics and high processing performance make high resolution solid modeling, collision detection, and simulation of structural dynamics computationally feasible. The Space Station is a complex system with many interacting subsystems. Design and testing of automation concepts demand modeling of the affected processes, their interactions, and that of the proposed control systems. The automation testbed was designed to facilitate studies in Space Station automation concepts.

  19. Large-scale neural circuit mapping data analysis accelerated with the graphical processing unit (GPU).

    PubMed

    Shi, Yulin; Veidenbaum, Alexander V; Nicolau, Alex; Xu, Xiangmin

    2015-01-15

    Modern neuroscience research demands computing power. Neural circuit mapping studies such as those using laser scanning photostimulation (LSPS) produce large amounts of data and require intensive computation for post hoc processing and analysis. Here we report on the design and implementation of a cost-effective desktop computer system for accelerated experimental data processing with recent GPU computing technology. A new version of Matlab software with GPU enabled functions is used to develop programs that run on Nvidia GPUs to harness their parallel computing power. We evaluated both the central processing unit (CPU) and GPU-enabled computational performance of our system in benchmark testing and practical applications. The experimental results show that the GPU-CPU co-processing of simulated data and actual LSPS experimental data clearly outperformed the multi-core CPU with up to a 22× speedup, depending on computational tasks. Further, we present a comparison of numerical accuracy between GPU and CPU computation to verify the precision of GPU computation. In addition, we show how GPUs can be effectively adapted to improve the performance of commercial image processing software such as Adobe Photoshop. To our best knowledge, this is the first demonstration of GPU application in neural circuit mapping and electrophysiology-based data processing. Together, GPU enabled computation enhances our ability to process large-scale data sets derived from neural circuit mapping studies, allowing for increased processing speeds while retaining data precision. Copyright © 2014 Elsevier B.V. All rights reserved.

  20. Large scale neural circuit mapping data analysis accelerated with the graphical processing unit (GPU)

    PubMed Central

    Shi, Yulin; Veidenbaum, Alexander V.; Nicolau, Alex; Xu, Xiangmin

    2014-01-01

    Background Modern neuroscience research demands computing power. Neural circuit mapping studies such as those using laser scanning photostimulation (LSPS) produce large amounts of data and require intensive computation for post-hoc processing and analysis. New Method Here we report on the design and implementation of a cost-effective desktop computer system for accelerated experimental data processing with recent GPU computing technology. A new version of Matlab software with GPU enabled functions is used to develop programs that run on Nvidia GPUs to harness their parallel computing power. Results We evaluated both the central processing unit (CPU) and GPU-enabled computational performance of our system in benchmark testing and practical applications. The experimental results show that the GPU-CPU co-processing of simulated data and actual LSPS experimental data clearly outperformed the multi-core CPU with up to a 22x speedup, depending on computational tasks. Further, we present a comparison of numerical accuracy between GPU and CPU computation to verify the precision of GPU computation. In addition, we show how GPUs can be effectively adapted to improve the performance of commercial image processing software such as Adobe Photoshop. Comparison with Existing Method(s) To our best knowledge, this is the first demonstration of GPU application in neural circuit mapping and electrophysiology-based data processing. Conclusions Together, GPU enabled computation enhances our ability to process large-scale data sets derived from neural circuit mapping studies, allowing for increased processing speeds while retaining data precision. PMID:25277633

  1. A graphical simulation model of the entire DNA process associated with the analysis of short tandem repeat loci.

    PubMed

    Gill, Peter; Curran, James; Elliot, Keith

    2005-01-01

    The use of expert systems to interpret short tandem repeat DNA profiles in forensic, medical and ancient DNA applications is becoming increasingly prevalent as high-throughput analytical systems generate large amounts of data that are time-consuming to process. With special reference to low copy number (LCN) applications, we use a graphical model to simulate stochastic variation associated with the entire DNA process starting with extraction of sample, followed by the processing associated with the preparation of a PCR reaction mixture and PCR itself. Each part of the process is modelled with input efficiency parameters. Then, the key output parameters that define the characteristics of a DNA profile are derived, namely heterozygote balance (Hb) and the probability of allelic drop-out p(D). The model can be used to estimate the unknown efficiency parameters, such as pi(extraction). 'What-if' scenarios can be used to improve and optimize the entire process, e.g. by increasing the aliquot forwarded to PCR, the improvement expected to a given DNA profile can be reliably predicted. We demonstrate that Hb and drop-out are mainly a function of stochastic effect of pre-PCR molecular selection. Whole genome amplification is unlikely to give any benefit over conventional PCR for LCN.

  2. A graphical simulation model of the entire DNA process associated with the analysis of short tandem repeat loci

    PubMed Central

    Gill, Peter; Curran, James; Elliot, Keith

    2005-01-01

    The use of expert systems to interpret short tandem repeat DNA profiles in forensic, medical and ancient DNA applications is becoming increasingly prevalent as high-throughput analytical systems generate large amounts of data that are time-consuming to process. With special reference to low copy number (LCN) applications, we use a graphical model to simulate stochastic variation associated with the entire DNA process starting with extraction of sample, followed by the processing associated with the preparation of a PCR reaction mixture and PCR itself. Each part of the process is modelled with input efficiency parameters. Then, the key output parameters that define the characteristics of a DNA profile are derived, namely heterozygote balance (Hb) and the probability of allelic drop-out p(D). The model can be used to estimate the unknown efficiency parameters, such as πextraction. ‘What-if’ scenarios can be used to improve and optimize the entire process, e.g. by increasing the aliquot forwarded to PCR, the improvement expected to a given DNA profile can be reliably predicted. We demonstrate that Hb and drop-out are mainly a function of stochastic effect of pre-PCR molecular selection. Whole genome amplification is unlikely to give any benefit over conventional PCR for LCN. PMID:15681615

  3. Real-time acquisition and display of flow contrast using speckle variance optical coherence tomography in a graphics processing unit.

    PubMed

    Xu, Jing; Wong, Kevin; Jian, Yifan; Sarunic, Marinko V

    2014-02-01

    In this report, we describe a graphics processing unit (GPU)-accelerated processing platform for real-time acquisition and display of flow contrast images with Fourier domain optical coherence tomography (FDOCT) in mouse and human eyes in vivo. Motion contrast from blood flow is processed using the speckle variance OCT (svOCT) technique, which relies on the acquisition of multiple B-scan frames at the same location and tracking the change of the speckle pattern. Real-time mouse and human retinal imaging using two different custom-built OCT systems with processing and display performed on GPU are presented with an in-depth analysis of performance metrics. The display output included structural OCT data, en face projections of the intensity data, and the svOCT en face projections of retinal microvasculature; these results compare projections with and without speckle variance in the different retinal layers to reveal significant contrast improvements. As a demonstration, videos of real-time svOCT for in vivo human and mouse retinal imaging are included in our results. The capability of performing real-time svOCT imaging of the retinal vasculature may be a useful tool in a clinical environment for monitoring disease-related pathological changes in the microcirculation such as diabetic retinopathy.

  4. Perception in statistical graphics

    NASA Astrophysics Data System (ADS)

    VanderPlas, Susan Ruth

    There has been quite a bit of research on statistical graphics and visualization, generally focused on new types of graphics, new software to create graphics, interactivity, and usability studies. Our ability to interpret and use statistical graphics hinges on the interface between the graph itself and the brain that perceives and interprets it, and there is substantially less research on the interplay between graph, eye, brain, and mind than is sufficient to understand the nature of these relationships. The goal of the work presented here is to further explore the interplay between a static graph, the translation of that graph from paper to mental representation (the journey from eye to brain), and the mental processes that operate on that graph once it is transferred into memory (mind). Understanding the perception of statistical graphics should allow researchers to create more effective graphs which produce fewer distortions and viewer errors while reducing the cognitive load necessary to understand the information presented in the graph. Taken together, these experiments should lay a foundation for exploring the perception of statistical graphics. There has been considerable research into the accuracy of numerical judgments viewers make from graphs, and these studies are useful, but it is more effective to understand how errors in these judgments occur so that the root cause of the error can be addressed directly. Understanding how visual reasoning relates to the ability to make judgments from graphs allows us to tailor graphics to particular target audiences. In addition, understanding the hierarchy of salient features in statistical graphics allows us to clearly communicate the important message from data or statistical models by constructing graphics which are designed specifically for the perceptual system.

  5. Second and Fourth Graders' Copying Ability: From Graphical to Linguistic Processing

    ERIC Educational Resources Information Center

    Grabowski, Joachim; Weinzierl, Christian; Schmitt, Markus

    2010-01-01

    Particularly in primary school, good performance on copy tasks is an important working technique. With respect to writing skills, copying is a very basic process on which more complex writing abilities are based. We studied the copying ability of second and fourth graders across four types of symbols which vary with respect to their semantic and…

  6. Graphic Novels, Web Comics, and Creator Blogs: Examining Product and Process

    ERIC Educational Resources Information Center

    Carter, James Bucky

    2011-01-01

    Young adult literature (YAL) of the late 20th and early 21st century is exploring hybrid forms with growing regularity by embracing textual conventions from sequential art, video games, film, and more. As well, Web-based technologies have given those who consume YAL more immediate access to authors, their metacognitive creative processes, and…

  7. Graphic Novels, Web Comics, and Creator Blogs: Examining Product and Process

    ERIC Educational Resources Information Center

    Carter, James Bucky

    2011-01-01

    Young adult literature (YAL) of the late 20th and early 21st century is exploring hybrid forms with growing regularity by embracing textual conventions from sequential art, video games, film, and more. As well, Web-based technologies have given those who consume YAL more immediate access to authors, their metacognitive creative processes, and…

  8. Programmer's Guide for FFORM. Physical Processes in Terrestrial and Aquatic Ecosystems, Computer Programs and Graphics Capabilities.

    ERIC Educational Resources Information Center

    Anderson, Lougenia; Gales, Larry

    This module is part of a series designed to be used by life science students for instruction in the application of physical theory to ecosystem operation. Most modules contain computer programs which are built around a particular application of a physical process. FFORM is a portable format-free input subroutine package written in ANSI Fortran IV…

  9. Conceptual Learning with Multiple Graphical Representations: Intelligent Tutoring Systems Support for Sense-Making and Fluency-Building Processes

    ERIC Educational Resources Information Center

    Rau, Martina A.

    2013-01-01

    Most learning environments in the STEM disciplines use multiple graphical representations along with textual descriptions and symbolic representations. Multiple graphical representations are powerful learning tools because they can emphasize complementary aspects of complex learning contents. However, to benefit from multiple graphical…

  10. Conceptual Learning with Multiple Graphical Representations: Intelligent Tutoring Systems Support for Sense-Making and Fluency-Building Processes

    ERIC Educational Resources Information Center

    Rau, Martina A.

    2013-01-01

    Most learning environments in the STEM disciplines use multiple graphical representations along with textual descriptions and symbolic representations. Multiple graphical representations are powerful learning tools because they can emphasize complementary aspects of complex learning contents. However, to benefit from multiple graphical…

  11. Real-time 4D signal processing and visualization using graphics processing unit on a regular nonlinear-k Fourier-domain OCT system.

    PubMed

    Zhang, Kang; Kang, Jin U

    2010-05-24

    We realized graphics processing unit (GPU) based real-time 4D (3D+time) signal processing and visualization on a regular Fourier-domain optical coherence tomography (FD-OCT) system with a nonlinear k-space spectrometer. An ultra-high speed linear spline interpolation (LSI) method for lambda-to-k spectral re-sampling is implemented in the GPU architecture, which gives average interpolation speeds of >3,000,000 line/s for 1024-pixel OCT (1024-OCT) and >1,400,000 line/s for 2048-pixel OCT (2048-OCT). The complete FD-OCT signal processing including lambda-to-k spectral re-sampling, fast Fourier transform (FFT) and post-FFT processing have all been implemented on a GPU. The maximum complete A-scan processing speeds are investigated to be 680,000 line/s for 1024-OCT and 320,000 line/s for 2048-OCT, which correspond to 1GByte processing bandwidth. In our experiment, a 2048-pixel CMOS camera running up to 70 kHz is used as an acquisition device. Therefore the actual imaging speed is camera- limited to 128,000 line/s for 1024-OCT or 70,000 line/s for 2048-OCT. 3D Data sets are continuously acquired in real time at 1024-OCT mode, immediately processed and visualized as high as 10 volumes/second (12,500 A-scans/volume) by either en face slice extraction or ray-casting based volume rendering from 3D texture mapped in graphics memory. For standard FD-OCT systems, a GPU is the only additional hardware needed to realize this improvement and no optical modification is needed. This technique is highly cost-effective and can be easily integrated into most ultrahigh speed FD-OCT systems to overcome the 3D data processing and visualization bottlenecks.

  12. Real-time 4D signal processing and visualization using graphics processing unit on a regular nonlinear-k Fourier-domain OCT system

    PubMed Central

    Zhang, Kang; Kang, Jin U.

    2010-01-01

    We realized graphics processing unit (GPU) based real-time 4D (3D + time) signal processing and visualization on a regular Fourier-domain optical coherence tomography (FD-OCT) system with a nonlinear k-space spectrometer. An ultra-high speed linear spline interpolation (LSI) method for λ-to-k spectral re-sampling is implemented in the GPU architecture, which gives average interpolation speeds of >3,000,000 line/s for 1024-pixel OCT (1024-OCT) and >1,400,000 line/s for 2048-pixel OCT (2048-OCT). The complete FD-OCT signal processing including λ-to-k spectral re-sampling, fast Fourier transform (FFT) and post-FFT processing have all been implemented on a GPU. The maximum complete A-scan processing speeds are investigated to be 680,000 line/s for 1024-OCT and 320,000 line/s for 2048-OCT, which correspond to 1GByte processing bandwidth. In our experiment, a 2048-pixel CMOS camera running up to 70 kHz is used as an acquisition device. Therefore the actual imaging speed is camera-limited to 128,000 line/s for 1024-OCT or 70,000 line/s for 2048-OCT. 3D Data sets are continuously acquired in real time at 1024-OCT mode, immediately processed and visualized as high as 10 volumes/second (12,500 A-scans/volume) by either en face slice extraction or ray-casting based volume rendering from 3D texture mapped in graphics memory. For standard FD-OCT systems, a GPU is the only additional hardware needed to realize this improvement and no optical modification is needed. This technique is highly cost-effective and can be easily integrated into most ultrahigh speed FD-OCT systems to overcome the 3D data processing and visualization bottlenecks. PMID:20589038

  13. The LHEA PDP 11/70 graphics processing facility users guide

    NASA Technical Reports Server (NTRS)

    1978-01-01

    A compilation of all necessary and useful information needed to allow the inexperienced user to program on the PDP 11/70. Information regarding the use of editing and file manipulation utilities as well as operational procedures are included. The inexperienced user is taken through the process of creating, editing, compiling, task building and debugging his/her FORTRAN program. Also, documentation on additional software is included.

  14. Real-time 3D and 4D Fourier domain Doppler optical coherence tomography based on dual graphics processing units.

    PubMed

    Huang, Yong; Liu, Xuan; Kang, Jin U

    2012-09-01

    We present real-time 3D (2D cross-sectional image plus time) and 4D (3D volume plus time) phase-resolved Doppler OCT (PRDOCT) imaging based on configuration of dual graphics processing units (GPU). A GPU-accelerated phase-resolving processing algorithm was developed and implemented. We combined a structural image intensity-based thresholding mask and average window method to improve the signal-to-noise ratio of the Doppler phase image. A 2D simultaneous display of the structure and Doppler flow images was presented at a frame rate of 70 fps with an image size of 1000 × 1024 (X × Z) pixels. A 3D volume rendering of tissue structure and flow images-each with a size of 512 × 512 pixels-was presented 64.9 milliseconds after every volume scanning cycle with a volume size of 500 × 256 × 512 (X × Y × Z) voxels, with an acquisition time window of only 3.7 seconds. To the best of our knowledge, this is the first time that an online, simultaneous structure and Doppler flow volume visualization has been achieved. Maximum system processing speed was measured to be 249,000 A-scans per second with each A-scan size of 2048 pixels.

  15. Real-time 3D and 4D Fourier domain Doppler optical coherence tomography based on dual graphics processing units

    PubMed Central

    Huang, Yong; Liu, Xuan; Kang, Jin U.

    2012-01-01

    We present real-time 3D (2D cross-sectional image plus time) and 4D (3D volume plus time) phase-resolved Doppler OCT (PRDOCT) imaging based on configuration of dual graphics processing units (GPU). A GPU-accelerated phase-resolving processing algorithm was developed and implemented. We combined a structural image intensity-based thresholding mask and average window method to improve the signal-to-noise ratio of the Doppler phase image. A 2D simultaneous display of the structure and Doppler flow images was presented at a frame rate of 70 fps with an image size of 1000 × 1024 (X × Z) pixels. A 3D volume rendering of tissue structure and flow images—each with a size of 512 × 512 pixels—was presented 64.9 milliseconds after every volume scanning cycle with a volume size of 500 × 256 × 512 (X × Y × Z) voxels, with an acquisition time window of only 3.7 seconds. To the best of our knowledge, this is the first time that an online, simultaneous structure and Doppler flow volume visualization has been achieved. Maximum system processing speed was measured to be 249,000 A-scans per second with each A-scan size of 2048 pixels. PMID:23024910

  16. Genetic algorithm supported by graphical processing unit improves the exploration of effective connectivity in functional brain imaging.

    PubMed

    Chan, Lawrence Wing Chi; Pang, Bin; Shyu, Chi-Ren; Chan, Tao; Khong, Pek-Lan

    2015-01-01

    Brain regions of human subjects exhibit certain levels of associated activation upon specific environmental stimuli. Functional Magnetic Resonance Imaging (fMRI) detects regional signals, based on which we could infer the direct or indirect neuronal connectivity between the regions. Structural Equation Modeling (SEM) is an appropriate mathematical approach for analyzing the effective connectivity using fMRI data. A maximum likelihood (ML) discrepancy function is minimized against some constrained coefficients of a path model. The minimization is an iterative process. The computing time is very long as the number of iterations increases geometrically with the number of path coefficients. Using regular Quad-Core Central Processing Unit (CPU) platform, duration up to 3 months is required for the iterations from 0 to 30 path coefficients. This study demonstrates the application of Graphical Processing Unit (GPU) with the parallel Genetic Algorithm (GA) that replaces the Powell minimization in the standard program code of the analysis software package. It was found in the same example that GA under GPU reduced the duration to 20 h and provided more accurate solution when compared with standard program code under CPU.

  17. Consent Processes for Mobile App Mediated Research: Systematic Review.

    PubMed

    Moore, Sarah; Tassé, Anne-Marie; Thorogood, Adrian; Winship, Ingrid; Zawati, Ma'n; Doerr, Megan

    2017-08-30

    Since the launch of ResearchKit on the iOS platform in March 2015 and ResearchStack on the Android platform in June 2016, many academic and commercial institutions around the world have adapted these frameworks to develop mobile app-based research studies. These studies cover a wide variety of subject areas including melanoma, cardiomyopathy, and autism. Additionally, these app-based studies target a variety of participant populations, including children and pregnant women. The aim of this review was to document the variety of self-administered remote informed consent processes used in app-based research studies available between May and September 2016. Remote consent is defined as any consenting process with zero in-person steps, when a participant is able to join a study without ever seeing a member of the research team. This type of review has not been previously conducted. The research community would benefit from a rigorous interrogation of the types of consent taken as part of the seismic shift to entirely mobile meditated research studies. This review examines both the process of information giving and specific content shared, with special attention to data privacy, aggregation, and sharing. Consistency across some elements of the app-based consent processes was found; for example, informing participants about how data will be curated from the phone. Variations in other elements were identified; for example, where specific information is shared and the level of detail disclosed. Additionally, several novel elements present in eConsent not typically seen in traditional consent for research were highlighted. This review advocates the importance of participant informedness in a novel and largely unregulated research setting.

  18. Graphic Arts.

    ERIC Educational Resources Information Center

    Towler, Alan L.

    This guide to teaching graphic arts, one in a series of instructional materials for junior high industrial arts education, is designed to assist teachers as they plan and implement new courses of study and as they make revisions and improvements in existing courses in order to integrate classroom learning with real-life experiences. This graphic…

  19. Graphic Arts.

    ERIC Educational Resources Information Center

    Towler, Alan L.

    This guide to teaching graphic arts, one in a series of instructional materials for junior high industrial arts education, is designed to assist teachers as they plan and implement new courses of study and as they make revisions and improvements in existing courses in order to integrate classroom learning with real-life experiences. This graphic…

  20. A Simple Graphical Method for Quantification of Disaster Management Surge Capacity Using Computer Simulation and Process-control Tools.

    PubMed

    Franc, Jeffrey Michael; Ingrassia, Pier Luigi; Verde, Manuela; Colombo, Davide; Della Corte, Francesco

    2015-02-01

    Surge capacity, or the ability to manage an extraordinary volume of patients, is fundamental for hospital management of mass-casualty incidents. However, quantification of surge capacity is difficult and no universal standard for its measurement has emerged, nor has a standardized statistical method been advocated. As mass-casualty incidents are rare, simulation may represent a viable alternative to measure surge capacity. Hypothesis/Problem The objective of the current study was to develop a statistical method for the quantification of surge capacity using a combination of computer simulation and simple process-control statistical tools. Length-of-stay (LOS) and patient volume (PV) were used as metrics. The use of this method was then demonstrated on a subsequent computer simulation of an emergency department (ED) response to a mass-casualty incident. In the derivation phase, 357 participants in five countries performed 62 computer simulations of an ED response to a mass-casualty incident. Benchmarks for ED response were derived from these simulations, including LOS and PV metrics for triage, bed assignment, physician assessment, and disposition. In the application phase, 13 students of the European Master in Disaster Medicine (EMDM) program completed the same simulation scenario, and the results were compared to the standards obtained in the derivation phase. Patient-volume metrics included number of patients to be triaged, assigned to rooms, assessed by a physician, and disposed. Length-of-stay metrics included median time to triage, room assignment, physician assessment, and disposition. Simple graphical methods were used to compare the application phase group to the derived benchmarks using process-control statistical tools. The group in the application phase failed to meet the indicated standard for LOS from admission to disposition decision. This study demonstrates how simulation software can be used to derive values for objective benchmarks of ED surge

  1. Solution of the direct problem in turbid media with inclusions using Monte Carlo simulations implemented in graphics processing units: new criterion for processing transmittance data.

    PubMed

    Carbone, Nicolas; Di Rocco, Hector; Iriarte, Daniela I; Pomarico, Juan A

    2010-01-01

    The study of light propagation in diffusive media requires solving the radiative transfer equation, or eventually, the diffusion approximation. Except for some cases involving simple geometries, the problem with immersed inclusions has not been solved. Also, Monte Carlo (MC) calculations have become a gold standard for simulating photon migration in turbid media, although they have the drawback large processing times. The purpose of this work is two-fold: first, we introduce a new processing criterion to retrieve information about the location and shape of absorbing inclusions based on normalization to the background intensity, when no inhomogeneities are present. Second, we demonstrate the feasibility of including inhomogeneities in MC simulations implemented in graphics processing units, achieving large acceleration factors ( approximately 10(3)), thus providing an important tool for iteratively solving the forward problem to retrieve the optical properties of the inclusion. Results using a cw source are compared with MC outcomes showing very good agreement.

  2. Analysis and simulation of industrial distillation processes using a graphical system design model

    NASA Astrophysics Data System (ADS)

    Boca, Maria Loredana; Dobra, Remus; Dragos, Pasculescu; Ahmad, Mohammad Ayaz

    2016-12-01

    The separation column used for experimentations one model can be configured in two ways: one - two columns of different diameters placed one within the other extension, and second way, one column with set diameter [1], [2]. The column separates the carbon isotopes based on the cryogenic distillation of pure carbon monoxide, which is fed at a constant flow rate as a gas through the feeding system [1],[2]. Based on numerical control systems used in virtual instrumentation was done some simulations of the distillation process in order to obtain of the isotope 13C at high concentrations. The experimental installation for cryogenic separation can be configured from the point of view of the separation column in two ways: Cascade - two columns of different diameters and placed one in the extension of the other column, and second one column with a set diameter. It is proposed that this installation is controlled to achieve data using a data acquisition tool and professional software that will process information from the isotopic column based on a logical dedicated algorithm. Classical isotopic column will be controlled automatically, and information about the main parameters will be monitored and properly display using one program. Take in consideration the very-low operating temperature, an efficient thermal isolation vacuum jacket is necessary. Since the "elementary separation ratio" [2] is very close to unity in order to raise the (13C) isotope concentration up to a desired level, a permanent counter current of the liquid-gaseous phases of the carbon monoxide is created by the main elements of the equipment: the boiler in the bottom-side of the column and the condenser in the top-side.

  3. Real time decision support system for diagnosis of rare cancers, trained in parallel, on a graphics processing unit.

    PubMed

    Sidiropoulos, Konstantinos; Glotsos, Dimitrios; Kostopoulos, Spiros; Ravazoula, Panagiota; Kalatzis, Ioannis; Cavouras, Dionisis; Stonham, John

    2012-04-01

    In the present study a new strategy is introduced for designing and developing of an efficient dynamic Decision Support System (DSS) for supporting rare cancers decision making. The proposed DSS operates on a Graphics Processing Unit (GPU) and it is capable of adjusting its design in real time based on user-defined clinical questions in contrast to standard CPU implementations that are limited by processing and memory constrains. The core of the proposed DSS was a Probabilistic Neural Network classifier and was evaluated on 140 rare brain cancer cases, regarding its ability to predict tumors' malignancy, using a panel of 20 morphological and textural features Generalization was estimated using an external 10-fold cross-validation. The proposed GPU-based DSS achieved significantly higher training speed, outperforming the CPU-based system by a factor that ranged from 267 to 288 times. System design was optimized using a combination of 4 textural and morphological features with 78.6% overall accuracy, whereas system generalization was 73.8%±3.2%. By exploiting the inherently parallel architecture of a consumer level GPU, the proposed approach enables real time, optimal design of a DSS for any user-defined clinical question for improving diagnostic assessments, prognostic relevance and concordance rates for rare cancers in clinical practice. Copyright © 2011 Elsevier Ltd. All rights reserved.

  4. Structural Determination of (Al2O3)(n) (n = 1-15) Clusters Based on Graphic Processing Unit.

    PubMed

    Zhang, Qiyao; Cheng, Longjiu

    2015-05-26

    Global optimization algorithms have been widely used in the field of chemistry to search the global minimum structures of molecular and atomic clusters, which is a nondeterministic polynomial problem with the increasing sizes of clusters. Considering that the computational ability of a graphic processing unit (GPU) is much better than that of a central processing unit (CPU), we developed a GPU-based genetic algorithm for structural prediction of clusters and achieved a high acceleration ratio compared to a CPU. On the one-dimensional (1D) operation of a GPU, taking (Al2O3)n clusters as test cases, the peak acceleration ratio in the GPU is about 220 times that in a CPU in single precision and the value is 103 for double precision in calculation of the analytical interatomic potential. The peak acceleration ratio is about 240 and 107 on the block operation, and it is about 77 and 35 on the 2D operation compared to a CPU in single precision and double precision, respectively. And the peak acceleration ratio of the whole genetic algorithm program is about 35 compared to CPU at double precision. Structures of (Al2O3)n clusters at n = 1-10 reported in previous works are successfully located, and their low-lying structures at n = 11-15 are predicted.

  5. Performance evaluation for volumetric segmentation of multiple sclerosis lesions using MATLAB and computing engine in the graphical processing unit (GPU)

    NASA Astrophysics Data System (ADS)

    Le, Anh H.; Park, Young W.; Ma, Kevin; Jacobs, Colin; Liu, Brent J.

    2010-03-01

    Multiple Sclerosis (MS) is a progressive neurological disease affecting myelin pathways in the brain. Multiple lesions in the white matter can cause paralysis and severe motor disabilities of the affected patient. To solve the issue of inconsistency and user-dependency in manual lesion measurement of MRI, we have proposed a 3-D automated lesion quantification algorithm to enable objective and efficient lesion volume tracking. The computer-aided detection (CAD) of MS, written in MATLAB, utilizes K-Nearest Neighbors (KNN) method to compute the probability of lesions on a per-voxel basis. Despite the highly optimized algorithm of imaging processing that is used in CAD development, MS CAD integration and evaluation in clinical workflow is technically challenging due to the requirement of high computation rates and memory bandwidth in the recursive nature of the algorithm. In this paper, we present the development and evaluation of using a computing engine in the graphical processing unit (GPU) with MATLAB for segmentation of MS lesions. The paper investigates the utilization of a high-end GPU for parallel computing of KNN in the MATLAB environment to improve algorithm performance. The integration is accomplished using NVIDIA's CUDA developmental toolkit for MATLAB. The results of this study will validate the practicality and effectiveness of the prototype MS CAD in a clinical setting. The GPU method may allow MS CAD to rapidly integrate in an electronic patient record or any disease-centric health care system.

  6. Graphics Processing Unit-Accelerated Code for Computing Second-Order Wiener Kernels and Spike-Triggered Covariance

    PubMed Central

    Mano, Omer

    2017-01-01

    Sensory neuroscience seeks to understand and predict how sensory neurons respond to stimuli. Nonlinear components of neural responses are frequently characterized by the second-order Wiener kernel and the closely-related spike-triggered covariance (STC). Recent advances in data acquisition have made it increasingly common and computationally intensive to compute second-order Wiener kernels/STC matrices. In order to speed up this sort of analysis, we developed a graphics processing unit (GPU)-accelerated module that computes the second-order Wiener kernel of a system’s response to a stimulus. The generated kernel can be easily transformed for use in standard STC analyses. Our code speeds up such analyses by factors of over 100 relative to current methods that utilize central processing units (CPUs). It works on any modern GPU and may be integrated into many data analysis workflows. This module accelerates data analysis so that more time can be spent exploring parameter space and interpreting data. PMID:28068420

  7. Accelerating Monte Carlo simulations of photon transport in a voxelized geometry using a massively parallel graphics processing unit.

    PubMed

    Badal, Andreu; Badano, Aldo

    2009-11-01

    It is a known fact that Monte Carlo simulations of radiation transport are computationally intensive and may require long computing times. The authors introduce a new paradigm for the acceleration of Monte Carlo simulations: The use of a graphics processing unit (GPU) as the main computing device instead of a central processing unit (CPU). A GPU-based Monte Carlo code that simulates photon transport in a voxelized geometry with the accurate physics models from PENELOPE has been developed using the CUDATM programming model (NVIDIA Corporation, Santa Clara, CA). An outline of the new code and a sample x-ray imaging simulation with an anthropomorphic phantom are presented. A remarkable 27-fold speed up factor was obtained using a GPU compared to a single core CPU. The reported results show that GPUs are currently a good alternative to CPUs for the simulation of radiation transport. Since the performance of GPUs is currently increasing at a faster pace than that of CPUs, the advantages of GPU-based software are likely to be more pronounced in the future.

  8. Graphics processing unit accelerated three-dimensional model for the simulation of pulsed low-temperature plasmas

    SciTech Connect

    Fierro, Andrew Dickens, James; Neuber, Andreas

    2014-12-15

    A 3-dimensional particle-in-cell/Monte Carlo collision simulation that is fully implemented on a graphics processing unit (GPU) is described and used to determine low-temperature plasma characteristics at high reduced electric field, E/n, in nitrogen gas. Details of implementation on the GPU using the NVIDIA Compute Unified Device Architecture framework are discussed with respect to efficient code execution. The software is capable of tracking around 10 × 10{sup 6} particles with dynamic weighting and a total mesh size larger than 10{sup 8} cells. Verification of the simulation is performed by comparing the electron energy distribution function and plasma transport parameters to known Boltzmann Equation (BE) solvers. Under the assumption of a uniform electric field and neglecting the build-up of positive ion space charge, the simulation agrees well with the BE solvers. The model is utilized to calculate plasma characteristics of a pulsed, parallel plate discharge. A photoionization model provides the simulation with additional electrons after the initial seeded electron density has drifted towards the anode. Comparison of the performance benefits between the GPU-implementation versus a CPU-implementation is considered, and a speed-up factor of 13 for a 3D relaxation Poisson solver is obtained. Furthermore, a factor 60 speed-up is realized for parallelization of the electron processes.

  9. Grid-based algorithm to search critical points, in the electron density, accelerated by graphics processing units.

    PubMed

    Hernández-Esparza, Raymundo; Mejía-Chica, Sol-Milena; Zapata-Escobar, Andy D; Guevara-García, Alfredo; Martínez-Melchor, Apolinar; Hernández-Pérez, Julio-M; Vargas, Rubicelia; Garza, Jorge

    2014-12-05

    Using a grid-based method to search the critical points in electron density, we show how to accelerate such a method with graphics processing units (GPUs). When the GPU implementation is contrasted with that used on central processing units (CPUs), we found a large difference between the time elapsed by both implementations: the smallest time is observed when GPUs are used. We tested two GPUs, one related with video games and other used for high-performance computing (HPC). By the side of the CPUs, two processors were tested, one used in common personal computers and other used for HPC, both of last generation. Although our parallel algorithm scales quite well on CPUs, the same implementation on GPUs runs around 10× faster than 16 CPUs, with any of the tested GPUs and CPUs. We have found what one GPU dedicated for video games can be used without any problem for our application, delivering a remarkable performance, in fact; this GPU competes against one HPC GPU, in particular when single-precision is used.

  10. Real-time generation of high-definition resolution digital holograms by using multiple graphic processing units

    NASA Astrophysics Data System (ADS)

    Song, Joongseok; Park, Jungsik; Park, Hanhoon; Park, Jong-Il

    2013-01-01

    We describe and evaluate a practical approach for implementing computer-generated-holography (CGH) using multiple graphic processing units (GPUs). The proposed method can generate high-definition (HD) resolution (1920×1080) digital holograms in real-time. In order to demonstrate the plausibility of our method, some experimental results will be given. First, we discuss the advantage of GPUs for CGH against central processing units (CPUs) by comparing the performance of both. Our results show that use of GPUs can shorten CGH computation time by 2791 times. Then, we discuss the potential of multiple GPUs for generating HD resolution digital holograms in real-time by measuring and analyzing the CGH computational time in accordance with the number of GPUs. Our result shows that the CGH computational time decreases nonlinearly, with a logarithmic-like curve, as the number of GPU increases. Therefore, we can determine the number of GPUs to maximize the efficiency. Consequently, our implementation can generate HD resolution digital holograms at a rate of more than 66 hps (holograms-per-second) using two NVIDIA GTX 590 cards.

  11. Optimization of Parallel Legendre Transform using Graphics Processing Unit (GPU) for a Geodynamo Code

    NASA Astrophysics Data System (ADS)

    Lokavarapu, H. V.; Matsui, H.

    2015-12-01

    Convection and magnetic field of the Earth's outer core are expected to have vast length scales. To resolve these flows, high performance computing is required for geodynamo simulations using spherical harmonics transform (SHT), a significant portion of the execution time is spent on the Legendre transform. Calypso is a geodynamo code designed to model magnetohydrodynamics of a Boussinesq fluid in a rotating spherical shell, such as the outer core of the Earth. The code has been shown to scale well on computer clusters capable of computing at the order of 10⁵ cores using Message Passing Interface (MPI) and Open Multi-Processing (OpenMP) parallelization for CPUs. To further optimize, we investigate three different algorithms of the SHT using GPUs. One is to preemptively compute the Legendre polynomials on the CPU before executing SHT on the GPU within the time integration loop. In the second approach, both the Legendre polynomials and the SHT are computed on the GPU simultaneously. In the third approach , we initially partition the radial grid for the forward transform and the harmonic order for the backward transform between the CPU and GPU. There after, the partitioned works are simultaneously computed in the time integration loop. We examine the trade-offs between space and time, memory bandwidth and GPU computations on Maverick, a Texas Advanced Computing Center (TACC) supercomputer. We have observed improved performance using a GPU enabled Legendre transform. Furthermore, we will compare and contrast the different algorithms in the context of GPUs.

  12. Computer graphics and the graphic artist

    NASA Technical Reports Server (NTRS)

    Taylor, N. L.; Fedors, E. G.; Pinelli, T. E.

    1985-01-01

    A centralized computer graphics system is being developed at the NASA Langley Research Center. This system was required to satisfy multiuser needs, ranging from presentation quality graphics prepared by a graphic artist to 16-mm movie simulations generated by engineers and scientists. While the major thrust of the central graphics system was directed toward engineering and scientific applications, hardware and software capabilities to support the graphic artists were integrated into the design. This paper briefly discusses the importance of computer graphics in research; the central graphics system in terms of systems, software, and hardware requirements; the application of computer graphics to graphic arts, discussed in terms of the requirements for a graphic arts workstation; and the problems encountered in applying computer graphics to the graphic arts. The paper concludes by presenting the status of the central graphics system.

  13. Computer graphics and the graphic artist

    NASA Technical Reports Server (NTRS)

    Taylor, N. L.; Fedors, E. G.; Pinelli, T. E.

    1985-01-01

    A centralized computer graphics system is being developed at the NASA Langley Research Center. This system was required to satisfy multiuser needs, ranging from presentation quality graphics prepared by a graphic artist to 16-mm movie simulations generated by engineers and scientists. While the major thrust of the central graphics system was directed toward engineering and scientific applications, hardware and software capabilities to support the graphic artists were integrated into the design. This paper briefly discusses the importance of computer graphics in research; the central graphics system in terms of systems, software, and hardware requirements; the application of computer graphics to graphic arts, discussed in terms of the requirements for a graphic arts workstation; and the problems encountered in applying computer graphics to the graphic arts. The paper concludes by presenting the status of the central graphics system.

  14. A Physics-Based Modeling and Real-Time Simulation of Biomechanical Diffusion Process Through Optical Imaged Alveolar Tissues on Graphical Processing Units

    NASA Astrophysics Data System (ADS)

    Kaya, Ilhan; Santhanam, Anand P.; Lee, Kye-Sung; Meemon, Panomsak; Papp, Nicolene; Rolland, Jannick P.

    Tissue engineering has broad applications from creating the much-needed engineered tissue and organ structures for regenerative medicine to providing in vitro testbeds for drug testing. In the latter, application domain, creating alveolar lung tissue, and simulating the diffusion process of oxygen and other possible agents from the air into the blood stream as well as modeling the removal of carbon dioxide and other possible entities from the blood stream are of critical importance to simulating lung functions in various environments. In this chapter, we propose a physics-based model to simulate the alveolar gas exchange and the alveolar diffusionDiffusion alveolar process. Tissue engineers, for the first time, may utilize these simulation results to better understand the underlying gas exchange process and properly adjust the tissue growing cycles. In this work, alveolar tissues are imaged by means of an optical coherence microscopyOptical coherence microscopy (OCM Modality OCM ) system developed in our laboratory. As a consequence, 3D alveoli tissue data with its inherent complex boundary is taken as input to the simulationSimulation diffusion system, which is based on computational fluid mechanics in simulating the alveolar gas exchange. The visualizationVisualization and the simulation of diffusion of the air into the blood through the alveoli tissue is performed using a state-of-art graphics processing unitGraphics processing unit (GPU). Results show the real-time simulation of the gas exchange through the 2D alveoli tissue.

  15. PO*WW*ER mobile treatment unit process hazards analysis

    SciTech Connect

    Richardson, R.B.

    1996-06-01

    The objective of this report is to demonstrate that a thorough assessment of the risks associated with the operation of the Rust Geotech patented PO*WW*ER mobile treatment unit (MTU) has been performed and documented. The MTU was developed to treat aqueous mixed wastes at the US Department of Energy (DOE) Albuquerque Operations Office sites. The MTU uses evaporation to separate organics and water from radionuclides and solids, and catalytic oxidation to convert the hazardous into byproducts. This process hazards analysis evaluated a number of accident scenarios not directly related to the operation of the MTU, such as natural phenomena damage and mishandling of chemical containers. Worst case accident scenarios were further evaluated to determine the risk potential to the MTU and to workers, the public, and the environment. The overall risk to any group from operation of the MTU was determined to be very low; the MTU is classified as a Radiological Facility with low hazards.

  16. Digital image processing for the acquisition of graphic similarity of the distributional patterns between cutaneous lesions of linear scleroderma and Blaschko's lines.

    PubMed

    Jue, Mihn Sook; Kim, Moon Hwan; Ko, Joo Yeon; Lee, Chang Woo

    2011-08-01

    The aim of this study is to objectively evaluate whether linear scleroderma (LS) follows Blaschko's lines (BL) in Korean patients using digital image processing. Thirty-two patients with LS were examined. According to the patients' clinical photographs, their skin lesions were plotted on the head and body charts. With the aid of graphics software, a digital image was produced that included an overlay of all the individual lesions and was used to compare the graphics with the published BL. To investigate the image similarity between the graphic patterns of the LS and BL, each case was analyzed by means of Hough transformations and Czekanowski's methods. The comparative investigation of the graphic similarity of distributional patterns between the LS and BL showed that Czekanowski's similarity index was 0.947 on average. In conclusion, our objective results suggest that the graphic patterns of the distribution of the LS skin lesions showed a high degree of similarity and in fact were almost identical to that of BL which may be the lines of embryonic development of the skin. This finding may suggest that some developmental factors during the embryological age could constitute the cause of LS. © 2011 Japanese Dermatological Association.

  17. Integrative Processing of Verbal and Graphical Information during Re-Reading Predicts Learning from Illustrated Text: An Eye-Movement Study

    ERIC Educational Resources Information Center

    Mason, Lucia; Tornatora, Maria Caterina; Pluchino, Patrik

    2015-01-01

    Printed or digital textbooks contain texts accompanied by various kinds of visualisation. Successful comprehension of these materials requires integrating verbal and graphical information. This study investigates the time course of processing an illustrated text through eye-tracking methodology in the school context. The aims were to identify…

  18. Integrative Processing of Verbal and Graphical Information during Re-Reading Predicts Learning from Illustrated Text: An Eye-Movement Study

    ERIC Educational Resources Information Center

    Mason, Lucia; Tornatora, Maria Caterina; Pluchino, Patrik

    2015-01-01

    Printed or digital textbooks contain texts accompanied by various kinds of visualisation. Successful comprehension of these materials requires integrating verbal and graphical information. This study investigates the time course of processing an illustrated text through eye-tracking methodology in the school context. The aims were to identify…

  19. Optimization of image processing algorithms on mobile platforms

    NASA Astrophysics Data System (ADS)

    Poudel, Pramod; Shirvaikar, Mukul

    2011-03-01

    This work presents a technique to optimize popular image processing algorithms on mobile platforms such as cell phones, net-books and personal digital assistants (PDAs). The increasing demand for video applications like context-aware computing on mobile embedded systems requires the use of computationally intensive image processing algorithms. The system engineer has a mandate to optimize them so as to meet real-time deadlines. A methodology to take advantage of the asymmetric dual-core processor, which includes an ARM and a DSP core supported by shared memory, is presented with implementation details. The target platform chosen is the popular OMAP 3530 processor for embedded media systems. It has an asymmetric dual-core architecture with an ARM Cortex-A8 and a TMS320C64x Digital Signal Processor (DSP). The development platform was the BeagleBoard with 256 MB of NAND RAM and 256 MB SDRAM memory. The basic image correlation algorithm is chosen for benchmarking as it finds widespread application for various template matching tasks such as face-recognition. The basic algorithm prototypes conform to OpenCV, a popular computer vision library. OpenCV algorithms can be easily ported to the ARM core which runs a popular operating system such as Linux or Windows CE. However, the DSP is architecturally more efficient at handling DFT algorithms. The algorithms are tested on a variety of images and performance results are presented measuring the speedup obtained due to dual-core implementation. A major advantage of this approach is that it allows the ARM processor to perform important real-time tasks, while the DSP addresses performance-hungry algorithms.

  20. Monte Carlo-based fluorescence molecular tomography reconstruction method accelerated by a cluster of graphic processing units

    NASA Astrophysics Data System (ADS)

    Quan, Guotao; Gong, Hui; Deng, Yong; Fu, Jianwei; Luo, Qingming

    2011-02-01

    High-speed fluorescence molecular tomography (FMT) reconstruction for 3-D heterogeneous media is still one of the most challenging problems in diffusive optical fluorescence imaging. In this paper, we propose a fast FMT reconstruction method that is based on Monte Carlo (MC) simulation and accelerated by a cluster of graphics processing units (GPUs). Based on the Message Passing Interface standard, we modified the MC code for fast FMT reconstruction, and different Green's functions representing the flux distribution in media are calculated simultaneously by different GPUs in the cluster. A load-balancing method was also developed to increase the computational efficiency. By applying the Fréchet derivative, a Jacobian matrix is formed to reconstruct the distribution of the fluorochromes using the calculated Green's functions. Phantom experiments have shown that only 10 min are required to get reconstruction results with a cluster of 6 GPUs, rather than 6 h with a cluster of multiple dual opteron CPU nodes. Because of the advantages of high accuracy and suitability for 3-D heterogeneity media with refractive-index-unmatched boundaries from the MC simulation, the GPU cluster-accelerated method provides a reliable approach to high-speed reconstruction for FMT imaging.

  1. Communication: A reduced scaling J-engine based reformulation of SOS-MP2 using graphics processing units.

    PubMed

    Maurer, S A; Kussmann, J; Ochsenfeld, C

    2014-08-07

    We present a low-prefactor, cubically scaling scaled-opposite-spin second-order Møller-Plesset perturbation theory (SOS-MP2) method which is highly suitable for massively parallel architectures like graphics processing units (GPU). The scaling is reduced from O(N⁵) to O(N³) by a reformulation of the MP2-expression in the atomic orbital basis via Laplace transformation and the resolution-of-the-identity (RI) approximation of the integrals in combination with efficient sparse algebra for the 3-center integral transformation. In contrast to previous works that employ GPUs for post Hartree-Fock calculations, we do not simply employ GPU-based linear algebra libraries to accelerate the conventional algorithm. Instead, our reformulation allows to replace the rate-determining contraction step with a modified J-engine algorithm, that has been proven to be highly efficient on GPUs. Thus, our SOS-MP2 scheme enables us to treat large molecular systems in an accurate and efficient manner on a single GPU-server.

  2. Fluorescence molecular tomography using a two-step three-dimensional shape-based reconstruction with graphics processing unit acceleration.

    PubMed

    Wang, Daifa; Qiao, Huiting; Song, Xiaolei; Fan, Yubo; Li, Deyu

    2012-12-20

    In fluorescence molecular tomography, the accurate and stable reconstruction of fluorescence-labeled targets remains a challenge for wide application of this imaging modality. Here we propose a two-step three-dimensional shape-based reconstruction method using graphics processing unit (GPU) acceleration. In this method, the fluorophore distribution is assumed as the sum of ellipsoids with piecewise-constant fluorescence intensities. The inverse problem is formulated as a constrained nonlinear least-squares problem with respect to shape parameters, leading to much less ill-posedness as the number of unknowns is greatly reduced. Considering that various shape parameters contribute differently to the boundary measurements, we use a two-step optimization algorithm to handle them in a distinctive way and also stabilize the reconstruction. Additionally, the GPU acceleration is employed for finite-element-method-based calculation of the objective function value and the Jacobian matrix, which reduces the total optimization time from around 10 min to less than 1 min. The numerical simulations show that our method can accurately reconstruct multiple targets of various shapes while the conventional voxel-based reconstruction cannot separate the nearby targets. Moreover, the two-step optimization can tolerate different initial values in the existence of noises, even when the number of targets is not known a priori. A physical phantom experiment further demonstrates the method's potential in practical applications.

  3. GPUDePiCt: A Parallel Implementation of a Clustering Algorithm for Computing Degenerate Primers on Graphics Processing Units.

    PubMed

    Cickovski, Trevor; Flor, Tiffany; Irving-Sachs, Galen; Novikov, Philip; Parda, James; Narasimhan, Giri

    2015-01-01

    In order to make multiple copies of a target sequence in the laboratory, the technique of Polymerase Chain Reaction (PCR) requires the design of "primers", which are short fragments of nucleotides complementary to the flanking regions of the target sequence. If the same primer is to amplify multiple closely related target sequences, then it is necessary to make the primers "degenerate", which would allow it to hybridize to target sequences with a limited amount of variability that may have been caused by mutations. However, the PCR technique can only allow a limited amount of degeneracy, and therefore the design of degenerate primers requires the identification of reasonably well-conserved regions in the input sequences. We take an existing algorithm for designing degenerate primers that is based on clustering and parallelize it in a web-accessible software package GPUDePiCt, using a shared memory model and the computing power of Graphics Processing Units (GPUs). We test our implementation on large sets of aligned sequences from the human genome and show a multi-fold speedup for clustering using our hybrid GPU/CPU implementation over a pure CPU approach for these sequences, which consist of more than 7,500 nucleotides. We also demonstrate that this speedup is consistent over larger numbers and longer lengths of aligned sequences.

  4. Real-time photoacoustic and ultrasound dual-modality imaging system facilitated with graphics processing unit and code parallel optimization.

    PubMed

    Yuan, Jie; Xu, Guan; Yu, Yao; Zhou, Yu; Carson, Paul L; Wang, Xueding; Liu, Xiaojun

    2013-08-01

    Photoacoustic tomography (PAT) offers structural and functional imaging of living biological tissue with highly sensitive optical absorption contrast and excellent spatial resolution comparable to medical ultrasound (US) imaging. We report the development of a fully integrated PAT and US dual-modality imaging system, which performs signal scanning, image reconstruction, and display for both photoacoustic (PA) and US imaging all in a truly real-time manner. The back-projection (BP) algorithm for PA image reconstruction is optimized to reduce the computational cost and facilitate parallel computation on a state of the art graphics processing unit (GPU) card. For the first time, PAT and US imaging of the same object can be conducted simultaneously and continuously, at a real-time frame rate, presently limited by the laser repetition rate of 10 Hz. Noninvasive PAT and US imaging of human peripheral joints in vivo were achieved, demonstrating the satisfactory image quality realized with this system. Another experiment, simultaneous PAT and US imaging of contrast agent flowing through an artificial vessel, was conducted to verify the performance of this system for imaging fast biological events. The GPU-based image reconstruction software code for this dual-modality system is open source and available for download from http://sourceforge.net/projects/patrealtime.

  5. High-performance iterative electron tomography reconstruction with long-object compensation using graphics processing units (GPUs).

    PubMed

    Xu, Wei; Xu, Fang; Jones, Mel; Keszthelyi, Bettina; Sedat, John; Agard, David; Mueller, Klaus

    2010-08-01

    Iterative reconstruction algorithms pose tremendous computational challenges for 3D Electron Tomography (ET). Similar to X-ray Computed Tomography (CT), graphics processing units (GPUs) offer an affordable platform to meet these demands. In this paper, we outline a CT reconstruction approach for ET that is optimized for the special demands and application setting of ET. It exploits the fact that ET is typically cast as a parallel-beam configuration, which allows the design of an efficient data management scheme, using a holistic sinogram-based representation. Our method produces speedups of about an order of magnitude over a previously proposed GPU-based ET implementation, on similar hardware, and completes an iterative 3D reconstruction of practical problem size within minutes. We also describe a novel GPU-amenable approach that effectively compensates for reconstruction errors resulting from the TEM data acquisition on (long) samples which extend the width of the parallel TEM beam. We show that the vignetting artifacts typically arising at the periphery of non-compensated ET reconstructions are completely eliminated when our method is employed. Copyright 2010 Elsevier Inc. All rights reserved.

  6. Monte Carlo-based fluorescence molecular tomography reconstruction method accelerated by a cluster of graphic processing units.

    PubMed

    Quan, Guotao; Gong, Hui; Deng, Yong; Fu, Jianwei; Luo, Qingming

    2011-02-01

    High-speed fluorescence molecular tomography (FMT) reconstruction for 3-D heterogeneous media is still one of the most challenging problems in diffusive optical fluorescence imaging. In this paper, we propose a fast FMT reconstruction method that is based on Monte Carlo (MC) simulation and accelerated by a cluster of graphics processing units (GPUs). Based on the Message Passing Interface standard, we modified the MC code for fast FMT reconstruction, and different Green's functions representing the flux distribution in media are calculated simultaneously by different GPUs in the cluster. A load-balancing method was also developed to increase the computational efficiency. By applying the Fréchet derivative, a Jacobian matrix is formed to reconstruct the distribution of the fluorochromes using the calculated Green's functions. Phantom experiments have shown that only 10 min are required to get reconstruction results with a cluster of 6 GPUs, rather than 6 h with a cluster of multiple dual opteron CPU nodes. Because of the advantages of high accuracy and suitability for 3-D heterogeneity media with refractive-index-unmatched boundaries from the MC simulation, the GPU cluster-accelerated method provides a reliable approach to high-speed reconstruction for FMT imaging.

  7. Communication: A reduced scaling J-engine based reformulation of SOS-MP2 using graphics processing units

    SciTech Connect

    Maurer, S. A.; Kussmann, J.; Ochsenfeld, C.

    2014-08-07

    We present a low-prefactor, cubically scaling scaled-opposite-spin second-order Møller-Plesset perturbation theory (SOS-MP2) method which is highly suitable for massively parallel architectures like graphics processing units (GPU). The scaling is reduced from O(N{sup 5}) to O(N{sup 3}) by a reformulation of the MP2-expression in the atomic orbital basis via Laplace transformation and the resolution-of-the-identity (RI) approximation of the integrals in combination with efficient sparse algebra for the 3-center integral transformation. In contrast to previous works that employ GPUs for post Hartree-Fock calculations, we do not simply employ GPU-based linear algebra libraries to accelerate the conventional algorithm. Instead, our reformulation allows to replace the rate-determining contraction step with a modified J-engine algorithm, that has been proven to be highly efficient on GPUs. Thus, our SOS-MP2 scheme enables us to treat large molecular systems in an accurate and efficient manner on a single GPU-server.

  8. Efficient localization and spectral estimation of an unknown number of ocean acoustic sources using a graphics processing unit.

    PubMed

    Dosso, Stan E; Dettmer, Jan; Wilmut, Michael J

    2015-11-01

    This paper develops a matched-field approach to localization and spectral estimation of an unknown number of ocean acoustic sources employing massively parallel implementation on a graphics processing unit (GPU) for real-time efficiency. A Bayesian formulation is developed in which the locations and complex spectra of multiple sources and noise variances are considered unknown random variables, and the Bayesian information criterion is minimized to estimate these parameters, as well as the number of sources present. Optimization is carried out using simulated annealing and includes steps that attempt to add/delete sources to/from the model. Closed-form maximum-likelihood (ML) solutions for source spectra and noise variances in terms of the source locations allow these parameters to be sampled implicitly, substantially reducing the dimensionality of the inversion. Source sampling, addition, and deletion are based on joint conditional probability distributions for source range and depth, which incorporate the ML spectral estimates. Computing these conditionals requires solving a very large number of systems of equations, which is carried out in parallel on a GPU, improving efficiency by 2 orders of magnitude. Simulated examples illustrate localizations and spectral estimation for a large number of sources (up to eight), and investigate mitigation of environmental mismatch via efficient multiple-frequency inversion.

  9. Real-time photoacoustic and ultrasound dual-modality imaging system facilitated with graphics processing unit and code parallel optimization

    NASA Astrophysics Data System (ADS)

    Yuan, Jie; Xu, Guan; Yu, Yao; Zhou, Yu; Carson, Paul L.; Wang, Xueding; Liu, Xiaojun

    2013-08-01

    Photoacoustic tomography (PAT) offers structural and functional imaging of living biological tissue with highly sensitive optical absorption contrast and excellent spatial resolution comparable to medical ultrasound (US) imaging. We report the development of a fully integrated PAT and US dual-modality imaging system, which performs signal scanning, image reconstruction, and display for both photoacoustic (PA) and US imaging all in a truly real-time manner. The back-projection (BP) algorithm for PA image reconstruction is optimized to reduce the computational cost and facilitate parallel computation on a state of the art graphics processing unit (GPU) card. For the first time, PAT and US imaging of the same object can be conducted simultaneously and continuously, at a real-time frame rate, presently limited by the laser repetition rate of 10 Hz. Noninvasive PAT and US imaging of human peripheral joints in vivo were achieved, demonstrating the satisfactory image quality realized with this system. Another experiment, simultaneous PAT and US imaging of contrast agent flowing through an artificial vessel, was conducted to verify the performance of this system for imaging fast biological events. The GPU-based image reconstruction software code for this dual-modality system is open source and available for download from http://sourceforge.net/projects/patrealtime.

  10. High-Performance Iterative Electron Tomography Reconstruction with Long-Object Compensation using Graphics Processing Units (GPUs)

    PubMed Central

    Xu, Wei; Xu, Fang; Jones, Mel; Keszthelyi, Bettina; Sedat, John; Agard, David; Mueller, Klaus

    2010-01-01

    Iterative reconstruction algorithms pose tremendous computational challenges for 3D Electron Tomography (ET). Similar to X-ray Computed Tomography (CT), graphics processing units (GPUs) offer an affordable platform to meet these demands. In this paper, we outline a CT reconstruction approach for ET that is optimized for the special demands and application setting of ET. It exploits the fact that ET is typically cast as a parallel-beam configuration, which allows the design of an efficient data management scheme, using a holistic sinogram-based representation. Our method produces speedups of about an order of magnitude over a previously proposed GPU-based ET implementation, on similar hardware, and completes an iterative 3D reconstruction of practical problem size within minutes. We also describe a novel GPU-amenable approach that effectively compensates for reconstruction errors resulting from the TEM data acquisition on (long) samples which extend the width of the parallel TEM beam. We show that the vignetting artifacts typically arising at the periphery of non-compensated ET reconstructions are completely eliminated when our method is employed. PMID:20371381

  11. Graphical processing unit-based machine vision system for simultaneous measurement of shrinkage and soil release in fabrics

    NASA Astrophysics Data System (ADS)

    Kamalakannan, Sridharan; Gururajan, Arunkumar; Hill, Matthew; Shahriar, Muneem; Sari-Sarraf, Hamed; Hequet, Eric F.

    2010-04-01

    We present a machine vision system for simultaneous and objective evaluation of two important functional attributes of a fabric, namely, soil release and shrinkage. Soil release corresponds to the efficacy of the fabric in releasing stains after laundering and shrinkage essentially quantifies the dimensional changes in the fabric postlaundering. Within the framework of the proposed machine vision scheme, the samples are prepared using a prescribed procedure and subsequently digitized using a commercially available off-the-shelf scanner. Shrinkage measurements in the lengthwise and widthwise directions are obtained by detecting and measuring the distance between two pairs of appropriately placed markers. In addition, these shrinkage markers help in producing estimates of the location of the center of the stain on the fabric image. Using this information, a customized adaptive statistical snake is initialized, which evolves based on region statistics to segment the stain. Once the stain is localized, appropriate measurements can be extracted from the stain and the background image that can help in objectively quantifying stain release. In addition, the statistical snakes algorithm has been parallelized on a graphical processing unit, which allows for rapid evolution of multiple snakes. This, in turn, translates to the fact that multiple stains can be detected and segmented in a computationally efficient fashion. Finally, the aforementioned scheme is validated on a sizeable set of fabric images and the promising nature of the results help in establishing the efficacy of the proposed approach.

  12. Performance of heterogeneous computing with graphics processing unit and many integrated core for hartree potential calculations on a numerical grid.

    PubMed

    Choi, Sunghwan; Kwon, Oh-Kyoung; Kim, Jaewook; Kim, Woo Youn

    2016-09-15

    We investigated the performance of heterogeneous computing with graphics processing units (GPUs) and many integrated core (MIC) with 20 CPU cores (20×CPU). As a practical example toward large scale electronic structure calculations using grid-based methods, we evaluated the Hartree potentials of silver nanoparticles with various sizes (3.1, 3.7, 4.9, 6.1, and 6.9 nm) via a direct integral method supported by the sinc basis set. The so-called work stealing scheduler was used for efficient heterogeneous computing via the balanced dynamic distribution of workloads between all processors on a given architecture without any prior information on their individual performances. 20×CPU + 1GPU was up to ∼1.5 and ∼3.1 times faster than 1GPU and 20×CPU, respectively. 20×CPU + 2GPU was ∼4.3 times faster than 20×CPU. The performance enhancement by CPU + MIC was considerably lower than expected because of the large initialization overhead of MIC, although its theoretical performance is similar with that of CPU + GPU. © 2016 Wiley Periodicals, Inc.

  13. A graphics processing unit accelerated motion correction algorithm and modular system for real-time fMRI.

    PubMed

    Scheinost, Dustin; Hampson, Michelle; Qiu, Maolin; Bhawnani, Jitendra; Constable, R Todd; Papademetris, Xenophon

    2013-07-01

    Real-time functional magnetic resonance imaging (rt-fMRI) has recently gained interest as a possible means to facilitate the learning of certain behaviors. However, rt-fMRI is limited by processing speed and available software, and continued development is needed for rt-fMRI to progress further and become feasible for clinical use. In this work, we present an open-source rt-fMRI system for biofeedback powered by a novel Graphics Processing Unit (GPU) accelerated motion correction strategy as part of the BioImage Suite project ( www.bioimagesuite.org ). Our system contributes to the development of rt-fMRI by presenting a motion correction algorithm that provides an estimate of motion with essentially no processing delay as well as a modular rt-fMRI system design. Using empirical data from rt-fMRI scans, we assessed the quality of motion correction in this new system. The present algorithm performed comparably to standard (non real-time) offline methods and outperformed other real-time methods based on zero order interpolation of motion parameters. The modular approach to the rt-fMRI system allows the system to be flexible to the experiment and feedback design, a valuable feature for many applications. We illustrate the flexibility of the system by describing several of our ongoing studies. Our hope is that continuing development of open-source rt-fMRI algorithms and software will make this new technology more accessible and adaptable, and will thereby accelerate its application in the clinical and cognitive neurosciences.

  14. Simultaneous reconstruction of multiple depth images without off-focus points in integral imaging using a graphics processing unit.

    PubMed

    Yi, Faliu; Lee, Jieun; Moon, Inkyu

    2014-05-01

    The reconstruction of multiple depth images with a ray back-propagation algorithm in three-dimensional (3D) computational integral imaging is computationally burdensome. Further, a reconstructed depth image consists of a focus and an off-focus area. Focus areas are 3D points on the surface of an object that are located at the reconstructed depth, while off-focus areas include 3D points in free-space that do not belong to any object surface in 3D space. Generally, without being removed, the presence of an off-focus area would adversely affect the high-level analysis of a 3D object, including its classification, recognition, and tracking. Here, we use a graphics processing unit (GPU) that supports parallel processing with multiple processors to simultaneously reconstruct multiple depth images using a lookup table containing the shifted values along the x and y directions for each elemental image in a given depth range. Moreover, each 3D point on a depth image can be measured by analyzing its statistical variance with its corresponding samples, which are captured by the two-dimensional (2D) elemental images. These statistical variances can be used to classify depth image pixels as either focus or off-focus points. At this stage, the measurement of focus and off-focus points in multiple depth images is also implemented in parallel on a GPU. Our proposed method is conducted based on the assumption that there is no occlusion of the 3D object during the capture stage of the integral imaging process. Experimental results have demonstrated that this method is capable of removing off-focus points in the reconstructed depth image. The results also showed that using a GPU to remove the off-focus points could greatly improve the overall computational speed compared with using a CPU.

  15. A Graphics Processing Unit Accelerated Motion Correction Algorithm and Modular System for Real-time fMRI

    PubMed Central

    Scheinost, Dustin; Hampson, Michelle; Qiu, Maolin; Bhawnani, Jitendra; Constable, R. Todd; Papademetris, Xenophon

    2013-01-01

    Real-time functional magnetic resonance imaging (rt-fMRI) has recently gained interest as a possible means to facilitate the learning of certain behaviors. However, rt-fMRI is limited by processing speed and available software, and continued development is needed for rt-fMRI to progress further and become feasible for clinical use. In this work, we present an open-source rt-fMRI system for biofeedback powered by a novel Graphics Processing Unit (GPU) accelerated motion correction strategy as part of the BioImage Suite project (www.bioimagesuite.org). Our system contributes to the development of rt-fMRI by presenting a motion correction algorithm that provides an estimate of motion with essentially no processing delay as well as a modular rt-fMRI system design. Using empirical data from rt-fMRI scans, we assessed the quality of motion correction in this new system. The present algorithm performed comparably to standard (non real-time) offline methods and outperformed other real-time methods based on zero order interpolation of motion parameters. The modular approach to the rt-fMRI system allows the system to be flexible to the experiment and feedback design, a valuable feature for many applications. We illustrate the flexibility of the system by describing several of our ongoing studies. Our hope is that continuing development of open-source rt-fMRI algorithms and software will make this new technology more accessible and adaptable, and will thereby accelerate its application in the clinical and cognitive neurosciences. PMID:23319241

  16. Monte Carlo standardless approach for laser induced breakdown spectroscopy based on massive parallel graphic processing unit computing

    NASA Astrophysics Data System (ADS)

    Demidov, A.; Eschlböck-Fuchs, S.; Kazakov, A. Ya.; Gornushkin, I. B.; Kolmhofer, P. J.; Pedarnig, J. D.; Huber, N.; Heitz, J.; Schmid, T.; Rössler, R.; Panne, U.

    2016-11-01

    The improved Monte-Carlo (MC) method for standard-less analysis in laser induced breakdown spectroscopy (LIBS) is presented. Concentrations in MC LIBS are found by fitting model-generated synthetic spectra to experimental spectra. The current version of MC LIBS is based on the graphic processing unit (GPU) computation and reduces the analysis time down to several seconds per spectrum/sample. The previous version of MC LIBS which was based on the central processing unit (CPU) computation requested unacceptably long analysis times of 10's minutes per spectrum/sample. The reduction of the computational time is achieved through the massively parallel computing on the GPU which embeds thousands of co-processors. It is shown that the number of iterations on the GPU exceeds that on the CPU by a factor > 1000 for the 5-dimentional parameter space and yet requires > 10-fold shorter computational time. The improved GPU-MC LIBS outperforms the CPU-MS LIBS in terms of accuracy, precision, and analysis time. The performance is tested on LIBS-spectra obtained from pelletized powders of metal oxides consisting of CaO, Fe2O3, MgO, and TiO2 that simulated by-products of steel industry, steel slags. It is demonstrated that GPU-based MC LIBS is capable of rapid multi-element analysis with relative error between 1 and 10's percent that is sufficient for industrial applications (e.g. steel slag analysis). The results of the improved GPU-based MC LIBS are positively compared to that of the CPU-based MC LIBS as well as to the results of the standard calibration-free (CF) LIBS based on the Boltzmann plot method.

  17. Designing and Implementing an OVERFLOW Reader for ParaView and Comparing Performance Between Central Processing Units and Graphical Processing Units

    NASA Technical Reports Server (NTRS)

    Chawner, David M.; Gomez, Ray J.

    2010-01-01

    In the Applied Aerosciences and CFD branch at Johnson Space Center, computational simulations are run that face many challenges. Two of which are the ability to customize software for specialized needs and the need to run simulations as fast as possible. There are many different tools that are used for running these simulations and each one has its own pros and cons. Once these simulations are run, there needs to be software capable of visualizing the results in an appealing manner. Some of this software is called open source, meaning that anyone can edit the source code to make modifications and distribute it to all other users in a future release. This is very useful, especially in this branch where many different tools are being used. File readers can be written to load any file format into a program, to ease the bridging from one tool to another. Programming such a reader requires knowledge of the file format that is being read as well as the equations necessary to obtain the derived values after loading. When running these CFD simulations, extremely large files are being loaded and having values being calculated. These simulations usually take a few hours to complete, even on the fastest machines. Graphics processing units (GPUs) are usually used to load the graphics for computers; however, in recent years, GPUs are being used for more generic applications because of the speed of these processors. Applications run on GPUs have been known to run up to forty times faster than they would on normal central processing units (CPUs). If these CFD programs are extended to run on GPUs, the amount of time they would require to complete would be much less. This would allow more simulations to be run in the same amount of time and possibly perform more complex computations.

  18. Computer-Aided Process and Tools for Mobile Software Acquisition

    DTIC Science & Technology

    2013-07-30

    against the execution trace of the mobile apps using logfile-based runtime verification. A case study of formally specifying, validating, and verifying a...the automatically generated statechart code against the execution trace of the mobile apps using logfile-based runtime verification. A case study ...12  Case Study

  19. Computer-assisted information graphics from the graphic design perspective

    SciTech Connect

    Marcus, A.

    1983-11-01

    Computer-assisted information graphics can benefit by adopting some of the working processes, principles, and areas of concern typical of information-oriented graphic designers. A review of some basic design considerations is followed by a discussion of the creation and design of a prototype nonverbal narrative which combines symbols, charts, maps, and diagrams.

  20. Graphic Communications--Graphic Arts. Ohio's Competency Analysis Profile.

    ERIC Educational Resources Information Center

    Ohio State Univ., Columbus. Vocational Instructional Materials Lab.

    This Ohio Competency Analysis Profile (OCAP), derived from a modified Developing a Curriculum (DACUM) process, is a current comprehensive and verified employer competency program list for graphic communications--graphic arts. Each unit (with or without subunits) contains competencies and competency builders that identify the occupational,…

  1. Runoff and solute mobilization processes in a semiarid headwater catchment

    NASA Astrophysics Data System (ADS)

    Hughes, Justin D.; Khan, Shahbaz; Crosbie, Russell S.; Helliwell, Stuart; Michalk, David L.

    2007-09-01

    Runoff and solute transport processes contributing to streamflow were determined in a small headwater catchment in the eastern Murray-Darling Basin of Australia using hydrometric and tracer methods. Streamflow and electrical conductivity were monitored from two gauges draining a portion of the upper catchment area (UCA) and a saline scalded area, respectively. Runoff in the UCA was related to the formation of a seasonally perched aquifer in the near-surface zone (0-0.4 m). A similar process was responsible for runoff generation in the saline scalded area. However, saturation in the scald area was related to the proximity of groundwater rather than low subsurface hydraulic conductivity. Because of higher antecedent water content, runoff commenced earlier in winter from the scald than did the UCA. Additionally, areal runoff from the scald was far greater than from the UCA. Total runoff from the UCA was higher than the scald (15.7 versus 3.5 mL), but salt export was far lower (0.6 and 5.4 t for the UCA and scald area, respectively) since salinity of the scald runoff was far higher than that from the UCA, indicating the potential impact of saline scalded areas at the catchment scale. End-member mixing analysis modeling using six solutes indicated that most runoff produced from the scald was "new" (40-71%) despite the proximity of the groundwater surface and the high antecedent moisture levels. This is a reflection of the very low hydraulic conductivity of soils in the study area. Nearly all chloride exported to the stream from the scald emanated from the near-surface zone (77-87%). Runoff and solute mobilization processes depend upon seasonal saturation occurring in the near-surface zone during periods of low evaporative demand and generation of saturated overland flow.

  2. Compute-unified device architecture implementation of a block-matching algorithm for multiple graphical processing unit cards

    NASA Astrophysics Data System (ADS)

    Massanes, Francesc; Cadennes, Marie; Brankov, Jovan G.

    2011-07-01

    We describe and evaluate a fast implementation of a classical block-matching motion estimation algorithm for multiple graphical processing units (GPUs) using the compute unified device architecture computing engine. The implemented block-matching algorithm uses summed absolute difference error criterion and full grid search (FS) for finding optimal block displacement. In this evaluation, we compared the execution time of a GPU and CPU implementation for images of various sizes, using integer and noninteger search grids. The results show that use of a GPU card can shorten computation time by a factor of 200 times for integer and 1000 times for a noninteger search grid. The additional speedup for a noninteger search grid comes from the fact that GPU has built-in hardware for image interpolation. Further, when using multiple GPU cards, the presented evaluation shows the importance of the data splitting method across multiple cards, but an almost linear speedup with a number of cards is achievable. In addition, we compared the execution time of the proposed FS GPU implementation with two existing, highly optimized nonfull grid search CPU-based motion estimations methods, namely implementation of the Pyramidal Lucas Kanade Optical flow algorithm in OpenCV and simplified unsymmetrical multi-hexagon search in H.264/AVC standard. In these comparisons, FS GPU implementation still showed modest improvement even though the computational complexity of FS GPU implementation is substantially higher than non-FS CPU implementation. We also demonstrated that for an image sequence of 720 × 480 pixels in resolution commonly used in video surveillance, the proposed GPU implementation is sufficiently fast for real-time motion estimation at 30 frames-per-second using two NVIDIA C1060 Tesla GPU cards.

  3. Compute-unified device architecture implementation of a block-matching algorithm for multiple graphical processing unit cards.

    PubMed

    Massanes, Francesc; Cadennes, Marie; Brankov, Jovan G

    2011-07-01

    In this paper we describe and evaluate a fast implementation of a classical block matching motion estimation algorithm for multiple Graphical Processing Units (GPUs) using the Compute Unified Device Architecture (CUDA) computing engine. The implemented block matching algorithm (BMA) uses summed absolute difference (SAD) error criterion and full grid search (FS) for finding optimal block displacement. In this evaluation we compared the execution time of a GPU and CPU implementation for images of various sizes, using integer and non-integer search grids.The results show that use of a GPU card can shorten computation time by a factor of 200 times for integer and 1000 times for a non-integer search grid. The additional speedup for non-integer search grid comes from the fact that GPU has built-in hardware for image interpolation. Further, when using multiple GPU cards, the presented evaluation shows the importance of the data splitting method across multiple cards, but an almost linear speedup with a number of cards is achievable.In addition we compared execution time of the proposed FS GPU implementation with two existing, highly optimized non-full grid search CPU based motion estimations methods, namely implementation of the Pyramidal Lucas Kanade Optical flow algorithm in OpenCV and Simplified Unsymmetrical multi-Hexagon search in H.264/AVC standard. In these comparisons, FS GPU implementation still showed modest improvement even though the computational complexity of FS GPU implementation is substantially higher than non-FS CPU implementation.We also demonstrated that for an image sequence of 720×480 pixels in resolution, commonly used in video surveillance, the proposed GPU implementation is sufficiently fast for real-time motion estimation at 30 frames-per-second using two NVIDIA C1060 Tesla GPU cards.

  4. Compute-unified device architecture implementation of a block-matching algorithm for multiple graphical processing unit cards

    PubMed Central

    Massanes, Francesc; Cadennes, Marie; Brankov, Jovan G.

    2012-01-01

    In this paper we describe and evaluate a fast implementation of a classical block matching motion estimation algorithm for multiple Graphical Processing Units (GPUs) using the Compute Unified Device Architecture (CUDA) computing engine. The implemented block matching algorithm (BMA) uses summed absolute difference (SAD) error criterion and full grid search (FS) for finding optimal block displacement. In this evaluation we compared the execution time of a GPU and CPU implementation for images of various sizes, using integer and non-integer search grids. The results show that use of a GPU card can shorten computation time by a factor of 200 times for integer and 1000 times for a non-integer search grid. The additional speedup for non-integer search grid comes from the fact that GPU has built-in hardware for image interpolation. Further, when using multiple GPU cards, the presented evaluation shows the importance of the data splitting method across multiple cards, but an almost linear speedup with a number of cards is achievable. In addition we compared execution time of the proposed FS GPU implementation with two existing, highly optimized non-full grid search CPU based motion estimations methods, namely implementation of the Pyramidal Lucas Kanade Optical flow algorithm in OpenCV and Simplified Unsymmetrical multi-Hexagon search in H.264/AVC standard. In these comparisons, FS GPU implementation still showed modest improvement even though the computational complexity of FS GPU implementation is substantially higher than non-FS CPU implementation. We also demonstrated that for an image sequence of 720×480 pixels in resolution, commonly used in video surveillance, the proposed GPU implementation is sufficiently fast for real-time motion estimation at 30 frames-per-second using two NVIDIA C1060 Tesla GPU cards. PMID:22347787

  5. Computer Graphics Verification

    NASA Technical Reports Server (NTRS)

    1992-01-01

    Video processing creates technical animation sequences using studio quality equipment to realistically represent fluid flow over space shuttle surfaces, helicopter rotors, and turbine blades.Computer systems Co-op, Tim Weatherford, performing computer graphics verification. Part of Co-op brochure.

  6. A Physics-Based Modeling and Real-Time Simulation of Biomechanical Diffusion Process Through Optical Imaged Alveolar Tissues on Graphical Processing Units

    NASA Astrophysics Data System (ADS)

    Kaya, Ilhan; Santhanam, Anand P.; Lee, Kye-Sung; Meemon, Panomsak; Papp, Nicolene; Rolland, Jannick P.

    Tissue engineering has broad applications from creating the much-needed engineered tissue and organ structures for regenerative medicine to providing in vitro testbeds for drug testing. In the latter, application domain, creating alveolar lung tissue, and simulating the diffusion process of oxygen and other possible agents from the air into the blood stream as well as modeling the removal of carbon dioxide and other possible entities from the blood stream are of critical importance to simulating lung functions in various environments. In this chapter, we propose a physics-based model to simulate the alveolar gas exchange and the alveolar process. Tissue engineers, for the first time, may utilize these simulation results to better understand the underlying gas exchange process and properly adjust the tissue growing cycles. In this work, alveolar tissues are imaged by means of an optical coherence microscopy (OCM) system developed in our laboratory. As a consequence, 3D alveoli tissue data with its inherent complex boundary is taken as input to the system, which is based on computational fluid mechanics in simulating the alveolar gas exchange. The visualization and the simulation of diffusion of the air into the blood through the alveoli tissue is performed using a state-of-art graphics processing unit (GPU). Results show the real-time simulation of the gas exchange through the 2D alveoli tissue.

  7. Executive function processes predict mobility outcomes in older adults.

    PubMed

    Gothe, Neha P; Fanning, Jason; Awick, Elizabeth; Chung, David; Wójcicki, Thomas R; Olson, Erin A; Mullen, Sean P; Voss, Michelle; Erickson, Kirk I; Kramer, Arthur F; McAuley, Edward

    2014-02-01

    To examine the relationship between performance on executive function measures and subsequent mobility outcomes in community-dwelling older adults. Randomized controlled clinical trial. Champaign-Urbana, Illinois. Community-dwelling older adults (N = 179; mean age 66.4). A 12-month exercise trial with two arms: an aerobic exercise group and a stretching and strengthening group. Established cognitive tests of executive function (flanker task, task switching, and a dual-task paradigm) and the Wisconsin card sort test. Mobility was assessed using the timed 8-foot up and go test and times to climb up and down a flight of stairs. Participants completed the cognitive tests at baseline and the mobility measures at baseline and after 12 months of the intervention. Multiple regression analyses were conducted to determine whether baseline executive function predicted postintervention functional performance after controlling for age, sex, education, cardiorespiratory fitness, and baseline mobility levels. Selective baseline executive function measurements, particularly performance on the flanker task (β = 0.15-0.17) and the Wisconsin card sort test (β = 0.11-0.16) consistently predicted mobility outcomes at 12 months. The estimates were in the expected direction, such that better baseline performance on the executive function measures predicted better performance on the timed mobility tests independent of intervention. Executive functions of inhibitory control, mental set shifting, and attentional flexibility were predictive of functional mobility. Given the literature associating mobility limitations with disability, morbidity, and mortality, these results are important for understanding the antecedents to poor mobility function that well-designed interventions to improve cognitive performance can attenuate. © 2014, Copyright the Authors Journal compilation © 2014, The American Geriatrics Society.

  8. Measuring Cognitive Load in Test Items: Static Graphics versus Animated Graphics

    ERIC Educational Resources Information Center

    Dindar, M.; Kabakçi Yurdakul, I.; Inan Dönmez, F.

    2015-01-01

    The majority of multimedia learning studies focus on the use of graphics in learning process but very few of them examine the role of graphics in testing students' knowledge. This study investigates the use of static graphics versus animated graphics in a computer-based English achievement test from a cognitive load theory perspective. Three…

  9. Measuring Cognitive Load in Test Items: Static Graphics versus Animated Graphics

    ERIC Educational Resources Information Center

    Dindar, M.; Kabakçi Yurdakul, I.; Inan Dönmez, F.

    2015-01-01

    The majority of multimedia learning studies focus on the use of graphics in learning process but very few of them examine the role of graphics in testing students' knowledge. This study investigates the use of static graphics versus animated graphics in a computer-based English achievement test from a cognitive load theory perspective. Three…

  10. Developing Online Multimodal Verbal Communication to Enhance the Writing Process in an Audio-Graphic Conferencing Environment

    ERIC Educational Resources Information Center

    Ciekanski, Maud; Chanier, Thierry

    2008-01-01

    Over the last decade, most studies in Computer-Mediated Communication (CMC) have highlighted how online synchronous learning environments implement a new literacy related to multimodal communication. The environment used in our experiment is based on a synchronous audio-graphic conferencing tool. This study concerns false beginners in an English…

  11. Real time processing of Fourier domain optical coherence tomography with fixed-pattern noise removal by partial median subtraction using a graphics processing unit.

    PubMed

    Watanabe, Yuuki

    2012-05-01

    The author presents a graphics processing unit (GPU) programming for real-time Fourier domain optical coherence tomography (FD-OCT) with fixed-pattern noise removal by subtracting mean and median. In general, the fixed-pattern noise can be removed by the averaged spectrum from the many spectra of an actual measurement. However, a mean-spectrum results in artifacts as residual lateral lines caused by a small number of high-reflective points on a sample surface. These artifacts can be eliminated from OCT images by using medians instead of means. However, median calculations that are based on a sorting algorithm can generate a large amount of computation time. With the developed GPU programming, highly reflective surface regions were obtained by calculating the standard deviation of the Fourier transformed data in the lateral direction. The medians and means were then subtracted at the observed regions and other regions, such as backgrounds. When the median calculation was less than 256 positions out of a total 512 depths in an OCT image with 1024 A-lines, the GPU processing rate was faster than that of the line scan camera (46.9 kHz). Therefore, processed OCT images can be displayed in real-time using partial medians.

  12. Interoperability framework for communication between processes running on different mobile operating systems

    NASA Astrophysics Data System (ADS)

    Gal, A.; Filip, I.; Dragan, F.

    2016-02-01

    As we live in an era where mobile communication is everywhere around us, the necessity to communicate between the variety of the devices we have available becomes even more of an urge. The major impediment to be able to achieve communication between the available devices is the incompatibility between the operating systems running on these devices. In the present paper we propose a framework that will make possible the ability to inter-operate between processes running on different mobile operating systems. The interoperability process will make use of any communication environment which is made available by the mobile devices where the processes are installed. The communication environment is chosen so as the process is optimal in terms of transferring the data between the mobile devices. The paper defines the architecture of the framework, expanding the functionality and interrelation between modules that make up the framework. For the proof of concept, we propose to use three different mobile operating systems installed on three different types of mobile devices. Depending on the various factors related to the structure of the mobile devices and the data type to be transferred, the framework will establish a data transfer protocol that will be used. The framework automates the interoperability process, user intervention being limited to a simple selection from the options that the framework suggests based on the full analysis of structural and functional elements of the mobile devices used in the process.

  13. HLYWD: a program for post-processing data files to generate selected plots or time-lapse graphics

    SciTech Connect

    Munro, J.K. Jr.

    1980-05-01

    The program HLYWD is a post-processor of output files generated by large plasma simulation computations or of data files containing a time sequence of plasma diagnostics. It is intended to be used in a production mode for either type of application; i.e., it allows one to generate along with the graphics sequence, segments containing title, credits to those who performed the work, text to describe the graphics, and acknowledgement of funding agency. The current version is designed to generate 3D plots and allows one to select type of display (linear or semi-log scales), choice of normalization of function values for display purposes, viewing perspective, and an option to allow continuous rotations of surfaces. This program was developed with the intention of being relatively easy to use, reasonably flexible, and requiring a minimum investment of the user's time. It uses the TV80 library of graphics software and ORDERLIB system software on the CDC 7600 at the National Magnetic Fusion Energy Computing Center at Lawrence Livermore Laboratory in California.

  14. Design Graphics

    NASA Technical Reports Server (NTRS)

    1990-01-01

    A mathematician, David R. Hedgley, Jr. developed a computer program that considers whether a line in a graphic model of a three-dimensional object should or should not be visible. Known as the Hidden Line Computer Code, the program automatically removes superfluous lines and displays an object from a specific viewpoint, just as the human eye would see it. An example of how one company uses the program is the experience of Birdair which specializes in production of fabric skylights and stadium covers. The fabric called SHEERFILL is a Teflon coated fiberglass material developed in cooperation with DuPont Company. SHEERFILL glazed structures are either tension structures or air-supported tension structures. Both are formed by patterned fabric sheets supported by a steel or aluminum frame or cable network. Birdair uses the Hidden Line Computer Code, to illustrate a prospective structure to an architect or owner. The program generates a three- dimensional perspective with the hidden lines removed. This program is still used by Birdair and continues to be commercially available to the public.

  15. Dynamic stepping information process method in mobile bio-sensing computing environments.

    PubMed

    Lee, Tae-Gyu; Lee, Seong-Hoon

    2014-01-01

    Recently, the interest toward human longevity free from diseases is being converged as one system frame along with the development of mobile computing environment, diversification of remote medical system and aging society. Such converged system enables implementation of a bioinformatics system created as various supplementary information services by sensing and gathering health conditions and various bio-information of mobile users to set up medical information. The existing bio-information system performs static and identical process without changes after the bio-information process defined at the initial system configuration executes the system. However, such static process indicates ineffective execution in the application of mobile bio-information system performing mobile computing. Especially, an inconvenient duty of having to perform initialization of new definition and execution is accompanied during the process configuration of bio-information system and change of method. This study proposes a dynamic process design and execution method to overcome such ineffective process.

  16. Graphic engine resource management

    NASA Astrophysics Data System (ADS)

    Bautin, Mikhail; Dwarakinath, Ashok; Chiueh, Tzi-cker

    2008-01-01

    Modern consumer-grade 3D graphic cards boast a computation/memory resource that can easily rival or even exceed that of standard desktop PCs. Although these cards are mainly designed for 3D gaming applications, their enormous computational power has attracted developers to port an increasing number of scientific computation programs to these cards, including matrix computation, collision detection, cryptography, database sorting, etc. As more and more applications run on 3D graphic cards, there is a need to allocate the computation/memory resource on these cards among the sharing applications more fairly and efficiently. In this paper, we describe the design, implementation and evaluation of a Graphic Processing Unit (GPU) scheduler based on Deficit Round Robin scheduling that successfully allocates to every process an equal share of the GPU time regardless of their demand. This scheduler, called GERM, estimates the execution time of each GPU command group based on dynamically collected statistics, and controls each process's GPU command production rate through its CPU scheduling priority. Measurements on the first GERM prototype show that this approach can keep the maximal GPU time consumption difference among concurrent GPU processes consistently below 5% for a variety of application mixes.

  17. Graphic Design Is Not a Medium.

    ERIC Educational Resources Information Center

    Gruber, John Edward, Jr.

    2001-01-01

    Discusses graphic design and reviews its development from analog processes to a digital tool with the use of computers. Topics include graphical user interfaces; the need for visual communication concepts; transmedia as opposed to repurposing; and graphic design instruction in higher education. (LRW)

  18. Mathematical Creative Activity and the Graphic Calculator

    ERIC Educational Resources Information Center

    Duda, Janina

    2011-01-01

    Teaching mathematics using graphic calculators has been an issue of didactic discussions for years. Finding ways in which graphic calculators can enrich the development process of creative activity in mathematically gifted students between the ages of 16-17 is the focus of this article. Research was conducted using graphic calculators with…

  19. Mathematical Creative Activity and the Graphic Calculator

    ERIC Educational Resources Information Center

    Duda, Janina

    2011-01-01

    Teaching mathematics using graphic calculators has been an issue of didactic discussions for years. Finding ways in which graphic calculators can enrich the development process of creative activity in mathematically gifted students between the ages of 16-17 is the focus of this article. Research was conducted using graphic calculators with…

  20. Graphic Design Is Not a Medium.

    ERIC Educational Resources Information Center

    Gruber, John Edward, Jr.

    2001-01-01

    Discusses graphic design and reviews its development from analog processes to a digital tool with the use of computers. Topics include graphical user interfaces; the need for visual communication concepts; transmedia as opposed to repurposing; and graphic design instruction in higher education. (LRW)

  1. The Longitudinal Impact of Cognitive Speed of Processing Training on Driving Mobility

    ERIC Educational Resources Information Center

    Edwards, Jerri D.; Myers, Charlsie; Ross, Lesley A.; Roenker, Daniel L.; Cissell, Gayla M.; McLaughlin, Alexis M.; Ball, Karlene K.

    2009-01-01

    Purpose: To examine how cognitive speed of processing training affects driving mobility across a 3-year period among older drivers. Design and Methods: Older drivers with poor Useful Field of View (UFOV) test performance (indicating greater risk for subsequent at-fault crashes and mobility declines) were randomly assigned to either a speed of…

  2. Forming Professional Mobility in the Process of Future Master Philologists' Training in Ukraine and Abroad

    ERIC Educational Resources Information Center

    Semenog, Olena

    2016-01-01

    On the basis of scientific research, the experience of higher education institutions in Ukraine and abroad (the USA, the Swiss Confederation) concerning the forming of future philologists' professional mobility in the process of Master training has been generalized. It has been overviewed, that professional mobility is an essential indicator of…

  3. Forming Professional Mobility in the Process of Future Master Philologists' Training in Ukraine and Abroad

    ERIC Educational Resources Information Center

    Semenog, Olena

    2016-01-01

    On the basis of scientific research, the experience of higher education institutions in Ukraine and abroad (the USA, the Swiss Confederation) concerning the forming of future philologists' professional mobility in the process of Master training has been generalized. It has been overviewed, that professional mobility is an essential indicator of…

  4. Mobile Technology and CAD Technology Integration in Teaching Architectural Design Process for Producing Creative Product

    ERIC Educational Resources Information Center

    Bin Hassan, Isham Shah; Ismail, Mohd Arif; Mustafa, Ramlee

    2011-01-01

    The purpose of this research is to examine the effect of integrating the mobile and CAD technology on teaching architectural design process for Malaysian polytechnic architectural students in producing a creative product. The website is set up based on Caroll's minimal theory, while mobile and CAD technology integration is based on Brown and…

  5. Mobilization

    DTIC Science & Technology

    1987-01-01

    istic and romantic emotionalism that typifies this genre. Longino, James C., et al. “A Study of World War Procurement and Industrial Mobilization...States. Harrisburg, PA: Military Service Publishing Co., 1941. CARL 355.22 J72b. Written in rough prose , this World War II era document explains the

  6. Space Spurred Computer Graphics

    NASA Technical Reports Server (NTRS)

    1983-01-01

    Dicomed Corporation was asked by NASA in the early 1970s to develop processing capabilities for recording images sent from Mars by Viking spacecraft. The company produced a film recorder which increased the intensity levels and the capability for color recording. This development led to a strong technology base resulting in sophisticated computer graphics equipment. Dicomed systems are used to record CAD (computer aided design) and CAM (computer aided manufacturing) equipment, to update maps and produce computer generated animation.

  7. Space-Time Processing for Tactical Mobile Ad Hoc Networks

    DTIC Science & Technology

    2009-08-01

    The overall performance attainable by a mobile ad hoc network depends fundamentally on the MIMO channel characteristics . During the past year the...propagation environment and an aperture within which the antennas must reside [1]. However, achieving these characteristics is difficult if not...determine the current distribution of Aperture 1 using the Covariance Method and compute the diversity gain obtained using the radiation characteristics

  8. Computer-Aided Process and Tools for Mobile Software Acquisition

    DTIC Science & Technology

    2013-04-01

    file?based runtime verification. A case study of formally specifying, validating, and verifying a set of requirements for an iPhone application that...Center for Strategic and International Studies The Making of a DoD Acquisition Lead System Integrator (LSI) Paul Montgomery, Ron Carlson, and John...code against the execution trace of the mobile apps using log file–based runtime verification. A case study of formally specifying, validating, and

  9. Finite difference calculation of acoustic streaming including the boundary layer phenomena in an ultrasonic air pump on graphics processing unit array

    NASA Astrophysics Data System (ADS)

    Wada, Yuji; Koyama, Daisuke; Nakamura, Kentaro

    2012-09-01

    Direct finite difference fluid simulation of acoustic streaming on the fine-meshed threedimension model by graphics processing unit (GPU)-oriented calculation array is discussed. Airflows due to the acoustic traveling wave are induced when an intense sound field is generated in a gap between a bending transducer and a reflector. Calculation results showed good agreement with the measurements in the pressure distribution. In addition to that, several flow-vortices were observed near the boundary of the reflector and the transducer, which have been often discussed in acoustic tube near the boundary, and have not yet been observed in the calculation in the ultrasonic air pump of this type.

  10. Fast point-based method of a computer-generated hologram for a triangle-patch model by using a graphics processing unit.

    PubMed

    Sugawara, Takuya; Ogihara, Yuki; Sakamoto, Yuji

    2016-01-20

    The point-based method and fast-Fourier-transform-based method are commonly used for calculation methods of computer-generation holograms. This paper proposes a novel fast calculation method for a patch model, which uses the point-based method. The method provides a calculation time that is proportional to the number of patches but not to that of the point light sources. This means that the method is suitable for calculating a wide area covered by patches quickly. Experiments using a graphics processing unit indicated that the proposed method is about 8 times or more faster than the ordinary point-based method.

  11. Quantum Chemistry on Graphical Processing Units. 3. Analytical Energy Gradients, Geometry Optimization, and First Principles Molecular Dynamics.

    PubMed

    Ufimtsev, Ivan S; Martinez, Todd J

    2009-10-13

    We demonstrate that a video gaming machine containing two consumer graphical cards can outpace a state-of-the-art quad-core processor workstation by a factor of more than 180× in Hartree-Fock energy + gradient calculations. Such performance makes it possible to run large scale Hartree-Fock and Density Functional Theory calculations, which typically require hundreds of traditional processor cores, on a single workstation. Benchmark Born-Oppenheimer molecular dynamics simulations are performed on two molecular systems using the 3-21G basis set - a hydronium ion solvated by 30 waters (94 atoms, 405 basis functions) and an aspartic acid molecule solvated by 147 waters (457 atoms, 2014 basis functions). Our GPU implementation can perform 27 ps/day and 0.7 ps/day of ab initio molecular dynamics simulation on a single desktop computer for these systems.

  12. Gasoline from coal in the state of Illinois: feasibility study. Volume I. Design. [KBW gasification process, ICI low-pressure methanol process and Mobil M-gasoline process

    SciTech Connect

    Not Available

    1980-01-01

    Volume 1 describes the proposed plant: KBW gasification process, ICI low-pressure methanol process and Mobil M-gasoline process, and also with ancillary processes, such as oxygen plant, shift process, RECTISOL purification process, sulfur recovery equipment and pollution control equipment. Numerous engineering diagrams are included. (LTN)

  13. Weather information network including graphical display

    NASA Technical Reports Server (NTRS)

    Leger, Daniel R. (Inventor); Burdon, David (Inventor); Son, Robert S. (Inventor); Martin, Kevin D. (Inventor); Harrison, John (Inventor); Hughes, Keith R. (Inventor)

    2006-01-01

    An apparatus for providing weather information onboard an aircraft includes a processor unit and a graphical user interface. The processor unit processes weather information after it is received onboard the aircraft from a ground-based source, and the graphical user interface provides a graphical presentation of the weather information to a user onboard the aircraft. Preferably, the graphical user interface includes one or more user-selectable options for graphically displaying at least one of convection information, turbulence information, icing information, weather satellite information, SIGMET information, significant weather prognosis information, and winds aloft information.

  14. Mobile Phone Service Process Hiccups at Cellular Inc.

    ERIC Educational Resources Information Center

    Edgington, Theresa M.

    2010-01-01

    This teaching case documents an actual case of process execution and failure. The case is useful in MIS introductory courses seeking to demonstrate the interdependencies within a business process, and the concept of cascading failure at the process level. This case demonstrates benefits and potential problems with information technology systems,…

  15. Mobile Phone Service Process Hiccups at Cellular Inc.

    ERIC Educational Resources Information Center

    Edgington, Theresa M.

    2010-01-01

    This teaching case documents an actual case of process execution and failure. The case is useful in MIS introductory courses seeking to demonstrate the interdependencies within a business process, and the concept of cascading failure at the process level. This case demonstrates benefits and potential problems with information technology systems,…

  16. Graphical fiber shaping control interface

    NASA Astrophysics Data System (ADS)

    Basso, Eric T.; Ninomiya, Yasuyuki

    2016-03-01

    In this paper, we present an improved graphical user interface for defining single-pass novel shaping techniques on glass processing machines that allows for streamlined process development. This approach offers unique modularity and debugging capability to researchers during the process development phase not usually afforded with similar scripting languages.

  17. The use of mobile computational technology in the nursing process: a new challenge for Brazilian nurses.

    PubMed

    Sperandio, Dircelene Jussara; Evora, Yolanda Dora Martinez

    2009-01-01

    The purpose of this study was to analyze the use a hand-mobile device with integrated wireless network interface to help nurses document the nursing process. The system is structured in five modules allowing the nurses to access and document data related to vital signals, hydroelectrolytic balance, assessment and nursing prescription at the point-of-care with transmission of data in real time. The results demonstrated that the mobile computer technology provided mobility for nurses and facilitated communication and documentation of care. In addition, real time documentation proved to be more efficient than a manual documentation system.

  18. A Meta-model Describing the Development Process of Mobile Learning

    NASA Astrophysics Data System (ADS)

    Wingkvist, Anna; Ericsson, Morgan

    This paper presents a meta-model to describe the development process of mobile learning initiatives. These initiatives are often small scale trials that are not integrated in the intended setting, but carried out outside of the setting. This results in sustainability issues, i.e., problems to integrate the results of the initiative as learning aids. In order to address the sustainability issues, and in turn help to understand the scaling process, a meta-model is introduced. This meta-model divides the development into four areas of concern, and the life cycle of any mobile learning initiative into four stages. The meta-model was developed by analyzing and describing how a podcasting initiative was developed, and is currently being evaluated as a tool to both describe and evaluate mobile learning initiatives. The meta-model was developed based on a mobile learning initiative, but the meta-model itself is extendible to other forms of technology-enhanced learning.

  19. Chemical Effects in the Separation Process of a Differential Mobility / Mass Spectrometer System

    PubMed Central

    Schneider, Bradley B.; Covey, Thomas R.; Coy, Stephen L.; Krylov, Evgeny V.; Nazarov, Erkinjon G.

    2013-01-01

    In differential mobility spectrometry (DMS, also referred to as high field asymmetric waveform ion mobility spectrometry, FAIMS), ions are separated on the basis of the difference in their mobility under high and low electric fields. The addition of polar modifiers to the gas transporting the ions through a DMS enhances the formation of clusters in a field-dependent way and thus amplifies the high and low field mobility difference resulting in increased peak capacity and separation power. Observations of the increase in mobility field dependence are consistent with a cluster formation model, also referred to as the dynamic cluster-decluster model. The uniqueness of chemical interactions that occur between an ion and cluster-forming neutrals increases the selectivity of the separation and the depression of low-field mobility relative to high-field mobility increases the compensation voltage and peak capacity. The effect of polar modifiers on the peak capacity across a broad range of chemicals has been investigated. We discuss the theoretical underpinnings which explain the observed effects. In contrast to the result from polar modifiers, we find that using mixtures of inert gases as the transport gas improve resolution by reducing peak width but has very little effect on peak capacity or selectivity. Inert gases do not cluster and thus do not reduce low field mobility relative to high-field mobility. The observed changes in the differential mobility α parameter exhibited by different classes of compounds when the transport gas contains polar modifiers or has a significant fraction of inert gas can be explained on the basis of the physical mechanisms involved in the separation processes. PMID:20121077

  20. GRASP/Ada: Graphical Representations of Algorithms, Structures, and Processes for Ada. The development of a program analysis environment for Ada: Reverse engineering tools for Ada, task 2, phase 3

    NASA Technical Reports Server (NTRS)

    Cross, James H., II

    1991-01-01

    The main objective is the investigation, formulation, and generation of graphical representations of algorithms, structures, and processes for Ada (GRASP/Ada). The presented task, in which various graphical representations that can be extracted or generated from source code are described and categorized, is focused on reverse engineering. The following subject areas are covered: the system model; control structure diagram generator; object oriented design diagram generator; user interface; and the GRASP library.

  1. The Longitudinal Impact of Cognitive Speed of Processing Training on Driving Mobility

    PubMed Central

    Edwards, Jerri D.; Myers, Charlsie; Ross, Lesley A.; Roenker, Daniel L.; Cissell, Gayla M.; McLaughlin, Alexis M.; Ball, Karlene K.

    2009-01-01

    Purpose: To examine how cognitive speed of processing training affects driving mobility across a 3-year period among older drivers. Design and Methods: Older drivers with poor Useful Field of View (UFOV) test performance (indicating greater risk for subsequent at-fault crashes and mobility declines) were randomly assigned to either a speed of processing training or a social and computer contact control group. Driving mobility of these 2 groups was compared with a group of older adults who did not score poorly on the UFOV test (reference group) across a 3-year period. Results: Older drivers with poor UFOV test scores who did not receive training experienced greater mobility declines as evidenced by decreased driving exposure and space and increased driving difficulty at 3 years. Those at risk for mobility decline who received training did not differ across the 3-year period from older adults in the reference group with regard to driving exposure, space, and most aspects of driving difficulty. Implications: Cognitive speed of processing training can not only improve cognitive performance but also protect against mobility declines among older drivers. Scientifically proven cognitive training regimens have the potential to enhance the everyday lives of older adults. PMID:19491362

  2. A graphical language for reliability model generation

    NASA Technical Reports Server (NTRS)

    Howell, Sandra V.; Bavuso, Salvatore J.; Haley, Pamela J.

    1990-01-01

    A graphical interface capability of the hybrid automated reliability predictor (HARP) is described. The graphics-oriented (GO) module provides the user with a graphical language for modeling system failure modes through the selection of various fault tree gates, including sequence dependency gates, or by a Markov chain. With this graphical input language, a fault tree becomes a convenient notation for describing a system. In accounting for any sequence dependencies, HARP converts the fault-tree notation to a complex stochastic process that is reduced to a Markov chain which it can then solve for system reliability. The graphics capability is available for use on an IBM-compatible PC, a Sun, and a VAX workstation. The GO module is written in the C programming language and uses the Graphical Kernel System (GKS) standard for graphics implementation. The PC, VAX, and Sun versions of the HARP GO module are currently in beta-testing.

  3. A graphical language for reliability model generation

    NASA Technical Reports Server (NTRS)

    Howell, Sandra V.; Bavuso, Salvatore J.; Haley, Pamela J.

    1990-01-01

    A graphical interface capability of the hybrid automated reliability predictor (HARP) is described. The graphics-oriented (GO) module provides the user with a graphical language for modeling system failure modes through the selection of various fault tree gates, including sequence dependency gates, or by a Markov chain. With this graphical input language, a fault tree becomes a convenient notation for describing a system. In accounting for any sequence dependencies, HARP converts the fault-tree notation to a complex stochastic process that is reduced to a Markov chain which it can then solve for system reliability. The graphics capability is available for use on an IBM-compatible PC, a Sun, and a VAX workstation. The GO module is written in the C programming language and uses the Graphical Kernel System (GKS) standard for graphics implementation. The PC, VAX, and Sun versions of the HARP GO module are currently in beta-testing.

  4. Linear-scaling self-consistent field calculations based on divide-and-conquer method using resolution-of-identity approximation on graphical processing units.

    PubMed

    Yoshikawa, Takeshi; Nakai, Hiromi

    2015-01-30

    Graphical processing units (GPUs) are emerging in computational chemistry to include Hartree-Fock (HF) methods and electron-correlation theories. However, ab initio calculations of large molecules face technical difficulties such as slow memory access between central processing unit and GPU and other shortfalls of GPU memory. The divide-and-conquer (DC) method, which is a linear-scaling scheme that divides a total system into several fragments, could avoid these bottlenecks by separately solving local equations in individual fragments. In addition, the resolution-of-the-identity (RI) approximation enables an effective reduction in computational cost with respect to the GPU memory. The present study implemented the DC-RI-HF code on GPUs using math libraries, which guarantee compatibility with future development of the GPU architecture. Numerical applications confirmed that the present code using GPUs significantly accelerated the HF calculations while maintaining accuracy. © 2014 Wiley Periodicals, Inc.

  5. Space-Time Processing for Tactical Mobile Ad Hoc Networks

    DTIC Science & Technology

    2008-08-01

    Maximization in Multi-User, MIMO Channels with Linear Processing...58 2.9 Using Feedback in Ad Hoc Networks....................................................................65 2.10 Feedback MIMO ...in MIMO Ad Hoc Interference Networks.......................................................................................................75 2.12

  6. MPE graphics -- Scalable X11 graphics in MPI

    SciTech Connect

    Gropp, W.; Karrels, E.; Lusk, E.

    1994-12-31

    As parallel programs enter the mainstream, they need to provide the same facilities and ease-of-use features expected of uniprocessor programs. For many applications, this means that they need to provide graphical output. This talk discusses a library of routines that provide scalable X Window System graphics. These routines make use of the MPI message-passing standard to provide a safe and reliable system that can be easily used in parallel programs. At the same time they encapsulate commonly-used services to provide a convenient interface to X graphics facilities. The easiest way to provide X11 graphics to a parallel program is to allow each process to draw on the same X11 Window. That is, each process opens a connection to the X11 server and draws directly to it. In one sense, this is as scalable a system as possible, since the single graphics display is an unavoidable point of sequential access. However, in reality, an X server can only accept a relatively small number of connections. In addition, the latency associated with each transmission between a parallel process and the X Window server is relatively high. This talk addresses these issues.

  7. Graphics Processing Unit (GPU) implementation of image processing algorithms to improve system performance of the Control, Acquisition, Processing, and Image Display System (CAPIDS) of the Micro-Angiographic Fluoroscope (MAF).

    PubMed

    Vasan, S N Swetadri; Ionita, Ciprian N; Titus, A H; Cartwright, A N; Bednarek, D R; Rudin, S

    2012-02-23

    We present the image processing upgrades implemented on a Graphics Processing Unit (GPU) in the Control, Acquisition, Processing, and Image Display System (CAPIDS) for the custom Micro-Angiographic Fluoroscope (MAF) detector. Most of the image processing currently implemented in the CAPIDS system is pixel independent; that is, the operation on each pixel is the same and the operation on one does not depend upon the result from the operation on the other, allowing the entire image to be processed in parallel. GPU hardware was developed for this kind of massive parallel processing implementation. Thus for an algorithm which has a high amount of parallelism, a GPU implementation is much faster than a CPU implementation. The image processing algorithm upgrades implemented on the CAPIDS system include flat field correction, temporal filtering, image subtraction, roadmap mask generation and display window and leveling. A comparison between the previous and the upgraded version of CAPIDS has been presented, to demonstrate how the improvement is achieved. By performing the image processing on a GPU, significant improvements (with respect to timing or frame rate) have been achieved, including stable operation of the system at 30 fps during a fluoroscopy run, a DSA run, a roadmap procedure and automatic image windowing and leveling during each frame.

  8. Graphics processing unit (GPU) implementation of image processing algorithms to improve system performance of the control acquisition, processing, and image display system (CAPIDS) of the micro-angiographic fluoroscope (MAF)

    NASA Astrophysics Data System (ADS)

    Swetadri Vasan, S. N.; Ionita, Ciprian N.; Titus, A. H.; Cartwright, A. N.; Bednarek, D. R.; Rudin, S.

    2012-03-01

    We present the image processing upgrades implemented on a Graphics Processing Unit (GPU) in the Control, Acquisition, Processing, and Image Display System (CAPIDS) for the custom Micro-Angiographic Fluoroscope (MAF) detector. Most of the image processing currently implemented in the CAPIDS system is pixel independent; that is, the operation on each pixel is the same and the operation on one does not depend upon the result from the operation on the other, allowing the entire image to be processed in parallel. GPU hardware was developed for this kind of massive parallel processing implementation. Thus for an algorithm which has a high amount of parallelism, a GPU implementation is much faster than a CPU implementation. The image processing algorithm upgrades implemented on the CAPIDS system include flat field correction, temporal filtering, image subtraction, roadmap mask generation and display window and leveling. A comparison between the previous and the upgraded version of CAPIDS has been presented, to demonstrate how the improvement is achieved. By performing the image processing on a GPU, significant improvements (with respect to timing or frame rate) have been achieved, including stable operation of the system at 30 fps during a fluoroscopy run, a DSA run, a roadmap procedure and automatic image windowing and leveling during each frame.

  9. ICT and mobile health to improve clinical process delivery. a research project for therapy management process innovation.

    PubMed

    Locatelli, Paolo; Montefusco, Vittorio; Sini, Elena; Restifo, Nicola; Facchini, Roberta; Torresani, Michele

    2013-01-01

    The volume and the complexity of clinical and administrative information make Information and Communication Technologies (ICTs) essential for running and innovating healthcare. This paper tells about a project aimed to design, develop and implement a set of organizational models, acknowledged procedures and ICT tools (Mobile & Wireless solutions and Automatic Identification and Data Capture technologies) to improve actual support, safety, reliability and traceability of a specific therapy management (stem cells). The value of the project is to design a solution based on mobile and identification technology in tight collaboration with physicians and actors involved in the process to ensure usability and effectivenes in process management.

  10. Effects of Mobile Instant Messaging on Collaborative Learning Processes and Outcomes: The Case of South Korea

    ERIC Educational Resources Information Center

    Kim, Hyewon; Lee, MiYoung; Kim, Minjeong

    2014-01-01

    The purpose of this paper was to investigate the effects of mobile instant messaging on collaborative learning processes and outcomes. The collaborative processes were measured in terms of different types of interactions. We measured the outcomes of the collaborations through both the students' taskwork and their teamwork. The collaborative…

  11. Twitter Micro-Blogging Based Mobile Learning Approach to Enhance the Agriculture Education Process

    ERIC Educational Resources Information Center

    Dissanayeke, Uvasara; Hewagamage, K. P.; Ramberg, Robert; Wikramanayake, G. N.

    2013-01-01

    The study intends to see how to introduce mobile learning within the domain of agriculture so as to enhance the agriculture education process. We propose to use the Activity theory together with other methodologies such as participatory methods to design, implement, and evaluate mLearning activities. The study explores the process of introducing…

  12. Developing a Mobile Application "Educational Process Remote Management System" on the Android Operating System

    ERIC Educational Resources Information Center

    Abildinova, Gulmira M.; Alzhanov, Aitugan K.; Ospanova, Nazira N.; Taybaldieva, Zhymatay; Baigojanova, Dametken S.; Pashovkin, Nikita O.

    2016-01-01

    Nowadays, when there is a need to introduce various innovations into the educational process, most efforts are aimed at simplifying the learning process. To that end, electronic textbooks, testing systems and other software is being developed. Most of them are intended to run on personal computers with limited mobility. Smart education is…

  13. Emergency healthcare process automation using mobile computing and cloud services.

    PubMed

    Poulymenopoulou, M; Malamateniou, F; Vassilacopoulos, G

    2012-10-01

    Emergency care is basically concerned with the provision of pre-hospital and in-hospital medical and/or paramedical services and it typically involves a wide variety of interdependent and distributed activities that can be interconnected to form emergency care processes within and between Emergency Medical Service (EMS) agencies and hospitals. Hence, in developing an information system for emergency care processes, it is essential to support individual process activities and to satisfy collaboration and coordination needs by providing readily access to patient and operational information regardless of location and time. Filling this information gap by enabling the provision of the right information, to the right people, at the right time fosters new challenges, including the specification of a common information format, the interoperability among heterogeneous institutional information systems or the development of new, ubiquitous trans-institutional systems. This paper is concerned with the development of an integrated computer support to emergency care processes by evolving and cross-linking institutional healthcare systems. To this end, an integrated EMS cloud-based architecture has been developed that allows authorized users to access emergency case information in standardized document form, as proposed by the Integrating the Healthcare Enterprise (IHE) profile, uses the Organization for the Advancement of Structured Information Standards (OASIS) standard Emergency Data Exchange Language (EDXL) Hospital Availability Exchange (HAVE) for exchanging operational data with hospitals and incorporates an intelligent module that supports triaging and selecting the most appropriate ambulances and hospitals for each case.

  14. The Occupational Mobility Process: An Analysis of Occupational Careers.

    ERIC Educational Resources Information Center

    Sorensen, Aage B.

    To further understanding of the process through which occupational careers are formed, this study analyzed life histories to determine job-shifts undertaken by a cohort of men 30-39 years of age. It was proposed that individuals will decide to leave a job when they perceive that a gain in achievement is possible. The analysis of the outcome of job…

  15. A Web Graphics Primer.

    ERIC Educational Resources Information Center

    Buchanan, Larry

    1999-01-01

    Discusses the basic technical concepts of using graphics in World Wide Web pages, including: color depth and dithering, dots-per-inch, image size, file types, Graphics Interchange Formats (GIFs), Joint Photographic Experts Group (JPEG), format, and software recommendations. (AEF)

  16. Repellency Awareness Graphic

    EPA Pesticide Factsheets

    Companies can apply to use the voluntary new graphic on product labels of skin-applied insect repellents. This graphic is intended to help consumers easily identify the protection time for mosquitoes and ticks and select appropriately.

  17. Adaptive Sampling for Learning Gaussian Processes Using Mobile Sensor Networks

    PubMed Central

    Xu, Yunfei; Choi, Jongeun

    2011-01-01

    This paper presents a novel class of self-organizing sensing agents that adaptively learn an anisotropic, spatio-temporal Gaussian process using noisy measurements and move in order to improve the quality of the estimated covariance function. This approach is based on a class of anisotropic covariance functions of Gaussian processes introduced to model a broad range of spatio-temporal physical phenomena. The covariance function is assumed to be unknown a priori. Hence, it is estimated by the maximum a posteriori probability (MAP) estimator. The prediction of the field of interest is then obtained based on the MAP estimate of the covariance function. An optimal sampling strategy is proposed to minimize the information-theoretic cost function of the Fisher Information Matrix. Simulation results demonstrate the effectiveness and the adaptability of the proposed scheme. PMID:22163785

  18. Adaptive sampling for learning gaussian processes using mobile sensor networks.

    PubMed

    Xu, Yunfei; Choi, Jongeun

    2011-01-01

    This paper presents a novel class of self-organizing sensing agents that adaptively learn an anisotropic, spatio-temporal gaussian process using noisy measurements and move in order to improve the quality of the estimated covariance function. This approach is based on a class of anisotropic covariance functions of gaussian processes introduced to model a broad range of spatio-temporal physical phenomena. The covariance function is assumed to be unknown a priori. Hence, it is estimated by the maximum a posteriori probability (MAP) estimator. The prediction of the field of interest is then obtained based on the MAP estimate of the covariance function. An optimal sampling strategy is proposed to minimize the information-theoretic cost function of the Fisher Information Matrix. Simulation results demonstrate the effectiveness and the adaptability of the proposed scheme.

  19. A service protocol for post-processing of medical images on the mobile device

    NASA Astrophysics Data System (ADS)

    He, Longjun; Ming, Xing; Xu, Lang; Liu, Qian

    2014-03-01

    With computing capability and display size growing, the mobile device has been used as a tool to help clinicians view patient information and medical images anywhere and anytime. It is uneasy and time-consuming for transferring medical images with large data size from picture archiving and communication system to mobile client, since the wireless network is unstable and limited by bandwidth. Besides, limited by computing capability, memory and power endurance, it is hard to provide a satisfactory quality of experience for radiologists to handle some complex post-processing of medical images on the mobile device, such as real-time direct interactive three-dimensional visualization. In this work, remote rendering technology is employed to implement the post-processing of medical images instead of local rendering, and a service protocol is developed to standardize the communication between the render server and mobile client. In order to make mobile devices with different platforms be able to access post-processing of medical images, the Extensible Markup Language is taken to describe this protocol, which contains four main parts: user authentication, medical image query/ retrieval, 2D post-processing (e.g. window leveling, pixel values obtained) and 3D post-processing (e.g. maximum intensity projection, multi-planar reconstruction, curved planar reformation and direct volume rendering). And then an instance is implemented to verify the protocol. This instance can support the mobile device access post-processing of medical image services on the render server via a client application or on the web page.

  20. On the effective implementation of a boundary element code on graphics processing units unsing an out-of-core LU algorithm

    SciTech Connect

    D'Azevedo, Ed F; Nintcheu Fata, Sylvain

    2012-01-01

    A collocation boundary element code for solving the three-dimensional Laplace equation, publicly available from \\url{http://www.intetec.org}, has been adapted to run on an Nvidia Tesla general purpose graphics processing unit (GPU). Global matrix assembly and LU factorization of the resulting dense matrix were performed on the GPU. Out-of-core techniques were used to solve problems larger than available GPU memory. The code achieved over eight times speedup in matrix assembly and about 56~Gflops/sec in the LU factorization using only 512~Mbytes of GPU memory. Details of the GPU implementation and comparisons with the standard sequential algorithm are included to illustrate the performance of the GPU code.