single nvidia tesla: Topics by Science.gov

Sample records for single nvidia tesla

Visual Media Reasoning - Terrain-based Geolocation

DTIC Science & Technology

2015-06-01

the drawings, specifications, or other data does not license the holder or any other person or corporation ; or convey any rights or permission to...3.4 Alternative Metric Investigation This section describes a graphics processor unit (GPU) based implementation in the NVIDIA CUDA programming...utilizing 2 concurrent CPU cores, each controlling a single Nvidia C2075 Tesla Fermi CUDA card. Figure 22 shows a comparison of the CPU and the GPU powered
Examination of Multi-Core Architectures

DTIC Science & Technology

2010-11-01

NOVEMBER 2010 2. REPORT TYPE Interim Technical Report 3. DATES COVERED (From - To) February 2010 – July 2010 4 . TITLE AND SUBTITLE EXAMINATION OF...STATEMENT 1 2.0 BACKGROUND 1 3.0 ARCHITECTURE CHARACTERISTICS 3 3.1 NVIDIA Tesla 3 3.2 TILE64 4 ...1 Tesla Architecture 3 2 TILE64 Architecture 4 3 Single Tile Architecture 4 4 STI Cell Broadband Engine
Techniques for Mapping Synthetic Aperture Radar Processing Algorithms to Multi-GPU Clusters

DTIC Science & Technology

2012-12-01

Experimental results were generated with 10 nVidia Tesla C2050 GPUs having maximum throughput of 972 Gflop /s. Our approach scales well for output...Experimental results were generated with 10 nVidia Tesla C2050 GPUs having maximum throughput of 972 Gflop /s. Our approach scales well for output
Simultaneous Range-Velocity Processing and SNR Analysis of AFIT’s Random Noise Radar

DTIC Science & Technology

2012-03-22

reducing the overall processing time. Two computers, equipped with NVIDIA ® GPUs, were used to process the col- 45 lected data. The specifications for each...gather the results back to the CPU. Another company , AccelerEyes®, has developed a product called Jacket® that claims to be better than the parallel...Number of Processing Cores 4 8 Processor Speed 3.33 GHz 3.07 GHz Installed Memory 48 GB 48 GB GPU Make NVIDIA NVIDIA GPU Model Tesla 1060 Tesla C2070 GPU
Numerical Integration with Graphical Processing Unit for QKD Simulation

DTIC Science & Technology

2014-03-27

Windows system application programming interface (API) timer. The problem sizes studied produce speedups greater than 60x on the NVIDIA Tesla C2075...13 2.3.3 CUDA API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3.4 CUDA and NVIDIA GPU Hardware...Theoretical Floating-Point Operations per Second for Intel CPUs and NVIDIA GPUs [3
An efficient tensor transpose algorithm for multicore CPU, Intel Xeon Phi, and NVidia Tesla GPU

NASA Astrophysics Data System (ADS)

Lyakh, Dmitry I.

2015-04-01

An efficient parallel tensor transpose algorithm is suggested for shared-memory computing units, namely, multicore CPU, Intel Xeon Phi, and NVidia GPU. The algorithm operates on dense tensors (multidimensional arrays) and is based on the optimization of cache utilization on x86 CPU and the use of shared memory on NVidia GPU. From the applied side, the ultimate goal is to minimize the overhead encountered in the transformation of tensor contractions into matrix multiplications in computer implementations of advanced methods of quantum many-body theory (e.g., in electronic structure theory and nuclear physics). A particular accent is made on higher-dimensional tensors that typically appear in the so-called multireference correlated methods of electronic structure theory. Depending on tensor dimensionality, the presented optimized algorithms can achieve an order of magnitude speedup on x86 CPUs and 2-3 times speedup on NVidia Tesla K20X GPU with respect to the naïve scattering algorithm (no memory access optimization). The tensor transpose routines developed in this work have been incorporated into a general-purpose tensor algebra library (TAL-SH).
Accelerating Multiple Compound Comparison Using LINGO-Based Load-Balancing Strategies on Multi-GPUs

PubMed Central

Lin, Chun-Yuan; Wang, Chung-Hung; Hung, Che-Lun; Lin, Yu-Shiang

2015-01-01

Compound comparison is an important task for the computational chemistry. By the comparison results, potential inhibitors can be found and then used for the pharmacy experiments. The time complexity of a pairwise compound comparison is O(n 2), where n is the maximal length of compounds. In general, the length of compounds is tens to hundreds, and the computation time is small. However, more and more compounds have been synthesized and extracted now, even more than tens of millions. Therefore, it still will be time-consuming when comparing with a large amount of compounds (seen as a multiple compound comparison problem, abbreviated to MCC). The intrinsic time complexity of MCC problem is O(k 2 n 2) with k compounds of maximal length n. In this paper, we propose a GPU-based algorithm for MCC problem, called CUDA-MCC, on single- and multi-GPUs. Four LINGO-based load-balancing strategies are considered in CUDA-MCC in order to accelerate the computation speed among thread blocks on GPUs. CUDA-MCC was implemented by C+OpenMP+CUDA. CUDA-MCC achieved 45 times and 391 times faster than its CPU version on a single NVIDIA Tesla K20m GPU card and a dual-NVIDIA Tesla K20m GPU card, respectively, under the experimental results. PMID:26491652
Accelerating Multiple Compound Comparison Using LINGO-Based Load-Balancing Strategies on Multi-GPUs.

PubMed

Lin, Chun-Yuan; Wang, Chung-Hung; Hung, Che-Lun; Lin, Yu-Shiang

2015-01-01

Compound comparison is an important task for the computational chemistry. By the comparison results, potential inhibitors can be found and then used for the pharmacy experiments. The time complexity of a pairwise compound comparison is O(n (2)), where n is the maximal length of compounds. In general, the length of compounds is tens to hundreds, and the computation time is small. However, more and more compounds have been synthesized and extracted now, even more than tens of millions. Therefore, it still will be time-consuming when comparing with a large amount of compounds (seen as a multiple compound comparison problem, abbreviated to MCC). The intrinsic time complexity of MCC problem is O(k (2) n (2)) with k compounds of maximal length n. In this paper, we propose a GPU-based algorithm for MCC problem, called CUDA-MCC, on single- and multi-GPUs. Four LINGO-based load-balancing strategies are considered in CUDA-MCC in order to accelerate the computation speed among thread blocks on GPUs. CUDA-MCC was implemented by C+OpenMP+CUDA. CUDA-MCC achieved 45 times and 391 times faster than its CPU version on a single NVIDIA Tesla K20m GPU card and a dual-NVIDIA Tesla K20m GPU card, respectively, under the experimental results.
An efficient tensor transpose algorithm for multicore CPU, Intel Xeon Phi, and NVidia Tesla GPU

DOE PAGES

Lyakh, Dmitry I.

2015-01-05

An efficient parallel tensor transpose algorithm is suggested for shared-memory computing units, namely, multicore CPU, Intel Xeon Phi, and NVidia GPU. The algorithm operates on dense tensors (multidimensional arrays) and is based on the optimization of cache utilization on x86 CPU and the use of shared memory on NVidia GPU. From the applied side, the ultimate goal is to minimize the overhead encountered in the transformation of tensor contractions into matrix multiplications in computer implementations of advanced methods of quantum many-body theory (e.g., in electronic structure theory and nuclear physics). A particular accent is made on higher-dimensional tensors that typicallymore » appear in the so-called multireference correlated methods of electronic structure theory. Depending on tensor dimensionality, the presented optimized algorithms can achieve an order of magnitude speedup on x86 CPUs and 2-3 times speedup on NVidia Tesla K20X GPU with respect to the na ve scattering algorithm (no memory access optimization). Furthermore, the tensor transpose routines developed in this work have been incorporated into a general-purpose tensor algebra library (TAL-SH).« less
Integrating the Nqueens Algorithm into a Parameterized Benchmark Suite

DTIC Science & Technology

2016-02-01

FOB is a 64-node heterogeneous cluster consisting of 16-IBM dx360M4 nodes, each with one NVIDIA Kepler K20M GPUs and 48-IBM dx360M4 nodes, and each...nodes have 256-GB of memory and an NVIDIA Tesla K40 GPU. More details on Excalibur can be found on the US Army DSRC website.19 Figures 3 and 4 show the
Graphics Processing Unit Acceleration of Gyrokinetic Turbulence Simulations

NASA Astrophysics Data System (ADS)

Hause, Benjamin; Parker, Scott

2012-10-01

We find a substantial increase in on-node performance using Graphics Processing Unit (GPU) acceleration in gyrokinetic delta-f particle-in-cell simulation. Optimization is performed on a two-dimensional slab gyrokinetic particle simulation using the Portland Group Fortran compiler with the GPU accelerator compiler directives. We have implemented the GPU acceleration on a Core I7 gaming PC with a NVIDIA GTX 580 GPU. We find comparable, or better, acceleration relative to the NERSC DIRAC cluster with the NVIDIA Tesla C2050 computing processor. The Tesla C 2050 is about 2.6 times more expensive than the GTX 580 gaming GPU. Optimization strategies and comparisons between DIRAC and the gaming PC will be presented. We will also discuss progress on optimizing the comprehensive three dimensional general geometry GEM code.
A GPU Parallelization of the Absolute Nodal Coordinate Formulation for Applications in Flexible Multibody Dynamics

DTIC Science & Technology

2012-02-17

to be solved. Disclaimer: Reference herein to any specific commercial company , product, process, or service by trade name, trademark...data processing rather than data caching and control flow. To make use of this computational power, NVIDIA introduced a general purpose parallel...GPU implementations were run on an Intel Nehalem Xeon E5520 2.26GHz processor with an NVIDIA Tesla C2070 graphics card for varying numbers of
Modeling & Analysis of Multicore Architectures for Embedded SIGINT Applications

DTIC Science & Technology

2015-03-01

NVIDIA Kepler K20 [7][8] 2496e 706 225 3520 15.6 Intel Xeon Phi 5110P [9] 60 1050 225 1010 4.5 Adapteva Epiphany [10] 16 – 4K 800 0.270 19 70.4...Cortex A15 and a Kepler GPU with 192 “CUDA” cores, and is more comparable as an HPEEC platform than Tesla series GPUs, such as the NVIDIA C2075 and K20
Construction of the Fock Matrix on a Grid-Based Molecular Orbital Basis Using GPGPUs.

PubMed

Losilla, Sergio A; Watson, Mark A; Aspuru-Guzik, Alán; Sundholm, Dage

2015-05-12

We present a GPGPU implementation of the construction of the Fock matrix in the molecular orbital basis using the fully numerical, grid-based bubbles representation. For a test set of molecules containing up to 90 electrons, the total Hartree-Fock energies obtained from reference GTO-based calculations are reproduced within 10(-4) Eh to 10(-8) Eh for most of the molecules studied. Despite the very large number of arithmetic operations involved, the high performance obtained made the calculations possible on a single Nvidia Tesla K40 GPGPU card.
Investigating the Mobility of Light Autonoumous Tracked Vehicles Using a High Performance Computing Simulation Capability

DTIC Science & Technology

2012-08-01

UNCLASSIFIED: Distribution Statement A. Approved for public release. DISCLAIMER: Reference herein to any specific commercial company , product...FunctionBay, S. Korea – NVIDIA – Caterpillar – MSC.Software – Advanced Micro Devices (AMD) 14-16 AUG 2012  Aaron Bartholomew  Makarand Datar...16GB DDR2 Graphics: 4x NVIDIA Tesla C1060 Power supply 1: 1000W Power supply 2: 750W Assembled Quad GPU Machine 14-16 AUG 2012 30
Acceleration of spiking neural network based pattern recognition on NVIDIA graphics processors.

PubMed

Han, Bing; Taha, Tarek M

2010-04-01

There is currently a strong push in the research community to develop biological scale implementations of neuron based vision models. Systems at this scale are computationally demanding and generally utilize more accurate neuron models, such as the Izhikevich and the Hodgkin-Huxley models, in favor of the more popular integrate and fire model. We examine the feasibility of using graphics processing units (GPUs) to accelerate a spiking neural network based character recognition network to enable such large scale systems. Two versions of the network utilizing the Izhikevich and Hodgkin-Huxley models are implemented. Three NVIDIA general-purpose (GP) GPU platforms are examined, including the GeForce 9800 GX2, the Tesla C1060, and the Tesla S1070. Our results show that the GPGPUs can provide significant speedup over conventional processors. In particular, the fastest GPGPU utilized, the Tesla S1070, provided a speedup of 5.6 and 84.4 over highly optimized implementations on the fastest central processing unit (CPU) tested, a quadcore 2.67 GHz Xeon processor, for the Izhikevich and the Hodgkin-Huxley models, respectively. The CPU implementation utilized all four cores and the vector data parallelism offered by the processor. The results indicate that GPUs are well suited for this application domain.
Spiking neural networks on high performance computer clusters

NASA Astrophysics Data System (ADS)

Chen, Chong; Taha, Tarek M.

2011-09-01

In this paper we examine the acceleration of two spiking neural network models on three clusters of multicore processors representing three categories of processors: x86, STI Cell, and NVIDIA GPGPUs. The x86 cluster utilized consists of 352 dualcore AMD Opterons, the Cell cluster consists of 320 Sony Playstation 3s, while the GPGPU cluster contains 32 NVIDIA Tesla S1070 systems. The results indicate that the GPGPU platform can dominate in performance compared to the Cell and x86 platforms examined. From a cost perspective, the GPGPU is more expensive in terms of neuron/s throughput. If the cost of GPGPUs go down in the future, this platform will become very cost effective for these models.
Enabling Computational Dynamics in Distributed Computing Environments Using a Heterogeneous Computing Template

DTIC Science & Technology

2011-08-09

fastest 10 supercomputers in the world. Both systems rely on GPU co-processing, one using AMD cards, the second, called Nebulae , using NVIDIA Tesla...Page 9 of 10 UNCLASSIFIED capability of almost 3 petaflop/s, the highest in TOP500, Nebulae only holds the No. 2 position on the TOP500 list of the
Multi-GPU accelerated three-dimensional FDTD method for electromagnetic simulation.

PubMed

Nagaoka, Tomoaki; Watanabe, Soichi

2011-01-01

Numerical simulation with a numerical human model using the finite-difference time domain (FDTD) method has recently been performed in a number of fields in biomedical engineering. To improve the method's calculation speed and realize large-scale computing with the numerical human model, we adapt three-dimensional FDTD code to a multi-GPU environment using Compute Unified Device Architecture (CUDA). In this study, we used NVIDIA Tesla C2070 as GPGPU boards. The performance of multi-GPU is evaluated in comparison with that of a single GPU and vector supercomputer. The calculation speed with four GPUs was approximately 3.5 times faster than with a single GPU, and was slightly (approx. 1.3 times) slower than with the supercomputer. Calculation speed of the three-dimensional FDTD method using GPUs can significantly improve with an expanding number of GPUs.
Genetically improved BarraCUDA.

PubMed

Langdon, W B; Lam, Brian Yee Hong

2017-01-01

BarraCUDA is an open source C program which uses the BWA algorithm in parallel with nVidia CUDA to align short next generation DNA sequences against a reference genome. Recently its source code was optimised using "Genetic Improvement". The genetically improved (GI) code is up to three times faster on short paired end reads from The 1000 Genomes Project and 60% more accurate on a short BioPlanet.com GCAT alignment benchmark. GPGPU BarraCUDA running on a single K80 Tesla GPU can align short paired end nextGen sequences up to ten times faster than bwa on a 12 core server. The speed up was such that the GI version was adopted and has been regularly downloaded from SourceForge for more than 12 months.

A High Performance Computing Framework for Physics-based Modeling and Simulation of Military Ground Vehicles

DTIC Science & Technology

2011-03-25

number one and Nebulae at number three. Both systems rely on GPU co-processing and use Intel Xeon processors cards and NVIDIA Tesla C2050 GPUs. In...spite of a theoretical peak capability of almost 3 Petaflop/s, Nebulae clocked at 1.271 PFlop/s when running the Linpack benchmark, which puts it
Graphics Processing Unit Acceleration of Gyrokinetic Turbulence Simulations

NASA Astrophysics Data System (ADS)

Hause, Benjamin; Parker, Scott; Chen, Yang

2013-10-01

We find a substantial increase in on-node performance using Graphics Processing Unit (GPU) acceleration in gyrokinetic delta-f particle-in-cell simulation. Optimization is performed on a two-dimensional slab gyrokinetic particle simulation using the Portland Group Fortran compiler with the OpenACC compiler directives and Fortran CUDA. Mixed implementation of both Open-ACC and CUDA is demonstrated. CUDA is required for optimizing the particle deposition algorithm. We have implemented the GPU acceleration on a third generation Core I7 gaming PC with two NVIDIA GTX 680 GPUs. We find comparable, or better, acceleration relative to the NERSC DIRAC cluster with the NVIDIA Tesla C2050 computing processor. The Tesla C 2050 is about 2.6 times more expensive than the GTX 580 gaming GPU. We also see enormous speedups (10 or more) on the Titan supercomputer at Oak Ridge with Kepler K20 GPUs. Results show speed-ups comparable or better than that of OpenMP models utilizing multiple cores. The use of hybrid OpenACC, CUDA Fortran, and MPI models across many nodes will also be discussed. Optimization strategies will be presented. We will discuss progress on optimizing the comprehensive three dimensional general geometry GEM code.
Parallel hyperbolic PDE simulation on clusters: Cell versus GPU

NASA Astrophysics Data System (ADS)

Rostrup, Scott; De Sterck, Hans

2010-12-01

Increasingly, high-performance computing is looking towards data-parallel computational devices to enhance computational performance. Two technologies that have received significant attention are IBM's Cell Processor and NVIDIA's CUDA programming model for graphics processing unit (GPU) computing. In this paper we investigate the acceleration of parallel hyperbolic partial differential equation simulation on structured grids with explicit time integration on clusters with Cell and GPU backends. The message passing interface (MPI) is used for communication between nodes at the coarsest level of parallelism. Optimizations of the simulation code at the several finer levels of parallelism that the data-parallel devices provide are described in terms of data layout, data flow and data-parallel instructions. Optimized Cell and GPU performance are compared with reference code performance on a single x86 central processing unit (CPU) core in single and double precision. We further compare the CPU, Cell and GPU platforms on a chip-to-chip basis, and compare performance on single cluster nodes with two CPUs, two Cell processors or two GPUs in a shared memory configuration (without MPI). We finally compare performance on clusters with 32 CPUs, 32 Cell processors, and 32 GPUs using MPI. Our GPU cluster results use NVIDIA Tesla GPUs with GT200 architecture, but some preliminary results on recently introduced NVIDIA GPUs with the next-generation Fermi architecture are also included. This paper provides computational scientists and engineers who are considering porting their codes to accelerator environments with insight into how structured grid based explicit algorithms can be optimized for clusters with Cell and GPU accelerators. It also provides insight into the speed-up that may be gained on current and future accelerator architectures for this class of applications. Program summaryProgram title: SWsolver Catalogue identifier: AEGY_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEGY_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GPL v3 No. of lines in distributed program, including test data, etc.: 59 168 No. of bytes in distributed program, including test data, etc.: 453 409 Distribution format: tar.gz Programming language: C, CUDA Computer: Parallel Computing Clusters. Individual compute nodes may consist of x86 CPU, Cell processor, or x86 CPU with attached NVIDIA GPU accelerator. Operating system: Linux Has the code been vectorised or parallelized?: Yes. Tested on 1-128 x86 CPU cores, 1-32 Cell Processors, and 1-32 NVIDIA GPUs. RAM: Tested on Problems requiring up to 4 GB per compute node. Classification: 12 External routines: MPI, CUDA, IBM Cell SDK Nature of problem: MPI-parallel simulation of Shallow Water equations using high-resolution 2D hyperbolic equation solver on regular Cartesian grids for x86 CPU, Cell Processor, and NVIDIA GPU using CUDA. Solution method: SWsolver provides 3 implementations of a high-resolution 2D Shallow Water equation solver on regular Cartesian grids, for CPU, Cell Processor, and NVIDIA GPU. Each implementation uses MPI to divide work across a parallel computing cluster. Additional comments: Sub-program numdiff is used for the test run.
Ultraviolet Communication for Medical Applications

DTIC Science & Technology

2014-05-01

parent company Imaging Systems Technology (IST) demonstrated feasibility of several key concepts are being developed into a working prototype in the...program using multiple high-end GPUs ( NVIDIA Tesla K20). Finally, the Monte Carlo simulation task will be resumed after the Milestone 2 demonstration...is acceptable for automated printing and handling. Next, the option of having our shells electroded by an external company was investigated and DEI
DOE Office of Scientific and Technical Information (OSTI.GOV)

Lyakh, Dmitry I.

An efficient parallel tensor transpose algorithm is suggested for shared-memory computing units, namely, multicore CPU, Intel Xeon Phi, and NVidia GPU. The algorithm operates on dense tensors (multidimensional arrays) and is based on the optimization of cache utilization on x86 CPU and the use of shared memory on NVidia GPU. From the applied side, the ultimate goal is to minimize the overhead encountered in the transformation of tensor contractions into matrix multiplications in computer implementations of advanced methods of quantum many-body theory (e.g., in electronic structure theory and nuclear physics). A particular accent is made on higher-dimensional tensors that typicallymore » appear in the so-called multireference correlated methods of electronic structure theory. Depending on tensor dimensionality, the presented optimized algorithms can achieve an order of magnitude speedup on x86 CPUs and 2-3 times speedup on NVidia Tesla K20X GPU with respect to the na ve scattering algorithm (no memory access optimization). Furthermore, the tensor transpose routines developed in this work have been incorporated into a general-purpose tensor algebra library (TAL-SH).« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

Allada, Veerendra, Benjegerdes, Troy; Bode, Brett

Commodity clusters augmented with application accelerators are evolving as competitive high performance computing systems. The Graphical Processing Unit (GPU) with a very high arithmetic density and performance per price ratio is a good platform for the scientific application acceleration. In addition to the interconnect bottlenecks among the cluster compute nodes, the cost of memory copies between the host and the GPU device have to be carefully amortized to improve the overall efficiency of the application. Scientific applications also rely on efficient implementation of the BAsic Linear Algebra Subroutines (BLAS), among which the General Matrix Multiply (GEMM) is considered as themore » workhorse subroutine. In this paper, they study the performance of the memory copies and GEMM subroutines that are critical to port the computational chemistry algorithms to the GPU clusters. To that end, a benchmark based on the NetPIPE framework is developed to evaluate the latency and bandwidth of the memory copies between the host and the GPU device. The performance of the single and double precision GEMM subroutines from the NVIDIA CUBLAS 2.0 library are studied. The results have been compared with that of the BLAS routines from the Intel Math Kernel Library (MKL) to understand the computational trade-offs. The test bed is a Intel Xeon cluster equipped with NVIDIA Tesla GPUs.« less
A parallel algorithm for the initial screening of space debris collisions prediction using the SGP4/SDP4 models and GPU acceleration

NASA Astrophysics Data System (ADS)

Lin, Mingpei; Xu, Ming; Fu, Xiaoyu

2017-05-01

Currently, a tremendous amount of space debris in Earth's orbit imperils operational spacecraft. It is essential to undertake risk assessments of collisions and predict dangerous encounters in space. However, collision predictions for an enormous amount of space debris give rise to large-scale computations. In this paper, a parallel algorithm is established on the Compute Unified Device Architecture (CUDA) platform of NVIDIA Corporation for collision prediction. According to the parallel structure of NVIDIA graphics processors, a block decomposition strategy is adopted in the algorithm. Space debris is divided into batches, and the computation and data transfer operations of adjacent batches overlap. As a consequence, the latency to access shared memory during the entire computing process is significantly reduced, and a higher computing speed is reached. Theoretically, a simulation of collision prediction for space debris of any amount and for any time span can be executed. To verify this algorithm, a simulation example including 1382 pieces of debris, whose operational time scales vary from 1 min to 3 days, is conducted on Tesla C2075 of NVIDIA. The simulation results demonstrate that with the same computational accuracy as that of a CPU, the computing speed of the parallel algorithm on a GPU is 30 times that on a CPU. Based on this algorithm, collision prediction of over 150 Chinese spacecraft for a time span of 3 days can be completed in less than 3 h on a single computer, which meets the timeliness requirement of the initial screening task. Furthermore, the algorithm can be adapted for multiple tasks, including particle filtration, constellation design, and Monte-Carlo simulation of an orbital computation.
GPU-accelerated phase-field simulation of dendritic solidification in a binary alloy

NASA Astrophysics Data System (ADS)

Yamanaka, Akinori; Aoki, Takayuki; Ogawa, Satoi; Takaki, Tomohiro

2011-03-01

The phase-field simulation for dendritic solidification of a binary alloy has been accelerated by using a graphic processing unit (GPU). To perform the phase-field simulation of the alloy solidification on GPU, a program code was developed with computer unified device architecture (CUDA). In this paper, the implementation technique of the phase-field model on GPU is presented. Also, we evaluated the acceleration performance of the three-dimensional solidification simulation by using a single NVIDIA TESLA C1060 GPU and the developed program code. The results showed that the GPU calculation for 5763 computational grids achieved the performance of 170 GFLOPS by utilizing the shared memory as a software-managed cache. Furthermore, it can be demonstrated that the computation with the GPU is 100 times faster than that with a single CPU core. From the obtained results, we confirmed the feasibility of realizing a real-time full three-dimensional phase-field simulation of microstructure evolution on a personal desktop computer.
Embedded-Based Graphics Processing Unit Cluster Platform for Multiple Sequence Alignments

PubMed Central

Wei, Jyh-Da; Cheng, Hui-Jun; Lin, Chun-Yuan; Ye, Jin; Yeh, Kuan-Yu

2017-01-01

High-end graphics processing units (GPUs), such as NVIDIA Tesla/Fermi/Kepler series cards with thousands of cores per chip, are widely applied to high-performance computing fields in a decade. These desktop GPU cards should be installed in personal computers/servers with desktop CPUs, and the cost and power consumption of constructing a GPU cluster platform are very high. In recent years, NVIDIA releases an embedded board, called Jetson Tegra K1 (TK1), which contains 4 ARM Cortex-A15 CPUs and 192 Compute Unified Device Architecture cores (belong to Kepler GPUs). Jetson Tegra K1 has several advantages, such as the low cost, low power consumption, and high applicability, and it has been applied into several specific applications. In our previous work, a bioinformatics platform with a single TK1 (STK platform) was constructed, and this previous work is also used to prove that the Web and mobile services can be implemented in the STK platform with a good cost-performance ratio by comparing a STK platform with the desktop CPU and GPU. In this work, an embedded-based GPU cluster platform will be constructed with multiple TK1s (MTK platform). Complex system installation and setup are necessary procedures at first. Then, 2 job assignment modes are designed for the MTK platform to provide services for users. Finally, ClustalW v2.0.11 and ClustalWtk will be ported to the MTK platform. The experimental results showed that the speedup ratios achieved 5.5 and 4.8 times for ClustalW v2.0.11 and ClustalWtk, respectively, by comparing 6 TK1s with a single TK1. The MTK platform is proven to be useful for multiple sequence alignments. PMID:28835734
Embedded-Based Graphics Processing Unit Cluster Platform for Multiple Sequence Alignments.

PubMed

Wei, Jyh-Da; Cheng, Hui-Jun; Lin, Chun-Yuan; Ye, Jin; Yeh, Kuan-Yu

2017-01-01

High-end graphics processing units (GPUs), such as NVIDIA Tesla/Fermi/Kepler series cards with thousands of cores per chip, are widely applied to high-performance computing fields in a decade. These desktop GPU cards should be installed in personal computers/servers with desktop CPUs, and the cost and power consumption of constructing a GPU cluster platform are very high. In recent years, NVIDIA releases an embedded board, called Jetson Tegra K1 (TK1), which contains 4 ARM Cortex-A15 CPUs and 192 Compute Unified Device Architecture cores (belong to Kepler GPUs). Jetson Tegra K1 has several advantages, such as the low cost, low power consumption, and high applicability, and it has been applied into several specific applications. In our previous work, a bioinformatics platform with a single TK1 (STK platform) was constructed, and this previous work is also used to prove that the Web and mobile services can be implemented in the STK platform with a good cost-performance ratio by comparing a STK platform with the desktop CPU and GPU. In this work, an embedded-based GPU cluster platform will be constructed with multiple TK1s (MTK platform). Complex system installation and setup are necessary procedures at first. Then, 2 job assignment modes are designed for the MTK platform to provide services for users. Finally, ClustalW v2.0.11 and ClustalWtk will be ported to the MTK platform. The experimental results showed that the speedup ratios achieved 5.5 and 4.8 times for ClustalW v2.0.11 and ClustalWtk, respectively, by comparing 6 TK1s with a single TK1. The MTK platform is proven to be useful for multiple sequence alignments.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Trędak, Przemysław, E-mail: przemyslaw.tredak@fuw.edu.pl; Rudnicki, Witold R.; Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, ul. Pawińskiego 5a, 02-106 Warsaw

The second generation Reactive Bond Order (REBO) empirical potential is commonly used to accurately model a wide range hydrocarbon materials. It is also extensible to other atom types and interactions. REBO potential assumes complex multi-body interaction model, that is difficult to represent efficiently in the SIMD or SIMT programming model. Hence, despite its importance, no efficient GPGPU implementation has been developed for this potential. Here we present a detailed description of a highly efficient GPGPU implementation of molecular dynamics algorithm using REBO potential. The presented algorithm takes advantage of rarely used properties of the SIMT architecture of a modern GPUmore » to solve difficult synchronizations issues that arise in computations of multi-body potential. Techniques developed for this problem may be also used to achieve efficient solutions of different problems. The performance of proposed algorithm is assessed using a range of model systems. It is compared to highly optimized CPU implementation (both single core and OpenMP) available in LAMMPS package. These experiments show up to 6x improvement in forces computation time using single processor of the NVIDIA Tesla K80 compared to high end 16-core Intel Xeon processor.« less
CUDA-based acceleration of collateral filtering in brain MR images

NASA Astrophysics Data System (ADS)

Li, Cheng-Yuan; Chang, Herng-Hua

2017-02-01

Image denoising is one of the fundamental and essential tasks within image processing. In medical imaging, finding an effective algorithm that can remove random noise in MR images is important. This paper proposes an effective noise reduction method for brain magnetic resonance (MR) images. Our approach is based on the collateral filter which is a more powerful method than the bilateral filter in many cases. However, the computation of the collateral filter algorithm is quite time-consuming. To solve this problem, we improved the collateral filter algorithm with parallel computing using GPU. We adopted CUDA, an application programming interface for GPU by NVIDIA, to accelerate the computation. Our experimental evaluation on an Intel Xeon CPU E5-2620 v3 2.40GHz with a NVIDIA Tesla K40c GPU indicated that the proposed implementation runs dramatically faster than the traditional collateral filter. We believe that the proposed framework has established a general blueprint for achieving fast and robust filtering in a wide variety of medical image denoising applications.
A comparative study of history-based versus vectorized Monte Carlo methods in the GPU/CUDA environment for a simple neutron eigenvalue problem

NASA Astrophysics Data System (ADS)

Liu, Tianyu; Du, Xining; Ji, Wei; Xu, X. George; Brown, Forrest B.

2014-06-01

For nuclear reactor analysis such as the neutron eigenvalue calculations, the time consuming Monte Carlo (MC) simulations can be accelerated by using graphics processing units (GPUs). However, traditional MC methods are often history-based, and their performance on GPUs is affected significantly by the thread divergence problem. In this paper we describe the development of a newly designed event-based vectorized MC algorithm for solving the neutron eigenvalue problem. The code was implemented using NVIDIA's Compute Unified Device Architecture (CUDA), and tested on a NVIDIA Tesla M2090 GPU card. We found that although the vectorized MC algorithm greatly reduces the occurrence of thread divergence thus enhancing the warp execution efficiency, the overall simulation speed is roughly ten times slower than the history-based MC code on GPUs. Profiling results suggest that the slow speed is probably due to the memory access latency caused by the large amount of global memory transactions. Possible solutions to improve the code efficiency are discussed.
Accelerating three-dimensional FDTD calculations on GPU clusters for electromagnetic field simulation.

PubMed

Nagaoka, Tomoaki; Watanabe, Soichi

2012-01-01

Electromagnetic simulation with anatomically realistic computational human model using the finite-difference time domain (FDTD) method has recently been performed in a number of fields in biomedical engineering. To improve the method's calculation speed and realize large-scale computing with the computational human model, we adapt three-dimensional FDTD code to a multi-GPU cluster environment with Compute Unified Device Architecture and Message Passing Interface. Our multi-GPU cluster system consists of three nodes. The seven GPU boards (NVIDIA Tesla C2070) are mounted on each node. We examined the performance of the FDTD calculation on multi-GPU cluster environment. We confirmed that the FDTD calculation on the multi-GPU clusters is faster than that on a multi-GPU (a single workstation), and we also found that the GPU cluster system calculate faster than a vector supercomputer. In addition, our GPU cluster system allowed us to perform the large-scale FDTD calculation because were able to use GPU memory of over 100 GB.
Implementation of metal-friendly EAM/FS-type semi-empirical potentials in HOOMD-blue: A GPU-accelerated molecular dynamics software

NASA Astrophysics Data System (ADS)

Yang, Lin; Zhang, Feng; Wang, Cai-Zhuang; Ho, Kai-Ming; Travesset, Alex

2018-04-01

We present an implementation of EAM and FS interatomic potentials, which are widely used in simulating metallic systems, in HOOMD-blue, a software designed to perform classical molecular dynamics simulations using GPU accelerations. We first discuss the details of our implementation and then report extensive benchmark tests. We demonstrate that single-precision floating point operations efficiently implemented on GPUs can produce sufficient accuracy when compared against double-precision codes, as demonstrated in test simulations of calculations of the glass-transition temperature of Cu64.5Zr35.5, and pair correlation function g (r) of liquid Ni3Al. Our code scales well with the size of the simulating system on NVIDIA Tesla M40 and P100 GPUs. Compared with another popular software LAMMPS running on 32 cores of AMD Opteron 6220 processors, the GPU/CPU performance ratio can reach as high as 4.6. The source code can be accessed through the HOOMD-blue web page for free by any interested user.
Multidisciplinary Simulation Acceleration using Multiple Shared-Memory Graphical Processing Units

NASA Astrophysics Data System (ADS)

Kemal, Jonathan Yashar

For purposes of optimizing and analyzing turbomachinery and other designs, the unsteady Favre-averaged flow-field differential equations for an ideal compressible gas can be solved in conjunction with the heat conduction equation. We solve all equations using the finite-volume multiple-grid numerical technique, with the dual time-step scheme used for unsteady simulations. Our numerical solver code targets CUDA-capable Graphical Processing Units (GPUs) produced by NVIDIA. Making use of MPI, our solver can run across networked compute notes, where each MPI process can use either a GPU or a Central Processing Unit (CPU) core for primary solver calculations. We use NVIDIA Tesla C2050/C2070 GPUs based on the Fermi architecture, and compare our resulting performance against Intel Zeon X5690 CPUs. Solver routines converted to CUDA typically run about 10 times faster on a GPU for sufficiently dense computational grids. We used a conjugate cylinder computational grid and ran a turbulent steady flow simulation using 4 increasingly dense computational grids. Our densest computational grid is divided into 13 blocks each containing 1033x1033 grid points, for a total of 13.87 million grid points or 1.07 million grid points per domain block. To obtain overall speedups, we compare the execution time of the solver's iteration loop, including all resource intensive GPU-related memory copies. Comparing the performance of 8 GPUs to that of 8 CPUs, we obtain an overall speedup of about 6.0 when using our densest computational grid. This amounts to an 8-GPU simulation running about 39.5 times faster than running than a single-CPU simulation.
Efficient implementation of the many-body Reactive Bond Order (REBO) potential on GPU

NASA Astrophysics Data System (ADS)

Trędak, Przemysław; Rudnicki, Witold R.; Majewski, Jacek A.

2016-09-01

The second generation Reactive Bond Order (REBO) empirical potential is commonly used to accurately model a wide range hydrocarbon materials. It is also extensible to other atom types and interactions. REBO potential assumes complex multi-body interaction model, that is difficult to represent efficiently in the SIMD or SIMT programming model. Hence, despite its importance, no efficient GPGPU implementation has been developed for this potential. Here we present a detailed description of a highly efficient GPGPU implementation of molecular dynamics algorithm using REBO potential. The presented algorithm takes advantage of rarely used properties of the SIMT architecture of a modern GPU to solve difficult synchronizations issues that arise in computations of multi-body potential. Techniques developed for this problem may be also used to achieve efficient solutions of different problems. The performance of proposed algorithm is assessed using a range of model systems. It is compared to highly optimized CPU implementation (both single core and OpenMP) available in LAMMPS package. These experiments show up to 6x improvement in forces computation time using single processor of the NVIDIA Tesla K80 compared to high end 16-core Intel Xeon processor.
Parallel fuzzy connected image segmentation on GPU.

PubMed

Zhuge, Ying; Cao, Yong; Udupa, Jayaram K; Miller, Robert W

2011-07-01

Image segmentation techniques using fuzzy connectedness (FC) principles have shown their effectiveness in segmenting a variety of objects in several large applications. However, one challenge in these algorithms has been their excessive computational requirements when processing large image datasets. Nowadays, commodity graphics hardware provides a highly parallel computing environment. In this paper, the authors present a parallel fuzzy connected image segmentation algorithm implementation on NVIDIA's compute unified device Architecture (CUDA) platform for segmenting medical image data sets. In the FC algorithm, there are two major computational tasks: (i) computing the fuzzy affinity relations and (ii) computing the fuzzy connectedness relations. These two tasks are implemented as CUDA kernels and executed on GPU. A dramatic improvement in speed for both tasks is achieved as a result. Our experiments based on three data sets of small, medium, and large data size demonstrate the efficiency of the parallel algorithm, which achieves a speed-up factor of 24.4x, 18.1x, and 10.3x, correspondingly, for the three data sets on the NVIDIA Tesla C1060 over the implementation of the algorithm on CPU, and takes 0.25, 0.72, and 15.04 s, correspondingly, for the three data sets. The authors developed a parallel algorithm of the widely used fuzzy connected image segmentation method on the NVIDIA GPUs, which are far more cost- and speed-effective than both cluster of workstations and multiprocessing systems. A near-interactive speed of segmentation has been achieved, even for the large data set.
Aspects of GPU perfomance in algorithms with random memory access

NASA Astrophysics Data System (ADS)

Kashkovsky, Alexander V.; Shershnev, Anton A.; Vashchenkov, Pavel V.

2017-10-01

The numerical code for solving the Boltzmann equation on the hybrid computational cluster using the Direct Simulation Monte Carlo (DSMC) method showed that on Tesla K40 accelerators computational performance drops dramatically with increase of percentage of occupied GPU memory. Testing revealed that memory access time increases tens of times after certain critical percentage of memory is occupied. Moreover, it seems to be the common problem of all NVidia's GPUs arising from its architecture. Few modifications of the numerical algorithm were suggested to overcome this problem. One of them, based on the splitting the memory into "virtual" blocks, resulted in 2.5 times speed up.
cudaMap: a GPU accelerated program for gene expression connectivity mapping.

PubMed

McArt, Darragh G; Bankhead, Peter; Dunne, Philip D; Salto-Tellez, Manuel; Hamilton, Peter; Zhang, Shu-Dong

2013-10-11

Modern cancer research often involves large datasets and the use of sophisticated statistical techniques. Together these add a heavy computational load to the analysis, which is often coupled with issues surrounding data accessibility. Connectivity mapping is an advanced bioinformatic and computational technique dedicated to therapeutics discovery and drug re-purposing around differential gene expression analysis. On a normal desktop PC, it is common for the connectivity mapping task with a single gene signature to take > 2h to complete using sscMap, a popular Java application that runs on standard CPUs (Central Processing Units). Here, we describe new software, cudaMap, which has been implemented using CUDA C/C++ to harness the computational power of NVIDIA GPUs (Graphics Processing Units) to greatly reduce processing times for connectivity mapping. cudaMap can identify candidate therapeutics from the same signature in just over thirty seconds when using an NVIDIA Tesla C2050 GPU. Results from the analysis of multiple gene signatures, which would previously have taken several days, can now be obtained in as little as 10 minutes, greatly facilitating candidate therapeutics discovery with high throughput. We are able to demonstrate dramatic speed differentials between GPU assisted performance and CPU executions as the computational load increases for high accuracy evaluation of statistical significance. Emerging 'omics' technologies are constantly increasing the volume of data and information to be processed in all areas of biomedical research. Embracing the multicore functionality of GPUs represents a major avenue of local accelerated computing. cudaMap will make a strong contribution in the discovery of candidate therapeutics by enabling speedy execution of heavy duty connectivity mapping tasks, which are increasingly required in modern cancer research. cudaMap is open source and can be freely downloaded from http://purl.oclc.org/NET/cudaMap.

Implementation of metal-friendly EAM/FS-type semi-empirical potentials in HOOMD-blue: A GPU-accelerated molecular dynamics software

DOE PAGES

Yang, Lin; Zhang, Feng; Wang, Cai-Zhuang; ...

2018-01-12

We present an implementation of EAM and FS interatomic potentials, which are widely used in simulating metallic systems, in HOOMD-blue, a software designed to perform classical molecular dynamics simulations using GPU accelerations. We first discuss the details of our implementation and then report extensive benchmark tests. We demonstrate that single-precision floating point operations efficiently implemented on GPUs can produce sufficient accuracy when compared against double-precision codes, as demonstrated in test simulations of calculations of the glass-transition temperature of Cu 64.5Zr 35.5, and pair correlation function of liquid Ni 3Al. Our code scales well with the size of the simulating systemmore » on NVIDIA Tesla M40 and P100 GPUs. Compared with another popular software LAMMPS running on 32 cores of AMD Opteron 6220 processors, the GPU/CPU performance ratio can reach as high as 4.6. In conclusion, the source code can be accessed through the HOOMD-blue web page for free by any interested user.« less
Implementation of metal-friendly EAM/FS-type semi-empirical potentials in HOOMD-blue: A GPU-accelerated molecular dynamics software

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yang, Lin; Zhang, Feng; Wang, Cai-Zhuang

We present an implementation of EAM and FS interatomic potentials, which are widely used in simulating metallic systems, in HOOMD-blue, a software designed to perform classical molecular dynamics simulations using GPU accelerations. We first discuss the details of our implementation and then report extensive benchmark tests. We demonstrate that single-precision floating point operations efficiently implemented on GPUs can produce sufficient accuracy when compared against double-precision codes, as demonstrated in test simulations of calculations of the glass-transition temperature of Cu 64.5Zr 35.5, and pair correlation function of liquid Ni 3Al. Our code scales well with the size of the simulating systemmore » on NVIDIA Tesla M40 and P100 GPUs. Compared with another popular software LAMMPS running on 32 cores of AMD Opteron 6220 processors, the GPU/CPU performance ratio can reach as high as 4.6. In conclusion, the source code can be accessed through the HOOMD-blue web page for free by any interested user.« less
Statistical tools for analysis and modeling of cosmic populations and astronomical time series: CUDAHM and TSE

NASA Astrophysics Data System (ADS)

Loredo, Thomas; Budavari, Tamas; Scargle, Jeffrey D.

2018-01-01

This presentation provides an overview of open-source software packages addressing two challenging classes of astrostatistics problems. (1) CUDAHM is a C++ framework for hierarchical Bayesian modeling of cosmic populations, leveraging graphics processing units (GPUs) to enable applying this computationally challenging paradigm to large datasets. CUDAHM is motivated by measurement error problems in astronomy, where density estimation and linear and nonlinear regression must be addressed for populations of thousands to millions of objects whose features are measured with possibly complex uncertainties, potentially including selection effects. An example calculation demonstrates accurate GPU-accelerated luminosity function estimation for simulated populations of $10^6$ objects in about two hours using a single NVIDIA Tesla K40c GPU. (2) Time Series Explorer (TSE) is a collection of software in Python and MATLAB for exploratory analysis and statistical modeling of astronomical time series. It comprises a library of stand-alone functions and classes, as well as an application environment for interactive exploration of times series data. The presentation will summarize key capabilities of this emerging project, including new algorithms for analysis of irregularly-sampled time series.
GPU-based relative fuzzy connectedness image segmentation.

PubMed

Zhuge, Ying; Ciesielski, Krzysztof C; Udupa, Jayaram K; Miller, Robert W

2013-01-01

Recently, clinical radiological research and practice are becoming increasingly quantitative. Further, images continue to increase in size and volume. For quantitative radiology to become practical, it is crucial that image segmentation algorithms and their implementations are rapid and yield practical run time on very large data sets. The purpose of this paper is to present a parallel version of an algorithm that belongs to the family of fuzzy connectedness (FC) algorithms, to achieve an interactive speed for segmenting large medical image data sets. The most common FC segmentations, optimizing an [script-l](∞)-based energy, are known as relative fuzzy connectedness (RFC) and iterative relative fuzzy connectedness (IRFC). Both RFC and IRFC objects (of which IRFC contains RFC) can be found via linear time algorithms, linear with respect to the image size. The new algorithm, P-ORFC (for parallel optimal RFC), which is implemented by using NVIDIA's Compute Unified Device Architecture (CUDA) platform, considerably improves the computational speed of the above mentioned CPU based IRFC algorithm. Experiments based on four data sets of small, medium, large, and super data size, achieved speedup factors of 32.8×, 22.9×, 20.9×, and 17.5×, correspondingly, on the NVIDIA Tesla C1060 platform. Although the output of P-ORFC need not precisely match that of IRFC output, it is very close to it and, as the authors prove, always lies between the RFC and IRFC objects. A parallel version of a top-of-the-line algorithm in the family of FC has been developed on the NVIDIA GPUs. An interactive speed of segmentation has been achieved, even for the largest medical image data set. Such GPU implementations may play a crucial role in automatic anatomy recognition in clinical radiology.
HASEonGPU-An adaptive, load-balanced MPI/GPU-code for calculating the amplified spontaneous emission in high power laser media

NASA Astrophysics Data System (ADS)

Eckert, C. H. J.; Zenker, E.; Bussmann, M.; Albach, D.

2016-10-01

We present an adaptive Monte Carlo algorithm for computing the amplified spontaneous emission (ASE) flux in laser gain media pumped by pulsed lasers. With the design of high power lasers in mind, which require large size gain media, we have developed the open source code HASEonGPU that is capable of utilizing multiple graphic processing units (GPUs). With HASEonGPU, time to solution is reduced to minutes on a medium size GPU cluster of 64 NVIDIA Tesla K20m GPUs and excellent speedup is achieved when scaling to multiple GPUs. Comparison of simulation results to measurements of ASE in Y b 3 + : Y AG ceramics show perfect agreement.
Computational Modeling and Numerical Methods for Spatiotemporal Calcium Cycling in Ventricular Myocytes

PubMed Central

Nivala, Michael; de Lange, Enno; Rovetti, Robert; Qu, Zhilin

2012-01-01

Intracellular calcium (Ca) cycling dynamics in cardiac myocytes is regulated by a complex network of spatially distributed organelles, such as sarcoplasmic reticulum (SR), mitochondria, and myofibrils. In this study, we present a mathematical model of intracellular Ca cycling and numerical and computational methods for computer simulations. The model consists of a coupled Ca release unit (CRU) network, which includes a SR domain and a myoplasm domain. Each CRU contains 10 L-type Ca channels and 100 ryanodine receptor channels, with individual channels simulated stochastically using a variant of Gillespie’s method, modified here to handle time-dependent transition rates. Both the SR domain and the myoplasm domain in each CRU are modeled by 5 × 5 × 5 voxels to maintain proper Ca diffusion. Advanced numerical algorithms implemented on graphical processing units were used for fast computational simulations. For a myocyte containing 100 × 20 × 10 CRUs, a 1-s heart time simulation takes about 10 min of machine time on a single NVIDIA Tesla C2050. Examples of simulated Ca cycling dynamics, such as Ca sparks, Ca waves, and Ca alternans, are shown. PMID:22586402
Novel hybrid GPU-CPU implementation of parallelized Monte Carlo parametric expectation maximization estimation method for population pharmacokinetic data analysis.

PubMed

Ng, C M

2013-10-01

The development of a population PK/PD model, an essential component for model-based drug development, is both time- and labor-intensive. A graphical-processing unit (GPU) computing technology has been proposed and used to accelerate many scientific computations. The objective of this study was to develop a hybrid GPU-CPU implementation of parallelized Monte Carlo parametric expectation maximization (MCPEM) estimation algorithm for population PK data analysis. A hybrid GPU-CPU implementation of the MCPEM algorithm (MCPEMGPU) and identical algorithm that is designed for the single CPU (MCPEMCPU) were developed using MATLAB in a single computer equipped with dual Xeon 6-Core E5690 CPU and a NVIDIA Tesla C2070 GPU parallel computing card that contained 448 stream processors. Two different PK models with rich/sparse sampling design schemes were used to simulate population data in assessing the performance of MCPEMCPU and MCPEMGPU. Results were analyzed by comparing the parameter estimation and model computation times. Speedup factor was used to assess the relative benefit of parallelized MCPEMGPU over MCPEMCPU in shortening model computation time. The MCPEMGPU consistently achieved shorter computation time than the MCPEMCPU and can offer more than 48-fold speedup using a single GPU card. The novel hybrid GPU-CPU implementation of parallelized MCPEM algorithm developed in this study holds a great promise in serving as the core for the next-generation of modeling software for population PK/PD analysis.
Heterogeneous computing architecture for fast detection of SNP-SNP interactions.

PubMed

Sluga, Davor; Curk, Tomaz; Zupan, Blaz; Lotric, Uros

2014-06-25

The extent of data in a typical genome-wide association study (GWAS) poses considerable computational challenges to software tools for gene-gene interaction discovery. Exhaustive evaluation of all interactions among hundreds of thousands to millions of single nucleotide polymorphisms (SNPs) may require weeks or even months of computation. Massively parallel hardware within a modern Graphic Processing Unit (GPU) and Many Integrated Core (MIC) coprocessors can shorten the run time considerably. While the utility of GPU-based implementations in bioinformatics has been well studied, MIC architecture has been introduced only recently and may provide a number of comparative advantages that have yet to be explored and tested. We have developed a heterogeneous, GPU and Intel MIC-accelerated software module for SNP-SNP interaction discovery to replace the previously single-threaded computational core in the interactive web-based data exploration program SNPsyn. We report on differences between these two modern massively parallel architectures and their software environments. Their utility resulted in an order of magnitude shorter execution times when compared to the single-threaded CPU implementation. GPU implementation on a single Nvidia Tesla K20 runs twice as fast as that for the MIC architecture-based Xeon Phi P5110 coprocessor, but also requires considerably more programming effort. General purpose GPUs are a mature platform with large amounts of computing power capable of tackling inherently parallel problems, but can prove demanding for the programmer. On the other hand the new MIC architecture, albeit lacking in performance reduces the programming effort and makes it up with a more general architecture suitable for a wider range of problems.
Heterogeneous computing architecture for fast detection of SNP-SNP interactions

PubMed Central

2014-01-01

Background The extent of data in a typical genome-wide association study (GWAS) poses considerable computational challenges to software tools for gene-gene interaction discovery. Exhaustive evaluation of all interactions among hundreds of thousands to millions of single nucleotide polymorphisms (SNPs) may require weeks or even months of computation. Massively parallel hardware within a modern Graphic Processing Unit (GPU) and Many Integrated Core (MIC) coprocessors can shorten the run time considerably. While the utility of GPU-based implementations in bioinformatics has been well studied, MIC architecture has been introduced only recently and may provide a number of comparative advantages that have yet to be explored and tested. Results We have developed a heterogeneous, GPU and Intel MIC-accelerated software module for SNP-SNP interaction discovery to replace the previously single-threaded computational core in the interactive web-based data exploration program SNPsyn. We report on differences between these two modern massively parallel architectures and their software environments. Their utility resulted in an order of magnitude shorter execution times when compared to the single-threaded CPU implementation. GPU implementation on a single Nvidia Tesla K20 runs twice as fast as that for the MIC architecture-based Xeon Phi P5110 coprocessor, but also requires considerably more programming effort. Conclusions General purpose GPUs are a mature platform with large amounts of computing power capable of tackling inherently parallel problems, but can prove demanding for the programmer. On the other hand the new MIC architecture, albeit lacking in performance reduces the programming effort and makes it up with a more general architecture suitable for a wider range of problems. PMID:24964802
cudaMap: a GPU accelerated program for gene expression connectivity mapping

PubMed Central

2013-01-01

Background Modern cancer research often involves large datasets and the use of sophisticated statistical techniques. Together these add a heavy computational load to the analysis, which is often coupled with issues surrounding data accessibility. Connectivity mapping is an advanced bioinformatic and computational technique dedicated to therapeutics discovery and drug re-purposing around differential gene expression analysis. On a normal desktop PC, it is common for the connectivity mapping task with a single gene signature to take > 2h to complete using sscMap, a popular Java application that runs on standard CPUs (Central Processing Units). Here, we describe new software, cudaMap, which has been implemented using CUDA C/C++ to harness the computational power of NVIDIA GPUs (Graphics Processing Units) to greatly reduce processing times for connectivity mapping. Results cudaMap can identify candidate therapeutics from the same signature in just over thirty seconds when using an NVIDIA Tesla C2050 GPU. Results from the analysis of multiple gene signatures, which would previously have taken several days, can now be obtained in as little as 10 minutes, greatly facilitating candidate therapeutics discovery with high throughput. We are able to demonstrate dramatic speed differentials between GPU assisted performance and CPU executions as the computational load increases for high accuracy evaluation of statistical significance. Conclusion Emerging ‘omics’ technologies are constantly increasing the volume of data and information to be processed in all areas of biomedical research. Embracing the multicore functionality of GPUs represents a major avenue of local accelerated computing. cudaMap will make a strong contribution in the discovery of candidate therapeutics by enabling speedy execution of heavy duty connectivity mapping tasks, which are increasingly required in modern cancer research. cudaMap is open source and can be freely downloaded from http://purl.oclc.org/NET/cudaMap. PMID:24112435
Analysis and Implementation of Particle-to-Particle (P2P) Graphics Processor Unit (GPU) Kernel for Black-Box Adaptive Fast Multipole Method

DTIC Science & Technology

2015-06-01

5110P and 16 dx360M4 nodes each with one NVIDIA Kepler K20M/K40M GPU. Each node contained dual Intel Xeon E5-2670 (Sandy Bridge) central processing...kernel and as such does not employ multiple processors. This work makes use of a single processing core and a single NVIDIA Kepler K40 GK110...bandwidth (2 × 16 slot), 7.877 GFloat/s; Kepler K40 peak, 4,290 × 1 billion floating-point operations (GFLOPs), and 288 GB/s Kepler K40 memory
Proton Testing of nVidia GTX 1050 GPU

NASA Technical Reports Server (NTRS)

Wyrwas, E. J.

2017-01-01

Single-Event Effects (SEE) testing was conducted on the nVidia GTX 1050 Graphics Processor Unit (GPU); herein referred to as device under test (DUT). Testing was conducted at Massachusetts General Hospitals (MGH) Francis H. Burr Proton Therapy Center on April 9th, 2017 using 200-MeV protons. This testing trip was purposed to provide a baseline assessment of the radiation susceptibility of the DUT as no previous testing had been conducted on this component.
Proton Testing of nVidia Jetson TX1

NASA Technical Reports Server (NTRS)

Wyrwas, Edward J.

2017-01-01

Single-Event Effects (SEE) testing was conducted on the nVidia Jetson TX1 System on Chip (SOC); herein referred to as device under test (DUT). Testing was conducted at Massachusetts General Hospitals (MGH) Francis H. Burr Proton Therapy Center on October 16th, 2016 using 200MeV protons. This testing trip was purposed to provide a baseline assessment of the radiation susceptibility of the DUT as no previous testing had been conducted on this component.
On the effective implementation of a boundary element code on graphics processing units unsing an out-of-core LU algorithm

DOE Office of Scientific and Technical Information (OSTI.GOV)

D'Azevedo, Ed F; Nintcheu Fata, Sylvain

2012-01-01

A collocation boundary element code for solving the three-dimensional Laplace equation, publicly available from \\url{http://www.intetec.org}, has been adapted to run on an Nvidia Tesla general purpose graphics processing unit (GPU). Global matrix assembly and LU factorization of the resulting dense matrix were performed on the GPU. Out-of-core techniques were used to solve problems larger than available GPU memory. The code achieved over eight times speedup in matrix assembly and about 56~Gflops/sec in the LU factorization using only 512~Mbytes of GPU memory. Details of the GPU implementation and comparisons with the standard sequential algorithm are included to illustrate the performance ofmore » the GPU code.« less
OpenACC acceleration of an unstructured CFD solver based on a reconstructed discontinuous Galerkin method for compressible flows

DOE PAGES

Xia, Yidong; Lou, Jialin; Luo, Hong; ...

2015-02-09

Here, an OpenACC directive-based graphics processing unit (GPU) parallel scheme is presented for solving the compressible Navier–Stokes equations on 3D hybrid unstructured grids with a third-order reconstructed discontinuous Galerkin method. The developed scheme requires the minimum code intrusion and algorithm alteration for upgrading a legacy solver with the GPU computing capability at very little extra effort in programming, which leads to a unified and portable code development strategy. A face coloring algorithm is adopted to eliminate the memory contention because of the threading of internal and boundary face integrals. A number of flow problems are presented to verify the implementationmore » of the developed scheme. Timing measurements were obtained by running the resulting GPU code on one Nvidia Tesla K20c GPU card (Nvidia Corporation, Santa Clara, CA, USA) and compared with those obtained by running the equivalent Message Passing Interface (MPI) parallel CPU code on a compute node (consisting of two AMD Opteron 6128 eight-core CPUs (Advanced Micro Devices, Inc., Sunnyvale, CA, USA)). Speedup factors of up to 24× and 1.6× for the GPU code were achieved with respect to one and 16 CPU cores, respectively. The numerical results indicate that this OpenACC-based parallel scheme is an effective and extensible approach to port unstructured high-order CFD solvers to GPU computing.« less
GPU accelerated population annealing algorithm

NASA Astrophysics Data System (ADS)

Barash, Lev Yu.; Weigel, Martin; Borovský, Michal; Janke, Wolfhard; Shchur, Lev N.

2017-11-01

Population annealing is a promising recent approach for Monte Carlo simulations in statistical physics, in particular for the simulation of systems with complex free-energy landscapes. It is a hybrid method, combining importance sampling through Markov chains with elements of sequential Monte Carlo in the form of population control. While it appears to provide algorithmic capabilities for the simulation of such systems that are roughly comparable to those of more established approaches such as parallel tempering, it is intrinsically much more suitable for massively parallel computing. Here, we tap into this structural advantage and present a highly optimized implementation of the population annealing algorithm on GPUs that promises speed-ups of several orders of magnitude as compared to a serial implementation on CPUs. While the sample code is for simulations of the 2D ferromagnetic Ising model, it should be easily adapted for simulations of other spin models, including disordered systems. Our code includes implementations of some advanced algorithmic features that have only recently been suggested, namely the automatic adaptation of temperature steps and a multi-histogram analysis of the data at different temperatures. Program Files doi:http://dx.doi.org/10.17632/sgzt4b7b3m.1 Licensing provisions: Creative Commons Attribution license (CC BY 4.0) Programming language: C, CUDA External routines/libraries: NVIDIA CUDA Toolkit 6.5 or newer Nature of problem: The program calculates the internal energy, specific heat, several magnetization moments, entropy and free energy of the 2D Ising model on square lattices of edge length L with periodic boundary conditions as a function of inverse temperature β. Solution method: The code uses population annealing, a hybrid method combining Markov chain updates with population control. The code is implemented for NVIDIA GPUs using the CUDA language and employs advanced techniques such as multi-spin coding, adaptive temperature steps and multi-histogram reweighting. Additional comments: Code repository at https://github.com/LevBarash/PAising. The system size and size of the population of replicas are limited depending on the memory of the GPU device used. For the default parameter values used in the sample programs, L = 64, θ = 100, β0 = 0, βf = 1, Δβ = 0 . 005, R = 20 000, a typical run time on an NVIDIA Tesla K80 GPU is 151 seconds for the single spin coded (SSC) and 17 seconds for the multi-spin coded (MSC) program (see Section 2 for a description of these parameters).
Multi-Core Programming Design Patterns: Stream Processing Algorithms for Dynamic Scene Perceptions

DTIC Science & Technology

2014-05-01

processor developed by IBM and other companies , incorpo- rates the verb—POWER5— processor as the Power Processor Element (PPE), one of the early general...deliver an power efficient single-precision peak performance of more than 256 GFlops. Substantially more raw power became available later, when nVIDIA ...algorithms, including IBM’s Cell/B.E., GPUs from NVidia and AMD and many-core CPUs from Intel.27 The vast growth of digital video content has been a
AESS: Accelerated Exact Stochastic Simulation

NASA Astrophysics Data System (ADS)

Jenkins, David D.; Peterson, Gregory D.

2011-12-01

The Stochastic Simulation Algorithm (SSA) developed by Gillespie provides a powerful mechanism for exploring the behavior of chemical systems with small species populations or with important noise contributions. Gene circuit simulations for systems biology commonly employ the SSA method, as do ecological applications. This algorithm tends to be computationally expensive, so researchers seek an efficient implementation of SSA. In this program package, the Accelerated Exact Stochastic Simulation Algorithm (AESS) contains optimized implementations of Gillespie's SSA that improve the performance of individual simulation runs or ensembles of simulations used for sweeping parameters or to provide statistically significant results. Program summaryProgram title: AESS Catalogue identifier: AEJW_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEJW_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: University of Tennessee copyright agreement No. of lines in distributed program, including test data, etc.: 10 861 No. of bytes in distributed program, including test data, etc.: 394 631 Distribution format: tar.gz Programming language: C for processors, CUDA for NVIDIA GPUs Computer: Developed and tested on various x86 computers and NVIDIA C1060 Tesla and GTX 480 Fermi GPUs. The system targets x86 workstations, optionally with multicore processors or NVIDIA GPUs as accelerators. Operating system: Tested under Ubuntu Linux OS and CentOS 5.5 Linux OS Classification: 3, 16.12 Nature of problem: Simulation of chemical systems, particularly with low species populations, can be accurately performed using Gillespie's method of stochastic simulation. Numerous variations on the original stochastic simulation algorithm have been developed, including approaches that produce results with statistics that exactly match the chemical master equation (CME) as well as other approaches that approximate the CME. Solution method: The Accelerated Exact Stochastic Simulation (AESS) tool provides implementations of a wide variety of popular variations on the Gillespie method. Users can select the specific algorithm considered most appropriate. Comparisons between the methods and with other available implementations indicate that AESS provides the fastest known implementation of Gillespie's method for a variety of test models. Users may wish to execute ensembles of simulations to sweep parameters or to obtain better statistical results, so AESS supports acceleration of ensembles of simulation using parallel processing with MPI, SSE vector units on x86 processors, and/or using NVIDIA GPUs with CUDA.
GPU-based relative fuzzy connectedness image segmentation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhuge Ying; Ciesielski, Krzysztof C.; Udupa, Jayaram K.

2013-01-15

Purpose:Recently, clinical radiological research and practice are becoming increasingly quantitative. Further, images continue to increase in size and volume. For quantitative radiology to become practical, it is crucial that image segmentation algorithms and their implementations are rapid and yield practical run time on very large data sets. The purpose of this paper is to present a parallel version of an algorithm that belongs to the family of fuzzy connectedness (FC) algorithms, to achieve an interactive speed for segmenting large medical image data sets. Methods: The most common FC segmentations, optimizing an Script-Small-L {sub {infinity}}-based energy, are known as relative fuzzymore » connectedness (RFC) and iterative relative fuzzy connectedness (IRFC). Both RFC and IRFC objects (of which IRFC contains RFC) can be found via linear time algorithms, linear with respect to the image size. The new algorithm, P-ORFC (for parallel optimal RFC), which is implemented by using NVIDIA's Compute Unified Device Architecture (CUDA) platform, considerably improves the computational speed of the above mentioned CPU based IRFC algorithm. Results: Experiments based on four data sets of small, medium, large, and super data size, achieved speedup factors of 32.8 Multiplication-Sign , 22.9 Multiplication-Sign , 20.9 Multiplication-Sign , and 17.5 Multiplication-Sign , correspondingly, on the NVIDIA Tesla C1060 platform. Although the output of P-ORFC need not precisely match that of IRFC output, it is very close to it and, as the authors prove, always lies between the RFC and IRFC objects. Conclusions: A parallel version of a top-of-the-line algorithm in the family of FC has been developed on the NVIDIA GPUs. An interactive speed of segmentation has been achieved, even for the largest medical image data set. Such GPU implementations may play a crucial role in automatic anatomy recognition in clinical radiology.« less
GOTHIC: Gravitational oct-tree code accelerated by hierarchical time step controlling

NASA Astrophysics Data System (ADS)

Miki, Yohei; Umemura, Masayuki

2017-04-01

The tree method is a widely implemented algorithm for collisionless N-body simulations in astrophysics well suited for GPU(s). Adopting hierarchical time stepping can accelerate N-body simulations; however, it is infrequently implemented and its potential remains untested in GPU implementations. We have developed a Gravitational Oct-Tree code accelerated by HIerarchical time step Controlling named GOTHIC, which adopts both the tree method and the hierarchical time step. The code adopts some adaptive optimizations by monitoring the execution time of each function on-the-fly and minimizes the time-to-solution by balancing the measured time of multiple functions. Results of performance measurements with realistic particle distribution performed on NVIDIA Tesla M2090, K20X, and GeForce GTX TITAN X, which are representative GPUs of the Fermi, Kepler, and Maxwell generation of GPUs, show that the hierarchical time step achieves a speedup by a factor of around 3-5 times compared to the shared time step. The measured elapsed time per step of GOTHIC is 0.30 s or 0.44 s on GTX TITAN X when the particle distribution represents the Andromeda galaxy or the NFW sphere, respectively, with 224 = 16,777,216 particles. The averaged performance of the code corresponds to 10-30% of the theoretical single precision peak performance of the GPU.

A multi-port 10GbE PCIe NIC featuring UDP offload and GPUDirect capabilities.

NASA Astrophysics Data System (ADS)

Ammendola, Roberto; Biagioni, Andrea; Frezza, Ottorino; Lamanna, Gianluca; Lo Cicero, Francesca; Lonardo, Alessandro; Martinelli, Michele; Stanislao Paolucci, Pier; Pastorelli, Elena; Pontisso, Luca; Rossetti, Davide; Simula, Francesco; Sozzi, Marco; Tosoratto, Laura; Vicini, Piero

2015-12-01

NaNet-10 is a four-ports 10GbE PCIe Network Interface Card designed for low-latency real-time operations with GPU systems. To this purpose the design includes an UDP offload module, for fast and clock-cycle deterministic handling of the transport layer protocol, plus a GPUDirect P2P/RDMA engine for low-latency communication with NVIDIA Tesla GPU devices. A dedicated module (Multi-Stream) can optionally process input UDP streams before data is delivered through PCIe DMA to their destination devices, re-organizing data from different streams guaranteeing computational optimization. NaNet-10 is going to be integrated in the NA62 CERN experiment in order to assess the suitability of GPGPU systems as real-time triggers; results and lessons learned while performing this activity will be reported herein.
Development of an Implicit, Charge and Energy Conserving 2D Electromagnetic PIC Code on Advanced Architectures

NASA Astrophysics Data System (ADS)

Payne, Joshua; Taitano, William; Knoll, Dana; Liebs, Chris; Murthy, Karthik; Feltman, Nicolas; Wang, Yijie; McCarthy, Colleen; Cieren, Emanuel

2012-10-01

In order to solve problems such as the ion coalescence and slow MHD shocks fully kinetically we developed a fully implicit 2D energy and charge conserving electromagnetic PIC code, PlasmaApp2D. PlasmaApp2D differs from previous implicit PIC implementations in that it will utilize advanced architectures such as GPUs and shared memory CPU systems, with problems too large to fit into cache. PlasmaApp2D will be a hybrid CPU-GPU code developed primarily to run on the DARWIN cluster at LANL utilizing four 12-core AMD Opteron CPUs and two NVIDIA Tesla GPUs per node. MPI will be used for cross-node communication, OpenMP will be used for on-node parallelism, and CUDA will be used for the GPUs. Development progress and initial results will be presented.
A Large Scale, High Resolution Agent-Based Insurgency Model

DTIC Science & Technology

2013-09-30

CUDA) is NVIDIA Corporation’s software development model for General Purpose Programming on Graphics Processing Units (GPGPU) ( NVIDIA Corporation ...Conference. Argonne National Laboratory, Argonne, IL, October, 2005. NVIDIA Corporation . NVIDIA CUDA Programming Guide 2.0 [Online]. NVIDIA Corporation
Multi-GPU Accelerated Admittance Method for High-Resolution Human Exposure Evaluation.

PubMed

Xiong, Zubiao; Feng, Shi; Kautz, Richard; Chandra, Sandeep; Altunyurt, Nevin; Chen, Ji

2015-12-01

A multi-graphics processing unit (GPU) accelerated admittance method solver is presented for solving the induced electric field in high-resolution anatomical models of human body when exposed to external low-frequency magnetic fields. In the solver, the anatomical model is discretized as a three-dimensional network of admittances. The conjugate orthogonal conjugate gradient (COCG) iterative algorithm is employed to take advantage of the symmetric property of the complex-valued linear system of equations. Compared against the widely used biconjugate gradient stabilized method, the COCG algorithm can reduce the solving time by 3.5 times and reduce the storage requirement by about 40%. The iterative algorithm is then accelerated further by using multiple NVIDIA GPUs. The computations and data transfers between GPUs are overlapped in time by using asynchronous concurrent execution design. The communication overhead is well hidden so that the acceleration is nearly linear with the number of GPU cards. Numerical examples show that our GPU implementation running on four NVIDIA Tesla K20c cards can reach 90 times faster than the CPU implementation running on eight CPU cores (two Intel Xeon E5-2603 processors). The implemented solver is able to solve large dimensional problems efficiently. A whole adult body discretized in 1-mm resolution can be solved in just several minutes. The high efficiency achieved makes it practical to investigate human exposure involving a large number of cases with a high resolution that meets the requirements of international dosimetry guidelines.
Parallel fuzzy connected image segmentation on GPU

PubMed Central

Zhuge, Ying; Cao, Yong; Udupa, Jayaram K.; Miller, Robert W.

2011-01-01

Purpose: Image segmentation techniques using fuzzy connectedness (FC) principles have shown their effectiveness in segmenting a variety of objects in several large applications. However, one challenge in these algorithms has been their excessive computational requirements when processing large image datasets. Nowadays, commodity graphics hardware provides a highly parallel computing environment. In this paper, the authors present a parallel fuzzy connected image segmentation algorithm implementation on NVIDIA’s compute unified device Architecture (cuda) platform for segmenting medical image data sets. Methods: In the FC algorithm, there are two major computational tasks: (i) computing the fuzzy affinity relations and (ii) computing the fuzzy connectedness relations. These two tasks are implemented as cuda kernels and executed on GPU. A dramatic improvement in speed for both tasks is achieved as a result. Results: Our experiments based on three data sets of small, medium, and large data size demonstrate the efficiency of the parallel algorithm, which achieves a speed-up factor of 24.4x, 18.1x, and 10.3x, correspondingly, for the three data sets on the NVIDIA Tesla C1060 over the implementation of the algorithm on CPU, and takes 0.25, 0.72, and 15.04 s, correspondingly, for the three data sets. Conclusions: The authors developed a parallel algorithm of the widely used fuzzy connected image segmentation method on the NVIDIA GPUs, which are far more cost- and speed-effective than both cluster of workstations and multiprocessing systems. A near-interactive speed of segmentation has been achieved, even for the large data set. PMID:21859037
77 FR 26789 - Certain Semiconductor Chips Having Synchronous Dynamic Random Access Memory Controllers and...

Federal Register 2010, 2011, 2012, 2013, 2014

2012-05-07

... patents. 73 FR 75131. The principal respondent was NVIDIA Corporation of Santa Clara, California (``NVIDIA''). Joining NVIDIA as respondents were approximately twenty of NVIDIA's customers. The Commission found a... accused products in the United States: NVIDIA; Hewlett-Packard Co. of Palo Alto, California; ASUS Computer...
High-Performance 3D Compressive Sensing MRI Reconstruction Using Many-Core Architectures.

PubMed

Kim, Daehyun; Trzasko, Joshua; Smelyanskiy, Mikhail; Haider, Clifton; Dubey, Pradeep; Manduca, Armando

2011-01-01

Compressive sensing (CS) describes how sparse signals can be accurately reconstructed from many fewer samples than required by the Nyquist criterion. Since MRI scan duration is proportional to the number of acquired samples, CS has been gaining significant attention in MRI. However, the computationally intensive nature of CS reconstructions has precluded their use in routine clinical practice. In this work, we investigate how different throughput-oriented architectures can benefit one CS algorithm and what levels of acceleration are feasible on different modern platforms. We demonstrate that a CUDA-based code running on an NVIDIA Tesla C2050 GPU can reconstruct a 256 × 160 × 80 volume from an 8-channel acquisition in 19 seconds, which is in itself a significant improvement over the state of the art. We then show that Intel's Knights Ferry can perform the same 3D MRI reconstruction in only 12 seconds, bringing CS methods even closer to clinical viability.
GPU-Powered Coherent Beamforming

NASA Astrophysics Data System (ADS)

Magro, A.; Adami, K. Zarb; Hickish, J.

2015-03-01

Graphics processing units (GPU)-based beamforming is a relatively unexplored area in radio astronomy, possibly due to the assumption that any such system will be severely limited by the PCIe bandwidth required to transfer data to the GPU. We have developed a CUDA-based GPU implementation of a coherent beamformer, specifically designed and optimized for deployment at the BEST-2 array which can generate an arbitrary number of synthesized beams for a wide range of parameters. It achieves ˜1.3 TFLOPs on an NVIDIA Tesla K20, approximately 10x faster than an optimized, multithreaded CPU implementation. This kernel has been integrated into two real-time, GPU-based time-domain software pipelines deployed at the BEST-2 array in Medicina: a standalone beamforming pipeline and a transient detection pipeline. We present performance benchmarks for the beamforming kernel as well as the transient detection pipeline with beamforming capabilities as well as results of test observation.
A GPU-based calculation using the three-dimensional FDTD method for electromagnetic field analysis.

PubMed

Nagaoka, Tomoaki; Watanabe, Soichi

2010-01-01

Numerical simulations with the numerical human model using the finite-difference time domain (FDTD) method have recently been performed frequently in a number of fields in biomedical engineering. However, the FDTD calculation runs too slowly. We focus, therefore, on general purpose programming on the graphics processing unit (GPGPU). The three-dimensional FDTD method was implemented on the GPU using Compute Unified Device Architecture (CUDA). In this study, we used the NVIDIA Tesla C1060 as a GPGPU board. The performance of the GPU is evaluated in comparison with the performance of a conventional CPU and a vector supercomputer. The results indicate that three-dimensional FDTD calculations using a GPU can significantly reduce run time in comparison with that using a conventional CPU, even a native GPU implementation of the three-dimensional FDTD method, while the GPU/CPU speed ratio varies with the calculation domain and thread block size.
Fast analytical scatter estimation using graphics processing units.

PubMed

Ingleby, Harry; Lippuner, Jonas; Rickey, Daniel W; Li, Yue; Elbakri, Idris

2015-01-01

To develop a fast patient-specific analytical estimator of first-order Compton and Rayleigh scatter in cone-beam computed tomography, implemented using graphics processing units. The authors developed an analytical estimator for first-order Compton and Rayleigh scatter in a cone-beam computed tomography geometry. The estimator was coded using NVIDIA's CUDA environment for execution on an NVIDIA graphics processing unit. Performance of the analytical estimator was validated by comparison with high-count Monte Carlo simulations for two different numerical phantoms. Monoenergetic analytical simulations were compared with monoenergetic and polyenergetic Monte Carlo simulations. Analytical and Monte Carlo scatter estimates were compared both qualitatively, from visual inspection of images and profiles, and quantitatively, using a scaled root-mean-square difference metric. Reconstruction of simulated cone-beam projection data of an anthropomorphic breast phantom illustrated the potential of this method as a component of a scatter correction algorithm. The monoenergetic analytical and Monte Carlo scatter estimates showed very good agreement. The monoenergetic analytical estimates showed good agreement for Compton single scatter and reasonable agreement for Rayleigh single scatter when compared with polyenergetic Monte Carlo estimates. For a voxelized phantom with dimensions 128 × 128 × 128 voxels and a detector with 256 × 256 pixels, the analytical estimator required 669 seconds for a single projection, using a single NVIDIA 9800 GX2 video card. Accounting for first order scatter in cone-beam image reconstruction improves the contrast to noise ratio of the reconstructed images. The analytical scatter estimator, implemented using graphics processing units, provides rapid and accurate estimates of single scatter and with further acceleration and a method to account for multiple scatter may be useful for practical scatter correction schemes.
A GPU-accelerated semi-implicit fractional step method for numerical solutions of incompressible Navier-Stokes equations

NASA Astrophysics Data System (ADS)

Ha, Sanghyun; Park, Junshin; You, Donghyun

2017-11-01

Utility of the computational power of modern Graphics Processing Units (GPUs) is elaborated for solutions of incompressible Navier-Stokes equations which are integrated using a semi-implicit fractional-step method. Due to its serial and bandwidth-bound nature, the present choice of numerical methods is considered to be a good candidate for evaluating the potential of GPUs for solving Navier-Stokes equations using non-explicit time integration. An efficient algorithm is presented for GPU acceleration of the Alternating Direction Implicit (ADI) and the Fourier-transform-based direct solution method used in the semi-implicit fractional-step method. OpenMP is employed for concurrent collection of turbulence statistics on a CPU while Navier-Stokes equations are computed on a GPU. Extension to multiple NVIDIA GPUs is implemented using NVLink supported by the Pascal architecture. Performance of the present method is experimented on multiple Tesla P100 GPUs compared with a single-core Xeon E5-2650 v4 CPU in simulations of boundary-layer flow over a flat plate. Supported by the National Research Foundation of Korea (NRF) Grant funded by the Korea government (Ministry of Science, ICT and Future Planning NRF-2016R1E1A2A01939553, NRF-2014R1A2A1A11049599, and Ministry of Trade, Industry and Energy 201611101000230).
Application of a GPU-Assisted Maxwell Code to Electromagnetic Wave Propagation in ITER

NASA Astrophysics Data System (ADS)

Kubota, S.; Peebles, W. A.; Woodbury, D.; Johnson, I.; Zolfaghari, A.

2014-10-01

The Low Field Side Reflectometer (LSFR) on ITER is envisioned to provide capabilities for electron density profile and fluctuations measurements in both the plasma core and edge. The current design for the Equatorial Port Plug 11 (EPP11) employs seven monostatic antennas for use with both fixed-frequency and swept-frequency systems. The present work examines the characteristics of this layout using the 3-D version of the GPU-Assisted Maxwell Code (GAMC-3D). Previous studies in this area were performed with either 2-D full wave codes or 3-D ray- and beam-tracing. GAMC-3D is based on the FDTD method and can be run with either a fixed-frequency or modulated (e.g. FMCW) source, and with either a stationary or moving target (e.g. Doppler backscattering). The code is designed to run on a single NVIDIA Tesla GPU accelerator, and utilizes a technique based on the moving window method to overcome the size limitation of the onboard memory. Effects such as beam drift, linear mode conversion, and diffraction/scattering will be examined. Comparisons will be made with beam-tracing calculations using the complex eikonal method. Supported by U.S. DoE Grants DE-FG02-99ER54527 and DE-AC02-09CH11466, and the DoE SULI Program at PPPL.
GASPRNG: GPU accelerated scalable parallel random number generator library

NASA Astrophysics Data System (ADS)

Gao, Shuang; Peterson, Gregory D.

2013-04-01

Graphics processors represent a promising technology for accelerating computational science applications. Many computational science applications require fast and scalable random number generation with good statistical properties, so they use the Scalable Parallel Random Number Generators library (SPRNG). We present the GPU Accelerated SPRNG library (GASPRNG) to accelerate SPRNG in GPU-based high performance computing systems. GASPRNG includes code for a host CPU and CUDA code for execution on NVIDIA graphics processing units (GPUs) along with a programming interface to support various usage models for pseudorandom numbers and computational science applications executing on the CPU, GPU, or both. This paper describes the implementation approach used to produce high performance and also describes how to use the programming interface. The programming interface allows a user to be able to use GASPRNG the same way as SPRNG on traditional serial or parallel computers as well as to develop tightly coupled programs executing primarily on the GPU. We also describe how to install GASPRNG and use it. To help illustrate linking with GASPRNG, various demonstration codes are included for the different usage models. GASPRNG on a single GPU shows up to 280x speedup over SPRNG on a single CPU core and is able to scale for larger systems in the same manner as SPRNG. Because GASPRNG generates identical streams of pseudorandom numbers as SPRNG, users can be confident about the quality of GASPRNG for scalable computational science applications. Catalogue identifier: AEOI_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEOI_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: UTK license. No. of lines in distributed program, including test data, etc.: 167900 No. of bytes in distributed program, including test data, etc.: 1422058 Distribution format: tar.gz Programming language: C and CUDA. Computer: Any PC or workstation with NVIDIA GPU (Tested on Fermi GTX480, Tesla C1060, Tesla M2070). Operating system: Linux with CUDA version 4.0 or later. Should also run on MacOS, Windows, or UNIX. Has the code been vectorized or parallelized?: Yes. Parallelized using MPI directives. RAM: 512 MB˜ 732 MB (main memory on host CPU, depending on the data type of random numbers.) / 512 MB (GPU global memory) Classification: 4.13, 6.5. Nature of problem: Many computational science applications are able to consume large numbers of random numbers. For example, Monte Carlo simulations are able to consume limitless random numbers for the computation as long as resources for the computing are supported. Moreover, parallel computational science applications require independent streams of random numbers to attain statistically significant results. The SPRNG library provides this capability, but at a significant computational cost. The GASPRNG library presented here accelerates the generators of independent streams of random numbers using graphical processing units (GPUs). Solution method: Multiple copies of random number generators in GPUs allow a computational science application to consume large numbers of random numbers from independent, parallel streams. GASPRNG is a random number generators library to allow a computational science application to employ multiple copies of random number generators to boost performance. Users can interface GASPRNG with software code executing on microprocessors and/or GPUs. Running time: The tests provided take a few minutes to run.
Multistage Analysis of Cyber Threats for Quick Mission Impact Assessment (CyberIA)

DTIC Science & Technology

2015-09-01

Corporation. NVIDIA ® is a registered trademark of the NVIDIA Corporation. CUDA™ is a trademark of the NVIDIA Corporation. Released by J. Lee...for developing and integrating different high-performance C/C++ algorithms. This capability is significant because NVIDIA ® CUDA™ architecture
Real-time volumetric image reconstruction and 3D tumor localization based on a single x-ray projection image for lung cancer radiotherapy.

PubMed

Li, Ruijiang; Jia, Xun; Lewis, John H; Gu, Xuejun; Folkerts, Michael; Men, Chunhua; Jiang, Steve B

2010-06-01

To develop an algorithm for real-time volumetric image reconstruction and 3D tumor localization based on a single x-ray projection image for lung cancer radiotherapy. Given a set of volumetric images of a patient at N breathing phases as the training data, deformable image registration was performed between a reference phase and the other N-1 phases, resulting in N-1 deformation vector fields (DVFs). These DVFs can be represented efficiently by a few eigenvectors and coefficients obtained from principal component analysis (PCA). By varying the PCA coefficients, new DVFs can be generated, which, when applied on the reference image, lead to new volumetric images. A volumetric image can then be reconstructed from a single projection image by optimizing the PCA coefficients such that its computed projection matches the measured one. The 3D location of the tumor can be derived by applying the inverted DVF on its position in the reference image. The algorithm was implemented on graphics processing units (GPUs) to achieve real-time efficiency. The training data were generated using a realistic and dynamic mathematical phantom with ten breathing phases. The testing data were 360 cone beam projections corresponding to one gantry rotation, simulated using the same phantom with a 50% increase in breathing amplitude. The average relative image intensity error of the reconstructed volumetric images is 6.9% +/- 2.4%. The average 3D tumor localization error is 0.8 +/- 0.5 mm. On an NVIDIA Tesla C1060 GPU card, the average computation time for reconstructing a volumetric image from each projection is 0.24 s (range: 0.17 and 0.35 s). The authors have shown the feasibility of reconstructing volumetric images and localizing tumor positions in 3D in near real-time from a single x-ray image.
Particle-in-cell simulations with charge-conserving current deposition on graphic processing units

NASA Astrophysics Data System (ADS)

Ren, Chuang; Kong, Xianglong; Huang, Michael; Decyk, Viktor; Mori, Warren

2011-10-01

Recently using CUDA, we have developed an electromagnetic Particle-in-Cell (PIC) code with charge-conserving current deposition for Nvidia graphic processing units (GPU's) (Kong et al., Journal of Computational Physics 230, 1676 (2011). On a Tesla M2050 (Fermi) card, the GPU PIC code can achieve a one-particle-step process time of 1.2 - 3.2 ns in 2D and 2.3 - 7.2 ns in 3D, depending on plasma temperatures. In this talk we will discuss novel algorithms for GPU-PIC including charge-conserving current deposition scheme with few branching and parallel particle sorting. These algorithms have made efficient use of the GPU shared memory. We will also discuss how to replace the computation kernels of existing parallel CPU codes while keeping their parallel structures. This work was supported by U.S. Department of Energy under Grant Nos. DE-FG02-06ER54879 and DE-FC02-04ER54789 and by NSF under Grant Nos. PHY-0903797 and CCF-0747324.
Building a Terabyte Memory Bandwidth Compute Node with Four Consumer Electronics GPUs

NASA Astrophysics Data System (ADS)

Omlin, Samuel; Räss, Ludovic; Podladchikov, Yuri

2014-05-01

GPUs released for consumer electronics are generally built with the same chip architectures as the GPUs released for professional usage. With regards to scientific computing, there are no obvious important differences in functionality or performance between the two types of releases, yet the price can differ up to one order of magnitude. For example, the consumer electronics release of the most recent NVIDIA Kepler architecture (GK110), named GeForce GTX TITAN, performed equally well in conducted memory bandwidth tests as the professional release, named Tesla K20; the consumer electronics release costs about one third of the professional release. We explain how to design and assemble a well adjusted computer with four high-end consumer electronics GPUs (GeForce GTX TITAN) combining more than 1 terabyte/s memory bandwidth. We compare the system's performance and precision with the one of hardware released for professional usage. The system can be used as a powerful workstation for scientific computing or as a compute node in a home-built GPU cluster.
Accelerating a three-dimensional eco-hydrological cellular automaton on GPGPU with OpenCL

NASA Astrophysics Data System (ADS)

Senatore, Alfonso; D'Ambrosio, Donato; De Rango, Alessio; Rongo, Rocco; Spataro, William; Straface, Salvatore; Mendicino, Giuseppe

2016-10-01

This work presents an effective implementation of a numerical model for complete eco-hydrological Cellular Automata modeling on Graphical Processing Units (GPU) with OpenCL (Open Computing Language) for heterogeneous computation (i.e., on CPUs and/or GPUs). Different types of parallel implementations were carried out (e.g., use of fast local memory, loop unrolling, etc), showing increasing performance improvements in terms of speedup, adopting also some original optimizations strategies. Moreover, numerical analysis of results (i.e., comparison of CPU and GPU outcomes in terms of rounding errors) have proven to be satisfactory. Experiments were carried out on a workstation with two CPUs (Intel Xeon E5440 at 2.83GHz), one GPU AMD R9 280X and one GPU nVIDIA Tesla K20c. Results have been extremely positive, but further testing should be performed to assess the functionality of the adopted strategies on other complete models and their ability to fruitfully exploit parallel systems resources.
Accelerating image reconstruction in dual-head PET system by GPU and symmetry properties.

PubMed

Chou, Cheng-Ying; Dong, Yun; Hung, Yukai; Kao, Yu-Jiun; Wang, Weichung; Kao, Chien-Min; Chen, Chin-Tu

2012-01-01

Positron emission tomography (PET) is an important imaging modality in both clinical usage and research studies. We have developed a compact high-sensitivity PET system that consisted of two large-area panel PET detector heads, which produce more than 224 million lines of response and thus request dramatic computational demands. In this work, we employed a state-of-the-art graphics processing unit (GPU), NVIDIA Tesla C2070, to yield an efficient reconstruction process. Our approaches ingeniously integrate the distinguished features of the symmetry properties of the imaging system and GPU architectures, including block/warp/thread assignments and effective memory usage, to accelerate the computations for ordered subset expectation maximization (OSEM) image reconstruction. The OSEM reconstruction algorithms were implemented employing both CPU-based and GPU-based codes, and their computational performance was quantitatively analyzed and compared. The results showed that the GPU-accelerated scheme can drastically reduce the reconstruction time and thus can largely expand the applicability of the dual-head PET system.
High-Performance 3D Compressive Sensing MRI Reconstruction Using Many-Core Architectures

PubMed Central

Kim, Daehyun; Trzasko, Joshua; Smelyanskiy, Mikhail; Haider, Clifton; Dubey, Pradeep; Manduca, Armando

2011-01-01

Compressive sensing (CS) describes how sparse signals can be accurately reconstructed from many fewer samples than required by the Nyquist criterion. Since MRI scan duration is proportional to the number of acquired samples, CS has been gaining significant attention in MRI. However, the computationally intensive nature of CS reconstructions has precluded their use in routine clinical practice. In this work, we investigate how different throughput-oriented architectures can benefit one CS algorithm and what levels of acceleration are feasible on different modern platforms. We demonstrate that a CUDA-based code running on an NVIDIA Tesla C2050 GPU can reconstruct a 256 × 160 × 80 volume from an 8-channel acquisition in 19 seconds, which is in itself a significant improvement over the state of the art. We then show that Intel's Knights Ferry can perform the same 3D MRI reconstruction in only 12 seconds, bringing CS methods even closer to clinical viability. PMID:21922017

Efficient Acceleration of the Pair-HMMs Forward Algorithm for GATK HaplotypeCaller on Graphics Processing Units.

PubMed

Ren, Shanshan; Bertels, Koen; Al-Ars, Zaid

2018-01-01

GATK HaplotypeCaller (HC) is a popular variant caller, which is widely used to identify variants in complex genomes. However, due to its high variants detection accuracy, it suffers from long execution time. In GATK HC, the pair-HMMs forward algorithm accounts for a large percentage of the total execution time. This article proposes to accelerate the pair-HMMs forward algorithm on graphics processing units (GPUs) to improve the performance of GATK HC. This article presents several GPU-based implementations of the pair-HMMs forward algorithm. It also analyzes the performance bottlenecks of the implementations on an NVIDIA Tesla K40 card with various data sets. Based on these results and the characteristics of GATK HC, we are able to identify the GPU-based implementations with the highest performance for the various analyzed data sets. Experimental results show that the GPU-based implementations of the pair-HMMs forward algorithm achieve a speedup of up to 5.47× over existing GPU-based implementations.
GPU-based fast cone beam CT reconstruction from undersampled and noisy projection data via total variation.

PubMed

Jia, Xun; Lou, Yifei; Li, Ruijiang; Song, William Y; Jiang, Steve B

2010-04-01

Cone-beam CT (CBCT) plays an important role in image guided radiation therapy (IGRT). However, the large radiation dose from serial CBCT scans in most IGRT procedures raises a clinical concern, especially for pediatric patients who are essentially excluded from receiving IGRT for this reason. The goal of this work is to develop a fast GPU-based algorithm to reconstruct CBCT from undersampled and noisy projection data so as to lower the imaging dose. The CBCT is reconstructed by minimizing an energy functional consisting of a data fidelity term and a total variation regularization term. The authors developed a GPU-friendly version of the forward-backward splitting algorithm to solve this model. A multigrid technique is also employed. It is found that 20-40 x-ray projections are sufficient to reconstruct images with satisfactory quality for IGRT. The reconstruction time ranges from 77 to 130 s on an NVIDIA Tesla C1060 (NVIDIA, Santa Clara, CA) GPU card, depending on the number of projections used, which is estimated about 100 times faster than similar iterative reconstruction approaches. Moreover, phantom studies indicate that the algorithm enables the CBCT to be reconstructed under a scanning protocol with as low as 0.1 mA s/projection. Comparing with currently widely used full-fan head and neck scanning protocol of approximately 360 projections with 0.4 mA s/projection, it is estimated that an overall 36-72 times dose reduction has been achieved in our fast CBCT reconstruction algorithm. This work indicates that the developed GPU-based CBCT reconstruction algorithm is capable of lowering imaging dose considerably. The high computation efficiency in this algorithm makes the iterative CBCT reconstruction approach applicable in real clinical environments.
GPU-based relative fuzzy connectedness image segmentation

PubMed Central

Zhuge, Ying; Ciesielski, Krzysztof C.; Udupa, Jayaram K.; Miller, Robert W.

2013-01-01

Purpose: Recently, clinical radiological research and practice are becoming increasingly quantitative. Further, images continue to increase in size and volume. For quantitative radiology to become practical, it is crucial that image segmentation algorithms and their implementations are rapid and yield practical run time on very large data sets. The purpose of this paper is to present a parallel version of an algorithm that belongs to the family of fuzzy connectedness (FC) algorithms, to achieve an interactive speed for segmenting large medical image data sets. Methods: The most common FC segmentations, optimizing an ℓ∞-based energy, are known as relative fuzzy connectedness (RFC) and iterative relative fuzzy connectedness (IRFC). Both RFC and IRFC objects (of which IRFC contains RFC) can be found via linear time algorithms, linear with respect to the image size. The new algorithm, P-ORFC (for parallel optimal RFC), which is implemented by using NVIDIA’s Compute Unified Device Architecture (CUDA) platform, considerably improves the computational speed of the above mentioned CPU based IRFC algorithm. Results: Experiments based on four data sets of small, medium, large, and super data size, achieved speedup factors of 32.8×, 22.9×, 20.9×, and 17.5×, correspondingly, on the NVIDIA Tesla C1060 platform. Although the output of P-ORFC need not precisely match that of IRFC output, it is very close to it and, as the authors prove, always lies between the RFC and IRFC objects. Conclusions: A parallel version of a top-of-the-line algorithm in the family of FC has been developed on the NVIDIA GPUs. An interactive speed of segmentation has been achieved, even for the largest medical image data set. Such GPU implementations may play a crucial role in automatic anatomy recognition in clinical radiology. PMID:23298094
Graphics processing unit accelerated phase field dislocation dynamics: Application to bi-metallic interfaces

DOE PAGES

Eghtesad, Adnan; Germaschewski, Kai; Beyerlein, Irene J.; ...

2017-10-14

We present the first high-performance computing implementation of the meso-scale phase field dislocation dynamics (PFDD) model on a graphics processing unit (GPU)-based platform. The implementation takes advantage of the portable OpenACC standard directive pragmas along with Nvidia's compute unified device architecture (CUDA) fast Fourier transform (FFT) library called CUFFT to execute the FFT computations within the PFDD formulation on the same GPU platform. The overall implementation is termed ACCPFDD-CUFFT. The package is entirely performance portable due to the use of OPENACC-CUDA inter-operability, in which calls to CUDA functions are replaced with the OPENACC data regions for a host central processingmore » unit (CPU) and device (GPU). A comprehensive benchmark study has been conducted, which compares a number of FFT routines, the Numerical Recipes FFT (FOURN), Fastest Fourier Transform in the West (FFTW), and the CUFFT. The last one exploits the advantages of the GPU hardware for FFT calculations. The novel ACCPFDD-CUFFT implementation is verified using the analytical solutions for the stress field around an infinite edge dislocation and subsequently applied to simulate the interaction and motion of dislocations through a bi-phase copper-nickel (Cu–Ni) interface. It is demonstrated that the ACCPFDD-CUFFT implementation on a single TESLA K80 GPU offers a 27.6X speedup relative to the serial version and a 5X speedup relative to the 22-multicore Intel Xeon CPU E5-2699 v4 @ 2.20 GHz version of the code.« less
Graphics processing unit accelerated phase field dislocation dynamics: Application to bi-metallic interfaces

DOE Office of Scientific and Technical Information (OSTI.GOV)

Eghtesad, Adnan; Germaschewski, Kai; Beyerlein, Irene J.

We present the first high-performance computing implementation of the meso-scale phase field dislocation dynamics (PFDD) model on a graphics processing unit (GPU)-based platform. The implementation takes advantage of the portable OpenACC standard directive pragmas along with Nvidia's compute unified device architecture (CUDA) fast Fourier transform (FFT) library called CUFFT to execute the FFT computations within the PFDD formulation on the same GPU platform. The overall implementation is termed ACCPFDD-CUFFT. The package is entirely performance portable due to the use of OPENACC-CUDA inter-operability, in which calls to CUDA functions are replaced with the OPENACC data regions for a host central processingmore » unit (CPU) and device (GPU). A comprehensive benchmark study has been conducted, which compares a number of FFT routines, the Numerical Recipes FFT (FOURN), Fastest Fourier Transform in the West (FFTW), and the CUFFT. The last one exploits the advantages of the GPU hardware for FFT calculations. The novel ACCPFDD-CUFFT implementation is verified using the analytical solutions for the stress field around an infinite edge dislocation and subsequently applied to simulate the interaction and motion of dislocations through a bi-phase copper-nickel (Cu–Ni) interface. It is demonstrated that the ACCPFDD-CUFFT implementation on a single TESLA K80 GPU offers a 27.6X speedup relative to the serial version and a 5X speedup relative to the 22-multicore Intel Xeon CPU E5-2699 v4 @ 2.20 GHz version of the code.« less
Musrfit-Real Time Parameter Fitting Using GPUs

NASA Astrophysics Data System (ADS)

Locans, Uldis; Suter, Andreas

High transverse field μSR (HTF-μSR) experiments typically lead to a rather large data sets, since it is necessary to follow the high frequencies present in the positron decay histograms. The analysis of these data sets can be very time consuming, usually due to the limited computational power of the hardware. To overcome the limited computing resources rotating reference frame transformation (RRF) is often used to reduce the data sets that need to be handled. This comes at a price typically the μSR community is not aware of: (i) due to the RRF transformation the fitting parameter estimate is of poorer precision, i.e., more extended expensive beamtime is needed. (ii) RRF introduces systematic errors which hampers the statistical interpretation of χ2 or the maximum log-likelihood. We will briefly discuss these issues in a non-exhaustive practical way. The only and single purpose of the RRF transformation is the sluggish computer power. Therefore during this work GPU (Graphical Processing Units) based fitting was developed which allows to perform real-time full data analysis without RRF. GPUs have become increasingly popular in scientific computing in recent years. Due to their highly parallel architecture they provide the opportunity to accelerate many applications with considerably less costs than upgrading the CPU computational power. With the emergence of frameworks such as CUDA and OpenCL these devices have become more easily programmable. During this work GPU support was added to Musrfit- a data analysis framework for μSR experiments. The new fitting algorithm uses CUDA or OpenCL to offload the most time consuming parts of the calculations to Nvidia or AMD GPUs. Using the current CPU implementation in Musrfit parameter fitting can take hours for certain data sets while the GPU version can allow to perform real-time data analysis on the same data sets. This work describes the challenges that arise in adding the GPU support to t as well as results obtained using the GPU version. The speedups using the GPU were measured comparing to the CPU implementation. Two different GPUs were used for the comparison — high end Nvidia Tesla K40c GPU designed for HPC applications and AMD Radeon R9 390× GPU designed for gaming industry.
Accelerated Adaptive MGS Phase Retrieval

NASA Technical Reports Server (NTRS)

Lam, Raymond K.; Ohara, Catherine M.; Green, Joseph J.; Bikkannavar, Siddarayappa A.; Basinger, Scott A.; Redding, David C.; Shi, Fang

2011-01-01

The Modified Gerchberg-Saxton (MGS) algorithm is an image-based wavefront-sensing method that can turn any science instrument focal plane into a wavefront sensor. MGS characterizes optical systems by estimating the wavefront errors in the exit pupil using only intensity images of a star or other point source of light. This innovative implementation of MGS significantly accelerates the MGS phase retrieval algorithm by using stream-processing hardware on conventional graphics cards. Stream processing is a relatively new, yet powerful, paradigm to allow parallel processing of certain applications that apply single instructions to multiple data (SIMD). These stream processors are designed specifically to support large-scale parallel computing on a single graphics chip. Computationally intensive algorithms, such as the Fast Fourier Transform (FFT), are particularly well suited for this computing environment. This high-speed version of MGS exploits commercially available hardware to accomplish the same objective in a fraction of the original time. The exploit involves performing matrix calculations in nVidia graphic cards. The graphical processor unit (GPU) is hardware that is specialized for computationally intensive, highly parallel computation. From the software perspective, a parallel programming model is used, called CUDA, to transparently scale multicore parallelism in hardware. This technology gives computationally intensive applications access to the processing power of the nVidia GPUs through a C/C++ programming interface. The AAMGS (Accelerated Adaptive MGS) software takes advantage of these advanced technologies, to accelerate the optical phase error characterization. With a single PC that contains four nVidia GTX-280 graphic cards, the new implementation can process four images simultaneously to produce a JWST (James Webb Space Telescope) wavefront measurement 60 times faster than the previous code.
Solving lattice QCD systems of equations using mixed precision solvers on GPUs

NASA Astrophysics Data System (ADS)

Clark, M. A.; Babich, R.; Barros, K.; Brower, R. C.; Rebbi, C.

2010-09-01

Modern graphics hardware is designed for highly parallel numerical tasks and promises significant cost and performance benefits for many scientific applications. One such application is lattice quantum chromodynamics (lattice QCD), where the main computational challenge is to efficiently solve the discretized Dirac equation in the presence of an SU(3) gauge field. Using NVIDIA's CUDA platform we have implemented a Wilson-Dirac sparse matrix-vector product that performs at up to 40, 135 and 212 Gflops for double, single and half precision respectively on NVIDIA's GeForce GTX 280 GPU. We have developed a new mixed precision approach for Krylov solvers using reliable updates which allows for full double precision accuracy while using only single or half precision arithmetic for the bulk of the computation. The resulting BiCGstab and CG solvers run in excess of 100 Gflops and, in terms of iterations until convergence, perform better than the usual defect-correction approach for mixed precision.
Supporting Real-Time Computer Vision Workloads using OpenVX on Multicore+GPU Platforms

DTIC Science & Technology

2015-05-01

a registered trademark of the NVIDIA Corporation . Report Documentation Page Form ApprovedOMB No. 0704-0188 Public reporting burden for the collection...from NVIDIA , we adapted an alpha- version of an NVIDIA OpenVX implementation called VisionWorks® [3] to run atop PGMRT (a graph-based mid- dleware...time support to an OpenVX implementation by NVIDIA called VisionWorks. Our modifications were applied to an alpha-version of VisionWorks. This alpha
SU (2) lattice gauge theory simulations on Fermi GPUs

NASA Astrophysics Data System (ADS)

Cardoso, Nuno; Bicudo, Pedro

2011-05-01

In this work we explore the performance of CUDA in quenched lattice SU (2) simulations. CUDA, NVIDIA Compute Unified Device Architecture, is a hardware and software architecture developed by NVIDIA for computing on the GPU. We present an analysis and performance comparison between the GPU and CPU in single and double precision. Analyses with multiple GPUs and two different architectures (G200 and Fermi architectures) are also presented. In order to obtain a high performance, the code must be optimized for the GPU architecture, i.e., an implementation that exploits the memory hierarchy of the CUDA programming model. We produce codes for the Monte Carlo generation of SU (2) lattice gauge configurations, for the mean plaquette, for the Polyakov Loop at finite T and for the Wilson loop. We also present results for the potential using many configurations (50,000) without smearing and almost 2000 configurations with APE smearing. With two Fermi GPUs we have achieved an excellent performance of 200× the speed over one CPU, in single precision, around 110 Gflops/s. We also find that, using the Fermi architecture, double precision computations for the static quark-antiquark potential are not much slower (less than 2× slower) than single precision computations.
Non-Enhanced MR Imaging of Cerebral Arteriovenous Malformations at 7 Tesla.

PubMed

Wrede, Karsten H; Dammann, Philipp; Johst, Sören; Mönninghoff, Christoph; Schlamann, Marc; Maderwald, Stefan; Sandalcioglu, I Erol; Ladd, Mark E; Forsting, Michael; Sure, Ulrich; Umutlu, Lale

2016-03-01

To evaluate prospectively 7 Tesla time-of-flight (TOF) magnetic resonance angiography (MRA) and 7 Tesla non-contrast-enhanced magnetization-prepared rapid acquisition gradient-echo (MPRAGE) for delineation of intracerebral arteriovenous malformations (AVMs) in comparison to 1.5 Tesla TOF MRA and digital subtraction angiography (DSA). Twenty patients with single or multifocal AVMs were enrolled in this trial. The study protocol comprised 1.5 and 7 Tesla TOF MRA and 7 Tesla non-contrast-enhanced MPRAGE sequences. All patients underwent an additional four-vessel 3D DSA. Image analysis of the following five AVM features was performed individually by two radiologists on a five-point scale: nidus, feeder(s), draining vein(s), relationship to adjacent vessels, and overall image quality and presence of artefacts. A total of 21 intracerebral AVMs were detected. Both sequences at 7 Tesla were rated superior over 1.5 Tesla TOF MRA in the assessment of all considered AVM features. Image quality at 7 Tesla was comparable with DSA considering both sequences. Inter-observer accordance was good to excellent for the majority of ratings. This study demonstrates excellent image quality for depiction of intracerebral AVMs using non-contrast-enhanced 7 Tesla MRA, comparable with DSA. Assessment of untreated AVMs is a promising clinical application of ultra-high-field MRA. • Non-contrast-enhanced 7 Tesla MRA demonstrates excellent image quality for intracerebral AVM depiction. • Image quality at 7 Tesla was comparable with DSA considering both sequences. • Assessment of intracerebral AVMs is a promising clinical application of ultra-high-field MRA.
GENIE: a software package for gene-gene interaction analysis in genetic association studies using multiple GPU or CPU cores.

PubMed

Chikkagoudar, Satish; Wang, Kai; Li, Mingyao

2011-05-26

Gene-gene interaction in genetic association studies is computationally intensive when a large number of SNPs are involved. Most of the latest Central Processing Units (CPUs) have multiple cores, whereas Graphics Processing Units (GPUs) also have hundreds of cores and have been recently used to implement faster scientific software. However, currently there are no genetic analysis software packages that allow users to fully utilize the computing power of these multi-core devices for genetic interaction analysis for binary traits. Here we present a novel software package GENIE, which utilizes the power of multiple GPU or CPU processor cores to parallelize the interaction analysis. GENIE reads an entire genetic association study dataset into memory and partitions the dataset into fragments with non-overlapping sets of SNPs. For each fragment, GENIE analyzes: 1) the interaction of SNPs within it in parallel, and 2) the interaction between the SNPs of the current fragment and other fragments in parallel. We tested GENIE on a large-scale candidate gene study on high-density lipoprotein cholesterol. Using an NVIDIA Tesla C1060 graphics card, the GPU mode of GENIE achieves a speedup of 27 times over its single-core CPU mode run. GENIE is open-source, economical, user-friendly, and scalable. Since the computing power and memory capacity of graphics cards are increasing rapidly while their cost is going down, we anticipate that GENIE will achieve greater speedups with faster GPU cards. Documentation, source code, and precompiled binaries can be downloaded from http://www.cceb.upenn.edu/~mli/software/GENIE/.
A real-time coherent dedispersion pipeline for the giant metrewave radio telescope

NASA Astrophysics Data System (ADS)

De, Kishalay; Gupta, Yashwant

2016-02-01

A fully real-time coherent dedispersion system has been developed for the pulsar back-end at the Giant Metrewave Radio Telescope (GMRT). The dedispersion pipeline uses the single phased array voltage beam produced by the existing GMRT software back-end (GSB) to produce coherently dedispersed intensity output in real time, for the currently operational bandwidths of 16 MHz and 32 MHz. Provision has also been made to coherently dedisperse voltage beam data from observations recorded on disk. We discuss the design and implementation of the real-time coherent dedispersion system, describing the steps carried out to optimise the performance of the pipeline. Presently functioning on an Intel Xeon X5550 CPU equipped with a NVIDIA Tesla C2075 GPU, the pipeline allows dispersion free, high time resolution data to be obtained in real-time. We illustrate the significant improvements over the existing incoherent dedispersion system at the GMRT, and present some preliminary results obtained from studies of pulsars using this system, demonstrating its potential as a useful tool for low frequency pulsar observations. We describe the salient features of our implementation, comparing it with other recently developed real-time coherent dedispersion systems. This implementation of a real-time coherent dedispersion pipeline for a large, low frequency array instrument like the GMRT, will enable long-term observing programs using coherent dedispersion to be carried out routinely at the observatory. We also outline the possible improvements for such a pipeline, including prospects for the upgraded GMRT which will have bandwidths about ten times larger than at present.
GENIE: a software package for gene-gene interaction analysis in genetic association studies using multiple GPU or CPU cores

PubMed Central

2011-01-01

Background Gene-gene interaction in genetic association studies is computationally intensive when a large number of SNPs are involved. Most of the latest Central Processing Units (CPUs) have multiple cores, whereas Graphics Processing Units (GPUs) also have hundreds of cores and have been recently used to implement faster scientific software. However, currently there are no genetic analysis software packages that allow users to fully utilize the computing power of these multi-core devices for genetic interaction analysis for binary traits. Findings Here we present a novel software package GENIE, which utilizes the power of multiple GPU or CPU processor cores to parallelize the interaction analysis. GENIE reads an entire genetic association study dataset into memory and partitions the dataset into fragments with non-overlapping sets of SNPs. For each fragment, GENIE analyzes: 1) the interaction of SNPs within it in parallel, and 2) the interaction between the SNPs of the current fragment and other fragments in parallel. We tested GENIE on a large-scale candidate gene study on high-density lipoprotein cholesterol. Using an NVIDIA Tesla C1060 graphics card, the GPU mode of GENIE achieves a speedup of 27 times over its single-core CPU mode run. Conclusions GENIE is open-source, economical, user-friendly, and scalable. Since the computing power and memory capacity of graphics cards are increasing rapidly while their cost is going down, we anticipate that GENIE will achieve greater speedups with faster GPU cards. Documentation, source code, and precompiled binaries can be downloaded from http://www.cceb.upenn.edu/~mli/software/GENIE/. PMID:21615923
Parallel Implementation of MAFFT on CUDA-Enabled Graphics Hardware.

PubMed

Zhu, Xiangyuan; Li, Kenli; Salah, Ahmad; Shi, Lin; Li, Keqin

2015-01-01

Multiple sequence alignment (MSA) constitutes an extremely powerful tool for many biological applications including phylogenetic tree estimation, secondary structure prediction, and critical residue identification. However, aligning large biological sequences with popular tools such as MAFFT requires long runtimes on sequential architectures. Due to the ever increasing sizes of sequence databases, there is increasing demand to accelerate this task. In this paper, we demonstrate how graphic processing units (GPUs), powered by the compute unified device architecture (CUDA), can be used as an efficient computational platform to accelerate the MAFFT algorithm. To fully exploit the GPU's capabilities for accelerating MAFFT, we have optimized the sequence data organization to eliminate the bandwidth bottleneck of memory access, designed a memory allocation and reuse strategy to make full use of limited memory of GPUs, proposed a new modified-run-length encoding (MRLE) scheme to reduce memory consumption, and used high-performance shared memory to speed up I/O operations. Our implementation tested in three NVIDIA GPUs achieves speedup up to 11.28 on a Tesla K20m GPU compared to the sequential MAFFT 7.015.
Interactions between Nanoparticles and Polymer Brushes: Molecular Dynamics Simulations and Self-consistent Field Theory Calculations

NASA Astrophysics Data System (ADS)

Cheng, Shengfeng; Wen, Chengyuan; Egorov, Sergei

2015-03-01

Molecular dynamics simulations and self-consistent field theory calculations are employed to study the interactions between a nanoparticle and a polymer brush at various densities of chains grafted to a plane. Simulations with both implicit and explicit solvent are performed. In either case the nanoparticle is loaded to the brush at a constant velocity. Then a series of simulations are performed to compute the force exerted on the nanoparticle that is fixed at various distances from the grafting plane. The potential of mean force is calculated and compared to the prediction based on a self-consistent field theory. Our simulations show that the explicit solvent leads to effects that are not captured in simulations with implicit solvent, indicating the importance of including explicit solvent in molecular simulations of such systems. Our results also demonstrate an interesting correlation between the force on the nanoparticle and the density profile of the brush. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Tesla K40 GPU used for this research.
Global magnetohydrodynamic simulations on multiple GPUs

NASA Astrophysics Data System (ADS)

Wong, Un-Hong; Wong, Hon-Cheng; Ma, Yonghui

2014-01-01

Global magnetohydrodynamic (MHD) models play the major role in investigating the solar wind-magnetosphere interaction. However, the huge computation requirement in global MHD simulations is also the main problem that needs to be solved. With the recent development of modern graphics processing units (GPUs) and the Compute Unified Device Architecture (CUDA), it is possible to perform global MHD simulations in a more efficient manner. In this paper, we present a global magnetohydrodynamic (MHD) simulator on multiple GPUs using CUDA 4.0 with GPUDirect 2.0. Our implementation is based on the modified leapfrog scheme, which is a combination of the leapfrog scheme and the two-step Lax-Wendroff scheme. GPUDirect 2.0 is used in our implementation to drive multiple GPUs. All data transferring and kernel processing are managed with CUDA 4.0 API instead of using MPI or OpenMP. Performance measurements are made on a multi-GPU system with eight NVIDIA Tesla M2050 (Fermi architecture) graphics cards. These measurements show that our multi-GPU implementation achieves a peak performance of 97.36 GFLOPS in double precision.
Implementation of EAM and FS potentials in HOOMD-blue

NASA Astrophysics Data System (ADS)

Yang, Lin; Zhang, Feng; Travesset, Alex; Wang, Caizhuang; Ho, Kaiming

HOOMD-blue is a general-purpose software to perform classical molecular dynamics simulations entirely on GPUs. We provide full support for EAM and FS type potentials in HOOMD-blue, and report accuracy and efficiency benchmarks, including comparisons with the LAMMPS GPU package. Two problems were selected to test the accuracy: the determination of the glass transition temperature of Cu64.5Zr35.5 alloy using an FS potential and the calculation of pair distribution functions of Ni3Al using an EAM potential. In both cases, the results using HOOMD-blue are indistinguishable from those obtained by the GPU package in LAMMPS within statistical uncertainties. As tests for time efficiency, we benchmark time-steps per second using LAMMPS GPU and HOOMD-blue on one NVIDIA Tesla GPU. Compared to our typical LAMMPS simulations on one CPU cluster node which has 16 CPUs, LAMMPS GPU can be 3-3.5 times faster, and HOOMD-blue can be 4-5.5 times faster. We acknowledge the support from Laboratory Directed Research and Development (LDRD) of Ames Laboratory.
A heterogeneous computing accelerated SCE-UA global optimization method using OpenMP, OpenCL, CUDA, and OpenACC.

PubMed

Kan, Guangyuan; He, Xiaoyan; Ding, Liuqian; Li, Jiren; Liang, Ke; Hong, Yang

2017-10-01

The shuffled complex evolution optimization developed at the University of Arizona (SCE-UA) has been successfully applied in various kinds of scientific and engineering optimization applications, such as hydrological model parameter calibration, for many years. The algorithm possesses good global optimality, convergence stability and robustness. However, benchmark and real-world applications reveal the poor computational efficiency of the SCE-UA. This research aims at the parallelization and acceleration of the SCE-UA method based on powerful heterogeneous computing technology. The parallel SCE-UA is implemented on Intel Xeon multi-core CPU (by using OpenMP and OpenCL) and NVIDIA Tesla many-core GPU (by using OpenCL, CUDA, and OpenACC). The serial and parallel SCE-UA were tested based on the Griewank benchmark function. Comparison results indicate the parallel SCE-UA significantly improves computational efficiency compared to the original serial version. The OpenCL implementation obtains the best overall acceleration results however, with the most complex source code. The parallel SCE-UA has bright prospects to be applied in real-world applications.
Flux free single crystal growth and characterization of FeTe1-xSx (x=0.00 and 0.10) crystals

NASA Astrophysics Data System (ADS)

Maheshwari, P. K.; Awana, V. P. S.

2018-05-01

We report synthesis of S doped FeTe1-xSx (x = 0.00 and 0.10) single crystals using flux free method via solid state reaction. Single crystal XRD patterns of FeTe1-xSx (x = 0.00 and 0.10) confirm the single crystalline property, as the crystals are grown in (00l) plane only. Powder XRD result of FeTe1-xSx (x = 0.00 and 0.10) crystals show that crystalline in tetragonal structure having P4/nmm space group. Rietveld refinement results show that both a and c lattice parameters decreases with S doping of 10% at Te site in FeTe1-xSx. Detailed scanning electron microscopy (SEM) image of FeTe0.90S0.10 shows that the growth of crystal is in slab-like morphology. Electrical resistivity measurement results onset confirm the superconductivity in S doped 10% sample at Te site and superconducting transition Tconset occurs at 9.5K and Tcoffset(ρ=0) occurs at 6.5K. ρ-T measurement has been performed under various magnetic field up to 12 Tesla down to 2K. Upper critical field Hc2(0), for x=0.10, which comes around 70Tesla, 60Tesla and 45Tesla of normal resistivity criterion ρn = 90%, 50% and 10% criterion respectively.

Implementation of fast macromolecular proton fraction mapping on 1.5 and 3 Tesla clinical MRI scanners: preliminary experience

NASA Astrophysics Data System (ADS)

Yarnykh, V.; Korostyshevskaya, A.

2017-08-01

Macromolecular proton fraction (MPF) is a biophysical parameter describing the amount of macromolecular protons involved into magnetization exchange with water protons in tissues. MPF represents a significant interest as a magnetic resonance imaging (MRI) biomarker of myelin for clinical applications. A recent fast MPF mapping method enabled clinical translation of MPF measurements due to time-efficient acquisition based on the single-point constrained fit algorithm. However, previous MPF mapping applications utilized only 3 Tesla MRI scanners and modified pulse sequences, which are not commonly available. This study aimed to test the feasibility of MPF mapping implementation on a 1.5 Tesla clinical scanner using standard manufacturer’s sequences and compare the performance of this method between 1.5 and 3 Tesla scanners. MPF mapping was implemented on 1.5 and 3 Tesla MRI units of one manufacturer with either optimized custom-written or standard product pulse sequences. Whole-brain three-dimensional MPF maps obtained from a single volunteer were compared between field strengths and implementation options. MPF maps demonstrated similar quality at both field strengths. MPF values in segmented brain tissues and specific anatomic regions appeared in close agreement. This experiment demonstrates the feasibility of fast MPF mapping using standard sequences on 1.5 T and 3 T clinical scanners.
SU (2) lattice gauge theory simulations on Fermi GPUs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cardoso, Nuno, E-mail: nunocardoso@cftp.ist.utl.p; Bicudo, Pedro, E-mail: bicudo@ist.utl.p

2011-05-10

In this work we explore the performance of CUDA in quenched lattice SU (2) simulations. CUDA, NVIDIA Compute Unified Device Architecture, is a hardware and software architecture developed by NVIDIA for computing on the GPU. We present an analysis and performance comparison between the GPU and CPU in single and double precision. Analyses with multiple GPUs and two different architectures (G200 and Fermi architectures) are also presented. In order to obtain a high performance, the code must be optimized for the GPU architecture, i.e., an implementation that exploits the memory hierarchy of the CUDA programming model. We produce codes formore » the Monte Carlo generation of SU (2) lattice gauge configurations, for the mean plaquette, for the Polyakov Loop at finite T and for the Wilson loop. We also present results for the potential using many configurations (50,000) without smearing and almost 2000 configurations with APE smearing. With two Fermi GPUs we have achieved an excellent performance of 200x the speed over one CPU, in single precision, around 110 Gflops/s. We also find that, using the Fermi architecture, double precision computations for the static quark-antiquark potential are not much slower (less than 2x slower) than single precision computations.« less
Non-enhanced MR imaging of cerebral aneurysms: 7 Tesla versus 1.5 Tesla.

PubMed

Wrede, Karsten H; Dammann, Philipp; Mönninghoff, Christoph; Johst, Sören; Maderwald, Stefan; Sandalcioglu, I Erol; Müller, Oliver; Özkan, Neriman; Ladd, Mark E; Forsting, Michael; Schlamann, Marc U; Sure, Ulrich; Umutlu, Lale

2014-01-01

To prospectively evaluate 7 Tesla time-of-flight (TOF) magnetic resonance angiography (MRA) in comparison to 1.5 Tesla TOF MRA and 7 Tesla non-contrast enhanced magnetization-prepared rapid acquisition gradient-echo (MPRAGE) for delineation of unruptured intracranial aneurysms (UIA). Sixteen neurosurgical patients (male n = 5, female n = 11) with single or multiple UIA were enrolled in this trial. All patients were accordingly examined at 7 Tesla and 1.5 Tesla MRI utilizing dedicated head coils. The following sequences were obtained: 7 Tesla TOF MRA, 1.5 Tesla TOF MRA and 7 Tesla non-contrast enhanced MPRAGE. Image analysis was performed by two radiologists with regard to delineation of aneurysm features (dome, neck, parent vessel), presence of artifacts, vessel-tissue-contrast and overall image quality. Interobserver accordance and intermethod comparisons were calculated by kappa coefficient and Lin's concordance correlation coefficient. A total of 20 intracranial aneurysms were detected in 16 patients, with two patients showing multiple aneurysms (n = 2, n = 4). Out of 20 intracranial aneurysms, 14 aneurysms were located in the anterior circulation and 6 aneurysms in the posterior circulation. 7 Tesla MPRAGE imaging was superior over 1.5 and 7 Tesla TOF MRA in the assessment of all considered aneurysm and image quality features (e.g. image quality: mean MPRAGE7T: 5.0; mean TOF7T: 4.3; mean TOF1.5T: 4.3). Ratings for 7 Tesla TOF MRA were equal or higher over 1.5 Tesla TOF MRA for all assessed features except for artifact delineation (mean TOF7T: 4.3; mean TOF1.5T 4.4). Interobserver accordance was good to excellent for most ratings. 7 Tesla MPRAGE imaging demonstrated its superiority in the detection and assessment of UIA as well as overall imaging features, offering excellent interobserver accordance and highest scores for all ratings. Hence, it may bear the potential to serve as a high-quality diagnostic tool for pretherapeutic assessment and follow-up of untreated UIA.
Non-Enhanced MR Imaging of Cerebral Aneurysms: 7 Tesla versus 1.5 Tesla

PubMed Central

Wrede, Karsten H.; Dammann, Philipp; Mönninghoff, Christoph; Johst, Sören; Maderwald, Stefan; Sandalcioglu, I. Erol; Müller, Oliver; Özkan, Neriman; Ladd, Mark E.; Forsting, Michael; Schlamann, Marc U.; Sure, Ulrich; Umutlu, Lale

2014-01-01

Purpose To prospectively evaluate 7 Tesla time-of-flight (TOF) magnetic resonance angiography (MRA) in comparison to 1.5 Tesla TOF MRA and 7 Tesla non-contrast enhanced magnetization-prepared rapid acquisition gradient-echo (MPRAGE) for delineation of unruptured intracranial aneurysms (UIA). Material and Methods Sixteen neurosurgical patients (male n = 5, female n = 11) with single or multiple UIA were enrolled in this trial. All patients were accordingly examined at 7 Tesla and 1.5 Tesla MRI utilizing dedicated head coils. The following sequences were obtained: 7 Tesla TOF MRA, 1.5 Tesla TOF MRA and 7 Tesla non-contrast enhanced MPRAGE. Image analysis was performed by two radiologists with regard to delineation of aneurysm features (dome, neck, parent vessel), presence of artifacts, vessel-tissue-contrast and overall image quality. Interobserver accordance and intermethod comparisons were calculated by kappa coefficient and Lin's concordance correlation coefficient. Results A total of 20 intracranial aneurysms were detected in 16 patients, with two patients showing multiple aneurysms (n = 2, n = 4). Out of 20 intracranial aneurysms, 14 aneurysms were located in the anterior circulation and 6 aneurysms in the posterior circulation. 7 Tesla MPRAGE imaging was superior over 1.5 and 7 Tesla TOF MRA in the assessment of all considered aneurysm and image quality features (e.g. image quality: mean MPRAGE7T: 5.0; mean TOF7T: 4.3; mean TOF1.5T: 4.3). Ratings for 7 Tesla TOF MRA were equal or higher over 1.5 Tesla TOF MRA for all assessed features except for artifact delineation (mean TOF7T: 4.3; mean TOF1.5T 4.4). Interobserver accordance was good to excellent for most ratings. Conclusion 7 Tesla MPRAGE imaging demonstrated its superiority in the detection and assessment of UIA as well as overall imaging features, offering excellent interobserver accordance and highest scores for all ratings. Hence, it may bear the potential to serve as a high-quality diagnostic tool for pretherapeutic assessment and follow-up of untreated UIA. PMID:24400100
High Resolution Imaging Testbed Utilizing Sodium Laser Guide Star Adaptive Optics: The Real Time Wavefront Reconstructor Computer

DTIC Science & Technology

2008-07-31

Unlike the Lyrtech, each DSP on a Bittware board offers 3 MB of on-chip memory and 3 GFLOPs of 32-bit peak processing power. Based on the performance...Each NVIDIA 8800 Ultra features 576 GFLOPS on 128 612-MHz single-precision floating-point SIMD processors, arranged in 16 clusters of eight. Each
76 FR 47639 - Tesla Motors, Inc.; Receipt of Petition for Temporary Exemption From the Electronic Stability...

Federal Register 2010, 2011, 2012, 2013, 2014

2011-08-05

... demonstrated that these systems reduce fatal single-vehicle crashes of passenger cars by 36 percent and fatal... the potential to prevent 70 percent of the fatal passenger car rollovers and 88 percent of the fatal..., the Roadster. Tesla began production of the all-electric Roadster in 2008 plans to conclude production...
Dynamic contrast-enhanced breast MRI at 7 Tesla utilizing a single-loop coil: a feasibility trial.

PubMed

Umutlu, Lale; Maderwald, Stefan; Kraff, Oliver; Theysohn, Jens M; Kuemmel, Sherko; Hauth, Elke A; Forsting, Michael; Antoch, Gerald; Ladd, Mark E; Quick, Harald H; Lauenstein, Thomas C

2010-08-01

The aim of this study was to assess the feasibility of dynamic contrast-enhanced ultra-high-field breast imaging at 7 Tesla. A total of 15 subjects, including 5 patients with histologically proven breast cancer, were examined on a 7 Tesla whole-body magnetic resonance imaging system using a unilateral linearly polarized single-loop coil. Subjects were placed in prone position on a biopsy support system, with the coil placed directly below the region of interest. The examination protocol included the following sequences: 1) T2-weighted turbo spin echo sequence; 2) six dynamic T1-weighted spoiled gradient-echo sequences; and 3) subtraction imaging. Contrast-enhanced T1-weighted imaging at 7 Tesla could be obtained at high spatial resolution with short acquisition times, providing good image accuracy and a conclusively good delineation of small anatomical and pathological structures. T2-weighted imaging could be obtained with high spatial resolution at adequate acquisition times. Because of coil limitations, four high-field magnetic resonance examinations showed decreased diagnostic value. This first scientific approach of dynamic contrast-enhanced breast magnetic resonance imaging at 7 Tesla demonstrates the complexity of ultra-high-field breast magnetic resonance imaging and countenances the implementation of further advanced bilateral coil concepts to circumvent current limitations from the coil and ultra-high-field magnetic strength. 2010 AUR. Published by Elsevier Inc. All rights reserved.
Challenges and Opportunities in Propulsion Simulations

DTIC Science & Technology

2015-09-24

leverage Nvidia GPU accelerators •  Release common computational infrastructure as Distro A for collaboration •  Add physics modules as either...Gemini (6.4 GB/s) Dual Rail EDR-IB (23 GB/s) Interconnect Topology 3D Torus Non-blocking Fat Tree Processors AMD Opteron™ NVIDIA Kepler™ IBM...POWER9 NVIDIA Volta™ File System 32 PB, 1 TB/s, Lustre® 120 PB, 1 TB/s, GPFS™ Peak power consumption 9 MW 10 MW Titan vs. Summit Source: R
Tensor Algebra Library for NVidia Graphics Processing Units

DOE Office of Scientific and Technical Information (OSTI.GOV)

Liakh, Dmitry

This is a general purpose math library implementing basic tensor algebra operations on NVidia GPU accelerators. This software is a tensor algebra library that can perform basic tensor algebra operations, including tensor contractions, tensor products, tensor additions, etc., on NVidia GPU accelerators, asynchronously with respect to the CPU host. It supports a simultaneous use of multiple NVidia GPUs. Each asynchronous API function returns a handle which can later be used for querying the completion of the corresponding tensor algebra operation on a specific GPU. The tensors participating in a particular tensor operation are assumed to be stored in local RAMmore » of a node or GPU RAM. The main research area where this library can be utilized is the quantum many-body theory (e.g., in electronic structure theory).« less
A biomolecular electrostatics solver using Python, GPUs and boundary elements that can handle solvent-filled cavities and Stern layers.

PubMed

Cooper, Christopher D; Bardhan, Jaydeep P; Barba, L A

2014-03-01

The continuum theory applied to biomolecular electrostatics leads to an implicit-solvent model governed by the Poisson-Boltzmann equation. Solvers relying on a boundary integral representation typically do not consider features like solvent-filled cavities or ion-exclusion (Stern) layers, due to the added difficulty of treating multiple boundary surfaces. This has hindered meaningful comparisons with volume-based methods, and the effects on accuracy of including these features has remained unknown. This work presents a solver called PyGBe that uses a boundary-element formulation and can handle multiple interacting surfaces. It was used to study the effects of solvent-filled cavities and Stern layers on the accuracy of calculating solvation energy and binding energy of proteins, using the well-known apbs finite-difference code for comparison. The results suggest that if required accuracy for an application allows errors larger than about 2% in solvation energy, then the simpler, single-surface model can be used. When calculating binding energies, the need for a multi-surface model is problem-dependent, becoming more critical when ligand and receptor are of comparable size. Comparing with the apbs solver, the boundary-element solver is faster when the accuracy requirements are higher. The cross-over point for the PyGBe code is in the order of 1-2% error, when running on one gpu card (nvidia Tesla C2075), compared with apbs running on six Intel Xeon cpu cores. PyGBe achieves algorithmic acceleration of the boundary element method using a treecode, and hardware acceleration using gpus via PyCuda from a user-visible code that is all Python. The code is open-source under MIT license.
Speeding up tsunami wave propagation modeling

NASA Astrophysics Data System (ADS)

Lavrentyev, Mikhail; Romanenko, Alexey

2014-05-01

Trans-oceanic wave propagation is one of the most time/CPU consuming parts of the tsunami modeling process. The so-called Method Of Splitting Tsunami (MOST) software package, developed at PMEL NOAA USA (Pacific Marine Environmental Laboratory of the National Oceanic and Atmospheric Administration, USA), is widely used to evaluate the tsunami parameters. However, it takes time to simulate trans-ocean wave propagation, that is up to 5 hours CPU time to "drive" the wave from Chili (epicenter) to the coast of Japan (even using a rather coarse computational mesh). Accurate wave height prediction requires fine meshes which leads to dramatic increase in time for simulation. Computation time is among the critical parameter as it takes only about 20 minutes for tsunami wave to approach the coast of Japan after earthquake at Japan trench or Sagami trench (as it was after the Great East Japan Earthquake on March 11, 2011). MOST solves numerically the hyperbolic system for three unknown functions, namely velocity vector and wave height (shallow water approximation). The system could be split into two independent systems by orthogonal directions (splitting method). Each system can be treated independently. This calculation scheme is well suited for SIMD architecture and GPUs as well. We performed adaptation of MOST package to GPU. Several numerical tests showed 40x performance gain for NVIDIA Tesla C2050 GPU vs. single core of Intel i7 processor. Results of numerical experiments were compared with other available simulation data. Calculation results, obtained at GPU, differ from the reference ones by 10^-3 cm of the wave height simulating 24 hours wave propagation. This allows us to speak about possibility to develop real-time system for evaluating tsunami danger.
A biomolecular electrostatics solver using Python, GPUs and boundary elements that can handle solvent-filled cavities and Stern layers

NASA Astrophysics Data System (ADS)

Cooper, Christopher D.; Bardhan, Jaydeep P.; Barba, L. A.

2014-03-01

The continuum theory applied to biomolecular electrostatics leads to an implicit-solvent model governed by the Poisson-Boltzmann equation. Solvers relying on a boundary integral representation typically do not consider features like solvent-filled cavities or ion-exclusion (Stern) layers, due to the added difficulty of treating multiple boundary surfaces. This has hindered meaningful comparisons with volume-based methods, and the effects on accuracy of including these features has remained unknown. This work presents a solver called PyGBe that uses a boundary-element formulation and can handle multiple interacting surfaces. It was used to study the effects of solvent-filled cavities and Stern layers on the accuracy of calculating solvation energy and binding energy of proteins, using the well-known APBS finite-difference code for comparison. The results suggest that if required accuracy for an application allows errors larger than about 2% in solvation energy, then the simpler, single-surface model can be used. When calculating binding energies, the need for a multi-surface model is problem-dependent, becoming more critical when ligand and receptor are of comparable size. Comparing with the APBS solver, the boundary-element solver is faster when the accuracy requirements are higher. The cross-over point for the PyGBe code is on the order of 1-2% error, when running on one GPU card (NVIDIA Tesla C2075), compared with APBS running on six Intel Xeon CPU cores. PyGBe achieves algorithmic acceleration of the boundary element method using a treecode, and hardware acceleration using GPUs via PyCuda from a user-visible code that is all Python. The code is open-source under MIT license.
GPU-Accelerated Voxelwise Hepatic Perfusion Quantification

PubMed Central

Wang, H; Cao, Y

2012-01-01

Voxelwise quantification of hepatic perfusion parameters from dynamic contrast enhanced (DCE) imaging greatly contributes to assessment of liver function in response to radiation therapy. However, the efficiency of the estimation of hepatic perfusion parameters voxel-by-voxel in the whole liver using a dual-input single-compartment model requires substantial improvement for routine clinical applications. In this paper, we utilize the parallel computation power of a graphics processing unit (GPU) to accelerate the computation, while maintaining the same accuracy as the conventional method. Using CUDA-GPU, the hepatic perfusion computations over multiple voxels are run across the GPU blocks concurrently but independently. At each voxel, non-linear least squares fitting the time series of the liver DCE data to the compartmental model is distributed to multiple threads in a block, and the computations of different time points are performed simultaneously and synchronically. An efficient fast Fourier transform in a block is also developed for the convolution computation in the model. The GPU computations of the voxel-by-voxel hepatic perfusion images are compared with ones by the CPU using the simulated DCE data and the experimental DCE MR images from patients. The computation speed is improved by 30 times using a NVIDIA Tesla C2050 GPU compared to a 2.67 GHz Intel Xeon CPU processor. To obtain liver perfusion maps with 626400 voxels in a patient’s liver, it takes 0.9 min with the GPU-accelerated voxelwise computation, compared to 110 min with the CPU, while both methods result in perfusion parameters differences less than 10−6. The method will be useful for generating liver perfusion images in clinical settings. PMID:22892645
Investigating the Importance of Stereo Displays for Helicopter Landing Simulation

DTIC Science & Technology

2016-08-11

visualization. The two instances of X Plane® were implemented using two separate PCs, each incorporating Intel i7 processors and Nvidia Quadro K4200... Nvidia GeForce GTX 680 graphics card was used to administer the stereo acuity and fusion range tests. The tests were displayed on an Asus VG278HE 3D...monitor with 1920x1080 pixels that was compatible with Nvidia 3D Vision2 and that used active shutter glasses. At a 1-m viewing distance, the
A Distributed GPU-Based Framework for Real-Time 3D Volume Rendering of Large Astronomical Data Cubes

NASA Astrophysics Data System (ADS)

Hassan, A. H.; Fluke, C. J.; Barnes, D. G.

2012-05-01

We present a framework to volume-render three-dimensional data cubes interactively using distributed ray-casting and volume-bricking over a cluster of workstations powered by one or more graphics processing units (GPUs) and a multi-core central processing unit (CPU). The main design target for this framework is to provide an in-core visualization solution able to provide three-dimensional interactive views of terabyte-sized data cubes. We tested the presented framework using a computing cluster comprising 64 nodes with a total of 128GPUs. The framework proved to be scalable to render a 204GB data cube with an average of 30 frames per second. Our performance analyses also compare the use of NVIDIA Tesla 1060 and 2050GPU architectures and the effect of increasing the visualization output resolution on the rendering performance. Although our initial focus, as shown in the examples presented in this work, is volume rendering of spectral data cubes from radio astronomy, we contend that our approach has applicability to other disciplines where close to real-time volume rendering of terabyte-order three-dimensional data sets is a requirement.
GPU accelerated simulations of 3D deterministic particle transport using discrete ordinates method

NASA Astrophysics Data System (ADS)

Gong, Chunye; Liu, Jie; Chi, Lihua; Huang, Haowei; Fang, Jingyue; Gong, Zhenghu

2011-07-01

Graphics Processing Unit (GPU), originally developed for real-time, high-definition 3D graphics in computer games, now provides great faculty in solving scientific applications. The basis of particle transport simulation is the time-dependent, multi-group, inhomogeneous Boltzmann transport equation. The numerical solution to the Boltzmann equation involves the discrete ordinates ( Sn) method and the procedure of source iteration. In this paper, we present a GPU accelerated simulation of one energy group time-independent deterministic discrete ordinates particle transport in 3D Cartesian geometry (Sweep3D). The performance of the GPU simulations are reported with the simulations of vacuum boundary condition. The discussion of the relative advantages and disadvantages of the GPU implementation, the simulation on multi GPUs, the programming effort and code portability are also reported. The results show that the overall performance speedup of one NVIDIA Tesla M2050 GPU ranges from 2.56 compared with one Intel Xeon X5670 chip to 8.14 compared with one Intel Core Q6600 chip for no flux fixup. The simulation with flux fixup on one M2050 is 1.23 times faster than on one X5670.
Utilizing GPUs to Accelerate Turbomachinery CFD Codes

NASA Technical Reports Server (NTRS)

MacCalla, Weylin; Kulkarni, Sameer

2016-01-01

GPU computing has established itself as a way to accelerate parallel codes in the high performance computing world. This work focuses on speeding up APNASA, a legacy CFD code used at NASA Glenn Research Center, while also drawing conclusions about the nature of GPU computing and the requirements to make GPGPU worthwhile on legacy codes. Rewriting and restructuring of the source code was avoided to limit the introduction of new bugs. The code was profiled and investigated for parallelization potential, then OpenACC directives were used to indicate parallel parts of the code. The use of OpenACC directives was not able to reduce the runtime of APNASA on either the NVIDIA Tesla discrete graphics card, or the AMD accelerated processing unit. Additionally, it was found that in order to justify the use of GPGPU, the amount of parallel work being done within a kernel would have to greatly exceed the work being done by any one portion of the APNASA code. It was determined that in order for an application like APNASA to be accelerated on the GPU, it should not be modular in nature, and the parallel portions of the code must contain a large portion of the code's computation time.
High performance in silico virtual drug screening on many-core processors.

PubMed

McIntosh-Smith, Simon; Price, James; Sessions, Richard B; Ibarra, Amaurys A

2015-05-01

Drug screening is an important part of the drug development pipeline for the pharmaceutical industry. Traditional, lab-based methods are increasingly being augmented with computational methods, ranging from simple molecular similarity searches through more complex pharmacophore matching to more computationally intensive approaches, such as molecular docking. The latter simulates the binding of drug molecules to their targets, typically protein molecules. In this work, we describe BUDE, the Bristol University Docking Engine, which has been ported to the OpenCL industry standard parallel programming language in order to exploit the performance of modern many-core processors. Our highly optimized OpenCL implementation of BUDE sustains 1.43 TFLOP/s on a single Nvidia GTX 680 GPU, or 46% of peak performance. BUDE also exploits OpenCL to deliver effective performance portability across a broad spectrum of different computer architectures from different vendors, including GPUs from Nvidia and AMD, Intel's Xeon Phi and multi-core CPUs with SIMD instruction sets.
High performance in silico virtual drug screening on many-core processors

PubMed Central

Price, James; Sessions, Richard B; Ibarra, Amaurys A

2015-01-01

Drug screening is an important part of the drug development pipeline for the pharmaceutical industry. Traditional, lab-based methods are increasingly being augmented with computational methods, ranging from simple molecular similarity searches through more complex pharmacophore matching to more computationally intensive approaches, such as molecular docking. The latter simulates the binding of drug molecules to their targets, typically protein molecules. In this work, we describe BUDE, the Bristol University Docking Engine, which has been ported to the OpenCL industry standard parallel programming language in order to exploit the performance of modern many-core processors. Our highly optimized OpenCL implementation of BUDE sustains 1.43 TFLOP/s on a single Nvidia GTX 680 GPU, or 46% of peak performance. BUDE also exploits OpenCL to deliver effective performance portability across a broad spectrum of different computer architectures from different vendors, including GPUs from Nvidia and AMD, Intel’s Xeon Phi and multi-core CPUs with SIMD instruction sets. PMID:25972727
75 FR 44989 - In the Matter of Certain Semiconductor Chips Having Synchronous Dynamic Random Access Memory...

Federal Register 2010, 2011, 2012, 2013, 2014

2010-07-30

... following respondents: NVIDIA Corporation of Santa Clara, California; Asustek Computer, Inc. of Taipei... exclusion order and cease- and-desist orders against respondents NVIDIA Corp.; Hewlett-Packard Co.; ASUS...

Accelerating Monte Carlo simulations with an NVIDIA ® graphics processor

NASA Astrophysics Data System (ADS)

Martinsen, Paul; Blaschke, Johannes; Künnemeyer, Rainer; Jordan, Robert

2009-10-01

Modern graphics cards, commonly used in desktop computers, have evolved beyond a simple interface between processor and display to incorporate sophisticated calculation engines that can be applied to general purpose computing. The Monte Carlo algorithm for modelling photon transport in turbid media has been implemented on an NVIDIA ® 8800 GT graphics card using the CUDA toolkit. The Monte Carlo method relies on following the trajectory of millions of photons through the sample, often taking hours or days to complete. The graphics-processor implementation, processing roughly 110 million scattering events per second, was found to run more than 70 times faster than a similar, single-threaded implementation on a 2.67 GHz desktop computer. Program summaryProgram title: Phoogle-C/Phoogle-G Catalogue identifier: AEEB_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEEB_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 51 264 No. of bytes in distributed program, including test data, etc.: 2 238 805 Distribution format: tar.gz Programming language: C++ Computer: Designed for Intel PCs. Phoogle-G requires a NVIDIA graphics card with support for CUDA 1.1 Operating system: Windows XP Has the code been vectorised or parallelized?: Phoogle-G is written for SIMD architectures RAM: 1 GB Classification: 21.1 External routines: Charles Karney Random number library. Microsoft Foundation Class library. NVIDA CUDA library [1]. Nature of problem: The Monte Carlo technique is an effective algorithm for exploring the propagation of light in turbid media. However, accurate results require tracing the path of many photons within the media. The independence of photons naturally lends the Monte Carlo technique to implementation on parallel architectures. Generally, parallel computing can be expensive, but recent advances in consumer grade graphics cards have opened the possibility of high-performance desktop parallel-computing. Solution method: In this pair of programmes we have implemented the Monte Carlo algorithm described by Prahl et al. [2] for photon transport in infinite scattering media to compare the performance of two readily accessible architectures: a standard desktop PC and a consumer grade graphics card from NVIDIA. Restrictions: The graphics card implementation uses single precision floating point numbers for all calculations. Only photon transport from an isotropic point-source is supported. The graphics-card version has no user interface. The simulation parameters must be set in the source code. The desktop version has a simple user interface; however some properties can only be accessed through an ActiveX client (such as Matlab). Additional comments: The random number library used has a LGPL ( http://www.gnu.org/copyleft/lesser.html) licence. Running time: Runtime can range from minutes to months depending on the number of photons simulated and the optical properties of the medium. References:http://www.nvidia.com/object/cuda_home.html. S. Prahl, M. Keijzer, Sl. Jacques, A. Welch, SPIE Institute Series 5 (1989) 102.
Large-Signal Code TESLA: Current Status and Recent Development

DTIC Science & Technology

2008-04-01

K.Eppley, J.J.Petillo, “ High - power four cavity S - band multiple- beam klystron design”, IEEE Trans. Plasma Sci. , vol. 32, pp. 1119-1135, June 2004. 4...advances in the development of the large-signal code TESLA, mainly used for the modeling of high - power single- beam and multiple-beam klystron ...amplifiers. Keywords: large-signal code; multiple-beam klystrons ; serial and parallel versions. Introduction The optimization and design of new high power
[Comparison of Quantification of Myocardial Infarct Size by One Breath Hold Single Shot PSIR Sequence and Segmented FLASH-PSIR Sequence at 3. 0 Tesla MR].

PubMed

Cheng, Wei; Cai, Shu; Sun, Jia-yu; Xia, Chun-chao; Li, Zhen-lin; Chen, Yu-cheng; Zhong, Yao-zu

2015-05-01

To compare the two sequences [single shot true-FISP-PSIR (single shot-PSIR) and segmented-turbo-FLASH-PSIR (segmented-PSIR)] in the value of quantification for myocardial infarct size at 3. 0 tesla MRI. 38 patients with clinical confirmed myocardial infarction were served a comprehensive gadonilium cardiac MRI at 3. 0 tesla MRI system (Trio, Siemens). Myocardial delayed enhancement (MDE) were performed by single shot-PSIR and segmented-PSIR sequences separatedly in 12-20 min followed gadopentetate dimeglumine injection (0. 15 mmol/kg). The quality of MDE images were analysed by experienced physicians. Signal-to-noise ratio (SNR), contrast-to-noise ratio (CNR) between the two techniques were compared. Myocardial infarct size was quantified by a dedicated software automatically (Q-mass, Medis). All objectives were scanned on the 3. 0T MR successfully. No significant difference was found in SNR and CNR of the image quality between the two sequences (P>0. 05), as well as the total myocardial volume, between two sequences (P>0. 05). Furthermore, there were still no difference in the infarct size [single shot-PSIR (30. 87 ± 15. 72) mL, segmented-PSIR (29. 26±14. 07) ml], ratio [single shot-PSIR (22. 94%±10. 94%), segmented-PSIR (20. 75% ± 8. 78%)] between the two sequences (P>0. 05). However, the average aquisition time of single shot-PSIR (21. 4 s) was less than that of the latter (380 s). Single shot-PSIR is equal to segmented-PSIR in detecting the myocardial infarct size with less acquisition time, which is valuable in the clinic application and further research.
Bayesian Methods and Confidence Intervals for Automatic Target Recognition of SAR Canonical Shapes

DTIC Science & Technology

2014-03-27

and DirectX [22]. The CUDA platform was developed by the NVIDIA Corporation to allow programmers access to the computational capabilities of the...were used for the intense repetitive computations. Developing CUDA software requires writing code for specialized compilers provided by NVIDIA and
High resolution human diffusion tensor imaging using 2-D navigated multi-shot SENSE EPI at 7 Tesla

PubMed Central

Jeong, Ha-Kyu; Gore, John C.; Anderson, Adam W.

2012-01-01

The combination of parallel imaging with partial Fourier acquisition has greatly improved the performance of diffusion-weighted single-shot EPI and is the preferred method for acquisitions at low to medium magnetic field strength such as 1.5 or 3 Tesla. Increased off-resonance effects and reduced transverse relaxation times at 7 Tesla, however, generate more significant artifacts than at lower magnetic field strength and limit data acquisition. Additional acceleration of k-space traversal using a multi-shot approach, which acquires a subset of k-space data after each excitation, reduces these artifacts relative to conventional single-shot acquisitions. However, corrections for motion-induced phase errors are not straightforward in accelerated, diffusion-weighted multi-shot EPI because of phase aliasing. In this study, we introduce a simple acquisition and corresponding reconstruction method for diffusion-weighted multi-shot EPI with parallel imaging suitable for use at high field. The reconstruction uses a simple modification of the standard SENSE algorithm to account for shot-to-shot phase errors; the method is called Image Reconstruction using Image-space Sampling functions (IRIS). Using this approach, reconstruction from highly aliased in vivo image data using 2-D navigator phase information is demonstrated for human diffusion-weighted imaging studies at 7 Tesla. The final reconstructed images show submillimeter in-plane resolution with no ghosts and much reduced blurring and off-resonance artifacts. PMID:22592941
A 12 coil superconducting bumpy torus magnet facility for plasma research

NASA Technical Reports Server (NTRS)

Roth, J. R.; Holmes, A. D.; Keller, T. A.; Krawczonek, W. M.

1972-01-01

A summary is presented of the performance of the two-coil superconducting pilot rig which preceded the NASA Lewis bumpy torus. This pilot rig was operated for 550 experimental runs over a period of 7 years. The NASA Lewis bumpy torus facility consists of 12 superconducting coils, each with a 19 cm in diameter and capable of producing magnetic field strengths of 3.0 teslas on their axes. The magnets are equally spaced around a major circumference 1.52 m in diameter, and are mounted with the major axis of the torus vertical in a single vacuum tank 2.59 m in diameter. The design value of maximum magnetic field on the magnetic axis (3.0 teslas) was reached and exceeded. A maximum magnetic field of 3.23 teslas was held for a period of 60 minutes, and the coils did not go to normal. When the coils were charged to a maximum magnetic field of 3.35 teslas, the coil system was driven normal without damage to the facility.
Tissue expander stimulated lengthening of arteries (TESLA) induces early endothelial cell proliferation in a novel rodent model.

PubMed

Potanos, Kristina; Fullington, Nora; Cauley, Ryan; Purcell, Patricia; Zurakowski, David; Fishman, Steven; Vakili, Khashayar; Kim, Heung Bae

2016-04-01

We examine the mechanism of aortic lengthening in a novel rodent model of tissue expander stimulated lengthening of arteries (TESLA). A rat model of TESLA was examined with a single stretch stimulus applied at the time of tissue expander insertion with evaluation of the aorta at 2, 4 and 7day time points. Measurements as well as histology and proliferation assays were performed and compared to sham controls. The aortic length was increased at all time points without histologic signs of tissue injury. Nuclear density remained unchanged despite the increase in length suggesting cellular hyperplasia. Cellular proliferation was confirmed in endothelial cell layer by Ki-67 stain. Aortic lengthening may be achieved using TESLA. The increase in aortic length can be achieved without tissue injury and results at least partially from cellular hyperplasia. Further studies are required to define the mechanisms involved in the growth of arteries under increased longitudinal stress. Copyright © 2015 Elsevier Inc. All rights reserved.
GPU-based ultra-fast dose calculation using a finite size pencil beam model.

PubMed

Gu, Xuejun; Choi, Dongju; Men, Chunhua; Pan, Hubert; Majumdar, Amitava; Jiang, Steve B

2009-10-21

Online adaptive radiation therapy (ART) is an attractive concept that promises the ability to deliver an optimal treatment in response to the inter-fraction variability in patient anatomy. However, it has yet to be realized due to technical limitations. Fast dose deposit coefficient calculation is a critical component of the online planning process that is required for plan optimization of intensity-modulated radiation therapy (IMRT). Computer graphics processing units (GPUs) are well suited to provide the requisite fast performance for the data-parallel nature of dose calculation. In this work, we develop a dose calculation engine based on a finite-size pencil beam (FSPB) algorithm and a GPU parallel computing framework. The developed framework can accommodate any FSPB model. We test our implementation in the case of a water phantom and the case of a prostate cancer patient with varying beamlet and voxel sizes. All testing scenarios achieved speedup ranging from 200 to 400 times when using a NVIDIA Tesla C1060 card in comparison with a 2.27 GHz Intel Xeon CPU. The computational time for calculating dose deposition coefficients for a nine-field prostate IMRT plan with this new framework is less than 1 s. This indicates that the GPU-based FSPB algorithm is well suited for online re-planning for adaptive radiotherapy.
Spectral turning bands for efficient Gaussian random fields generation on GPUs and accelerators

NASA Astrophysics Data System (ADS)

Hunger, L.; Cosenza, B.; Kimeswenger, S.; Fahringer, T.

2015-11-01

A random field (RF) is a set of correlated random variables associated with different spatial locations. RF generation algorithms are of crucial importance for many scientific areas, such as astrophysics, geostatistics, computer graphics, and many others. Current approaches commonly make use of 3D fast Fourier transform (FFT), which does not scale well for RF bigger than the available memory; they are also limited to regular rectilinear meshes. We introduce random field generation with the turning band method (RAFT), an RF generation algorithm based on the turning band method that is optimized for massively parallel hardware such as GPUs and accelerators. Our algorithm replaces the 3D FFT with a lower-order, one-dimensional FFT followed by a projection step and is further optimized with loop unrolling and blocking. RAFT can easily generate RF on non-regular (non-uniform) meshes and efficiently produce fields with mesh sizes bigger than the available device memory by using a streaming, out-of-core approach. Our algorithm generates RF with the correct statistical behavior and is tested on a variety of modern hardware, such as NVIDIA Tesla, AMD FirePro and Intel Phi. RAFT is faster than the traditional methods on regular meshes and has been successfully applied to two real case scenarios: planetary nebulae and cosmological simulations.
Accelerating large-scale protein structure alignments with graphics processing units

PubMed Central

2012-01-01

Background Large-scale protein structure alignment, an indispensable tool to structural bioinformatics, poses a tremendous challenge on computational resources. To ensure structure alignment accuracy and efficiency, efforts have been made to parallelize traditional alignment algorithms in grid environments. However, these solutions are costly and of limited accessibility. Others trade alignment quality for speedup by using high-level characteristics of structure fragments for structure comparisons. Findings We present ppsAlign, a parallel protein structure Alignment framework designed and optimized to exploit the parallelism of Graphics Processing Units (GPUs). As a general-purpose GPU platform, ppsAlign could take many concurrent methods, such as TM-align and Fr-TM-align, into the parallelized algorithm design. We evaluated ppsAlign on an NVIDIA Tesla C2050 GPU card, and compared it with existing software solutions running on an AMD dual-core CPU. We observed a 36-fold speedup over TM-align, a 65-fold speedup over Fr-TM-align, and a 40-fold speedup over MAMMOTH. Conclusions ppsAlign is a high-performance protein structure alignment tool designed to tackle the computational complexity issues from protein structural data. The solution presented in this paper allows large-scale structure comparisons to be performed using massive parallel computing power of GPU. PMID:22357132
Improved preconditioned conjugate gradient algorithm and application in 3D inversion of gravity-gradiometry data

NASA Astrophysics Data System (ADS)

Wang, Tai-Han; Huang, Da-Nian; Ma, Guo-Qing; Meng, Zhao-Hai; Li, Ye

2017-06-01

With the continuous development of full tensor gradiometer (FTG) measurement techniques, three-dimensional (3D) inversion of FTG data is becoming increasingly used in oil and gas exploration. In the fast processing and interpretation of large-scale high-precision data, the use of the graphics processing unit process unit (GPU) and preconditioning methods are very important in the data inversion. In this paper, an improved preconditioned conjugate gradient algorithm is proposed by combining the symmetric successive over-relaxation (SSOR) technique and the incomplete Choleksy decomposition conjugate gradient algorithm (ICCG). Since preparing the preconditioner requires extra time, a parallel implement based on GPU is proposed. The improved method is then applied in the inversion of noisecontaminated synthetic data to prove its adaptability in the inversion of 3D FTG data. Results show that the parallel SSOR-ICCG algorithm based on NVIDIA Tesla C2050 GPU achieves a speedup of approximately 25 times that of a serial program using a 2.0 GHz Central Processing Unit (CPU). Real airborne gravity-gradiometry data from Vinton salt dome (southwest Louisiana, USA) are also considered. Good results are obtained, which verifies the efficiency and feasibility of the proposed parallel method in fast inversion of 3D FTG data.
Sub-second pencil beam dose calculation on GPU for adaptive proton therapy.

PubMed

da Silva, Joakim; Ansorge, Richard; Jena, Rajesh

2015-06-21

Although proton therapy delivered using scanned pencil beams has the potential to produce better dose conformity than conventional radiotherapy, the created dose distributions are more sensitive to anatomical changes and patient motion. Therefore, the introduction of adaptive treatment techniques where the dose can be monitored as it is being delivered is highly desirable. We present a GPU-based dose calculation engine relying on the widely used pencil beam algorithm, developed for on-line dose calculation. The calculation engine was implemented from scratch, with each step of the algorithm parallelized and adapted to run efficiently on the GPU architecture. To ensure fast calculation, it employs several application-specific modifications and simplifications, and a fast scatter-based implementation of the computationally expensive kernel superposition step. The calculation time for a skull base treatment plan using two beam directions was 0.22 s on an Nvidia Tesla K40 GPU, whereas a test case of a cubic target in water from the literature took 0.14 s to calculate. The accuracy of the patient dose distributions was assessed by calculating the γ-index with respect to a gold standard Monte Carlo simulation. The passing rates were 99.2% and 96.7%, respectively, for the 3%/3 mm and 2%/2 mm criteria, matching those produced by a clinical treatment planning system.
Fast CPU-based Monte Carlo simulation for radiotherapy dose calculation.

PubMed

Ziegenhein, Peter; Pirner, Sven; Ph Kamerling, Cornelis; Oelfke, Uwe

2015-08-07

Monte-Carlo (MC) simulations are considered to be the most accurate method for calculating dose distributions in radiotherapy. Its clinical application, however, still is limited by the long runtimes conventional implementations of MC algorithms require to deliver sufficiently accurate results on high resolution imaging data. In order to overcome this obstacle we developed the software-package PhiMC, which is capable of computing precise dose distributions in a sub-minute time-frame by leveraging the potential of modern many- and multi-core CPU-based computers. PhiMC is based on the well verified dose planning method (DPM). We could demonstrate that PhiMC delivers dose distributions which are in excellent agreement to DPM. The multi-core implementation of PhiMC scales well between different computer architectures and achieves a speed-up of up to 37[Formula: see text] compared to the original DPM code executed on a modern system. Furthermore, we could show that our CPU-based implementation on a modern workstation is between 1.25[Formula: see text] and 1.95[Formula: see text] faster than a well-known GPU implementation of the same simulation method on a NVIDIA Tesla C2050. Since CPUs work on several hundreds of GB RAM the typical GPU memory limitation does not apply for our implementation and high resolution clinical plans can be calculated.
Leveraging FPGAs for Accelerating Short Read Alignment.

PubMed

Arram, James; Kaplan, Thomas; Luk, Wayne; Jiang, Peiyong

2017-01-01

One of the key challenges facing genomics today is how to efficiently analyze the massive amounts of data produced by next-generation sequencing platforms. With general-purpose computing systems struggling to address this challenge, specialized processors such as the Field-Programmable Gate Array (FPGA) are receiving growing interest. The means by which to leverage this technology for accelerating genomic data analysis is however largely unexplored. In this paper, we present a runtime reconfigurable architecture for accelerating short read alignment using FPGAs. This architecture exploits the reconfigurability of FPGAs to allow the development of fast yet flexible alignment designs. We apply this architecture to develop an alignment design which supports exact and approximate alignment with up to two mismatches. Our design is based on the FM-index, with optimizations to improve the alignment performance. In particular, the n-step FM-index, index oversampling, a seed-and-compare stage, and bi-directional backtracking are included. Our design is implemented and evaluated on a 1U Maxeler MPC-X2000 dataflow node with eight Altera Stratix-V FPGAs. Measurements show that our design is 28 times faster than Bowtie2 running with 16 threads on dual Intel Xeon E5-2640 CPUs, and nine times faster than Soap3-dp running on an NVIDIA Tesla C2070 GPU.
Efficient methods for implementation of multi-level nonrigid mass-preserving image registration on GPUs and multi-threaded CPUs.

PubMed

Ellingwood, Nathan D; Yin, Youbing; Smith, Matthew; Lin, Ching-Long

2016-04-01

Faster and more accurate methods for registration of images are important for research involved in conducting population-based studies that utilize medical imaging, as well as improvements for use in clinical applications. We present a novel computation- and memory-efficient multi-level method on graphics processing units (GPU) for performing registration of two computed tomography (CT) volumetric lung images. We developed a computation- and memory-efficient Diffeomorphic Multi-level B-Spline Transform Composite (DMTC) method to implement nonrigid mass-preserving registration of two CT lung images on GPU. The framework consists of a hierarchy of B-Spline control grids of increasing resolution. A similarity criterion known as the sum of squared tissue volume difference (SSTVD) was adopted to preserve lung tissue mass. The use of SSTVD consists of the calculation of the tissue volume, the Jacobian, and their derivatives, which makes its implementation on GPU challenging due to memory constraints. The use of the DMTC method enabled reduced computation and memory storage of variables with minimal communication between GPU and Central Processing Unit (CPU) due to ability to pre-compute values. The method was assessed on six healthy human subjects. Resultant GPU-generated displacement fields were compared against the previously validated CPU counterpart fields, showing good agreement with an average normalized root mean square error (nRMS) of 0.044±0.015. Runtime and performance speedup are compared between single-threaded CPU, multi-threaded CPU, and GPU algorithms. Best performance speedup occurs at the highest resolution in the GPU implementation for the SSTVD cost and cost gradient computations, with a speedup of 112 times that of the single-threaded CPU version and 11 times over the twelve-threaded version when considering average time per iteration using a Nvidia Tesla K20X GPU. The proposed GPU-based DMTC method outperforms its multi-threaded CPU version in terms of runtime. Total registration time reduced runtime to 2.9min on the GPU version, compared to 12.8min on twelve-threaded CPU version and 112.5min on a single-threaded CPU. Furthermore, the GPU implementation discussed in this work can be adapted for use of other cost functions that require calculation of the first derivatives. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Kartsaklis, Christos; Civario, G

This paper discusses an ongoing progress regarding the development of a Java-based library for rapid kernel prototyping in NVIDIA PTX and PTX instruction scheduling. It is aimed at developers seeking total control of emitted PTX, highly parametric emission of, and tunable instruction reordering. It is primarily used for code development at ICHEC but is also hoped that NVIDIA GPU community will also find it beneficial.
Fast T1 and T2 mapping methods: the zoomed U-FLARE sequence compared with EPI and snapshot-FLASH for abdominal imaging at 11.7 Tesla.

PubMed

Pastor, Géraldine; Jiménez-González, María; Plaza-García, Sandra; Beraza, Marta; Reese, Torsten

2017-06-01

A newly adapted zoomed ultrafast low-angle RARE (U-FLARE) sequence is described for abdominal imaging applications at 11.7 Tesla and compared with the standard echo-plannar imaging (EPI) and snapshot fast low angle shot (FLASH) methods. Ultrafast EPI and snapshot-FLASH protocols were evaluated to determine relaxation times in phantoms and in the mouse kidney in vivo. Owing to their apparent shortcomings, imaging artefacts, signal-to-noise ratio (SNR), and variability in the determination of relaxation times, these methods are compared with the newly implemented zoomed U-FLARE sequence. Snapshot-FLASH has a lower SNR when compared with the zoomed U-FLARE sequence and EPI. The variability in the measurement of relaxation times is higher in the Look-Locker sequences than in inversion recovery experiments. Respectively, the average T1 and T2 values at 11.7 Tesla are as follows: kidney cortex, 1810 and 29 ms; kidney medulla, 2100 and 25 ms; subcutaneous tumour, 2365 and 28 ms. This study demonstrates that the zoomed U-FLARE sequence yields single-shot single-slice images with good anatomical resolution and high SNR at 11.7 Tesla. Thus, it offers a viable alternative to standard protocols for mapping very fast parameters, such as T1 and T2, or dynamic processes in vivo at high field.
Performance of a 12-coil superconducting bumpy torus magnet facility

NASA Technical Reports Server (NTRS)

Roth, J. R.; Holmes, A. D.; Keller, T. A.; Krawczonek, W. M.

1972-01-01

The bumpy torus facility consists of 12 superconducting coils, each 19 cm i.d. and capable of 3.0 teslas on their axes. The coils are equally spaced around a toroidal array with a major diameter of 1.52 m, and are mounted with the major axis of the torus vertical in a single vacuum tank 2.6 m in diameter. Final shakedown tests of the facility mapped out its magnetic, cryogenic, vacuum, mechanical, and electrical performance. The facility is now ready for use as a plasma physics research facility. A maximum magnetic field on the magnetic axis of 3.23 teslas was held for a period of more than sixty minutes without a coil normalcy. The design field was 3.00 teslas. The steady-state liquid helium boil-off rate was 87 liters per hour of liquid helium without the coils charged. The coil array was stable when subjected to an impulsive loading, even with the magnets fully charged. When the coils were charged to a maximum magnetic field of 3.35 teslas, the system was driven normal without damage.
Computing effective properties of random heterogeneous materials on heterogeneous parallel processors

NASA Astrophysics Data System (ADS)

Leidi, Tiziano; Scocchi, Giulio; Grossi, Loris; Pusterla, Simone; D'Angelo, Claudio; Thiran, Jean-Philippe; Ortona, Alberto

2012-11-01

In recent decades, finite element (FE) techniques have been extensively used for predicting effective properties of random heterogeneous materials. In the case of very complex microstructures, the choice of numerical methods for the solution of this problem can offer some advantages over classical analytical approaches, and it allows the use of digital images obtained from real material samples (e.g., using computed tomography). On the other hand, having a large number of elements is often necessary for properly describing complex microstructures, ultimately leading to extremely time-consuming computations and high memory requirements. With the final objective of reducing these limitations, we improved an existing freely available FE code for the computation of effective conductivity (electrical and thermal) of microstructure digital models. To allow execution on hardware combining multi-core CPUs and a GPU, we first translated the original algorithm from Fortran to C, and we subdivided it into software components. Then, we enhanced the C version of the algorithm for parallel processing with heterogeneous processors. With the goal of maximizing the obtained performances and limiting resource consumption, we utilized a software architecture based on stream processing, event-driven scheduling, and dynamic load balancing. The parallel processing version of the algorithm has been validated using a simple microstructure consisting of a single sphere located at the centre of a cubic box, yielding consistent results. Finally, the code was used for the calculation of the effective thermal conductivity of a digital model of a real sample (a ceramic foam obtained using X-ray computed tomography). On a computer equipped with dual hexa-core Intel Xeon X5670 processors and an NVIDIA Tesla C2050, the parallel application version features near to linear speed-up progression when using only the CPU cores. It executes more than 20 times faster when additionally using the GPU.
Single voxel magnetic resonance spectroscopy at 3 Tesla in a memory disorders clinic: early right hippocampal NAA/Cr loss in mildly impaired subjects.

PubMed

Caserta, Maria T; Ragin, Ann; Hermida, Adriana P; Ahrens, R John; Wise, Leon

2008-11-30

In this study, we use magnetic resonance spectroscopy (MRS) at 3 Tesla to measure N-acetyl aspartate (NAA), myo-inositol (mI) and choline (Cho) to creatine (Cr) ratios in R (right) and L (left) hippocampi (H) in 8 mildly memory impaired (MMI), 6 probable Alzheimer's Disease (PRAD), and 17 control subjects. NAA/Cr was significantly reduced in the RH in the MMI group and bilaterally in the PRAD group vs. controls. No other metabolite differences were noted between the three groups. Five MMI subjects have converted to PRAD in follow-up. These findings suggest that RH NAA/Cr ratios measured at 3 Tesla may be a sensitive marker of future progression to dementia in a clinically defined population with isolated memory complaints.

Multishot versus Single-Shot Pulse Sequences in Very High Field fMRI: A Comparison Using Retinotopic Mapping

PubMed Central

Gatenby, J. Christopher; Gore, John C.; Tong, Frank

2012-01-01

High-resolution functional MRI is a leading application for very high field (7 Tesla) human MR imaging. Though higher field strengths promise improvements in signal-to-noise ratios (SNR) and BOLD contrast relative to fMRI at 3 Tesla, these benefits may be partially offset by accompanying increases in geometric distortion and other off-resonance effects. Such effects may be especially pronounced with the single-shot EPI pulse sequences typically used for fMRI at standard field strengths. As an alternative, one might consider multishot pulse sequences, which may lead to somewhat lower temporal SNR than standard EPI, but which are also often substantially less susceptible to off-resonance effects. Here we consider retinotopic mapping of human visual cortex as a practical test case by which to compare examples of these sequence types for high-resolution fMRI at 7 Tesla. We performed polar angle retinotopic mapping at each of 3 isotropic resolutions (2.0, 1.7, and 1.1 mm) using both accelerated single-shot 2D EPI and accelerated multishot 3D gradient-echo pulse sequences. We found that single-shot EPI indeed led to greater temporal SNR and contrast-to-noise ratios (CNR) than the multishot sequences. However, additional distortion correction in postprocessing was required in order to fully realize these advantages, particularly at higher resolutions. The retinotopic maps produced by both sequence types were qualitatively comparable, and showed equivalent test/retest reliability. Thus, when surface-based analyses are planned, or in other circumstances where geometric distortion is of particular concern, multishot pulse sequences could provide a viable alternative to single-shot EPI. PMID:22514646
Multishot versus single-shot pulse sequences in very high field fMRI: a comparison using retinotopic mapping.

PubMed

Swisher, Jascha D; Sexton, John A; Gatenby, J Christopher; Gore, John C; Tong, Frank

2012-01-01

High-resolution functional MRI is a leading application for very high field (7 Tesla) human MR imaging. Though higher field strengths promise improvements in signal-to-noise ratios (SNR) and BOLD contrast relative to fMRI at 3 Tesla, these benefits may be partially offset by accompanying increases in geometric distortion and other off-resonance effects. Such effects may be especially pronounced with the single-shot EPI pulse sequences typically used for fMRI at standard field strengths. As an alternative, one might consider multishot pulse sequences, which may lead to somewhat lower temporal SNR than standard EPI, but which are also often substantially less susceptible to off-resonance effects. Here we consider retinotopic mapping of human visual cortex as a practical test case by which to compare examples of these sequence types for high-resolution fMRI at 7 Tesla. We performed polar angle retinotopic mapping at each of 3 isotropic resolutions (2.0, 1.7, and 1.1 mm) using both accelerated single-shot 2D EPI and accelerated multishot 3D gradient-echo pulse sequences. We found that single-shot EPI indeed led to greater temporal SNR and contrast-to-noise ratios (CNR) than the multishot sequences. However, additional distortion correction in postprocessing was required in order to fully realize these advantages, particularly at higher resolutions. The retinotopic maps produced by both sequence types were qualitatively comparable, and showed equivalent test/retest reliability. Thus, when surface-based analyses are planned, or in other circumstances where geometric distortion is of particular concern, multishot pulse sequences could provide a viable alternative to single-shot EPI.
Optimizing ion channel models using a parallel genetic algorithm on graphical processors.

PubMed

Ben-Shalom, Roy; Aviv, Amit; Razon, Benjamin; Korngreen, Alon

2012-01-01

We have recently shown that we can semi-automatically constrain models of voltage-gated ion channels by combining a stochastic search algorithm with ionic currents measured using multiple voltage-clamp protocols. Although numerically successful, this approach is highly demanding computationally, with optimization on a high performance Linux cluster typically lasting several days. To solve this computational bottleneck we converted our optimization algorithm for work on a graphical processing unit (GPU) using NVIDIA's CUDA. Parallelizing the process on a Fermi graphic computing engine from NVIDIA increased the speed ∼180 times over an application running on an 80 node Linux cluster, considerably reducing simulation times. This application allows users to optimize models for ion channel kinetics on a single, inexpensive, desktop "super computer," greatly reducing the time and cost of building models relevant to neuronal physiology. We also demonstrate that the point of algorithm parallelization is crucial to its performance. We substantially reduced computing time by solving the ODEs (Ordinary Differential Equations) so as to massively reduce memory transfers to and from the GPU. This approach may be applied to speed up other data intensive applications requiring iterative solutions of ODEs. Copyright © 2012 Elsevier B.V. All rights reserved.
Operational Based Vision Assessment

DTIC Science & Technology

2014-02-01

formulated or supplied the drawings, specifications, or other data does not license the holder or any other person or corporation or convey any...expensive than other developers’ software. The sources for the GPUs ( Nvidia ) and the host computer (Concurrent’s iHawk) were identified. The...boundaries, which is a distracting artifact when performing visual tests. The problem has been isolated by the OBVA team to the Nvidia GPUs. The OBVA system
Sub-second pencil beam dose calculation on GPU for adaptive proton therapy

NASA Astrophysics Data System (ADS)

da Silva, Joakim; Ansorge, Richard; Jena, Rajesh

2015-06-01

Although proton therapy delivered using scanned pencil beams has the potential to produce better dose conformity than conventional radiotherapy, the created dose distributions are more sensitive to anatomical changes and patient motion. Therefore, the introduction of adaptive treatment techniques where the dose can be monitored as it is being delivered is highly desirable. We present a GPU-based dose calculation engine relying on the widely used pencil beam algorithm, developed for on-line dose calculation. The calculation engine was implemented from scratch, with each step of the algorithm parallelized and adapted to run efficiently on the GPU architecture. To ensure fast calculation, it employs several application-specific modifications and simplifications, and a fast scatter-based implementation of the computationally expensive kernel superposition step. The calculation time for a skull base treatment plan using two beam directions was 0.22 s on an Nvidia Tesla K40 GPU, whereas a test case of a cubic target in water from the literature took 0.14 s to calculate. The accuracy of the patient dose distributions was assessed by calculating the γ-index with respect to a gold standard Monte Carlo simulation. The passing rates were 99.2% and 96.7%, respectively, for the 3%/3 mm and 2%/2 mm criteria, matching those produced by a clinical treatment planning system.
Aho-Corasick String Matching on Shared and Distributed Memory Parallel Architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tumeo, Antonino; Villa, Oreste; Chavarría-Miranda, Daniel

String matching is at the core of many critical applications, including network intrusion detection systems, search engines, virus scanners, spam filters, DNA and protein sequencing, and data mining. For all of these applications string matching requires a combination of (sometimes all) the following characteristics: high and/or predictable performance, support for large data sets and flexibility of integration and customization. Many software based implementations targeting conventional cache-based microprocessors fail to achieve high and predictable performance requirements, while Field-Programmable Gate Array (FPGA) implementations and dedicated hardware solutions fail to support large data sets (dictionary sizes) and are difficult to integrate and customize.more » The advent of multicore, multithreaded, and GPU-based systems is opening the possibility for software based solutions to reach very high performance at a sustained rate. This paper compares several software-based implementations of the Aho-Corasick string searching algorithm for high performance systems. We discuss the implementation of the algorithm on several types of shared-memory high-performance architectures (Niagara 2, large x86 SMPs and Cray XMT), distributed memory with homogeneous processing elements (InfiniBand cluster of x86 multicores) and heterogeneous processing elements (InfiniBand cluster of x86 multicores with NVIDIA Tesla C10 GPUs). We describe in detail how each solution achieves the objectives of supporting large dictionaries, sustaining high performance, and enabling customization and flexibility using various data sets.« less
Real-time implementation of optimized maximum noise fraction transform for feature extraction of hyperspectral images

NASA Astrophysics Data System (ADS)

Wu, Yuanfeng; Gao, Lianru; Zhang, Bing; Zhao, Haina; Li, Jun

2014-01-01

We present a parallel implementation of the optimized maximum noise fraction (G-OMNF) transform algorithm for feature extraction of hyperspectral images on commodity graphics processing units (GPUs). The proposed approach explored the algorithm data-level concurrency and optimized the computing flow. We first defined a three-dimensional grid, in which each thread calculates a sub-block data to easily facilitate the spatial and spectral neighborhood data searches in noise estimation, which is one of the most important steps involved in OMNF. Then, we optimized the processing flow and computed the noise covariance matrix before computing the image covariance matrix to reduce the original hyperspectral image data transmission. These optimization strategies can greatly improve the computing efficiency and can be applied to other feature extraction algorithms. The proposed parallel feature extraction algorithm was implemented on an Nvidia Tesla GPU using the compute unified device architecture and basic linear algebra subroutines library. Through the experiments on several real hyperspectral images, our GPU parallel implementation provides a significant speedup of the algorithm compared with the CPU implementation, especially for highly data parallelizable and arithmetically intensive algorithm parts, such as noise estimation. In order to further evaluate the effectiveness of G-OMNF, we used two different applications: spectral unmixing and classification for evaluation. Considering the sensor scanning rate and the data acquisition time, the proposed parallel implementation met the on-board real-time feature extraction.
GAMUT: GPU accelerated microRNA analysis to uncover target genes through CUDA-miRanda

PubMed Central

2014-01-01

Background Non-coding sequences such as microRNAs have important roles in disease processes. Computational microRNA target identification (CMTI) is becoming increasingly important since traditional experimental methods for target identification pose many difficulties. These methods are time-consuming, costly, and often need guidance from computational methods to narrow down candidate genes anyway. However, most CMTI methods are computationally demanding, since they need to handle not only several million query microRNA and reference RNA pairs, but also several million nucleotide comparisons within each given pair. Thus, the need to perform microRNA identification at such large scale has increased the demand for parallel computing. Methods Although most CMTI programs (e.g., the miRanda algorithm) are based on a modified Smith-Waterman (SW) algorithm, the existing parallel SW implementations (e.g., CUDASW++ 2.0/3.0, SWIPE) are unable to meet this demand in CMTI tasks. We present CUDA-miRanda, a fast microRNA target identification algorithm that takes advantage of massively parallel computing on Graphics Processing Units (GPU) using NVIDIA's Compute Unified Device Architecture (CUDA). CUDA-miRanda specifically focuses on the local alignment of short (i.e., ≤ 32 nucleotides) sequences against longer reference sequences (e.g., 20K nucleotides). Moreover, the proposed algorithm is able to report multiple alignments (up to 191 top scores) and the corresponding traceback sequences for any given (query sequence, reference sequence) pair. Results Speeds over 5.36 Giga Cell Updates Per Second (GCUPs) are achieved on a server with 4 NVIDIA Tesla M2090 GPUs. Compared to the original miRanda algorithm, which is evaluated on an Intel Xeon E5620@2.4 GHz CPU, the experimental results show up to 166 times performance gains in terms of execution time. In addition, we have verified that the exact same targets were predicted in both CUDA-miRanda and the original miRanda implementations through multiple test datasets. Conclusions We offer a GPU-based alternative to high performance compute (HPC) that can be developed locally at a relatively small cost. The community of GPU developers in the biomedical research community, particularly for genome analysis, is still growing. With increasing shared resources, this community will be able to advance CMTI in a very significant manner. Our source code is available at https://sourceforge.net/projects/cudamiranda/. PMID:25077821
GPU Acceleration of DSP for Communication Receivers.

PubMed

Gunther, Jake; Gunther, Hyrum; Moon, Todd

2017-09-01

Graphics processing unit (GPU) implementations of signal processing algorithms can outperform CPU-based implementations. This paper describes the GPU implementation of several algorithms encountered in a wide range of high-data rate communication receivers including filters, multirate filters, numerically controlled oscillators, and multi-stage digital down converters. These structures are tested by processing the 20 MHz wide FM radio band (88-108 MHz). Two receiver structures are explored: a single channel receiver and a filter bank channelizer. Both run in real time on NVIDIA GeForce GTX 1080 graphics card.
Beyond 100 Tesla: Scientific experiments using single-turn coils

NASA Astrophysics Data System (ADS)

Portugall, Oliver; Solane, Pierre Yves; Plochocka, Paulina; Maude, Duncan K.; Nicholas, Robin J.

2013-01-01

Current opportunities and recent examples for research in magnetic fields well above 100 T using single-turn coils are discussed. After a general introduction into basic principles and technical constraints associated with the generation of Megagauss fields we discuss data obtained at the LNCMI Toulouse, where such fields are routinely used for scientific applications.
Single-row vs. double-row arthroscopic rotator cuff repair: clinical and 3 Tesla MR arthrography results.

PubMed

Tudisco, Cosimo; Bisicchia, Salvatore; Savarese, Eugenio; Fiori, Roberto; Bartolucci, Dario A; Masala, Salvatore; Simonetti, Giovanni

2013-01-27

Arthroscopic rotator cuff repair has become popular in the last few years because it avoids large skin incisions and deltoid detachment and dysfunction. Earlier arthroscopic single-row (SR) repair methods achieved only partial restoration of the original footprint of the tendons of the rotator cuff, while double-row (DR) repair methods presented many biomechanical advantages and higher rates of tendon-to-bone healing. However, DR repair failed to demonstrate better clinical results than SR repair in clinical trials. MR imaging at 3 Tesla, especially with intra-articular contrast medium (MRA), showed a better diagnostic performance than 1.5 Tesla in the musculoskeletal setting. The objective of this study was to retrospectively evaluate the clinical and 3 Tesla MRA results in two groups of patients operated on for a medium-sized full-thickness rotator cuff tear with two different techniques. The first group consisted of 20 patients operated on with the SR technique; the second group consisted of 20 patients operated on with the DR technique. All patients were evaluated at a minimum of 3 years after surgery. The primary end point was the re-tear rate at 3 Tesla MRA. The secondary end points were the Constant-Murley Scale (CMS), the Simple Shoulder Test (SST) scores, surgical time and implant expense. The mean follow-up was 40 months in the SR group and 38.9 months in the DR group. The mean postoperative CMS was 70 in the SR group and 68 in the DR group. The mean SST score was 9.4 in the SR group and 10.1 in the DR group. The re-tear rate was 60% in the SR group and 25% in the DR group. Leakage of the contrast medium was observed in all patients. To the best of our knowledge, this is the first report on 3 Tesla MRA in the evaluation of two different techniques of rotator cuff repair. DR repair resulted in a statistically significant lower re-tear rate, with longer surgical time and higher implant expense, despite no difference in clinical outcomes. We think that leakage of the contrast medium is due to an incomplete tendon-to-bone sealing, which is not a re-tear. This phenomenon could have important medicolegal implications. Level of evidence III. Treatment study: Case-control study.
Single-row vs. double-row arthroscopic rotator cuff repair: clinical and 3 Tesla MR arthrography results

PubMed Central

2013-01-01

Background Arthroscopic rotator cuff repair has become popular in the last few years because it avoids large skin incisions and deltoid detachment and dysfunction. Earlier arthroscopic single-row (SR) repair methods achieved only partial restoration of the original footprint of the tendons of the rotator cuff, while double-row (DR) repair methods presented many biomechanical advantages and higher rates of tendon-to-bone healing. However, DR repair failed to demonstrate better clinical results than SR repair in clinical trials. MR imaging at 3 Tesla, especially with intra-articular contrast medium (MRA), showed a better diagnostic performance than 1.5 Tesla in the musculoskeletal setting. The objective of this study was to retrospectively evaluate the clinical and 3 Tesla MRA results in two groups of patients operated on for a medium-sized full-thickness rotator cuff tear with two different techniques. Methods The first group consisted of 20 patients operated on with the SR technique; the second group consisted of 20 patients operated on with the DR technique. All patients were evaluated at a minimum of 3 years after surgery. The primary end point was the re-tear rate at 3 Tesla MRA. The secondary end points were the Constant-Murley Scale (CMS), the Simple Shoulder Test (SST) scores, surgical time and implant expense. Results The mean follow-up was 40 months in the SR group and 38.9 months in the DR group. The mean postoperative CMS was 70 in the SR group and 68 in the DR group. The mean SST score was 9.4 in the SR group and 10.1 in the DR group. The re-tear rate was 60% in the SR group and 25% in the DR group. Leakage of the contrast medium was observed in all patients. Conclusions To the best of our knowledge, this is the first report on 3 Tesla MRA in the evaluation of two different techniques of rotator cuff repair. DR repair resulted in a statistically significant lower re-tear rate, with longer surgical time and higher implant expense, despite no difference in clinical outcomes. We think that leakage of the contrast medium is due to an incomplete tendon-to-bone sealing, which is not a re-tear. This phenomenon could have important medicolegal implications. Level of evidence III. Treatment study: Case–control study. PMID:23351978
Massive parallelization of a 3D finite difference electromagnetic forward solution using domain decomposition methods on multiple CUDA enabled GPUs

NASA Astrophysics Data System (ADS)

Schultz, A.

2010-12-01

3D forward solvers lie at the core of inverse formulations used to image the variation of electrical conductivity within the Earth's interior. This property is associated with variations in temperature, composition, phase, presence of volatiles, and in specific settings, the presence of groundwater, geothermal resources, oil/gas or minerals. The high cost of 3D solutions has been a stumbling block to wider adoption of 3D methods. Parallel algorithms for modeling frequency domain 3D EM problems have not achieved wide scale adoption, with emphasis on fairly coarse grained parallelism using MPI and similar approaches. The communications bandwidth as well as the latency required to send and receive network communication packets is a limiting factor in implementing fine grained parallel strategies, inhibiting wide adoption of these algorithms. Leading Graphics Processor Unit (GPU) companies now produce GPUs with hundreds of GPU processor cores per die. The footprint, in silicon, of the GPU's restricted instruction set is much smaller than the general purpose instruction set required of a CPU. Consequently, the density of processor cores on a GPU can be much greater than on a CPU. GPUs also have local memory, registers and high speed communication with host CPUs, usually through PCIe type interconnects. The extremely low cost and high computational power of GPUs provides the EM geophysics community with an opportunity to achieve fine grained (i.e. massive) parallelization of codes on low cost hardware. The current generation of GPUs (e.g. NVidia Fermi) provides 3 billion transistors per chip die, with nearly 500 processor cores and up to 6 GB of fast (DDR5) GPU memory. This latest generation of GPU supports fast hardware double precision (64 bit) floating point operations of the type required for frequency domain EM forward solutions. Each Fermi GPU board can sustain nearly 1 TFLOP in double precision, and multiple boards can be installed in the host computer system. We describe our ongoing efforts to achieve massive parallelization on a novel hybrid GPU testbed machine currently configured with 12 Intel Westmere Xeon CPU cores (or 24 parallel computational threads) with 96 GB DDR3 system memory, 4 GPU subsystems which in aggregate contain 960 NVidia Tesla GPU cores with 16 GB dedicated DDR3 GPU memory, and a second interleved bank of 4 GPU subsystems containing in aggregate 1792 NVidia Fermi GPU cores with 12 GB dedicated DDR5 GPU memory. We are applying domain decomposition methods to a modified version of Weiss' (2001) 3D frequency domain full physics EM finite difference code, an open source GPL licensed f90 code available for download from www.OpenEM.org. This will be the core of a new hybrid 3D inversion that parallelizes frequencies across CPUs and individual forward solutions across GPUs. We describe progress made in modifying the code to use direct solvers in GPU cores dedicated to each small subdomain, iteratively improving the solution by matching adjacent subdomain boundary solutions, rather than iterative Krylov space sparse solvers as currently applied to the whole domain.
Design Tools for Accelerating Development and Usage of Multi-Core Computing Platforms

DTIC Science & Technology

2014-04-01

Government formulated or supplied the drawings, specifications, or other data does not license the holder or any other person or corporation ; or convey...multicore PDSP platforms. The GPU- based capabilities of TDIF are currently oriented towards NVIDIA GPUs, based on the Compute Unified Device Architecture...CUDA) programming language [ NVIDIA 2007], which can be viewed as an extension of C. The multicore PDSP capabilities currently in TDIF are oriented
The RISC-V Instruction Set Manual Volume 2: Privileged Architecture Version 1.7

DTIC Science & Technology

2015-05-09

DIG07-10227). Additional support came from Par Lab affiliates Nokia, NVIDIA , Oracle, and Samsung. • Project Isis: DoE Award DE-SC0003624. • ASPIRE...STARnet center funded by the Semiconductor Research Corporation . Additional sup- port from ASPIRE industrial sponsor, Intel, and ASPIRE affiliates...Google, Huawei, Nokia, NVIDIA , Oracle, and Samsung. The content of this paper does not necessarily reflect the position or the policy of the US
Nonenhanced peripheral MR-angiography (MRA) at 3 Tesla: evaluation of quiescent-interval single-shot MRA in patients undergoing digital subtraction angiography.

PubMed

Wagner, Moritz; Knobloch, Gesine; Gielen, Martin; Lauff, Marie-Teres; Romano, Valentina; Hamm, Bernd; Kröncke, Thomas

2015-04-01

Quiescent-interval single-shot MRA (QISS-MRA) is a promising nonenhanced imaging technique for assessment of peripheral arterial disease (PAD). Previous studies at 3 Tesla included only very limited numbers of patients for correlation of QISS-MRA with digital subtraction angiography (DSA) as standard of reference (SOR). The aim of this prospective institutional review board-approved study was to compare QISS-MRA at 3 Tesla with DSA in a larger patient group. Our study included 32 consecutive patients who underwent QISS-MRA, contrast-enhanced MRA (CE-MRA), and DSA. Two readers independently performed a per-segment evaluation of QISS-MRA and CE-MRA for image quality and identification of non-significant stenosis (<50%) versus significant stenosis (50-100%). The final dataset included 1,027 vessel segments. Reader 1 and 2 rated image quality as diagnostic in 96.8 and 98.0% of the vessel segments on QISS-MRA and in 99.3 and 98.4% of the vessel segments on CE-MRA, respectively. DSA was available for 922 segments and detected significant stenosis in 133 segments (14.4%). Consensus reading yielded the following diagnostic parameters for QISS-MRA versus CE-MRA: sensitivity: 83.5% (111/133) versus 82.7% (110/133), p = 0.80; specificity: 93.9% (741/789) versus 95.7% (755/789), p = 0.25; and diagnostic accuracy: 92.4% (852/922) versus 93.8% (865/922), p = 0.35. In conclusion, using DSA as SOR, QISS-MRA and CE-MRA at 3 Tesla showed similar diagnostic accuracy in the assessment of PAD. A limitation of QISS-MRA was the lower rate of assessable vessel segments compared to CE-MRA.
GPU Implementation of High Rayleigh Number Three-Dimensional Mantle Convection

NASA Astrophysics Data System (ADS)

Sanchez, D. A.; Yuen, D. A.; Wright, G. B.; Barnett, G. A.

2010-12-01

Although we have entered the age of petascale computing, many factors are still prohibiting high-performance computing (HPC) from infiltrating all suitable scientific disciplines. For this reason and others, application of GPU to HPC is gaining traction in the scientific world. With its low price point, high performance potential, and competitive scalability, GPU has been an option well worth considering for the last few years. Moreover with the advent of NVIDIA's Fermi architecture, which brings ECC memory, better double-precision performance, and more RAM to GPU, there is a strong message of corporate support for GPU in HPC. However many doubts linger concerning the practicality of using GPU for scientific computing. In particular, GPU has a reputation for being difficult to program and suitable for only a small subset of problems. Although inroads have been made in addressing these concerns, for many scientists GPU still has hurdles to clear before becoming an acceptable choice. We explore the applicability of GPU to geophysics by implementing a three-dimensional, second-order finite-difference model of Rayleigh-Benard thermal convection on an NVIDIA GPU using C for CUDA. Our code reaches sufficient resolution, on the order of 500x500x250 evenly-spaced finite-difference gridpoints, on a single GPU. We make extensive use of highly optimized CUBLAS routines, allowing us to achieve performance on the order of O( 0.1 ) µs per timestep*gridpoint at this resolution. This performance has allowed us to study high Rayleigh number simulations, on the order of 2x10^7, on a single GPU.
NVIDIA OptiX ray-tracing engine as a new tool for modelling medical imaging systems

NASA Astrophysics Data System (ADS)

Pietrzak, Jakub; Kacperski, Krzysztof; Cieślar, Marek

2015-03-01

The most accurate technique to model the X- and gamma radiation path through a numerically defined object is the Monte Carlo simulation which follows single photons according to their interaction probabilities. A simplified and much faster approach, which just integrates total interaction probabilities along selected paths, is known as ray tracing. Both techniques are used in medical imaging for simulating real imaging systems and as projectors required in iterative tomographic reconstruction algorithms. These approaches are ready for massive parallel implementation e.g. on Graphics Processing Units (GPU), which can greatly accelerate the computation time at a relatively low cost. In this paper we describe the application of the NVIDIA OptiX ray-tracing engine, popular in professional graphics and rendering applications, as a new powerful tool for X- and gamma ray-tracing in medical imaging. It allows the implementation of a variety of physical interactions of rays with pixel-, mesh- or nurbs-based objects, and recording any required quantities, like path integrals, interaction sites, deposited energies, and others. Using the OptiX engine we have implemented a code for rapid Monte Carlo simulations of Single Photon Emission Computed Tomography (SPECT) imaging, as well as the ray-tracing projector, which can be used in reconstruction algorithms. The engine generates efficient, scalable and optimized GPU code, ready to run on multi GPU heterogeneous systems. We have compared the results our simulations with the GATE package. With the OptiX engine the computation time of a Monte Carlo simulation can be reduced from days to minutes.
GPU-based multi-volume ray casting within VTK for medical applications.

PubMed

Bozorgi, Mohammadmehdi; Lindseth, Frank

2015-03-01

Multi-volume visualization is important for displaying relevant information in multimodal or multitemporal medical imaging studies. The main objective with the current study was to develop an efficient GPU-based multi-volume ray caster (MVRC) and validate the proposed visualization system in the context of image-guided surgical navigation. Ray casting can produce high-quality 2D images from 3D volume data but the method is computationally demanding, especially when multiple volumes are involved, so a parallel GPU version has been implemented. In the proposed MVRC, imaginary rays are sent through the volumes (one ray for each pixel in the view), and at equal and short intervals along the rays, samples are collected from each volume. Samples from all the volumes are composited using front to back α-blending. Since all the rays can be processed simultaneously, the MVRC was implemented in parallel on the GPU to achieve acceptable interactive frame rates. The method is fully integrated within the visualization toolkit (VTK) pipeline with the ability to apply different operations (e.g., transformations, clipping, and cropping) on each volume separately. The implemented method is cross-platform (Windows, Linux and Mac OSX) and runs on different graphics card (NVidia and AMD). The speed of the MVRC was tested with one to five volumes of varying sizes: 128(3), 256(3), and 512(3). A Tesla C2070 GPU was used, and the output image size was 600 × 600 pixels. The original VTK single-volume ray caster and the MVRC were compared when rendering only one volume. The multi-volume rendering system achieved an interactive frame rate (> 15 fps) when rendering five small volumes (128 (3) voxels), four medium-sized volumes (256(3) voxels), and two large volumes (512(3) voxels). When rendering single volumes, the frame rate of the MVRC was comparable to the original VTK ray caster for small and medium-sized datasets but was approximately 3 frames per second slower for large datasets. The MVRC was successfully integrated in an existing surgical navigation system and was shown to be clinically useful during an ultrasound-guided neurosurgical tumor resection. A GPU-based MVRC for VTK is a useful tool in medical visualization. The proposed multi-volume GPU-based ray caster for VTK provided high-quality images at reasonable frame rates. The MVRC was effective when used in a neurosurgical navigation application.
Communication Efficient Gaussian Elimination with Partial Pivoting using a Shape Morphing Data Layout

DTIC Science & Technology

2013-02-21

support comes from ParLab affiliates National Instruments, Nokia, NVIDIA , Oracle and Samsung, as well as MathWorks. Research is also supported by DOE...affiliates National Instruments, Nokia, NVIDIA , Oracle and Samsung, as well as MathWorks. Research is also supported by DOE grants DE-SC0004938, DE-SC0005136...International Business Machines Company , 1966. [17] S. Toledo. Locality of reference in LU decomposition with partial pivoting. SIAM J. Matrix Anal. Appl., 18

GPU Lossless Hyperspectral Data Compression System for Space Applications

NASA Technical Reports Server (NTRS)

Keymeulen, Didier; Aranki, Nazeeh; Hopson, Ben; Kiely, Aaron; Klimesh, Matthew; Benkrid, Khaled

2012-01-01

On-board lossless hyperspectral data compression reduces data volume in order to meet NASA and DoD limited downlink capabilities. At JPL, a novel, adaptive and predictive technique for lossless compression of hyperspectral data, named the Fast Lossless (FL) algorithm, was recently developed. This technique uses an adaptive filtering method and achieves state-of-the-art performance in both compression effectiveness and low complexity. Because of its outstanding performance and suitability for real-time onboard hardware implementation, the FL compressor is being formalized as the emerging CCSDS Standard for Lossless Multispectral & Hyperspectral image compression. The FL compressor is well-suited for parallel hardware implementation. A GPU hardware implementation was developed for FL targeting the current state-of-the-art GPUs from NVIDIA(Trademark). The GPU implementation on a NVIDIA(Trademark) GeForce(Trademark) GTX 580 achieves a throughput performance of 583.08 Mbits/sec (44.85 MSamples/sec) and an acceleration of at least 6 times a software implementation running on a 3.47 GHz single core Intel(Trademark) Xeon(Trademark) processor. This paper describes the design and implementation of the FL algorithm on the GPU. The massively parallel implementation will provide in the future a fast and practical real-time solution for airborne and space applications.
Non-enhanced magnetic resonance imaging of the small bowel at 7 Tesla in comparison to 1.5 Tesla: First steps towards clinical application.

PubMed

Hahnemann, Maria L; Kraff, Oliver; Maderwald, Stefan; Johst, Soeren; Orzada, Stephan; Umutlu, Lale; Ladd, Mark E; Quick, Harald H; Lauenstein, Thomas C

2016-06-01

To perform non-enhanced (NE) magnetic resonance imaging (MRI) of the small bowel at 7 Tesla (7T) and to compare it with 1.5 Tesla (1.5T). Twelve healthy subjects were prospectively examined using a 1.5T and 7T MRI system. Coronal and axial true fast imaging with steady-state precession (TrueFISP) imaging and a coronal T2-weighted (T2w) half-Fourier acquisition single-shot turbo spin-echo (HASTE) sequence were acquired. Image analysis was performed by 1) visual evaluation of tissue contrast and detail detectability, 2) measurement and calculation of contrast ratios and 3) assessment of artifacts. NE MRI of the small bowel at 7T was technically feasible. In the vast majority of the cases, tissue contrast and image details were equivalent at both field strengths. At 7T, two cases revealed better detail detectability in the TrueFISP, and better contrast in the HASTE. Susceptibility artifacts and B1 inhomogeneities were significantly increased at 7T. This study provides first insights into NE ultra-high field MRI of the small bowel and may be considered an important step towards high quality T2w abdominal imaging at 7T MRI. Copyright © 2016 Elsevier Inc. All rights reserved.
Prism-based single-camera system for stereo display

NASA Astrophysics Data System (ADS)

Zhao, Yue; Cui, Xiaoyu; Wang, Zhiguo; Chen, Hongsheng; Fan, Heyu; Wu, Teresa

2016-06-01

This paper combines the prism and single camera and puts forward a method of stereo imaging with low cost. First of all, according to the principle of geometrical optics, we can deduce the relationship between the prism single-camera system and dual-camera system, and according to the principle of binocular vision we can deduce the relationship between binoculars and dual camera. Thus we can establish the relationship between the prism single-camera system and binoculars and get the positional relation of prism, camera, and object with the best effect of stereo display. Finally, using the active shutter stereo glasses of NVIDIA Company, we can realize the three-dimensional (3-D) display of the object. The experimental results show that the proposed approach can make use of the prism single-camera system to simulate the various observation manners of eyes. The stereo imaging system, which is designed by the method proposed by this paper, can restore the 3-D shape of the object being photographed factually.
Particle In Cell Codes on Highly Parallel Architectures

NASA Astrophysics Data System (ADS)

Tableman, Adam

2014-10-01

We describe strategies and examples of Particle-In-Cell Codes running on Nvidia GPU and Intel Phi architectures. This includes basic implementations in skeletons codes and full-scale development versions (encompassing 1D, 2D, and 3D codes) in Osiris. Both the similarities and differences between Intel's and Nvidia's hardware will be examined. Work supported by grants NSF ACI 1339893, DOE DE SC 000849, DOE DE SC 0008316, DOE DE NA 0001833, and DOE DE FC02 04ER 54780.
Poster: Building a Large Tiled-Display Cluster

DTIC Science & Technology

2012-10-01

graphics cards ( Nvidia Quadro FX 5800), and each graphics ∗e-mail: mark.livingston@nrl.navy.mil †e-mail: jonathan.decker@nrl.navy.mil card in a display...such as DisplayPort and HDMI (see: Nvidia Quadro 6000). We recommend these formats because they are much easier to plug-and-play. 3.4 Leverage Open...will find yourself with all the issues related to owning a server room. Today, there are a number of companies offering turn-key so- lutions for tiled
Breath-hold imaging of the coronary arteries using Quiescent-Interval Slice-Selective (QISS) magnetic resonance angiography: pilot study at 1.5 Tesla and 3 Tesla.

PubMed

Edelman, Robert R; Giri, S; Pursnani, A; Botelho, M P F; Li, W; Koktzoglou, I

2015-11-23

Coronary magnetic resonance angiography (MRA) is usually obtained with a free-breathing navigator-gated 3D acquisition. Our aim was to develop an alternative breath-hold approach that would allow the coronary arteries to be evaluated in a much shorter time and without risk of degradation by respiratory motion artifacts. For this purpose, we implemented a breath-hold, non-contrast-enhanced, quiescent-interval slice-selective (QISS) 2D technique. Sequence performance was compared at 1.5 and 3 Tesla using both radial and Cartesian k-space trajectories. The left coronary circulation was imaged in six healthy subjects and two patients with coronary artery disease. Breath-hold QISS was compared with T2-prepared 2D balanced steady-state free-precession (bSSFP) and free-breathing, navigator-gated 3D bSSFP. Approximately 10 2.1-mm thick slices were acquired in a single ~20-s breath-hold using two-shot QISS. QISS contrast-to-noise ratio (CNR) was 1.5-fold higher at 3 Tesla than at 1.5 Tesla. Cartesian QISS provided the best coronary-to-myocardium CNR, whereas radial QISS provided the sharpest coronary images. QISS image quality exceeded that of free-breathing 3D coronary MRA with few artifacts at either field strength. Compared with T2-prepared 2D bSSFP, multi-slice capability was not restricted by the specific absorption rate at 3 Tesla and pericardial fluid signal was better suppressed. In addition to depicting the coronary arteries, QISS could image intra-cardiac structures, pericardium, and the aortic root in arbitrary slice orientations. Breath-hold QISS is a simple, versatile, and time-efficient method for coronary MRA that provides excellent image quality at both 1.5 and 3 Tesla. Image quality exceeded that of free-breathing, navigator-gated 3D MRA in a much shorter scan time. QISS also allowed rapid multi-slice bright-blood, diastolic phase imaging of the heart, which may have complementary value to multi-phase cine imaging. We conclude that, with further clinical validation, QISS might provide an efficient alternative to commonly used free-breathing coronary MRA techniques.
PLACD-7T Study: Atherosclerotic Carotid Plaque Components Correlated with Cerebral Damage at 7 Tesla Magnetic Resonance Imaging.

PubMed

den Hartog, A G; Bovens, S M; Koning, W; Hendrikse, J; Pasterkamp, G; Moll, F L; de Borst, G J

2011-02-01

In patients with carotid artery stenosis histological plaque composition is associated with plaque stability and with presenting symptomatology. Preferentially, plaque vulnerability should be taken into account in pre-operative work-up of patients with severe carotid artery stenosis. However, currently no appropriate and conclusive (non-) invasive technique to differentiate between the high and low risk carotid artery plaque in vivo is available. We propose that 7 Tesla human high resolution MRI scanning will visualize carotid plaque characteristics more precisely and will enable correlation of these specific components with cerebral damage. The aim of the PlaCD-7T study is 1: to correlate 7T imaging with carotid plaque histology (gold standard); and 2: to correlate plaque characteristics with cerebral damage ((clinically silent) cerebral (micro) infarcts or bleeds) on 7 Tesla high resolution (HR) MRI. We propose a single center prospective study for either symptomatic or asymptomatic patients with haemodynamic significant (70%) stenosis of at least one of the carotid arteries. The Athero-Express (AE) biobank histological analysis will be derived according to standard protocol. Patients included in the AE and our prospective study will undergo a pre-operative 7 Tesla HR-MRI scan of both the head and neck area. We hypothesize that the 7 Tesla MRI scanner will allow early identification of high risk carotid plaques being associated with micro infarcted cerebral areas, and will thus be able to identify patients with a high risk of periprocedural stroke, by identification of surrogate measures of increased cardiovascular risk.
PLACD-7T Study: Atherosclerotic Carotid Plaque Components Correlated with Cerebral Damage at 7 Tesla Magnetic Resonance Imaging

PubMed Central

den Hartog, A.G; Bovens, S.M; Koning, W; Hendrikse, J; Pasterkamp, G; Moll, F.L; de Borst, G.J

2011-01-01

Introduction: In patients with carotid artery stenosis histological plaque composition is associated with plaque stability and with presenting symptomatology. Preferentially, plaque vulnerability should be taken into account in pre-operative work-up of patients with severe carotid artery stenosis. However, currently no appropriate and conclusive (non-) invasive technique to differentiate between the high and low risk carotid artery plaque in vivo is available. We propose that 7 Tesla human high resolution MRI scanning will visualize carotid plaque characteristics more precisely and will enable correlation of these specific components with cerebral damage. Study objective: The aim of the PlaCD-7T study is 1: to correlate 7T imaging with carotid plaque histology (gold standard); and 2: to correlate plaque characteristics with cerebral damage ((clinically silent) cerebral (micro) infarcts or bleeds) on 7 Tesla high resolution (HR) MRI. Design: We propose a single center prospective study for either symptomatic or asymptomatic patients with haemodynamic significant (70%) stenosis of at least one of the carotid arteries. The Athero-Express (AE) biobank histological analysis will be derived according to standard protocol. Patients included in the AE and our prospective study will undergo a pre-operative 7 Tesla HR-MRI scan of both the head and neck area. Discussion: We hypothesize that the 7 Tesla MRI scanner will allow early identification of high risk carotid plaques being associated with micro infarcted cerebral areas, and will thus be able to identify patients with a high risk of periprocedural stroke, by identification of surrogate measures of increased cardiovascular risk. PMID:22294972
Phase-encoded single-voxel magnetic resonance spectroscopy for suppressing outer volume signals at 7 Tesla.

PubMed

Li, Ningzhi; An, Li; Johnson, Christopher; Shen, Jun

2017-01-01

Due to imperfect slice profiles, unwanted signals from outside the selected voxel may significantly contaminate metabolite signals acquired using in vivo magnetic resonance spectroscopy (MRS). The use of outer volume suppression may exceed the SAR threshold, especially at high field. We propose using phase-encoding gradients after radiofrequency (RF) excitation to spatially encode unwanted signals originating from outside of the selected single voxel. Phase-encoding gradients were added to a standard single voxel point-resolved spectroscopy (PRESS) sequence which selects a 2 × 2 × 2 cm 3 voxel. Subsequent spatial Fourier transform was used to encode outer volume signals. Phantom and in vivo experiments were performed using both phase-encoded PRESS and standard PRESS at 7 Tesla. Quantification was performed using fitting software developed in-house. Both phantom and in vivo studies showed that spectra from the phase-encoded PRESS sequence were relatively immune from contamination by oil signals and have more accurate quantification results than spectra from standard PRESS spectra of the same voxel. The proposed phase-encoded single-voxel PRESS method can significantly suppress outer volume signals that may appear in the spectra of standard PRESS without increasing RF power deposition.
PBOOST: a GPU-based tool for parallel permutation tests in genome-wide association studies.

PubMed

Yang, Guangyuan; Jiang, Wei; Yang, Qiang; Yu, Weichuan

2015-05-01

The importance of testing associations allowing for interactions has been demonstrated by Marchini et al. (2005). A fast method detecting associations allowing for interactions has been proposed by Wan et al. (2010a). The method is based on likelihood ratio test with the assumption that the statistic follows the χ(2) distribution. Many single nucleotide polymorphism (SNP) pairs with significant associations allowing for interactions have been detected using their method. However, the assumption of χ(2) test requires the expected values in each cell of the contingency table to be at least five. This assumption is violated in some identified SNP pairs. In this case, likelihood ratio test may not be applicable any more. Permutation test is an ideal approach to checking the P-values calculated in likelihood ratio test because of its non-parametric nature. The P-values of SNP pairs having significant associations with disease are always extremely small. Thus, we need a huge number of permutations to achieve correspondingly high resolution for the P-values. In order to investigate whether the P-values from likelihood ratio tests are reliable, a fast permutation tool to accomplish large number of permutations is desirable. We developed a permutation tool named PBOOST. It is based on GPU with highly reliable P-value estimation. By using simulation data, we found that the P-values from likelihood ratio tests will have relative error of >100% when 50% cells in the contingency table have expected count less than five or when there is zero expected count in any of the contingency table cells. In terms of speed, PBOOST completed 10(7) permutations for a single SNP pair from the Wellcome Trust Case Control Consortium (WTCCC) genome data (Wellcome Trust Case Control Consortium, 2007) within 1 min on a single Nvidia Tesla M2090 device, while it took 60 min in a single CPU Intel Xeon E5-2650 to finish the same task. More importantly, when simultaneously testing 256 SNP pairs for 10(7) permutations, our tool took only 5 min, while the CPU program took 10 h. By permuting on a GPU cluster consisting of 40 nodes, we completed 10(12) permutations for all 280 SNP pairs reported with P-values smaller than 1.6 × 10⁻¹² in the WTCCC datasets in 1 week. The source code and sample data are available at http://bioinformatics.ust.hk/PBOOST.zip. gyang@ust.hk; eeyu@ust.hk Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Metabolite-cycled density-weighted concentric rings k-space trajectory (DW-CRT) enables high-resolution 1 H magnetic resonance spectroscopic imaging at 3-Tesla.

PubMed

Steel, Adam; Chiew, Mark; Jezzard, Peter; Voets, Natalie L; Plaha, Puneet; Thomas, Michael Albert; Stagg, Charlotte J; Emir, Uzay E

2018-05-17

Magnetic resonance spectroscopic imaging (MRSI) is a promising technique in both experimental and clinical settings. However, to date, MRSI has been hampered by prohibitively long acquisition times and artifacts caused by subject motion and hardware-related frequency drift. In the present study, we demonstrate that density weighted concentric ring trajectory (DW-CRT) k-space sampling in combination with semi-LASER excitation and metabolite-cycling enables high-resolution MRSI data to be rapidly acquired at 3 Tesla. Single-slice full-intensity MRSI data (short echo time (TE) semi-LASER TE = 32 ms) were acquired from 6 healthy volunteers with an in-plane resolution of 5 × 5 mm in 13 min 30 sec using this approach. Using LCModel analysis, we found that the acquired spectra allowed for the mapping of total N-acetylaspartate (median Cramer-Rao Lower Bound [CRLB] = 3%), glutamate+glutamine (8%), and glutathione (13%). In addition, we demonstrate potential clinical utility of this technique by optimizing the TE to detect 2-hydroxyglutarate (long TE semi-LASER, TE = 110 ms), to produce relevant high-resolution metabolite maps of grade III IDH-mutant oligodendroglioma in a single patient. This study demonstrates the potential utility of MRSI in the clinical setting at 3 Tesla.
An open data repository and a data processing software toolset of an equivalent Nordic grid model matched to historical electricity market data.

PubMed

Vanfretti, Luigi; Olsen, Svein H; Arava, V S Narasimham; Laera, Giuseppe; Bidadfar, Ali; Rabuzin, Tin; Jakobsen, Sigurd H; Lavenius, Jan; Baudette, Maxime; Gómez-López, Francisco J

2017-04-01

This article presents an open data repository, the methodology to generate it and the associated data processing software developed to consolidate an hourly snapshot historical data set for the year 2015 to an equivalent Nordic power grid model (aka Nordic 44), the consolidation was achieved by matching the model׳s physical response w.r.t historical power flow records in the bidding regions of the Nordic grid that are available from the Nordic electricity market agent, Nord Pool. The model is made available in the form of CIM v14, Modelica and PSS/E (Siemens PTI) files. The Nordic 44 model in Modelica and PSS/E were first presented in the paper titled "iTesla Power Systems Library (iPSL): A Modelica library for phasor time-domain simulations" (Vanfretti et al., 2016) [1] for a single snapshot. In the digital repository being made available with the submission of this paper (SmarTSLab_Nordic44 Repository at Github, 2016) [2], a total of 8760 snapshots (for the year 2015) that can be used to initialize and execute dynamic simulations using tools compatible with CIM v14, the Modelica language and the proprietary PSS/E tool are provided. The Python scripts to generate the snapshots (processed data) are also available with all the data in the GitHub repository (SmarTSLab_Nordic44 Repository at Github, 2016) [2]. This Nordic 44 equivalent model was also used in iTesla project (iTesla) [3] to carry out simulations within a dynamic security assessment toolset (iTesla, 2016) [4], and has been further enhanced during the ITEA3 OpenCPS project (iTEA3) [5]. The raw, processed data and output models utilized within the iTesla platform (iTesla, 2016) [4] are also available in the repository. The CIM and Modelica snapshots of the "Nordic 44" model for the year 2015 are available in a Zenodo repository.
1.5 versus 3 versus 7 Tesla in abdominal MRI: A comparative study.

PubMed

Laader, Anja; Beiderwellen, Karsten; Kraff, Oliver; Maderwald, Stefan; Wrede, Karsten; Ladd, Mark E; Lauenstein, Thomas C; Forsting, Michael; Quick, Harald H; Nassenstein, Kai; Umutlu, Lale

2017-01-01

The aim of this study was to investigate and compare the feasibility as well as potential impact of altered magnetic field properties on image quality and potential artifacts of 1.5 Tesla, 3 Tesla and 7 Tesla non-enhanced abdominal MRI. Magnetic Resonance (MR) imaging of the upper abdomen was performed in 10 healthy volunteers on a 1.5 Tesla, a 3 Tesla and a 7 Tesla MR system. The study protocol comprised a (1) T1-weighted fat-saturated spoiled gradient-echo sequence (2D FLASH), (2) T1-weighted fat-saturated volumetric interpolated breath hold examination sequence (3D VIBE), (3) T1-weighted 2D in and opposed phase sequence, (4) True fast imaging with steady-state precession sequence (TrueFISP) and (5) T2-weighted turbo spin-echo (TSE) sequence. For comparison reasons field of view and acquisition times were kept comparable for each correlating sequence at all three field strengths, while trying to achieve the highest possible spatial resolution. Qualitative and quantitative analyses were tested for significant differences. While 1.5 and 3 Tesla MRI revealed comparable results in all assessed features and sequences, 7 Tesla MRI yielded considerable differences in T1 and T2 weighted imaging. Benefits of 7 Tesla MRI encompassed an increased higher spatial resolution and a non-enhanced hyperintense vessel signal at 7 Tesla, potentially offering a more accurate diagnosis of abdominal parenchymatous and vasculature disease. 7 Tesla MRI was also shown to be more impaired by artifacts, including residual B1 inhomogeneities, susceptibility and chemical shift artifacts, resulting in reduced overall image quality and overall image impairment ratings. While 1.5 and 3 Tesla T2w imaging showed equivalently high image quality, 7 Tesla revealed strong impairments in its diagnostic value. Our results demonstrate the feasibility and overall comparable imaging ability of T1-weighted 7 Tesla abdominal MRI towards 3 Tesla and 1.5 Tesla MRI, yielding a promising diagnostic potential for non-enhanced Magnetic Resonance Angiography (MRA). 1.5 Tesla and 3 Tesla offer comparably high-quality T2w imaging, showing superior diagnostic quality over 7 Tesla MRI.
1.5 versus 3 versus 7 Tesla in abdominal MRI: A comparative study

PubMed Central

Beiderwellen, Karsten; Kraff, Oliver; Maderwald, Stefan; Wrede, Karsten; Ladd, Mark E.; Lauenstein, Thomas C.; Forsting, Michael; Quick, Harald H.; Nassenstein, Kai; Umutlu, Lale

2017-01-01

Objectives The aim of this study was to investigate and compare the feasibility as well as potential impact of altered magnetic field properties on image quality and potential artifacts of 1.5 Tesla, 3 Tesla and 7 Tesla non-enhanced abdominal MRI. Materials and methods Magnetic Resonance (MR) imaging of the upper abdomen was performed in 10 healthy volunteers on a 1.5 Tesla, a 3 Tesla and a 7 Tesla MR system. The study protocol comprised a (1) T1-weighted fat-saturated spoiled gradient-echo sequence (2D FLASH), (2) T1-weighted fat-saturated volumetric interpolated breath hold examination sequence (3D VIBE), (3) T1-weighted 2D in and opposed phase sequence, (4) True fast imaging with steady-state precession sequence (TrueFISP) and (5) T2-weighted turbo spin-echo (TSE) sequence. For comparison reasons field of view and acquisition times were kept comparable for each correlating sequence at all three field strengths, while trying to achieve the highest possible spatial resolution. Qualitative and quantitative analyses were tested for significant differences. Results While 1.5 and 3 Tesla MRI revealed comparable results in all assessed features and sequences, 7 Tesla MRI yielded considerable differences in T1 and T2 weighted imaging. Benefits of 7 Tesla MRI encompassed an increased higher spatial resolution and a non-enhanced hyperintense vessel signal at 7 Tesla, potentially offering a more accurate diagnosis of abdominal parenchymatous and vasculature disease. 7 Tesla MRI was also shown to be more impaired by artifacts, including residual B1 inhomogeneities, susceptibility and chemical shift artifacts, resulting in reduced overall image quality and overall image impairment ratings. While 1.5 and 3 Tesla T2w imaging showed equivalently high image quality, 7 Tesla revealed strong impairments in its diagnostic value. Conclusions Our results demonstrate the feasibility and overall comparable imaging ability of T1-weighted 7 Tesla abdominal MRI towards 3 Tesla and 1.5 Tesla MRI, yielding a promising diagnostic potential for non-enhanced Magnetic Resonance Angiography (MRA). 1.5 Tesla and 3 Tesla offer comparably high-quality T2w imaging, showing superior diagnostic quality over 7 Tesla MRI. PMID:29125850
Multicore and GPU algorithms for Nussinov RNA folding

PubMed Central

2014-01-01

Background One segment of a RNA sequence might be paired with another segment of the same RNA sequence due to the force of hydrogen bonds. This two-dimensional structure is called the RNA sequence's secondary structure. Several algorithms have been proposed to predict an RNA sequence's secondary structure. These algorithms are referred to as RNA folding algorithms. Results We develop cache efficient, multicore, and GPU algorithms for RNA folding using Nussinov's algorithm. Conclusions Our cache efficient algorithm provides a speedup between 1.6 and 3.0 relative to a naive straightforward single core code. The multicore version of the cache efficient single core algorithm provides a speedup, relative to the naive single core algorithm, between 7.5 and 14.0 on a 6 core hyperthreaded CPU. Our GPU algorithm for the NVIDIA C2050 is up to 1582 times as fast as the naive single core algorithm and between 5.1 and 11.2 times as fast as the fastest previously known GPU algorithm for Nussinov RNA folding. PMID:25082539
Optimization of Selected Remote Sensing Algorithms for Embedded NVIDIA Kepler GPU Architecture

NASA Technical Reports Server (NTRS)

Riha, Lubomir; Le Moigne, Jacqueline; El-Ghazawi, Tarek

2015-01-01

This paper evaluates the potential of embedded Graphic Processing Units in the Nvidias Tegra K1 for onboard processing. The performance is compared to a general purpose multi-core CPU and full fledge GPU accelerator. This study uses two algorithms: Wavelet Spectral Dimension Reduction of Hyperspectral Imagery and Automated Cloud-Cover Assessment (ACCA) Algorithm. Tegra K1 achieved 51 for ACCA algorithm and 20 for the dimension reduction algorithm, as compared to the performance of the high-end 8-core server Intel Xeon CPU with 13.5 times higher power consumption.
NIKOLA TESLA AND MEDICINE: 160TH ANNIVERSARY OF THE BIRTH OF THE GENIUS WHO GAVE LIGHT TO THE WORLD - PART II.

PubMed

Vucevic, Danijela; Dordevic, Drago; Radosavljevic, Tatjana

2016-11-01

Nikola Tesla (1856- 1943) was a genius inventor and scientist, whose contribution to medicine is remarkable. Part I of this article reviewed special contributions of the world renowned scientist to the establishment of radiology as a new discipline in medicine. This paper deals with the use of Tesla currents in medicine. Tesla Currents in Medicine. Tesla's greatest impact on medicine is his invention of a transformer (Tesla coil) for producing high frequency and high voltage currents (Tesla currents). Tesla currents are used in diathermy, as they, while passing through the body, transform electrical energy into a therapeutic heat. In 1891, Tesla passed currents through his own body and was the first to experience their beneficial effects. He kept correspondence on electrotherapy with J. Dugan and S. H. Monell. In 1896, he used high frequency currents and designed an ozone generator for producing ozone, with powerful antiseptic and antibacterial properties. Tesla is famous for his extensive experiments with mechanical vibrations and resonance, examining their effects on the organ ism and pioneering their use for medical purposes. Tesla also designed an oscillator to relieve fatigue of the leg muscles. It is less known that Tesla's inventions (Tesla coil and wireless remote control) are widely used in modern medical equipment. Apart from this, wireless technology is nowadays widely applied in numerous diagnostic and therapeutic procedures. Nikola Tesla was the last Renais- sance figure of the modern era. Tesla bridged three centuries and two millennia by his inventions, and permanently indebted humankind by his epochal discoveries.
[Studies of three-dimensional cardiac late gadolinium enhancement MRI at 3.0 Tesla].

PubMed

Ishimoto, Takeshi; Ishihara, Masaru; Ikeda, Takayuki; Kawakami, Momoe

2008-12-20

Cardiac late Gadolinium enhancement MR imaging has been shown to allow assessment of myocardial viability in patients with ischemic heart disease. The current standard approach is a 3D inversion recovery sequence at 1.5 Tesla. The aims of this study were to evaluate the technique feasibility and clinical utility of MR viability imaging at 3.0 Tesla in patients with myocardial infarction and cardiomyopathy. In phantom and volunteer studies, the inversion time required to suppress the signal of interests and tissues was prolonged at 3.0 Tesla. In the clinical study, the average inversion time to suppress the signal of myocardium at 3.0 Tesla with respect to MR viability imaging at 1.5 Tesla was at 15 min after the administration of contrast agent (304.0+/-29.2 at 3.0 Tesla vs. 283.9+/-20.9 at 1.5 Tesla). The contrast between infarction and viable myocardium was equal at both field strengths (4.06+/-1.30 at 3.0 Tesla vs. 4.42+/-1.85 at 1.5 Tesla). Even at this early stage, MR viability imaging at 3.0 Tesla provides high quality images in patients with myocardial infarction. The inversion time is significantly prolonged at 3.0 Tesla. The contrast between infarction and viable myocardium at 3.0 Tesla are equal to 1.5 Tesla. Further investigation is needed for this technical improvement, for clinical evaluation, and for limitations.
Artefacts induced by coiled intracranial aneurysms on 3.0-Tesla versus 1.5-Tesla MR angiography--An in vivo and in vitro study.

PubMed

Schaafsma, Joanna D; Velthuis, Birgitta K; Vincken, Koen L; de Kort, Gerard A P; Rinkel, Gabriel J E; Bartels, Lambertus W

2014-05-01

To compare metal-induced artefacts from coiled intracranial aneurysms on 3.0-Tesla and 1.5-Tesla magnetic resonance angiography (MRA), since concerns persist on artefact enlargement at 3.0Tesla. We scanned 19 patients (mean age 53; 16 women) with 20 saccular aneurysms treated with coils only, at 1.5 and 3.0Tesla according to standard clinical 3D TOF-MRA protocols containing a shorter echo-time but weaker read-out gradient at 3.0Tesla in addition to intra-arterial digital subtraction angiography (IA-DSA). Per modality two neuro-radiologists assessed the occlusion status, measured residual flow, and indicated whether coil artefacts disturbed this assessment on MRA. We assessed relative risks for disturbance by coil artefacts, weighted kappa's for agreement on occlusion levels, and we compared remnant sizes. For artefact measurements, a coil model was created and scanned with the same protocols followed by 2D MR scans with variation of echo-time and read-out gradient strength. Coil artefacts disturbed assessments less frequently at 3.0Tesla than at 1.5Tesla (RR: 0.3; 95%CI: 0.1-0.8). On 3.0-Tesla MRA, remnants were larger than on 1.5-Tesla MRA (difference: 0.7mm; 95%CI: 0.3-1.1) and larger than on IA-DSA (difference: 1.0mm; 95%CI: 0.6-1.5) with similar agreement on occlusion levels with IA-DSA for both field strengths (κ 0.53; 95%CI: 0.23-0.84 for 1.5-Tesla MRA and IA-DSA; κ 0.47; 95%CI: 0.19-0.76 for 3.0-Tesla MRA and IA-DSA). Coil model artefacts were smaller at 3.0Tesla than at 1.5Tesla. The echo-time influenced artefact size more than the read-out gradient. Artefacts were not larger, but smaller at 3.0Tesla because a shorter echo-time at 3.0Tesla negated artefact enlargement. Despite smaller artefacts and larger remnants at 3.0Tesla, occlusion levels were similar for both field strengths. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Experimental determination of the bulk Rashba parameters in BiTeBr

NASA Astrophysics Data System (ADS)

Martin, C.; Suslov, A. V.; Buvaev, S.; Hebard, A. F.; Bugnon, P.; Berger, H.; Magrez, A.; Tanner, D. B.

2016-12-01

Shubnikov-de Haas (SdH) oscillations, Hall effect, and optical reflectance (R(ω)) measurements have been performed on single crystals of BiTeBr. Under magnetic fields up to 32 tesla and at temperatures as low as 0.4 K, the SdH data shows a single oscillation frequency F = 102 +/- 5 \\text{tesla} . The combined transport and optical studies establish that the SdH effect originates from the Rashba spin-split bulk conduction band, with the chemical potential situated about 13 meV below the crossing (Dirac) point. The bulk carrier concentration was ne≈5×1018 \\text{cm}-3 and the effective mass m1*= 0.16m0 . Combining SdH and optical data, we reliably determine the Rashba parameters for the bulk conduction band of BiTeBr: the Rashba energy ER = 28 \\text{meV} and the momentum spin-split kR = 0.033 \\unicode{8491}-1 . Hence, the bulk Rashba coupling strength αR = 2ER/kR is found to be 1.7 eVÅ.

Extreme Material Physical Properties and Measurements above 100 tesla

NASA Astrophysics Data System (ADS)

Mielke, Charles

2011-03-01

The National High Magnetic Field Laboratory (NHMFL) Pulsed Field Facility (PFF) at Los Alamos National Laboratory (LANL) offers extreme environments of ultra high magnetic fields above 100 tesla by use of the Single Turn method as well as fields approaching 100 tesla with more complex methods. The challenge of metrology in the extreme magnetic field generating devices is complicated by the millions of amperes of current and tens of thousands of volts that are required to deliver the pulsed power needed for field generation. Methods of detecting physical properties of materials are essential parts of the science that seeks to understand and eventually control the fundamental functionality of materials in extreme environments. De-coupling the signal of the sample from the electro-magnetic interference associated with the magnet system is required to make these state-of-the-art magnetic fields useful to scientists studying materials in high magnetic fields. The cutting edge methods that are being used as well as methods in development will be presented with recent results in Graphene and High-Tc superconductors along with the methods and challenges. National Science Foundation DMR-Award 0654118.
[70 years of Nikola Tesla studies].

PubMed

Juznic, Stanislav

2013-01-01

Nikola Tesla's studies of chemistry are described including his not very scholarly affair in Maribor. After almost a century and half of hypothesis at least usable scenario of Tesla's life and "work" in Maribor is provided. The chemistry achievements of Tesla's most influential professors Martin Sekulić and Tesla's Graz professors are put into the limelight. The fact that Tesla in Graz studied on the technological chemistry Faculty of Polytechnic is focused.
Do Recent Advances in MR Technologies Contribute to Better Gamma Knife Radiosurgery Treatment Results for Brain Metastases?

PubMed

Hayashi, M; Yamamoto, M; Nishimura, C; Satoh, H

2007-10-31

The detection of intracerebral lesions has improved greatly with advancements in MR imaging, especially the greater sensitivity of the 1.5 Tesla unit versus the older 1.0 Tesla unit. We aimed to determine whether improvements in MR imaging have actually improved diagnostic capabilities and treatment outcomes in gamma knife radiosurgery (GKRS) for brain metastases (METs). Ours was a retrospective study of a consecutive series of 1179 patients (441 females, 738 males, mean age: 63 years, range: 19-92 years) with brain METs who underwent GKRS from 1998 to 2004. Our treatment policy was to irradiate all lesions visible on MR images during a single GKRS session. Mean and median tumor numbers were seven and three (range; 1-74). The 1179 patients were divided into two groups: a 1.0 T-group of 660 patients examined using a 1.0 Tesla MR unit before August,2002, and a 1.5 T-group of 519 examined using a 1.5 Tesla MR unit after September 2002. In the 1.5 T-group, lesion volumes as small as 0.004 cc were detected with a 5 mm slice thickness. The corresponding lesion size was 0.013 cc in the 1.0 T-group. One or more lesions invisible on a 5 mm slice study were additionally detected on a 2 mm slice study in 47.8% of patients in the 1.0 T-group and 25.2% in the 1.5 T-group (p<.0001). The median survival time (MST) in the 1.5 T-group was significantly longer than that in the 1.0 T-group (8.4 vs. 6.3 months, p=.0004). Due to biases in patient numbers between the two groups, we analyzed subgroups with KPS of 80% or better, no neurological deficits, stable primary tumors, lung cancer, tumor numbers of four or less and tumor volumes of 10.0 cc or smaller. In every subgroup analysis, the MSTs of the 1.5-Tesla group were significantly longer than those of the 1.0-Tesla group. The prognosis of a cancer patient is undoubtedly influenced by multiple factors. Nevertheless, we conclude that application of the 1.5 Tesla MR unit has had a favorable impact on diagnosis and GKRS treatment results in patients with brain METs.
Scaling Deep Learning Workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gawande, Nitin A.; Landwehr, Joshua B.; Daily, Jeffrey A.

Deep Learning (DL) algorithms have become ubiquitous in data analytics. As a result, major computing vendors --- including NVIDIA, Intel, AMD and IBM --- have architectural road-maps influenced by DL workloads. Furthermore, several vendors have recently advertised new computing products as accelerating DL workloads. Unfortunately, it is difficult for data scientists to quantify the potential of these different products. This paper provides a performance and power analysis of important DL workloads on two major parallel architectures: NVIDIA DGX-1 (eight Pascal P100 GPUs interconnected with NVLink) and Intel Knights Landing (KNL) CPUs interconnected with Intel Omni-Path. Our evaluation consists of amore » cross section of convolutional neural net workloads: CifarNet, CaffeNet, AlexNet and GoogleNet topologies using the Cifar10 and ImageNet datasets. The workloads are vendor optimized for each architecture. GPUs provide the highest overall raw performance. Our analysis indicates that although GPUs provide the highest overall performance, the gap can close for some convolutional networks; and KNL can be competitive when considering performance/watt. Furthermore, NVLink is critical to GPU scaling.« less
Performance of a 12-coil superconducting 'bumpy torus' magnet facility.

NASA Technical Reports Server (NTRS)

Roth, J. R.; Holmes, A. D.; Keller, T. A.; Krawczonek, W. M.

1972-01-01

The NASA-Lewis 'bumpy torus' facility consists of 12 superconducting coils, each 19 cm ID and capable of 3.0 tesla on their axes. The coils are equally spaced around a toroidal array with a major diameter of 1.52 m, and are mounted with the major axis of the torus vertical in a single vacuum tank 2.6 m in diameter. Final shakedown tests of the facility mapped out its magnetic, cryogenic, vacuum, mechanical, and electrical performance. The facility is now ready for use as a plasma physics research facility. A maximum magnetic field on the magnetic axis of 3.23 teslas has been held for a period of more than sixty minutes without a coil normalcy.
Plasma generating apparatus for large area plasma processing

DOEpatents

Tsai, C.C.; Gorbatkin, S.M.; Berry, L.A.

1991-07-16

A plasma generating apparatus for plasma processing applications is based on a permanent magnet line-cusp plasma confinement chamber coupled to a compact single-coil microwave waveguide launcher. The device creates an electron cyclotron resonance (ECR) plasma in the launcher and a second ECR plasma is created in the line cusps due to a 0.0875 tesla magnetic field in that region. Additional special magnetic field configuring reduces the magnetic field at the substrate to below 0.001 tesla. The resulting plasma source is capable of producing large-area (20-cm diam), highly uniform (.+-.5%) ion beams with current densities above 5 mA/cm[sup 2]. The source has been used to etch photoresist on 5-inch diam silicon wafers with good uniformity. 3 figures.
Plasma generating apparatus for large area plasma processing

DOEpatents

Tsai, Chin-Chi; Gorbatkin, Steven M.; Berry, Lee A.

1991-01-01

A plasma generating apparatus for plasma processing applications is based on a permanent magnet line-cusp plasma confinement chamber coupled to a compact single-coil microwave waveguide launcher. The device creates an electron cyclotron resonance (ECR) plasma in the launcher and a second ECR plasma is created in the line cusps due to a 0.0875 tesla magnetic field in that region. Additional special magnetic field configuring reduces the magnetic field at the substrate to below 0.001 tesla. The resulting plasma source is capable of producing large-area (20-cm diam), highly uniform (.+-.5%) ion beams with current densities above 5 mA/cm.sup.2. The source has been used to etch photoresist on 5-inch diam silicon wafers with good uniformity.
Rapid Parametric Mapping of the Longitudinal Relaxation Time T1 Using Two-Dimensional Variable Flip Angle Magnetic Resonance Imaging at 1.5 Tesla, 3 Tesla, and 7 Tesla

PubMed Central

Dieringer, Matthias A.; Deimling, Michael; Santoro, Davide; Wuerfel, Jens; Madai, Vince I.; Sobesky, Jan; von Knobelsdorff-Brenkenhoff, Florian; Schulz-Menger, Jeanette; Niendorf, Thoralf

2014-01-01

Introduction Visual but subjective reading of longitudinal relaxation time (T1) weighted magnetic resonance images is commonly used for the detection of brain pathologies. For this non-quantitative measure, diagnostic quality depends on hardware configuration, imaging parameters, radio frequency transmission field (B1+) uniformity, as well as observer experience. Parametric quantification of the tissue T1 relaxation parameter offsets the propensity for these effects, but is typically time consuming. For this reason, this study examines the feasibility of rapid 2D T1 quantification using a variable flip angles (VFA) approach at magnetic field strengths of 1.5 Tesla, 3 Tesla, and 7 Tesla. These efforts include validation in phantom experiments and application for brain T1 mapping. Methods T1 quantification included simulations of the Bloch equations to correct for slice profile imperfections, and a correction for B1+. Fast gradient echo acquisitions were conducted using three adjusted flip angles for the proposed T1 quantification approach that was benchmarked against slice profile uncorrected 2D VFA and an inversion-recovery spin-echo based reference method. Brain T1 mapping was performed in six healthy subjects, one multiple sclerosis patient, and one stroke patient. Results Phantom experiments showed a mean T1 estimation error of (-63±1.5)% for slice profile uncorrected 2D VFA and (0.2±1.4)% for the proposed approach compared to the reference method. Scan time for single slice T1 mapping including B1+ mapping could be reduced to 5 seconds using an in-plane resolution of (2×2) mm2, which equals a scan time reduction of more than 99% compared to the reference method. Conclusion Our results demonstrate that rapid 2D T1 quantification using a variable flip angle approach is feasible at 1.5T/3T/7T. It represents a valuable alternative for rapid T1 mapping due to the gain in speed versus conventional approaches. This progress may serve to enhance the capabilities of parametric MR based lesion detection and brain tissue characterization. PMID:24621588
Rapid parametric mapping of the longitudinal relaxation time T1 using two-dimensional variable flip angle magnetic resonance imaging at 1.5 Tesla, 3 Tesla, and 7 Tesla.

PubMed

Dieringer, Matthias A; Deimling, Michael; Santoro, Davide; Wuerfel, Jens; Madai, Vince I; Sobesky, Jan; von Knobelsdorff-Brenkenhoff, Florian; Schulz-Menger, Jeanette; Niendorf, Thoralf

2014-01-01

Visual but subjective reading of longitudinal relaxation time (T1) weighted magnetic resonance images is commonly used for the detection of brain pathologies. For this non-quantitative measure, diagnostic quality depends on hardware configuration, imaging parameters, radio frequency transmission field (B1+) uniformity, as well as observer experience. Parametric quantification of the tissue T1 relaxation parameter offsets the propensity for these effects, but is typically time consuming. For this reason, this study examines the feasibility of rapid 2D T1 quantification using a variable flip angles (VFA) approach at magnetic field strengths of 1.5 Tesla, 3 Tesla, and 7 Tesla. These efforts include validation in phantom experiments and application for brain T1 mapping. T1 quantification included simulations of the Bloch equations to correct for slice profile imperfections, and a correction for B1+. Fast gradient echo acquisitions were conducted using three adjusted flip angles for the proposed T1 quantification approach that was benchmarked against slice profile uncorrected 2D VFA and an inversion-recovery spin-echo based reference method. Brain T1 mapping was performed in six healthy subjects, one multiple sclerosis patient, and one stroke patient. Phantom experiments showed a mean T1 estimation error of (-63±1.5)% for slice profile uncorrected 2D VFA and (0.2±1.4)% for the proposed approach compared to the reference method. Scan time for single slice T1 mapping including B1+ mapping could be reduced to 5 seconds using an in-plane resolution of (2×2) mm2, which equals a scan time reduction of more than 99% compared to the reference method. Our results demonstrate that rapid 2D T1 quantification using a variable flip angle approach is feasible at 1.5T/3T/7T. It represents a valuable alternative for rapid T1 mapping due to the gain in speed versus conventional approaches. This progress may serve to enhance the capabilities of parametric MR based lesion detection and brain tissue characterization.
[Examination of upper abdominal region in high spatial resolution diffusion-weighted imaging using 3-Tesla MRI].

PubMed

Terada, Masaki; Matsushita, Hiroki; Oosugi, Masanori; Inoue, Kazuyasu; Yaegashi, Taku; Anma, Takeshi

2009-03-20

The advantage of the higher signal-to-noise ratio (SNR) of 3-Tesla magnetic resonance imaging (3-Tesla) has the possibility of contributing to the improvement of high spatial resolution without causing image deterioration. In this study, we compared SNR and the apparent diffusion coefficient (ADC) value with 3-Tesla as the condition in the diffusion-weighted image (DWI) parameter of the 1.5-Tesla magnetic resonance imaging (1.5-Tesla) and we examined the high spatial resolution images in the imaging method [respiratory-triggering (RT) method and breath free (BF) method] and artifact (motion and zebra) in the upper abdominal region of DWI at 3-Tesla. We have optimized scan parameters based on phantom and in vivo study. As a result, 3-Tesla was able to obtain about 1.5 times SNR in comparison with the 1.5-Tesla, ADC value had few differences. Moreover, the RT method was effective in correcting the influence of respiratory movement in comparison with the BF method, and image improvement by the effective acquisition of SNR and reduction of the artifact were provided. Thus, DWI of upper abdominal region was a useful sequence for the high spatial resolution in 3-Tesla.
Sailfish: A flexible multi-GPU implementation of the lattice Boltzmann method

NASA Astrophysics Data System (ADS)

Januszewski, M.; Kostur, M.

2014-09-01

We present Sailfish, an open source fluid simulation package implementing the lattice Boltzmann method (LBM) on modern Graphics Processing Units (GPUs) using CUDA/OpenCL. We take a novel approach to GPU code implementation and use run-time code generation techniques and a high level programming language (Python) to achieve state of the art performance, while allowing easy experimentation with different LBM models and tuning for various types of hardware. We discuss the general design principles of the code, scaling to multiple GPUs in a distributed environment, as well as the GPU implementation and optimization of many different LBM models, both single component (BGK, MRT, ELBM) and multicomponent (Shan-Chen, free energy). The paper also presents results of performance benchmarks spanning the last three NVIDIA GPU generations (Tesla, Fermi, Kepler), which we hope will be useful for researchers working with this type of hardware and similar codes. Catalogue identifier: AETA_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AETA_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU Lesser General Public License, version 3 No. of lines in distributed program, including test data, etc.: 225864 No. of bytes in distributed program, including test data, etc.: 46861049 Distribution format: tar.gz Programming language: Python, CUDA C, OpenCL. Computer: Any with an OpenCL or CUDA-compliant GPU. Operating system: No limits (tested on Linux and Mac OS X). RAM: Hundreds of megabytes to tens of gigabytes for typical cases. Classification: 12, 6.5. External routines: PyCUDA/PyOpenCL, Numpy, Mako, ZeroMQ (for multi-GPU simulations), scipy, sympy Nature of problem: GPU-accelerated simulation of single- and multi-component fluid flows. Solution method: A wide range of relaxation models (LBGK, MRT, regularized LB, ELBM, Shan-Chen, free energy, free surface) and boundary conditions within the lattice Boltzmann method framework. Simulations can be run in single or double precision using one or more GPUs. Restrictions: The lattice Boltzmann method works for low Mach number flows only. Unusual features: The actual numerical calculations run exclusively on GPUs. The numerical code is built dynamically at run-time in CUDA C or OpenCL, using templates and symbolic formulas. The high-level control of the simulation is maintained by a Python process. Additional comments: !!!!! The distribution file for this program is over 45 Mbytes and therefore is not delivered directly when Download or Email is requested. Instead a html file giving details of how the program can be obtained is sent. !!!!! Running time: Problem-dependent, typically minutes (for small cases or short simulations) to hours (large cases or long simulations).
3D tumor localization through real-time volumetric x-ray imaging for lung cancer radiotherapy.

PubMed

Li, Ruijiang; Lewis, John H; Jia, Xun; Gu, Xuejun; Folkerts, Michael; Men, Chunhua; Song, William Y; Jiang, Steve B

2011-05-01

To evaluate an algorithm for real-time 3D tumor localization from a single x-ray projection image for lung cancer radiotherapy. Recently, we have developed an algorithm for reconstructing volumetric images and extracting 3D tumor motion information from a single x-ray projection [Li et al., Med. Phys. 37, 2822-2826 (2010)]. We have demonstrated its feasibility using a digital respiratory phantom with regular breathing patterns. In this work, we present a detailed description and a comprehensive evaluation of the improved algorithm. The algorithm was improved by incorporating respiratory motion prediction. The accuracy and efficiency of using this algorithm for 3D tumor localization were then evaluated on (1) a digital respiratory phantom, (2) a physical respiratory phantom, and (3) five lung cancer patients. These evaluation cases include both regular and irregular breathing patterns that are different from the training dataset. For the digital respiratory phantom with regular and irregular breathing, the average 3D tumor localization error is less than 1 mm which does not seem to be affected by amplitude change, period change, or baseline shift. On an NVIDIA Tesla C1060 graphic processing unit (GPU) card, the average computation time for 3D tumor localization from each projection ranges between 0.19 and 0.26 s, for both regular and irregular breathing, which is about a 10% improvement over previously reported results. For the physical respiratory phantom, an average tumor localization error below 1 mm was achieved with an average computation time of 0.13 and 0.16 s on the same graphic processing unit (GPU) card, for regular and irregular breathing, respectively. For the five lung cancer patients, the average tumor localization error is below 2 mm in both the axial and tangential directions. The average computation time on the same GPU card ranges between 0.26 and 0.34 s. Through a comprehensive evaluation of our algorithm, we have established its accuracy in 3D tumor localization to be on the order of 1 mm on average and 2 mm at 95 percentile for both digital and physical phantoms, and within 2 mm on average and 4 mm at 95 percentile for lung cancer patients. The results also indicate that the accuracy is not affected by the breathing pattern, be it regular or irregular. High computational efficiency can be achieved on GPU, requiring 0.1-0.3 s for each x-ray projection.
NIKOLA TESLA AND MEDICINE: 160TH ANNIVERSARY OF THE BIRTH OF THE GENIUS WHO GAVE LIGHT TO THE WORLD - PART I.

PubMed

Vucevic, Danijela; Dordevic, Drago; Radosavljevic, Tatjana

2016-09-01

The interest in Nikola Tesla, a scientist, physicist, engineer and inventor, is constantly growing. In the millennialong history of human civilization, it is almost impossible to find another person whose life and work has been under so much scrutiny of such a wide range of researchers, medical professionals included. Although Tesla was not primarily dedicated to biomedical research, his work significantly contributed to the development of radiology, and high frequency electrotherapy. This paper deals with the impact of Tesla's work on the development of a new medical branch - radiology. Nikola Tesla and the Discovery of X-ray radiation. Tesla pioneered the use of X-rays for medical purposes, practically laying the foundations of radiology. Namely, since 1887, Tesla periodically experimented with X-rays, at that time still unknown and unnamed, which he called "shadowgraphs". Moreover, at the end of 1894, lie conducted extensive research focusing on X-rays, but unfortunately it was inlerrupted after the fire burning down his laboratory in 1895. In 1896 and 1897, Tesla published ten papers on the biologic effects of X-ray radiation. All his studies on X-rays were experimental. During 1896 and 1897, Tesla continued improving X-ray devices. Apart from this, Tesla was the first to point out the harmful effects of exposure to X-ray radiation on human body. Nikola Tesla was a visionary genius of the future. Tesla's pioneer steps, made more than a century ago in the domain of radiology, are still being used today.
Diffusion-weighted MR imaging of the normal pancreas: reproducibility and variations of apparent diffusion coefficient measurement at 1.5- and 3.0-Tesla.

PubMed

Barral, M; Soyer, P; Ben Hassen, W; Gayat, E; Aout, M; Chiaradia, M; Rahmouni, A; Luciani, A

2013-04-01

To evaluate reproducibility and variations in apparent diffusion coefficient (ADC) measurement in normal pancreatic parenchyma at 1.5- and 3.0-Tesla and determine if differences may exist between the four pancreatic segments. Diffusion-weighted MR imaging of the pancreas was performed at 1.5-Tesla in 20 patients and at 3.0-Tesla in other 20 patients strictly matched for gender and age using the same b values (0, 400 and 800s/mm(2)). Two independent observers placed regions of interest within the four pancreatic segments to measure ADC at both fields. Intra- and inter-observer agreement in ADC measurement was assessed using Bland-Altman analysis and comparison between ADC values obtained at both fields using non-parametrical tests. There were no significant differences in ADC between repeated measurements and between ADC obtained at 1.5-Tesla and those at 3.0-Tesla. The 95% limits of intra-observer agreement between ADC were 2.3%-22.7% at 1.5-Tesla and 1%-24.2% at 3.0-Tesla and those for inter-observer agreement between 1.9%-14% at 1.5-Tesla and 8%-25% at 3.0-Tesla. ADC values were similar in all pancreatic segments at 3.0-T whereas the tail had lower ADC at 1.5-Tesla. ADC measurement conveys high degrees of intra- and inter-observer reproducibility. ADC have homogeneous distribution among the four pancreatic segments at 3.0-Tesla. Copyright © 2012 Éditions françaises de radiologie. Published by Elsevier Masson SAS. All rights reserved.
A fast three-dimensional gamma evaluation using a GPU utilizing texture memory for on-the-fly interpolations.

PubMed

Persoon, Lucas C G G; Podesta, Mark; van Elmpt, Wouter J C; Nijsten, Sebastiaan M J J G; Verhaegen, Frank

2011-07-01

A widely accepted method to quantify differences in dose distributions is the gamma (gamma) evaluation. Currently, almost all gamma implementations utilize the central processing unit (CPU). Recently, the graphics processing unit (GPU) has become a powerful platform for specific computing tasks. In this study, we describe the implementation of a 3D gamma evaluation using a GPU to improve calculation time. The gamma evaluation algorithm was implemented on an NVIDIA Tesla C2050 GPU using the compute unified device architecture (CUDA). First, several cubic virtual phantoms were simulated. These phantoms were tested with varying dose cube sizes and set-ups, introducing artificial dose differences. Second, to show applicability in clinical practice, five patient cases have been evaluated using the 3D dose distribution from a treatment planning system as the reference and the delivered dose determined during treatment as the comparison. A calculation time comparison between the CPU and GPU was made with varying thread-block sizes including the option of using texture or global memory. A GPU over CPU speed-up of 66 +/- 12 was achieved for the virtual phantoms. For the patient cases, a speed-up of 57 +/- 15 using the GPU was obtained. A thread-block size of 16 x 16 performed best in all cases. The use of texture memory improved the total calculation time, especially when interpolation was applied. Differences between the CPU and GPU gammas were negligible. The GPU and its features, such as texture memory, decreased the calculation time for gamma evaluations considerably without loss of accuracy.
An improved parallel fuzzy connected image segmentation method based on CUDA.

PubMed

Wang, Liansheng; Li, Dong; Huang, Shaohui

2016-05-12

Fuzzy connectedness method (FC) is an effective method for extracting fuzzy objects from medical images. However, when FC is applied to large medical image datasets, its running time will be greatly expensive. Therefore, a parallel CUDA version of FC (CUDA-kFOE) was proposed by Ying et al. to accelerate the original FC. Unfortunately, CUDA-kFOE does not consider the edges between GPU blocks, which causes miscalculation of edge points. In this paper, an improved algorithm is proposed by adding a correction step on the edge points. The improved algorithm can greatly enhance the calculation accuracy. In the improved method, an iterative manner is applied. In the first iteration, the affinity computation strategy is changed and a look up table is employed for memory reduction. In the second iteration, the error voxels because of asynchronism are updated again. Three different CT sequences of hepatic vascular with different sizes were used in the experiments with three different seeds. NVIDIA Tesla C2075 is used to evaluate our improved method over these three data sets. Experimental results show that the improved algorithm can achieve a faster segmentation compared to the CPU version and higher accuracy than CUDA-kFOE. The calculation results were consistent with the CPU version, which demonstrates that it corrects the edge point calculation error of the original CUDA-kFOE. The proposed method has a comparable time cost and has less errors compared to the original CUDA-kFOE as demonstrated in the experimental results. In the future, we will focus on automatic acquisition method and automatic processing.
Grace: A cross-platform micromagnetic simulator on graphics processing units

NASA Astrophysics Data System (ADS)

Zhu, Ru

2015-12-01

A micromagnetic simulator running on graphics processing units (GPUs) is presented. Different from GPU implementations of other research groups which are predominantly running on NVidia's CUDA platform, this simulator is developed with C++ Accelerated Massive Parallelism (C++ AMP) and is hardware platform independent. It runs on GPUs from venders including NVidia, AMD and Intel, and achieves significant performance boost as compared to previous central processing unit (CPU) simulators, up to two orders of magnitude. The simulator paved the way for running large size micromagnetic simulations on both high-end workstations with dedicated graphics cards and low-end personal computers with integrated graphics cards, and is freely available to download.
Real-time radar signal processing using GPGPU (general-purpose graphic processing unit)

NASA Astrophysics Data System (ADS)

Kong, Fanxing; Zhang, Yan Rockee; Cai, Jingxiao; Palmer, Robert D.

2016-05-01

This study introduces a practical approach to develop real-time signal processing chain for general phased array radar on NVIDIA GPUs(Graphical Processing Units) using CUDA (Compute Unified Device Architecture) libraries such as cuBlas and cuFFT, which are adopted from open source libraries and optimized for the NVIDIA GPUs. The processed results are rigorously verified against those from the CPUs. Performance benchmarked in computation time with various input data cube sizes are compared across GPUs and CPUs. Through the analysis, it will be demonstrated that GPGPUs (General Purpose GPU) real-time processing of the array radar data is possible with relatively low-cost commercial GPUs.
Micromagnetics on high-performance workstation and mobile computational platforms

NASA Astrophysics Data System (ADS)

Fu, S.; Chang, R.; Couture, S.; Menarini, M.; Escobar, M. A.; Kuteifan, M.; Lubarda, M.; Gabay, D.; Lomakin, V.

2015-05-01

The feasibility of using high-performance desktop and embedded mobile computational platforms is presented, including multi-core Intel central processing unit, Nvidia desktop graphics processing units, and Nvidia Jetson TK1 Platform. FastMag finite element method-based micromagnetic simulator is used as a testbed, showing high efficiency on all the platforms. Optimization aspects of improving the performance of the mobile systems are discussed. The high performance, low cost, low power consumption, and rapid performance increase of the embedded mobile systems make them a promising candidate for micromagnetic simulations. Such architectures can be used as standalone systems or can be built as low-power computing clusters.
Employing OpenCL to Accelerate Ab Initio Calculations on Graphics Processing Units.

PubMed

Kussmann, Jörg; Ochsenfeld, Christian

2017-06-13

We present an extension of our graphics processing units (GPU)-accelerated quantum chemistry package to employ OpenCL compute kernels, which can be executed on a wide range of computing devices like CPUs, Intel Xeon Phi, and AMD GPUs. Here, we focus on the use of AMD GPUs and discuss differences as compared to CUDA-based calculations on NVIDIA GPUs. First illustrative timings are presented for hybrid density functional theory calculations using serial as well as parallel compute environments. The results show that AMD GPUs are as fast or faster than comparable NVIDIA GPUs and provide a viable alternative for quantum chemical applications.

Evaluation of the Intel Xeon Phi 7120 and NVIDIA K80 as accelerators for two-dimensional panel codes

PubMed Central

2017-01-01

To optimize the geometry of airfoils for a specific application is an important engineering problem. In this context genetic algorithms have enjoyed some success as they are able to explore the search space without getting stuck in local optima. However, these algorithms require the computation of aerodynamic properties for a significant number of airfoil geometries. Consequently, for low-speed aerodynamics, panel methods are most often used as the inner solver. In this paper we evaluate the performance of such an optimization algorithm on modern accelerators (more specifically, the Intel Xeon Phi 7120 and the NVIDIA K80). For that purpose, we have implemented an optimized version of the algorithm on the CPU and Xeon Phi (based on OpenMP, vectorization, and the Intel MKL library) and on the GPU (based on CUDA and the MAGMA library). We present timing results for all codes and discuss the similarities and differences between the three implementations. Overall, we observe a speedup of approximately 2.5 for adding an Intel Xeon Phi 7120 to a dual socket workstation and a speedup between 3.4 and 3.8 for adding a NVIDIA K80 to a dual socket workstation. PMID:28582389
Evaluation of the Intel Xeon Phi 7120 and NVIDIA K80 as accelerators for two-dimensional panel codes.

PubMed

Einkemmer, Lukas

2017-01-01

To optimize the geometry of airfoils for a specific application is an important engineering problem. In this context genetic algorithms have enjoyed some success as they are able to explore the search space without getting stuck in local optima. However, these algorithms require the computation of aerodynamic properties for a significant number of airfoil geometries. Consequently, for low-speed aerodynamics, panel methods are most often used as the inner solver. In this paper we evaluate the performance of such an optimization algorithm on modern accelerators (more specifically, the Intel Xeon Phi 7120 and the NVIDIA K80). For that purpose, we have implemented an optimized version of the algorithm on the CPU and Xeon Phi (based on OpenMP, vectorization, and the Intel MKL library) and on the GPU (based on CUDA and the MAGMA library). We present timing results for all codes and discuss the similarities and differences between the three implementations. Overall, we observe a speedup of approximately 2.5 for adding an Intel Xeon Phi 7120 to a dual socket workstation and a speedup between 3.4 and 3.8 for adding a NVIDIA K80 to a dual socket workstation.
Singular value decomposition utilizing parallel algorithms on graphical processors

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kotas, Charlotte W; Barhen, Jacob

2011-01-01

One of the current challenges in underwater acoustic array signal processing is the detection of quiet targets in the presence of noise. In order to enable robust detection, one of the key processing steps requires data and replica whitening. This, in turn, involves the eigen-decomposition of the sample spectral matrix, Cx = 1/K xKX(k)XH(k) where X(k) denotes a single frequency snapshot with an element for each element of the array. By employing the singular value decomposition (SVD) method, the eigenvectors and eigenvalues can be determined directly from the data without computing the sample covariance matrix, reducing the computational requirements formore » a given level of accuracy (van Trees, Optimum Array Processing). (Recall that the SVD of a complex matrix A involves determining V, , and U such that A = U VH where U and V are orthonormal and is a positive, real, diagonal matrix containing the singular values of A. U and V are the eigenvectors of AAH and AHA, respectively, while the singular values are the square roots of the eigenvalues of AAH.) Because it is desirable to be able to compute these quantities in real time, an efficient technique for computing the SVD is vital. In addition, emerging multicore processors like graphical processing units (GPUs) are bringing parallel processing capabilities to an ever increasing number of users. Since the computational tasks involved in array signal processing are well suited for parallelization, it is expected that these computations will be implemented using GPUs as soon as users have the necessary computational tools available to them. Thus, it is important to have an SVD algorithm that is suitable for these processors. This work explores the effectiveness of two different parallel SVD implementations on an NVIDIA Tesla C2050 GPU (14 multiprocessors, 32 cores per multiprocessor, 1.15 GHz clock - peed). The first algorithm is based on a two-step algorithm which bidiagonalizes the matrix using Householder transformations, and then diagonalizes the intermediate bidiagonal matrix through implicit QR shifts. This is similar to that implemented for real matrices by Lahabar and Narayanan ("Singular Value Decomposition on GPU using CUDA", IEEE International Parallel Distributed Processing Symposium 2009). The implementation is done in a hybrid manner, with the bidiagonalization stage done using the GPU while the diagonalization stage is done using the CPU, with the GPU used to update the U and V matrices. The second algorithm is based on a one-sided Jacobi scheme utilizing a sequence of pair-wise column orthogonalizations such that A is replaced by AV until the resulting matrix is sufficiently orthogonal (that is, equal to U ). V is obtained from the sequence of orthogonalizations, while can be found from the square root of the diagonal elements of AH A and, once is known, U can be found from column scaling the resulting matrix. These implementations utilize CUDA Fortran and NVIDIA's CUB LAS library. The primary goal of this study is to quantify the comparative performance of these two techniques against themselves and other standard implementations (for example, MATLAB). Considering that there is significant overhead associated with transferring data to the GPU and with synchronization between the GPU and the host CPU, it is also important to understand when it is worthwhile to use the GPU in terms of the matrix size and number of concurrent SVDs to be calculated.« less
Evaluation of slice accelerations using multiband echo planar imaging at 3 Tesla

PubMed Central

Xu, Junqian; Moeller, Steen; Auerbach, Edward J.; Strupp, John; Smith, Stephen M.; Feinberg, David A.; Yacoub, Essa; Uğurbil, Kâmil

2013-01-01

We evaluate residual aliasing among simultaneously excited and acquired slices in slice accelerated multiband (MB) echo planar imaging (EPI). No in-plane accelerations were used in order to maximize and evaluate achievable slice acceleration factors at 3 Tesla. We propose a novel leakage (L-) factor to quantify the effects of signal leakage between simultaneously acquired slices. With a standard 32-channel receiver coil at 3 Tesla, we demonstrate that slice acceleration factors of up to eight (MB = 8) with blipped controlled aliasing in parallel imaging (CAIPI), in the absence of in-plane accelerations, can be used routinely with acceptable image quality and integrity for whole brain imaging. Spectral analyses of single-shot fMRI time series demonstrate that temporal fluctuations due to both neuronal and physiological sources were distinguishable and comparable up to slice-acceleration factors of nine (MB = 9). The increased temporal efficiency could be employed to achieve, within a given acquisition period, higher spatial resolution, increased fMRI statistical power, multiple TEs, faster sampling of temporal events in a resting state fMRI time series, increased sampling of q-space in diffusion imaging, or more quiet time during a scan. PMID:23899722
Progress in HTS trapped field magnets: J(sub c), area, and applications

NASA Technical Reports Server (NTRS)

Weinstein, Roy; Ren, Yanru; Liu, Jianxiong; Sawh, Ravi; Parks, Drew; Foster, Charles; Obot, Victor; Arndt, G. Dickey; Crapo, Alan

1995-01-01

Progress in trapped field magnets is reported. Single YBCO grains with diameters of 2 cm are made in production quantities, while 3 cm, 4 1/2 cm and 6 cm diameters are being explored. For single grain tiles: J(sub c) is approximately 10,000 A/cm(exp 2) for melt textured grains; J(sub c) is approximately 40,000 A/cm2 for light ion irradiation; and J(sub c) is approximately 85,000 A/cm(exp 2) for heavy ion irradiation. Using 2 cm diameter tiles bombarded by light ions, we have fabricated a mini-magnet which trapped 2.25 Tesla at 77K, and 5.3 Tesla at 65K. A previous generation of tiles, 1 cm x 1 cm, was used to trap 7.0 Tesla at 55K. Unirradiated 2.0 cm tiles were used to provide 8 magnets for an axial gap generator, in a collaborative experiment with Emerson Electric Co. This generator delivered 100 Watts to a resistive load, at 2265 rpm. In this experiment activation of the TFMs was accomplished by a current pulse of 15 ms duration. Tiles have also been studied for application as a bumper-tether system for the soft docking of spacecraft. A method for optimizing tether forces, and mechanisms of energy dissipation are discussed. A bus bar was constructed by welding three crystals while melt-texturing, such that their a,b planes were parallel and interleaved. The bus bar, an area of approximately 2 cm(exp 2), carried a transport current of 1000 amps, the limit of the testing equipment available.
Progress in HTS Trapped Field Magnets: J(sub c), Area, and Applications

NASA Technical Reports Server (NTRS)

Weinstein, Roy; Ren, Yanru; Liu, Jian-Xiong; Sawh, Ravi; Parks, Drew; Foster, Charles; Obot, Victor; Arndt, G. Dickey; Crapo, Alan

1995-01-01

Progress in trapped field magnets is reported. Single YBCO grains with diameters of 2 cm are made in production quantities, while 3 cm, 4 1/2 cm and 6 cm diameters are being explored. For single grain tiles: J(sub c) - 10,000 A/sq cm for melt textured grains; J(sub c) - 40,000 A/sq cm for light ion irradiation; and J(sub c) - 85,000 A/J(sub c) for heavy ion irradiation. Using 2 cm diameter tiles bombarded by light ions, we have fabricated a mini-magnet which trapped 2.25 Tesla at 77K, and 5.3 Tesla at 65K. A previous generation of tiles, 1 cm x 1 cm, was used to trap 7.0 Tesla at 55K. Unirradiated 2.0 cm tiles were used to provide 8 magnets for an axial gap generator, in a collaborative experiment with Emerson Electric Co. This generator delivered 100 Watts to a resistive load, at 2265 rpm. In this experiment, activation of the TFMs was accomplished by a current pulse of 15 ms duration. Tiles have also been studied for application as a bumper-tether system for the soft docking of spacecraft. A method for optimizing tether forces, and mechanisms of energy dissipation are discussed. A bus bar was constructed by welding three crystals while melt-texturing, such that their a,b planes were parallel and interleaved. The bus bar, of area approx. 2 sq cm, carried a transport current of 1000 amps, the limit of the testing equipment available.
Gadolinium-based magnetic resonance contrast agents at 7 Tesla: in vitro T1 relaxivities in human blood plasma.

PubMed

Noebauer-Huhmann, Iris M; Szomolanyi, Pavol; Juras, Vladimír; Kraff, Oliver; Ladd, Mark E; Trattnig, Siegfried

2010-09-01

PURPOSE/INTRODUCTION: The aim of this study was to determine the T1 relaxivities (r1) of 8 gadolinium (Gd)-based MR contrast agents in human blood plasma at 7 Tesla, compared with 3 Tesla. Eight commercially available Gd-based MR contrast agents were diluted in human blood plasma to concentrations of 0, 0.25, 0.5, 1, and 2 mmol/L. In vitro measurements were performed at 37 degrees C, on a 7 Tesla and on a 3 Tesla whole-body magnetic resonance imaging scanner. For the determination of T1 relaxation times, Inversion Recovery Sequences with inversion times from 0 to 3500 ms were used. The relaxivities were calculated. The r1 relaxivities of all agents, diluted in human blood plasma at body temperature, were lower at 7 Tesla than at 3 Tesla. The values at 3 Tesla were comparable to those published earlier. Notably, in some agents, a minor negative correlation of r1 with a concentration of up to 2 mmol/L could be observed. This was most pronounced in the agents with the highest protein-binding capacity. At 7 Tesla, the in vitro r1 relaxivities of Gd-based contrast agents in human blood plasma are lower than those at 3 Tesla. This work may serve as a basis for the application of Gd-based MR contrast agents at 7 Tesla. Further studies are required to optimize the contrast agent dose in vivo.
Value of 3 Tesla diffusion-weighted magnetic resonance imaging for assessing liver fibrosis

PubMed Central

Papalavrentios, Lavrentios; Sinakos, Emmanouil; Chourmouzi, Danai; Hytiroglou, Prodromos; Drevelegas, Konstantinos; Constantinides, Manos; Drevelegas, Antonios; Talwalkar, Jayant; Akriviadis, Evangelos

2015-01-01

Background Limited data are available regarding the role of magnetic resonance imaging (MRI), particularly the new generation 3 Tesla technology, and especially diffusion-weighted imaging (DWI) in predicting liver fibrosis. The aim of our pilot study was to assess the clinical performance of the apparent diffusion coefficient (ADC) of liver parenchyma for the assessment of liver fibrosis in patients with non-alcoholic fatty liver disease (NAFLD). Methods 18 patients with biopsy-proven NAFLD underwent DWI with 3 Tesla MRI. DWI was performed with single-shot echo-planar technique at b values of 0-500 and 0-1000 s/mm2. ADC was measured in four locations in the liver and the mean ADC value was used for analysis. Staging of fibrosis was performed according to the METAVIR system. Results The median age of patients was 52 years (range 23-73). The distribution of patients in different fibrosis stages was: 0 (n=1), 1 (n=7), 2 (n=1), 3 (n=5), 4 (n=4). Fibrosis stage was poorly associated with ADC at b value of 0-500 s/mm2 (r= -0.30, P=0.27). However it was significantly associated with ADC at b value of 0-1000 s/mm2 (r= -0.57, P=0.01). For this b value (0-1000 s/mm2) the area under receiver-operating characteristic curve was 0.93 for fibrosis stage ≥3 and the optimal ADC cut-off value was 1.16 ×10-3 mm2/s. Conclusion 3 Tesla DWI can possibly predict the presence of advanced fibrosis in patients with NAFLD. PMID:25608776
Value of 3 Tesla diffusion-weighted magnetic resonance imaging for assessing liver fibrosis.

PubMed

Papalavrentios, Lavrentios; Sinakos, Emmanouil; Chourmouzi, Danai; Hytiroglou, Prodromos; Drevelegas, Konstantinos; Constantinides, Manos; Drevelegas, Antonios; Talwalkar, Jayant; Akriviadis, Evangelos

2015-01-01

Limited data are available regarding the role of magnetic resonance imaging (MRI), particularly the new generation 3 Tesla technology, and especially diffusion-weighted imaging (DWI) in predicting liver fibrosis. The aim of our pilot study was to assess the clinical performance of the apparent diffusion coefficient (ADC) of liver parenchyma for the assessment of liver fibrosis in patients with non-alcoholic fatty liver disease (NAFLD). 18 patients with biopsy-proven NAFLD underwent DWI with 3 Tesla MRI. DWI was performed with single-shot echo-planar technique at b values of 0-500 and 0-1000 s/mm 2 . ADC was measured in four locations in the liver and the mean ADC value was used for analysis. Staging of fibrosis was performed according to the METAVIR system. The median age of patients was 52 years (range 23-73). The distribution of patients in different fibrosis stages was: 0 (n=1), 1 (n=7), 2 (n=1), 3 (n=5), 4 (n=4). Fibrosis stage was poorly associated with ADC at b value of 0-500 s/mm 2 (r= -0.30, P=0.27). However it was significantly associated with ADC at b value of 0-1000 s/mm 2 (r= -0.57, P=0.01). For this b value (0-1000 s/mm 2 ) the area under receiver-operating characteristic curve was 0.93 for fibrosis stage ≥3 and the optimal ADC cut-off value was 1.16 ×10 -3 mm 2 /s. 3 Tesla DWI can possibly predict the presence of advanced fibrosis in patients with NAFLD.
Tesla - A Flash of a Genius

NASA Astrophysics Data System (ADS)

Teodorani, M.

2005-10-01

This book, which is entirely dedicated to the inventions of scientist Nikola Tesla, is divided into three parts: a) all the most important innovative technological creations from the alternate current to the death ray, Tesla research in fundamental physics with a particular attention to the concept of "ether", ball lightning physics; b) the life and the bright mind of Nikola Tesla and the reasons why some of his most recent findings were not accepted by the establishment; c) a critical discussion of the most important work by Tesla followers.
Advanced mathematical on-line analysis in nuclear experiments. Usage of parallel computing CUDA routines in standard root analysis

NASA Astrophysics Data System (ADS)

Grzeszczuk, A.; Kowalski, S.

2015-04-01

Compute Unified Device Architecture (CUDA) is a parallel computing platform developed by Nvidia for increase speed of graphics by usage of parallel mode for processes calculation. The success of this solution has opened technology General-Purpose Graphic Processor Units (GPGPUs) for applications not coupled with graphics. The GPGPUs system can be applying as effective tool for reducing huge number of data for pulse shape analysis measures, by on-line recalculation or by very quick system of compression. The simplified structure of CUDA system and model of programming based on example Nvidia GForce GTX580 card are presented by our poster contribution in stand-alone version and as ROOT application.
MILC Code Performance on High End CPU and GPU Supercomputer Clusters

NASA Astrophysics Data System (ADS)

DeTar, Carleton; Gottlieb, Steven; Li, Ruizi; Toussaint, Doug

2018-03-01

With recent developments in parallel supercomputing architecture, many core, multi-core, and GPU processors are now commonplace, resulting in more levels of parallelism, memory hierarchy, and programming complexity. It has been necessary to adapt the MILC code to these new processors starting with NVIDIA GPUs, and more recently, the Intel Xeon Phi processors. We report on our efforts to port and optimize our code for the Intel Knights Landing architecture. We consider performance of the MILC code with MPI and OpenMP, and optimizations with QOPQDP and QPhiX. For the latter approach, we concentrate on the staggered conjugate gradient and gauge force. We also consider performance on recent NVIDIA GPUs using the QUDA library.
GPU acceleration for digitally reconstructed radiographs using bindless texture objects and CUDA/OpenGL interoperability.

PubMed

Abdellah, Marwan; Eldeib, Ayman; Owis, Mohamed I

2015-01-01

This paper features an advanced implementation of the X-ray rendering algorithm that harnesses the giant computing power of the current commodity graphics processors to accelerate the generation of high resolution digitally reconstructed radiographs (DRRs). The presented pipeline exploits the latest features of NVIDIA Graphics Processing Unit (GPU) architectures, mainly bindless texture objects and dynamic parallelism. The rendering throughput is substantially improved by exploiting the interoperability mechanisms between CUDA and OpenGL. The benchmarks of our optimized rendering pipeline reflect its capability of generating DRRs with resolutions of 2048(2) and 4096(2) at interactive and semi interactive frame-rates using an NVIDIA GeForce 970 GTX device.
The gputools package enables GPU computing in R.

PubMed

Buckner, Joshua; Wilson, Justin; Seligman, Mark; Athey, Brian; Watson, Stanley; Meng, Fan

2010-01-01

By default, the R statistical environment does not make use of parallelism. Researchers may resort to expensive solutions such as cluster hardware for large analysis tasks. Graphics processing units (GPUs) provide an inexpensive and computationally powerful alternative. Using R and the CUDA toolkit from Nvidia, we have implemented several functions commonly used in microarray gene expression analysis for GPU-equipped computers. R users can take advantage of the better performance provided by an Nvidia GPU. The package is available from CRAN, the R project's repository of packages, at http://cran.r-project.org/web/packages/gputools More information about our gputools R package is available at http://brainarray.mbni.med.umich.edu/brainarray/Rgpgpu
k-t SENSE-accelerated Myocardial Perfusion MR Imaging at 3.0 Tesla - comparison with 1.5 Tesla

PubMed Central

Plein, Sven; Schwitter, Juerg; Suerder, Daniel; Greenwood, John P.; Boesiger, Peter; Kozerke, Sebastian

2008-01-01

Purpose To determine the feasibility and diagnostic accuracy of high spatial resolution myocardial perfusion MR at 3.0 Tesla using k-space and time domain undersampling with sensitivity encoding (k-t SENSE). Materials and Methods The study was reviewed and approved by the local ethic review board. k-t SENSE perfusion MR was performed at 1.5 Tesla and 3.0 Tesla (saturation recovery gradient echo pulse sequence, repetition time/echo time 3.0ms/1.0ms, flip angle 15°, 5x k-t SENSE acceleration, spatial resolution 1.3×1.3×10mm3). Fourteen volunteers were studied at rest and 37 patients during adenosine stress. In volunteers, comparison was also made with standard-resolution (2.5×2.5×10mm3) 2x SENSE perfusion MR at 3.0 Tesla. Image quality, artifact scores, signal-to-noise ratios (SNR) and contrast-enhancement ratios (CER) were derived. In patients, diagnostic accuracy of visual analysis to detect >50% diameter stenosis on quantitative coronary angiography was determined by receiver-operator-characteristics (ROC). Results In volunteers, image quality and artifact scores were similar for 3.0 Tesla and 1.5 Tesla, while SNR was higher (11.6 vs. 5.6) and CER lower (1.1 vs. 1.5, p=0.012) at 3.0 Tesla. Compared with standard-resolution perfusion MR, image quality was higher for k-t SENSE (3.6 vs. 3.1, p=0.04), endocardial dark rim artifacts were reduced (artifact thickness 1.6mm vs. 2.4mm, p<0.001) and CER similar. In patients, area under the ROC curve for detection of coronary stenosis was 0.89 and 0.80, p=0.21 for 3.0 Tesla and 1.5 Tesla, respectively. Conclusions k-t SENSE accelerated high-resolution perfusion MR at 3.0 Tesla is feasible with similar artifacts and diagnostic accuracy as at 1.5 Tesla. Compared with standard-resolution perfusion MR, image quality is improved and artifacts are reduced. PMID:18936311
Scaling deep learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gawande, Nitin A.; Landwehr, Joshua B.; Daily, Jeffrey A.

Deep Learning (DL) algorithms have become ubiquitous in data analytics. As a result, major computing vendors --- including NVIDIA, Intel, AMD, and IBM --- have architectural road-maps influenced by DL workloads. Furthermore, several vendors have recently advertised new computing products as accelerating large DL workloads. Unfortunately, it is difficult for data scientists to quantify the potential of these different products. This paper provides a performance and power analysis of important DL workloads on two major parallel architectures: NVIDIA DGX-1 (eight Pascal P100 GPUs interconnected with NVLink) and Intel Knights Landing (KNL) CPUs interconnected with Intel Omni-Path or Cray Aries. Ourmore » evaluation consists of a cross section of convolutional neural net workloads: CifarNet, AlexNet, GoogLeNet, and ResNet50 topologies using the Cifar10 and ImageNet datasets. The workloads are vendor-optimized for each architecture. Our analysis indicates that although GPUs provide the highest overall performance, the gap can close for some convolutional networks; and the KNL can be competitive in performance/watt. We find that NVLink facilitates scaling efficiency on GPUs. However, its importance is heavily dependent on neural network architecture. Furthermore, for weak-scaling --- sometimes encouraged by restricted GPU memory --- NVLink is less important.« less
Age, gender, and skeletal variation in bone marrow composition: a preliminary study at 3.0 Tesla.

PubMed

Liney, Gary P; Bernard, Clare P; Manton, David J; Turnbull, Lindsay W; Langton, Chris M

2007-09-01

To evaluate the efficacy of MR Spectroscopy (MRS) at 3.0 Tesla for the assessment of normal bone marrow composition and assess the variation in terms of age, gender, and skeletal site. A total of 16 normal subjects (aged between eight and 57 years) were investigated on a 3.0 Tesla GE Signa system. To investigate axial and peripheral skeleton differences, non-water-suppressed spectra were acquired from single voxels in the calcaneus and lumbar spine. In addition, spectra were acquired at multiple vertebral bodies to assess variation within the lumbar spine. Data was also correlated with bone mineral density (BMD) measured in six subjects using dual-energy X-ray absorptiometry (DXA). Fat content was an order of magnitude greater in the heel compared to the spine. An age-related increase was demonstrated in the spine with values greater in men compared to female subjects. Significant trends in vertebral bodies within the same subjects were also shown, with fat content increasing L5 > L1. Population coefficient of variation (CV) was greater for fat fraction (FF) compared to BMD. Significant normal variations of marrow composition have been demonstrated, which provide important data for the future interpretation of patient investigations. (c) 2007 Wiley-Liss, Inc.
Wright Laboratory Research and Development Facilities Handbook

DTIC Science & Technology

1992-08-01

properties o. superconductors SPECIAL/UNIQUE CAPABILITIES: Two superconducting coils: 3-inch bore, 10 Tesla coil. 20 kilojoule repetitively pulsed coil 7 inch...bore, cryogenically cooled 14 Tesla coil INSTRUMENTATION: Computer Controlled Variable Temperature (2-400K) and Field (0-5 Tesla ) Squid Susceptometer...Variable Temperature (10-80K) and Field (0-10 Tesla ) Transport Current Measurement Apparatus RF Source Sputtering Rig, Optical Microscope, Furnaces
Exact diagonalization of quantum lattice models on coprocessors

NASA Astrophysics Data System (ADS)

Siro, T.; Harju, A.

2016-10-01

We implement the Lanczos algorithm on an Intel Xeon Phi coprocessor and compare its performance to a multi-core Intel Xeon CPU and an NVIDIA graphics processor. The Xeon and the Xeon Phi are parallelized with OpenMP and the graphics processor is programmed with CUDA. The performance is evaluated by measuring the execution time of a single step in the Lanczos algorithm. We study two quantum lattice models with different particle numbers, and conclude that for small systems, the multi-core CPU is the fastest platform, while for large systems, the graphics processor is the clear winner, reaching speedups of up to 7.6 compared to the CPU. The Xeon Phi outperforms the CPU with sufficiently large particle number, reaching a speedup of 2.5.
Edison vs. Tesla

ScienceCinema

Hogan, Kathleen; Wallace, Hal; Ivestor, Rob

2018-01-16

As Edison vs. Tesla week heats up at the Energy Department, we are exploring the rivalry between Thomas Edison and Nikola Tesla and how their work is still impacting the way we use energy today. Whether you're on Team Tesla or Team Edison, both inventors were key players in creating things like batteries, power plants and wireless technologies -- all innovations we still use today. And as we move toward a clean energy future, energy efficient lighting, like LED bulbs, and more efficient electric motors not only help us save money on electricity costs but help combat climate change. For this, Tesla and Edison both deserve our recognition.

Real-time magnetic resonance imaging-guided radiofrequency atrial ablation and visualization of lesion formation at 3 Tesla.

PubMed

Vergara, Gaston R; Vijayakumar, Sathya; Kholmovski, Eugene G; Blauer, Joshua J E; Guttman, Mike A; Gloschat, Christopher; Payne, Gene; Vij, Kamal; Akoum, Nazem W; Daccarett, Marcos; McGann, Christopher J; Macleod, Rob S; Marrouche, Nassir F

2011-02-01

Magnetic resonance imaging (MRI) allows visualization of location and extent of radiofrequency (RF) ablation lesion, myocardial scar formation, and real-time (RT) assessment of lesion formation. In this study, we report a novel 3-Tesla RT -RI based porcine RF ablation model and visualization of lesion formation in the atrium during RF energy delivery. The purpose of this study was to develop a 3-Tesla RT MRI-based catheter ablation and lesion visualization system. RF energy was delivered to six pigs under RT MRI guidance. A novel MRI-compatible mapping and ablation catheter was used. Under RT MRI, this catheter was safely guided and positioned within either the left or right atrium. Unipolar and bipolar electrograms were recorded. The catheter tip-tissue interface was visualized with a T1-weighted gradient echo sequence. RF energy was then delivered in a power-controlled fashion. Myocardial changes and lesion formation were visualized with a T2-weighted (T2W) half Fourier acquisition with single-shot turbo spin echo (HASTE) sequence during ablation. RT visualization of lesion formation was achieved in 30% of the ablations performed. In the other cases, either the lesion was formed outside the imaged region (25%) or the lesion was not created (45%) presumably due to poor tissue-catheter tip contact. The presence of lesions was confirmed by late gadolinium enhancement MRI and macroscopic tissue examination. MRI-compatible catheters can be navigated and RF energy safely delivered under 3-Tesla RT MRI guidance. Recording electrograms during RT imaging also is feasible. RT visualization of lesion as it forms during RF energy delivery is possible and was demonstrated using T2W HASTE imaging. Copyright © 2011 Heart Rhythm Society. Published by Elsevier Inc. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Seidl, P. A.; Waldron, W.

This report describes the prototype final focus solenoid (FFS-1G), or 1st generation FFS. In order to limit eddy currents, the solenoid winding consists of Litz wire wound on a non-conductive G-10 tube. For the same reason, the winding pack was inserted into an electrically insulating, but thermally conducting Polypropylene (Cool- Poly© D1202) housing and potted with highly viscous epoxy (to be able to wick the single strands of the Litz wire). The magnet is forced-air cooled through cooling channels. The magnet was designed for water cooling, but he cooling jacket cracked, and therefore cooling (beyond natural conduction and radiation) wasmore » exclusively by forced air. Though the design operating point was 8 Tesla, for the majority of running on NDCX-1 it operated up to about 5 Tesla. This was due mostly from limitations of voltage holding at the leads, where discharges at higher pulsed current damaged the leads. Generation 1 was replaced by the 2nd generation solenoid (FFS-2G) about a year later, which has operated reliably up to 8 Tesla, with a better lead design and utilizes water cooling. At this point, FFS-1G was used for plasma source R&D by LBNL and PPPL. The maximum field for those experiments was reduced to 3 Tesla due to continued difficulty with the leads and because higher field was not essential for those experiments. The pulser for the final focusing solenoid is a SCR-switched capacitor bank which produces a half-sine current waveform. The pulse width is ~800us and a charge voltage of 3kV drives ~20kA through the magnet producing ~8T field.« less
GPUs, a New Tool of Acceleration in CFD: Efficiency and Reliability on Smoothed Particle Hydrodynamics Methods

PubMed Central

Crespo, Alejandro C.; Dominguez, Jose M.; Barreiro, Anxo; Gómez-Gesteira, Moncho; Rogers, Benedict D.

2011-01-01

Smoothed Particle Hydrodynamics (SPH) is a numerical method commonly used in Computational Fluid Dynamics (CFD) to simulate complex free-surface flows. Simulations with this mesh-free particle method far exceed the capacity of a single processor. In this paper, as part of a dual-functioning code for either central processing units (CPUs) or Graphics Processor Units (GPUs), a parallelisation using GPUs is presented. The GPU parallelisation technique uses the Compute Unified Device Architecture (CUDA) of nVidia devices. Simulations with more than one million particles on a single GPU card exhibit speedups of up to two orders of magnitude over using a single-core CPU. It is demonstrated that the code achieves different speedups with different CUDA-enabled GPUs. The numerical behaviour of the SPH code is validated with a standard benchmark test case of dam break flow impacting on an obstacle where good agreement with the experimental results is observed. Both the achieved speed-ups and the quantitative agreement with experiments suggest that CUDA-based GPU programming can be used in SPH methods with efficiency and reliability. PMID:21695185
Computational algorithms for simulations in atmospheric optics.

PubMed

Konyaev, P A; Lukin, V P

2016-04-20

A computer simulation technique for atmospheric and adaptive optics based on parallel programing is discussed. A parallel propagation algorithm is designed and a modified spectral-phase method for computer generation of 2D time-variant random fields is developed. Temporal power spectra of Laguerre-Gaussian beam fluctuations are considered as an example to illustrate the applications discussed. Implementation of the proposed algorithms using Intel MKL and IPP libraries and NVIDIA CUDA technology is shown to be very fast and accurate. The hardware system for the computer simulation is an off-the-shelf desktop with an Intel Core i7-4790K CPU operating at a turbo-speed frequency up to 5 GHz and an NVIDIA GeForce GTX-960 graphics accelerator with 1024 1.5 GHz processors.
Single-voxel and multi-voxel spectroscopy yield comparable results in the normal juvenile canine brain when using 3 Tesla magnetic resonance imaging.

PubMed

Lee, Alison M; Beasley, Michaela J; Barrett, Emerald D; James, Judy R; Gambino, Jennifer M

2018-06-10

Conventional magnetic resonance imaging (MRI) characteristics of canine brain diseases are often nonspecific. Single- and multi-voxel spectroscopy techniques allow quantification of chemical biomarkers for tissues of interest and may help to improve diagnostic specificity. However, published information is currently lacking for the in vivo performance of these two techniques in dogs. The aim of this prospective, methods comparison study was to compare the performance of single- and multi-voxel spectroscopy in the brains of eight healthy, juvenile dogs using 3 Tesla MRI. Ipsilateral regions of single- and multi-voxel spectroscopy were performed in symmetric regions of interest of each brain in the parietal (n = 3), thalamic (n = 2), and piriform lobes (n = 3). In vivo single-voxel spectroscopy and multi-voxel spectroscopy metabolite ratios from the same size and multi-voxel spectroscopy ratios from different sized regions of interest were compared. No significant difference was seen between single-voxel spectroscopy and multi-voxel spectroscopy metabolite ratios for any lobe when regions of interest were similar in size and shape. Significant lobar single-voxel spectroscopy and multi-voxel spectroscopy differences were seen between the parietal lobe and thalamus (P = 0.047) for the choline to N-acetyl aspartase ratios when large multi-voxel spectroscopy regions of interest were compared to very small multi-voxel spectroscopy regions of interest within the same lobe; and for the N-acetyl aspartase to creatine ratios in all lobes when single-voxel spectroscopy was compared to combined (pooled) multi-voxel spectroscopy datasets. Findings from this preliminary study indicated that single- and multi-voxel spectroscopy techniques using 3T MRI yield comparable results for similar sized regions of interest in the normal canine brain. Findings also supported using the contralateral side as an internal control for dogs with brain lesions. © 2018 American College of Veterinary Radiology.
Dynamic interleaved 1H/31P STEAM MRS at 3 Tesla using a pneumatic force-controlled plantar flexion exercise rig

PubMed Central

Meyerspeer, M.; Krššák, M.; Kemp, G.J.; Roden, M.; Moser, E.

2016-01-01

1 Objective To develop a measurement method for interleaved acquisition of 1H and 31P STEAM localised spectra of exercising human calf muscle. 2 Materials and Methods A nonmagnetic exercise rig with a pneumatic piston and sensors for force and pedal angle was constructed to enable plantar flexion measured in the 3 Tesla MR scanner, which holds the dual tuned (1H,31P) surface coil used for signal transmission and reception. 3 Results 31P spectra acquired in interleaved mode benefit from higher SNR (factor of 1.34± 0.06 for PCr) compared to standard acquisition due to the Nuclear Overhauser effect (NOE) and substantial PCr/Pi changes during exercise can be observed in 31P spectra. 1H spectral quality is equal to that in single mode experiments and allows Cr2 changes to be monitored. 4 Conclusion The feasibility of dynamic interleaved localised 1H and 31P spectroscopy during plantar flexion exercise has been demonstrated using a custom-built pneumatic system for muscle activation. This opens the possibility of studying the dynamics of metabolism with multi nuclear MRS in a single run. PMID:16320091
Transport properties of Cu-doped bismuth selenide single crystals at high magnetic fields up to 60 Tesla: Shubnikov-de Haas oscillations and π-Berry phase

NASA Astrophysics Data System (ADS)

Romanova, Taisiia A.; Knyazev, Dmitry A.; Wang, Zhaosheng; Sadakov, Andrey V.; Prudkoglyad, Valery A.

2018-05-01

We report Shubnikov-de Haas (SdH) and Hall oscillations in Cu-doped high quality bismuth selenide single crystals. To increase the accuracy of Berry phase determination by means of the of the SdH oscillations phase analysis we present a study of n-type samples with bulk carrier density n ∼1019 -1020cm-3 at high magnetic field up to 60 Tesla. In particular, Landau level fan diagram starting from the value of the Landau index N = 4 was plotted. Thus, from our data we found π-Berry phase that directly indicates the Dirac nature of the carriers in three-dimensional topological insulator (3D TI) based on Cu-doped bismuth selenide. We argued that in our samples the magnetotransport is determined by a general group of carriers that exhibit quasi-two-dimensional (2D) behaviour and are characterized by topological π-Berry phase. Along with the main contribution to the conductivity the presence of a small group of bulk carriers was registered. For 3D-pocket Berry phase was identified as zero, which is a characteristic of trivial metallic states.
Development of a Cloud Resolving Model for Heterogeneous Supercomputers

NASA Astrophysics Data System (ADS)

Sreepathi, S.; Norman, M. R.; Pal, A.; Hannah, W.; Ponder, C.

2017-12-01

A cloud resolving climate model is needed to reduce major systematic errors in climate simulations due to structural uncertainty in numerical treatments of convection - such as convective storm systems. This research describes the porting effort to enable SAM (System for Atmosphere Modeling) cloud resolving model on heterogeneous supercomputers using GPUs (Graphical Processing Units). We have isolated a standalone configuration of SAM that is targeted to be integrated into the DOE ACME (Accelerated Climate Modeling for Energy) Earth System model. We have identified key computational kernels from the model and offloaded them to a GPU using the OpenACC programming model. Furthermore, we are investigating various optimization strategies intended to enhance GPU utilization including loop fusion/fission, coalesced data access and loop refactoring to a higher abstraction level. We will present early performance results, lessons learned as well as optimization strategies. The computational platform used in this study is the Summitdev system, an early testbed that is one generation removed from Summit, the next leadership class supercomputer at Oak Ridge National Laboratory. The system contains 54 nodes wherein each node has 2 IBM POWER8 CPUs and 4 NVIDIA Tesla P100 GPUs. This work is part of a larger project, ACME-MMF component of the U.S. Department of Energy(DOE) Exascale Computing Project. The ACME-MMF approach addresses structural uncertainty in cloud processes by replacing traditional parameterizations with cloud resolving "superparameterization" within each grid cell of global climate model. Super-parameterization dramatically increases arithmetic intensity, making the MMF approach an ideal strategy to achieve good performance on emerging exascale computing architectures. The goal of the project is to integrate superparameterization into ACME, and explore its full potential to scientifically and computationally advance climate simulation and prediction.
Ultrafast treatment plan optimization for volumetric modulated arc therapy (VMAT)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Men Chunhua; Romeijn, H. Edwin; Jia Xun

2010-11-15

Purpose: To develop a novel aperture-based algorithm for volumetric modulated arc therapy (VMAT) treatment plan optimization with high quality and high efficiency. Methods: The VMAT optimization problem is formulated as a large-scale convex programming problem solved by a column generation approach. The authors consider a cost function consisting two terms, the first enforcing a desired dose distribution and the second guaranteeing a smooth dose rate variation between successive gantry angles. A gantry rotation is discretized into 180 beam angles and for each beam angle, only one MLC aperture is allowed. The apertures are generated one by one in a sequentialmore » way. At each iteration of the column generation method, a deliverable MLC aperture is generated for one of the unoccupied beam angles by solving a subproblem with the consideration of MLC mechanic constraints. A subsequent master problem is then solved to determine the dose rate at all currently generated apertures by minimizing the cost function. When all 180 beam angles are occupied, the optimization completes, yielding a set of deliverable apertures and associated dose rates that produce a high quality plan. Results: The algorithm was preliminarily tested on five prostate and five head-and-neck clinical cases, each with one full gantry rotation without any couch/collimator rotations. High quality VMAT plans have been generated for all ten cases with extremely high efficiency. It takes only 5-8 min on CPU (MATLAB code on an Intel Xeon 2.27 GHz CPU) and 18-31 s on GPU (CUDA code on an NVIDIA Tesla C1060 GPU card) to generate such plans. Conclusions: The authors have developed an aperture-based VMAT optimization algorithm which can generate clinically deliverable high quality treatment plans at very high efficiency.« less
Ultrafast treatment plan optimization for volumetric modulated arc therapy (VMAT).

PubMed

Men, Chunhua; Romeijn, H Edwin; Jia, Xun; Jiang, Steve B

2010-11-01

To develop a novel aperture-based algorithm for volumetric modulated are therapy (VMAT) treatment plan optimization with high quality and high efficiency. The VMAT optimization problem is formulated as a large-scale convex programming problem solved by a column generation approach. The authors consider a cost function consisting two terms, the first enforcing a desired dose distribution and the second guaranteeing a smooth dose rate variation between successive gantry angles. A gantry rotation is discretized into 180 beam angles and for each beam angle, only one MLC aperture is allowed. The apertures are generated one by one in a sequential way. At each iteration of the column generation method, a deliverable MLC aperture is generated for one of the unoccupied beam angles by solving a subproblem with the consideration of MLC mechanic constraints. A subsequent master problem is then solved to determine the dose rate at all currently generated apertures by minimizing the cost function. When all 180 beam angles are occupied, the optimization completes, yielding a set of deliverable apertures and associated dose rates that produce a high quality plan. The algorithm was preliminarily tested on five prostate and five head-and-neck clinical cases, each with one full gantry rotation without any couch/collimator rotations. High quality VMAT plans have been generated for all ten cases with extremely high efficiency. It takes only 5-8 min on CPU (MATLAB code on an Intel Xeon 2.27 GHz CPU) and 18-31 s on GPU (CUDA code on an NVIDIA Tesla C1060 GPU card) to generate such plans. The authors have developed an aperture-based VMAT optimization algorithm which can generate clinically deliverable high quality treatment plans at very high efficiency.
Patient-specific non-linear finite element modelling for predicting soft organ deformation in real-time: application to non-rigid neuroimage registration.

PubMed

Wittek, Adam; Joldes, Grand; Couton, Mathieu; Warfield, Simon K; Miller, Karol

2010-12-01

Long computation times of non-linear (i.e. accounting for geometric and material non-linearity) biomechanical models have been regarded as one of the key factors preventing application of such models in predicting organ deformation for image-guided surgery. This contribution presents real-time patient-specific computation of the deformation field within the brain for six cases of brain shift induced by craniotomy (i.e. surgical opening of the skull) using specialised non-linear finite element procedures implemented on a graphics processing unit (GPU). In contrast to commercial finite element codes that rely on an updated Lagrangian formulation and implicit integration in time domain for steady state solutions, our procedures utilise the total Lagrangian formulation with explicit time stepping and dynamic relaxation. We used patient-specific finite element meshes consisting of hexahedral and non-locking tetrahedral elements, together with realistic material properties for the brain tissue and appropriate contact conditions at the boundaries. The loading was defined by prescribing deformations on the brain surface under the craniotomy. Application of the computed deformation fields to register (i.e. align) the preoperative and intraoperative images indicated that the models very accurately predict the intraoperative deformations within the brain. For each case, computing the brain deformation field took less than 4 s using an NVIDIA Tesla C870 GPU, which is two orders of magnitude reduction in computation time in comparison to our previous study in which the brain deformation was predicted using a commercial finite element solver executed on a personal computer. Copyright © 2010 Elsevier Ltd. All rights reserved.
Compute-unified device architecture implementation of a block-matching algorithm for multiple graphical processing unit cards

PubMed Central

Massanes, Francesc; Cadennes, Marie; Brankov, Jovan G.

2012-01-01

In this paper we describe and evaluate a fast implementation of a classical block matching motion estimation algorithm for multiple Graphical Processing Units (GPUs) using the Compute Unified Device Architecture (CUDA) computing engine. The implemented block matching algorithm (BMA) uses summed absolute difference (SAD) error criterion and full grid search (FS) for finding optimal block displacement. In this evaluation we compared the execution time of a GPU and CPU implementation for images of various sizes, using integer and non-integer search grids. The results show that use of a GPU card can shorten computation time by a factor of 200 times for integer and 1000 times for a non-integer search grid. The additional speedup for non-integer search grid comes from the fact that GPU has built-in hardware for image interpolation. Further, when using multiple GPU cards, the presented evaluation shows the importance of the data splitting method across multiple cards, but an almost linear speedup with a number of cards is achievable. In addition we compared execution time of the proposed FS GPU implementation with two existing, highly optimized non-full grid search CPU based motion estimations methods, namely implementation of the Pyramidal Lucas Kanade Optical flow algorithm in OpenCV and Simplified Unsymmetrical multi-Hexagon search in H.264/AVC standard. In these comparisons, FS GPU implementation still showed modest improvement even though the computational complexity of FS GPU implementation is substantially higher than non-FS CPU implementation. We also demonstrated that for an image sequence of 720×480 pixels in resolution, commonly used in video surveillance, the proposed GPU implementation is sufficiently fast for real-time motion estimation at 30 frames-per-second using two NVIDIA C1060 Tesla GPU cards. PMID:22347787
Compute-unified device architecture implementation of a block-matching algorithm for multiple graphical processing unit cards.

PubMed

Massanes, Francesc; Cadennes, Marie; Brankov, Jovan G

2011-07-01

In this paper we describe and evaluate a fast implementation of a classical block matching motion estimation algorithm for multiple Graphical Processing Units (GPUs) using the Compute Unified Device Architecture (CUDA) computing engine. The implemented block matching algorithm (BMA) uses summed absolute difference (SAD) error criterion and full grid search (FS) for finding optimal block displacement. In this evaluation we compared the execution time of a GPU and CPU implementation for images of various sizes, using integer and non-integer search grids.The results show that use of a GPU card can shorten computation time by a factor of 200 times for integer and 1000 times for a non-integer search grid. The additional speedup for non-integer search grid comes from the fact that GPU has built-in hardware for image interpolation. Further, when using multiple GPU cards, the presented evaluation shows the importance of the data splitting method across multiple cards, but an almost linear speedup with a number of cards is achievable.In addition we compared execution time of the proposed FS GPU implementation with two existing, highly optimized non-full grid search CPU based motion estimations methods, namely implementation of the Pyramidal Lucas Kanade Optical flow algorithm in OpenCV and Simplified Unsymmetrical multi-Hexagon search in H.264/AVC standard. In these comparisons, FS GPU implementation still showed modest improvement even though the computational complexity of FS GPU implementation is substantially higher than non-FS CPU implementation.We also demonstrated that for an image sequence of 720×480 pixels in resolution, commonly used in video surveillance, the proposed GPU implementation is sufficiently fast for real-time motion estimation at 30 frames-per-second using two NVIDIA C1060 Tesla GPU cards.
SU-E-T-36: A GPU-Accelerated Monte-Carlo Dose Calculation Platform and Its Application Toward Validating a ViewRay Beam Model

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang, Y; Mazur, T; Green, O

Purpose: To build a fast, accurate and easily-deployable research platform for Monte-Carlo dose calculations. We port the dose calculation engine PENELOPE to C++, and accelerate calculations using GPU acceleration. Simulations of a Co-60 beam model provided by ViewRay demonstrate the capabilities of the platform. Methods: We built software that incorporates a beam model interface, CT-phantom model, GPU-accelerated PENELOPE engine, and GUI front-end. We rewrote the PENELOPE kernel in C++ (from Fortran) and accelerated the code on a GPU. We seamlessly integrated a Co-60 beam model (obtained from ViewRay) into our platform. Simulations of various field sizes and SSDs using amore » homogeneous water phantom generated PDDs, dose profiles, and output factors that were compared to experiment data. Results: With GPU acceleration using a dated graphics card (Nvidia Tesla C2050), a highly accurate simulation – including 100*100*100 grid, 3×3×3 mm3 voxels, <1% uncertainty, and 4.2×4.2 cm2 field size – runs 24 times faster (20 minutes versus 8 hours) than when parallelizing on 8 threads across a new CPU (Intel i7-4770). Simulated PDDs, profiles and output ratios for the commercial system agree well with experiment data measured using radiographic film or ionization chamber. Based on our analysis, this beam model is precise enough for general applications. Conclusions: Using a beam model for a Co-60 system provided by ViewRay, we evaluate a dose calculation platform that we developed. Comparison to measurements demonstrates the promise of our software for use as a research platform for dose calculations, with applications including quality assurance and treatment plan verification.« less
Multi-GPGPU Tsunami simulation at Toyama-bay

NASA Astrophysics Data System (ADS)

Furuyama, Shoichi; Ueda, Yuki

2017-07-01

Accelerated multi General Purpose Graphics Processing Unit (GPGPU) calculation for Tsunami run-up simulation was achieved at the wide area (whole Toyama-bay in Japan) by faster computation technique. Toyama-bay has active-faults at the sea-bed. It has a high possibility to occur earthquakes and Tsunami waves in the case of the huge earthquake, that's why to predict the area of Tsunami run-up is important for decreasing damages to residents by the disaster. However it is very hard task to achieve the simulation by the computer resources problem. A several meter's order of the high resolution calculation is required for the running-up Tsunami simulation because artificial structures on the ground such as roads, buildings, and houses are very small. On the other hand the huge area simulation is also required. In the Toyama-bay case the area is 42 [km] × 15 [km]. When 5 [m] × 5 [m] size computational cells are used for the simulation, over 26,000,000 computational cells are generated. To calculate the simulation, a normal CPU desktop computer took about 10 hours for the calculation. An improvement of calculation time is important problem for the immediate prediction system of Tsunami running-up, as a result it will contribute to protect a lot of residents around the coastal region. The study tried to decrease this calculation time by using multi GPGPU system which is equipped with six NVIDIA TESLA K20xs, InfiniBand network connection between computer nodes by MVAPICH library. As a result 5.16 times faster calculation was achieved on six GPUs than one GPU case and it was 86% parallel efficiency to the linear speed up.
Real-time time-division color electroholography using a single GPU and a USB module for synchronizing reference light.

PubMed

Araki, Hiromitsu; Takada, Naoki; Niwase, Hiroaki; Ikawa, Shohei; Fujiwara, Masato; Nakayama, Hirotaka; Kakue, Takashi; Shimobaba, Tomoyoshi; Ito, Tomoyoshi

2015-12-01

We propose real-time time-division color electroholography using a single graphics processing unit (GPU) and a simple synchronization system of reference light. To facilitate real-time time-division color electroholography, we developed a light emitting diode (LED) controller with a universal serial bus (USB) module and the drive circuit for reference light. A one-chip RGB LED connected to a personal computer via an LED controller was used as the reference light. A single GPU calculates three computer-generated holograms (CGHs) suitable for red, green, and blue colors in each frame of a three-dimensional (3D) movie. After CGH calculation using a single GPU, the CPU can synchronize the CGH display with the color switching of the one-chip RGB LED via the LED controller. Consequently, we succeeded in real-time time-division color electroholography for a 3D object consisting of around 1000 points per color when an NVIDIA GeForce GTX TITAN was used as the GPU. Furthermore, we implemented the proposed method in various GPUs. The experimental results showed that the proposed method was effective for various GPUs.
Nikola Tesla: the man behind the magnetic field unit.

PubMed

Roguin, Ariel

2004-03-01

The magnetic field strength of both the magnet and gradient coils used in MR imaging equipment is measured in Tesla units, which are named for Nikola Tesla. This article presents the life and achievements of this Serbian-American inventor and researcher who discovered the rotating magnetic field, the basis of most alternating-current machinery. Nikola Tesla had 700 patents in the United States and Europe that covered every aspect of science and technology. Tesla's discoveries include the Tesla coil, AC electrical conduction, improved lighting, newer forms of turbine engines, robotics, fluorescent light, wireless transmission of electrical energy, radio, remote control, discovery of cosmic radio waves, and the use of the ionosphere for scientific purposes. He was a genius whose discoveries had a pivotal role in advancing us into the modern era. Copyright 2004 Wiley-Liss, Inc.
In vivo high-resolution 7 Tesla MRI shows early and diffuse cortical alterations in CADASIL.

PubMed

De Guio, François; Reyes, Sonia; Vignaud, Alexandre; Duering, Marco; Ropele, Stefan; Duchesnay, Edouard; Chabriat, Hugues; Jouvent, Eric

2014-01-01

Recent data suggest that early symptoms may be related to cortex alterations in CADASIL (Cerebral Autosomal-Dominant Arteriopathy with Subcortical Infarcts and Leukoencephalopathy), a monogenic model of cerebral small vessel disease (SVD). The aim of this study was to investigate cortical alterations using both high-resolution T2* acquisitions obtained with 7 Tesla MRI and structural T1 images with 3 Tesla MRI in CADASIL patients with no or only mild symptomatology (modified Rankin's scale ≤1 and Mini Mental State Examination (MMSE) ≥24). Complete reconstructions of the cortex using 7 Tesla T2* acquisitions with 0.7 mm isotropic resolution were obtained in 11 patients (52.1±13.2 years, 36% male) and 24 controls (54.8±11.0 years, 42% male). Seven Tesla T2* within the cortex and cortical thickness and morphology obtained from 3 Tesla images were compared between CADASIL and control subjects using general linear models. MMSE, brain volume, cortical thickness and global sulcal morphology did not differ between groups. By contrast, T2* measured by 7 Tesla MRI was significantly increased in frontal, parietal, occipital and cingulate cortices in patients after correction for multiple testing. These changes were not related to white matter lesions, lacunes or microhemorrhages in patients having no brain atrophy compared to controls. Seven Tesla MRI, by contrast to state of the art post-processing of 3 Tesla acquisitions, shows diffuse T2* alterations within the cortical mantle in CADASIL whose origin remains to be determined.
Comparison of Deep Brain Stimulation Lead Targeting Accuracy and Procedure Duration between 1.5- and 3-Tesla Interventional Magnetic Resonance Imaging Systems: An Initial 12-Month Experience.

PubMed

Southwell, Derek G; Narvid, Jared A; Martin, Alastair J; Qasim, Salman E; Starr, Philip A; Larson, Paul S

2016-01-01

Interventional magnetic resonance imaging (iMRI) allows deep brain stimulator lead placement under general anesthesia. While the accuracy of lead targeting has been described for iMRI systems utilizing 1.5-tesla magnets, a similar assessment of 3-tesla iMRI procedures has not been performed. To compare targeting accuracy, the number of lead targeting attempts, and surgical duration between procedures performed on 1.5- and 3-tesla iMRI systems. Radial targeting error, the number of targeting attempts, and procedure duration were compared between surgeries performed on 1.5- and 3-tesla iMRI systems (SmartFrame and ClearPoint systems). During the first year of operation of each system, 26 consecutive leads were implanted using the 1.5-tesla system, and 23 consecutive leads were implanted using the 3-tesla system. There was no significant difference in radial error (Mann-Whitney test, p = 0.26), number of lead placements that required multiple targeting attempts (Fisher's exact test, p = 0.59), or bilateral procedure durations between surgeries performed with the two systems (p = 0.15). Accurate DBS lead targeting can be achieved with iMRI systems utilizing either 1.5- or 3-tesla magnets. The use of a 3-tesla magnet, however, offers improved visualization of the target structures and allows comparable accuracy and efficiency of placement at the selected targets. © 2016 S. Karger AG, Basel.
Teslaphoresis of Carbon Nanotubes.

PubMed

Bornhoeft, Lindsey R; Castillo, Aida C; Smalley, Preston R; Kittrell, Carter; James, Dustin K; Brinson, Bruce E; Rybolt, Thomas R; Johnson, Bruce R; Cherukuri, Tonya K; Cherukuri, Paul

2016-04-26

This paper introduces Teslaphoresis, the directed motion and self-assembly of matter by a Tesla coil, and studies this electrokinetic phenomenon using single-walled carbon nanotubes (CNTs). Conventional directed self-assembly of matter using electric fields has been restricted to small scale structures, but with Teslaphoresis, we exceed this limitation by using the Tesla coil's antenna to create a gradient high-voltage force field that projects into free space. CNTs placed within the Teslaphoretic (TEP) field polarize and self-assemble into wires that span from the nanoscale to the macroscale, the longest thus far being 15 cm. We show that the TEP field not only directs the self-assembly of long nanotube wires at remote distances (>30 cm) but can also wirelessly power nanotube-based LED circuits. Furthermore, individualized CNTs self-organize to form long parallel arrays with high fidelity alignment to the TEP field. Thus, Teslaphoresis is effective for directed self-assembly from the bottom-up to the macroscale.

Using Large Signal Code TESLA for Wide Band Klystron Simulations

DTIC Science & Technology

2006-04-01

tuning procedure TESLA simulates of high power klystron [3]. accurately actual eigenmodes of the structure as a solution Wide band klystrons very often...on band klystrons with two-gap two-mode resonators. The decomposition of simulation region into an external results of TESLA simulations for NRL S ...UNCLASSIFIED Defense Technical Information Center Compilation Part Notice ADP022454 TITLE: Using Large Signal Code TESLA for Wide Band Klystron
DOE Office of Scientific and Technical Information (OSTI.GOV)

Evans, III, Boyd Mccutchen; Kisner, Roger A.; Ludtka, Gail Mackiewicz

A method of making a single crystal comprises heating a material comprising magnetic anisotropy to a temperature T sufficient to form a melt of the material. A magnetic field of at least about 1 Tesla is applied to the melt at the temperature T, where a magnetic free energy difference .DELTA.G.sub.m between different crystallographic axes is greater than a thermal energy kT. While applying the magnetic field, the melt is cooled at a rate of about 30.degree. C./min or higher, and the melt solidifies to form a single crystal of the material.
Optically probing the fine structure of a single Mn atom in an InAs quantum dot.

PubMed

Kudelski, A; Lemaître, A; Miard, A; Voisin, P; Graham, T C M; Warburton, R J; Krebs, O

2007-12-14

We report on the optical spectroscopy of a single InAs/GaAs quantum dot doped with a single Mn atom in a longitudinal magnetic field of a few Tesla. Our findings show that the Mn impurity is a neutral acceptor state A0 whose effective spin J=1 is significantly perturbed by the quantum dot potential and its associated strain field. The spin interaction with photocarriers injected in the quantum dot is shown to be ferromagnetic for holes, with an effective coupling constant of a few hundreds of mueV, but vanishingly small for electrons.
Quantitative techniques for musculoskeletal MRI at 7 Tesla.

PubMed

Bangerter, Neal K; Taylor, Meredith D; Tarbox, Grayson J; Palmer, Antony J; Park, Daniel J

2016-12-01

Whole-body 7 Tesla MRI scanners have been approved solely for research since they appeared on the market over 10 years ago, but may soon be approved for selected clinical neurological and musculoskeletal applications in both the EU and the United States. There has been considerable research work on musculoskeletal applications at 7 Tesla over the past decade, including techniques for ultra-high resolution morphological imaging, 3D T2 and T2* mapping, ultra-short TE applications, diffusion tensor imaging of cartilage, and several techniques for assessing proteoglycan content in cartilage. Most of this work has been done in the knee or other extremities, due to technical difficulties associated with scanning areas such as the hip and torso at 7 Tesla. In this manuscript, we first provide some technical context for 7 Tesla imaging, including challenges and potential advantages. We then review the major quantitative MRI techniques being applied to musculoskeletal applications on 7 Tesla whole-body systems.
Scaling Deep Learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing

DOE PAGES

Gawande, Nitin A.; Daily, Jeff A.; Siegel, Charles; ...

2018-05-05

Deep Learning (DL) algorithms have become ubiquitous in data analytics. As a result, major computing vendors—including NVIDIA, Intel, AMD, and IBM—have architectural road maps influenced by DL workloads. Furthermore, several vendors have recently advertised new computing products as accelerating large DL workloads. Unfortunately, it is difficult for data scientists to quantify the potential of these different products. Here, this article provides a performance and power analysis of important DL workloads on two major parallel architectures: NVIDIA DGX-1 (eight Pascal P100 GPUs interconnected with NVLink) and Intel Knights Landing (KNL) CPUs interconnected with Intel Omni-Path or Cray Aries. Our evaluation consistsmore » of a cross section of convolutional neural net workloads: CifarNet, AlexNet, GoogLeNet, and ResNet50 topologies using the Cifar10 and ImageNet datasets. The workloads are vendor-optimized for each architecture. We use sequentially equivalent implementations to maintain iso-accuracy between parallel and sequential DL models. Our analysis indicates that although GPUs provide the highest overall performance, the gap can close for some convolutional networks; and the KNL can be competitive in performance/watt. We find that NVLink facilitates scaling efficiency on GPUs. However, its importance is heavily dependent on neural network architecture. Furthermore, for weak-scaling—sometimes encouraged by restricted GPU memory—NVLink is less important.« less
Scaling Deep Learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gawande, Nitin A.; Daily, Jeff A.; Siegel, Charles

Deep Learning (DL) algorithms have become ubiquitous in data analytics. As a result, major computing vendors—including NVIDIA, Intel, AMD, and IBM—have architectural road maps influenced by DL workloads. Furthermore, several vendors have recently advertised new computing products as accelerating large DL workloads. Unfortunately, it is difficult for data scientists to quantify the potential of these different products. Here, this article provides a performance and power analysis of important DL workloads on two major parallel architectures: NVIDIA DGX-1 (eight Pascal P100 GPUs interconnected with NVLink) and Intel Knights Landing (KNL) CPUs interconnected with Intel Omni-Path or Cray Aries. Our evaluation consistsmore » of a cross section of convolutional neural net workloads: CifarNet, AlexNet, GoogLeNet, and ResNet50 topologies using the Cifar10 and ImageNet datasets. The workloads are vendor-optimized for each architecture. We use sequentially equivalent implementations to maintain iso-accuracy between parallel and sequential DL models. Our analysis indicates that although GPUs provide the highest overall performance, the gap can close for some convolutional networks; and the KNL can be competitive in performance/watt. We find that NVLink facilitates scaling efficiency on GPUs. However, its importance is heavily dependent on neural network architecture. Furthermore, for weak-scaling—sometimes encouraged by restricted GPU memory—NVLink is less important.« less
GPU acceleration of Dock6's Amber scoring computation.

PubMed

Yang, Hailong; Zhou, Qiongqiong; Li, Bo; Wang, Yongjian; Luan, Zhongzhi; Qian, Depei; Li, Hanlu

2010-01-01

Dressing the problem of virtual screening is a long-term goal in the drug discovery field, which if properly solved, can significantly shorten new drugs' R&D cycle. The scoring functionality that evaluates the fitness of the docking result is one of the major challenges in virtual screening. In general, scoring functionality in docking requires a large amount of floating-point calculations, which usually takes several weeks or even months to be finished. This time-consuming procedure is unacceptable, especially when highly fatal and infectious virus arises such as SARS and H1N1, which forces the scoring task to be done in a limited time. This paper presents how to leverage the computational power of GPU to accelerate Dock6's (http://dock.compbio.ucsf.edu/DOCK_6/) Amber (J. Comput. Chem. 25: 1157-1174, 2004) scoring with NVIDIA CUDA (NVIDIA Corporation Technical Staff, Compute Unified Device Architecture - Programming Guide, NVIDIA Corporation, 2008) (Compute Unified Device Architecture) platform. We also discuss many factors that will greatly influence the performance after porting the Amber scoring to GPU, including thread management, data transfer, and divergence hidden. Our experiments show that the GPU-accelerated Amber scoring achieves a 6.5× speedup with respect to the original version running on AMD dual-core CPU for the same problem size. This acceleration makes the Amber scoring more competitive and efficient for large-scale virtual screening problems.
High-resolution motion compensated MRA in patients with congenital heart disease using extracellular contrast agent at 3 Tesla.

PubMed

Dabir, Darius; Naehle, Claas Philip; Clauberg, Ralf; Gieseke, Juergen; Schild, Hans H; Thomas, Daniel

2012-10-29

Using first-pass MRA (FP-MRA) spatial resolution is limited by breath-hold duration. In addition, image quality may be hampered by respiratory and cardiac motion artefacts. In order to overcome these limitations an ECG- and navigator-gated high-resolution-MRA sequence (HR-MRA) with slow infusion of extracellular contrast agent was implemented at 3 Tesla for the assessment of congenital heart disease and compared to standard first-pass-MRA (FP-MRA). 34 patients (median age: 13 years) with congenital heart disease (CHD) were prospectively examined on a 3 Tesla system. The CMR-protocol comprised functional imaging, FP- and HR-MRA, and viability imaging. After the acquisition of the FP-MRA sequence using a single dose of extracellular contrast agent the motion compensated HR-MRA sequence with isotropic resolution was acquired while injecting the second single dose, utilizing the timeframe before viability imaging. Qualitative scores for image quality (two independent reviewers) as well as quantitative measurements of vessel sharpness and relative contrast were compared using the Wilcoxon signed-rank test. Quantitative measurements of vessel diameters were compared using the Bland-Altman test. The mean image quality score revealed significantly better image quality of the HR-MRA sequence compared to the FP-MRA sequence in all vessels of interest (ascending aorta (AA), left pulmonary artery (LPA), left superior pulmonary vein (LSPV), coronary sinus (CS), and coronary ostia (CO); all p < 0.0001). In comparison to FP-MRA, HR-MRA revealed significantly better vessel sharpness for all considered vessels (AA, LSPV and LPA; all p < 0.0001). The relative contrast of the HR-MRA sequence was less compared to the FP-MRA sequence (AA: p <0.028, main pulmonary artery: p <0.004, LSPV: p <0.005). Both, the results of the intra- and interobserver measurements of the vessel diameters revealed closer correlation and closer 95 % limits of agreement for the HR-MRA. HR-MRA revealed one additional clinical finding, missed by FP-MRA. An ECG- and navigator-gated HR-MRA-protocol with infusion of extracellular contrast agent at 3 Tesla is feasible. HR-MRA delivers significantly better image quality and vessel sharpness compared to FP-MRA. It may be integrated into a standard CMR-protocol for patients with CHD without the need for additional contrast agent injection and without any additional examination time.
In Vivo High-Resolution 7 Tesla MRI Shows Early and Diffuse Cortical Alterations in CADASIL

PubMed Central

De Guio, François; Reyes, Sonia; Vignaud, Alexandre; Duering, Marco; Ropele, Stefan; Duchesnay, Edouard; Chabriat, Hugues; Jouvent, Eric

2014-01-01

Background and Purpose Recent data suggest that early symptoms may be related to cortex alterations in CADASIL (Cerebral Autosomal-Dominant Arteriopathy with Subcortical Infarcts and Leukoencephalopathy), a monogenic model of cerebral small vessel disease (SVD). The aim of this study was to investigate cortical alterations using both high-resolution T2* acquisitions obtained with 7 Tesla MRI and structural T1 images with 3 Tesla MRI in CADASIL patients with no or only mild symptomatology (modified Rankin’s scale ≤1 and Mini Mental State Examination (MMSE) ≥24). Methods Complete reconstructions of the cortex using 7 Tesla T2* acquisitions with 0.7 mm isotropic resolution were obtained in 11 patients (52.1±13.2 years, 36% male) and 24 controls (54.8±11.0 years, 42% male). Seven Tesla T2* within the cortex and cortical thickness and morphology obtained from 3 Tesla images were compared between CADASIL and control subjects using general linear models. Results MMSE, brain volume, cortical thickness and global sulcal morphology did not differ between groups. By contrast, T2* measured by 7 Tesla MRI was significantly increased in frontal, parietal, occipital and cingulate cortices in patients after correction for multiple testing. These changes were not related to white matter lesions, lacunes or microhemorrhages in patients having no brain atrophy compared to controls. Conclusions Seven Tesla MRI, by contrast to state of the art post-processing of 3 Tesla acquisitions, shows diffuse T2* alterations within the cortical mantle in CADASIL whose origin remains to be determined. PMID:25165824
DOE Office of Scientific and Technical Information (OSTI.GOV)

Singer, W.; Singer, X.; Jelezov, I.

Activities of the past several years in developing the technique of forming seamless (weldless) cavity cells by hydroforming are summarized. An overview of the technique developed at DESY for the fabrication of single cells and multicells of the TESLA cavity shape is given and the major rf results are presented. The forming is performed by expanding a seamless tube with internal water pressure while simultaneously swaging it axially. Prior to the expansion the tube is necked at the iris area and at the ends. Tube radii and axial displacements are computer controlled during the forming process in accordance with resultsmore » of finite element method simulations for necking and expansion using the experimentally obtained strain-stress relationship of tube material. In cooperation with industry different methods of niobium seamless tube production have been explored. The most appropriate and successful method is a combination of spinning or deep drawing with flow forming. Several single-cell niobium cavities of the 1.3 GHz TESLA shape were produced by hydroforming. They reached accelerating gradients E acc up to 35 MV/m after buffered chemical polishing (BCP) and up to 42 MV/m after electropolishing (EP). More recent work concentrated on fabrication and testing of multicell and nine-cell cavities. Several seamless two- and three-cell units were explored. Accelerating gradients E acc of 30–35 MV/m were measured after BCP and E acc up to 40 MV/m were reached after EP. Nine-cell niobium cavities combining three three-cell units were completed at the company E. Zanon. These cavities reached accelerating gradients of E acc = 30–35 MV/m. One cavity is successfully integrated in an XFEL cryomodule and is used in the operation of the FLASH linear accelerator at DESY. Additionally the fabrication of bimetallic single-cell and multicell NbCu cavities by hydroforming was successfully developed. Several NbCu clad single-cell and double-cell cavities of the TESLA shape have been fabricated. The clad seamless tubes were produced using hot bonding or explosive bonding and subsequent flow forming. The thicknesses of Nb and Cu layers in the tube wall are about 1 and 3 mm respectively. The rf performance of the best NbCu clad cavities is similar to that of bulk Nb cavities. The highest accelerating gradient achieved was 40 MV/m. The advantages and disadvantages of hydroformed cavities are discussed in this paper.« less
Hydroforming of elliptical cavities

DOE PAGES

Singer, W.; Singer, X.; Jelezov, I.; ...

2015-02-27

Activities of the past several years in developing the technique of forming seamless (weldless) cavity cells by hydroforming are summarized. An overview of the technique developed at DESY for the fabrication of single cells and multicells of the TESLA cavity shape is given and the major rf results are presented. The forming is performed by expanding a seamless tube with internal water pressure while simultaneously swaging it axially. Prior to the expansion the tube is necked at the iris area and at the ends. Tube radii and axial displacements are computer controlled during the forming process in accordance with resultsmore » of finite element method simulations for necking and expansion using the experimentally obtained strain-stress relationship of tube material. In cooperation with industry different methods of niobium seamless tube production have been explored. The most appropriate and successful method is a combination of spinning or deep drawing with flow forming. Several single-cell niobium cavities of the 1.3 GHz TESLA shape were produced by hydroforming. They reached accelerating gradients E acc up to 35 MV/m after buffered chemical polishing (BCP) and up to 42 MV/m after electropolishing (EP). More recent work concentrated on fabrication and testing of multicell and nine-cell cavities. Several seamless two- and three-cell units were explored. Accelerating gradients E acc of 30–35 MV/m were measured after BCP and E acc up to 40 MV/m were reached after EP. Nine-cell niobium cavities combining three three-cell units were completed at the company E. Zanon. These cavities reached accelerating gradients of E acc = 30–35 MV/m. One cavity is successfully integrated in an XFEL cryomodule and is used in the operation of the FLASH linear accelerator at DESY. Additionally the fabrication of bimetallic single-cell and multicell NbCu cavities by hydroforming was successfully developed. Several NbCu clad single-cell and double-cell cavities of the TESLA shape have been fabricated. The clad seamless tubes were produced using hot bonding or explosive bonding and subsequent flow forming. The thicknesses of Nb and Cu layers in the tube wall are about 1 and 3 mm respectively. The rf performance of the best NbCu clad cavities is similar to that of bulk Nb cavities. The highest accelerating gradient achieved was 40 MV/m. The advantages and disadvantages of hydroformed cavities are discussed in this paper.« less
Hydroforming of elliptical cavities

NASA Astrophysics Data System (ADS)

Singer, W.; Singer, X.; Jelezov, I.; Kneisel, P.

2015-02-01

Activities of the past several years in developing the technique of forming seamless (weldless) cavity cells by hydroforming are summarized. An overview of the technique developed at DESY for the fabrication of single cells and multicells of the TESLA cavity shape is given and the major rf results are presented. The forming is performed by expanding a seamless tube with internal water pressure while simultaneously swaging it axially. Prior to the expansion the tube is necked at the iris area and at the ends. Tube radii and axial displacements are computer controlled during the forming process in accordance with results of finite element method simulations for necking and expansion using the experimentally obtained strain-stress relationship of tube material. In cooperation with industry different methods of niobium seamless tube production have been explored. The most appropriate and successful method is a combination of spinning or deep drawing with flow forming. Several single-cell niobium cavities of the 1.3 GHz TESLA shape were produced by hydroforming. They reached accelerating gradients Eacc up to 35 MV /m after buffered chemical polishing (BCP) and up to 42 MV /m after electropolishing (EP). More recent work concentrated on fabrication and testing of multicell and nine-cell cavities. Several seamless two- and three-cell units were explored. Accelerating gradients Eacc of 30 - 35 MV /m were measured after BCP and Eacc up to 40 MV /m were reached after EP. Nine-cell niobium cavities combining three three-cell units were completed at the company E. Zanon. These cavities reached accelerating gradients of Eacc=30 - 35 MV /m . One cavity is successfully integrated in an XFEL cryomodule and is used in the operation of the FLASH linear accelerator at DESY. Additionally the fabrication of bimetallic single-cell and multicell NbCu cavities by hydroforming was successfully developed. Several NbCu clad single-cell and double-cell cavities of the TESLA shape have been fabricated. The clad seamless tubes were produced using hot bonding or explosive bonding and subsequent flow forming. The thicknesses of Nb and Cu layers in the tube wall are about 1 and 3 mm respectively. The rf performance of the best NbCu clad cavities is similar to that of bulk Nb cavities. The highest accelerating gradient achieved was 40 MV /m . The advantages and disadvantages of hydroformed cavities are discussed in this paper.
Intraindividual comparison of image quality in MR urography at 1.5 and 3 tesla in an animal model.

PubMed

Regier, M; Nolte-Ernsting, C; Adam, G; Kemper, J

2008-10-01

Experimental evaluation of image quality of the upper urinary tract in MR urography (MRU) at 1.5 and 3 Tesla in a porcine model. In this study four healthy domestic pigs, weighing between 71 and 80 kg (mean 73.6 kg), were examined with a standard T1w 3D-GRE and a high-resolution (HR) T1w 3D-GRE sequence at 1.5 and 3 Tesla. Additionally, at 3 Tesla both sequences were performed with parallel imaging (SENSE factor 2). The MR urographic scans were performed after intravenous injection of gadolinium-DTPA (0.1 mmol/kg body weight (bw)) and low-dose furosemide (0.1 mg/kg bw). Image evaluation was performed by two independent radiologists blinded to sequence parameters and field strength. Image analysis included grading of image quality of the segmented collecting system based on a five-point grading scale regarding anatomical depiction and artifacts observed (1: the majority of the segment (>50%) was not depicted or was obscured by major artifacts; 5: the segment was visualized without artifacts and had sharply defined borders). Signal-to-noise (SNR) and contrast-to-noise (CNR) ratios were determined. Statistical analysis included kappa-statistics, Wilcoxon and paired student t-test. The mean scores for MR urographies at 1.5 Tesla were 2.83 for the 3D-GRE and 3.48 for the HR3D-GRE sequence. Significantly higher values were determined using the corresponding sequences at 3 Tesla, averaging 3.19 for the 3D-GRE (p = 0.047) and 3.92 for the HR3D-GRE (p = 0,023) sequence. Delineation of the pelvicaliceal system was rated significantly higher at 3 Tesla compared to 1.5 Tesla (3D-GRE: p = 0.015; HR3D-GRE: p = 0.006). At 3 Tesla the mean SNR and CNR were significantly higher (p < 0.05). A kappa of 0.67 indicated good interobserver agreement. In an experimental setup, MR urography at 3 Tesla allowed for significantly higher image quality and SNR compared to 1.5 Tesla, particularly for the visualization of the pelvicaliceal system.
3D gaze tracking system for NVidia 3D Vision®.

PubMed

Wibirama, Sunu; Hamamoto, Kazuhiko

2013-01-01

Inappropriate parallax setting in stereoscopic content generally causes visual fatigue and visual discomfort. To optimize three dimensional (3D) effects in stereoscopic content by taking into account health issue, understanding how user gazes at 3D direction in virtual space is currently an important research topic. In this paper, we report the study of developing a novel 3D gaze tracking system for Nvidia 3D Vision(®) to be used in desktop stereoscopic display. We suggest an optimized geometric method to accurately measure the position of virtual 3D object. Our experimental result shows that the proposed system achieved better accuracy compared to conventional geometric method by average errors 0.83 cm, 0.87 cm, and 1.06 cm in X, Y, and Z dimensions, respectively.
A simple GPU-accelerated two-dimensional MUSCL-Hancock solver for ideal magnetohydrodynamics

NASA Astrophysics Data System (ADS)

Bard, Christopher M.; Dorelli, John C.

2014-02-01

We describe our experience using NVIDIA's CUDA (Compute Unified Device Architecture) C programming environment to implement a two-dimensional second-order MUSCL-Hancock ideal magnetohydrodynamics (MHD) solver on a GTX 480 Graphics Processing Unit (GPU). Taking a simple approach in which the MHD variables are stored exclusively in the global memory of the GTX 480 and accessed in a cache-friendly manner (without further optimizing memory access by, for example, staging data in the GPU's faster shared memory), we achieved a maximum speed-up of ≈126 for a 10242 grid relative to the sequential C code running on a single Intel Nehalem (2.8 GHz) core. This speedup is consistent with simple estimates based on the known floating point performance, memory throughput and parallel processing capacity of the GTX 480.
MULTI-CORE AND OPTICAL PROCESSOR RELATED APPLICATIONS RESEARCH AT OAK RIDGE NATIONAL LABORATORY

DOE Office of Scientific and Technical Information (OSTI.GOV)

Barhen, Jacob; Kerekes, Ryan A; ST Charles, Jesse Lee

2008-01-01

High-speed parallelization of common tasks holds great promise as a low-risk approach to achieving the significant increases in signal processing and computational performance required for next generation innovations in reconfigurable radio systems. Researchers at the Oak Ridge National Laboratory have been working on exploiting the parallelization offered by this emerging technology and applying it to a variety of problems. This paper will highlight recent experience with four different parallel processors applied to signal processing tasks that are directly relevant to signal processing required for SDR/CR waveforms. The first is the EnLight Optical Core Processor applied to matched filter (MF) correlationmore » processing via fast Fourier transform (FFT) of broadband Dopplersensitive waveforms (DSW) using active sonar arrays for target tracking. The second is the IBM CELL Broadband Engine applied to 2-D discrete Fourier transform (DFT) kernel for image processing and frequency domain processing. And the third is the NVIDIA graphical processor applied to document feature clustering. EnLight Optical Core Processor. Optical processing is inherently capable of high-parallelism that can be translated to very high performance, low power dissipation computing. The EnLight 256 is a small form factor signal processing chip (5x5 cm2) with a digital optical core that is being developed by an Israeli startup company. As part of its evaluation of foreign technology, ORNL's Center for Engineering Science Advanced Research (CESAR) had access to a precursor EnLight 64 Alpha hardware for a preliminary assessment of capabilities in terms of large Fourier transforms for matched filter banks and on applications related to Doppler-sensitive waveforms. This processor is optimized for array operations, which it performs in fixed-point arithmetic at the rate of 16 TeraOPS at 8-bit precision. This is approximately 1000 times faster than the fastest DSP available today. The optical core performs the matrix-vector multiplications, where the nominal matrix size is 256x256. The system clock is 125MHz. At each clock cycle, 128K multiply-and-add operations per second (OPS) are carried out, which yields a peak performance of 16 TeraOPS. IBM Cell Broadband Engine. The Cell processor is the extraordinary resulting product of 5 years of sustained, intensive R&D collaboration (involving over $400M investment) between IBM, Sony, and Toshiba. Its architecture comprises one multithreaded 64-bit PowerPC processor element (PPE) with VMX capabilities and two levels of globally coherent cache, and 8 synergistic processor elements (SPEs). Each SPE consists of a processor (SPU) designed for streaming workloads, local memory, and a globally coherent direct memory access (DMA) engine. Computations are performed in 128-bit wide single instruction multiple data streams (SIMD). An integrated high-bandwidth element interconnect bus (EIB) connects the nine processors and their ports to external memory and to system I/O. The Applied Software Engineering Research (ASER) Group at the ORNL is applying the Cell to a variety of text and image analysis applications. Research on Cell-equipped PlayStation3 (PS3) consoles has led to the development of a correlation-based image recognition engine that enables a single PS3 to process images at more than 10X the speed of state-of-the-art single-core processors. NVIDIA Graphics Processing Units. The ASER group is also employing the latest NVIDIA graphical processing units (GPUs) to accelerate clustering of thousands of text documents using recently developed clustering algorithms such as document flocking and affinity propagation.« less
Swan: A tool for porting CUDA programs to OpenCL

NASA Astrophysics Data System (ADS)

Harvey, M. J.; De Fabritiis, G.

2011-04-01

The use of modern, high-performance graphical processing units (GPUs) for acceleration of scientific computation has been widely reported. The majority of this work has used the CUDA programming model supported exclusively by GPUs manufactured by NVIDIA. An industry standardisation effort has recently produced the OpenCL specification for GPU programming. This offers the benefits of hardware-independence and reduced dependence on proprietary tool-chains. Here we describe a source-to-source translation tool, "Swan" for facilitating the conversion of an existing CUDA code to use the OpenCL model, as a means to aid programmers experienced with CUDA in evaluating OpenCL and alternative hardware. While the performance of equivalent OpenCL and CUDA code on fixed hardware should be comparable, we find that a real-world CUDA application ported to OpenCL exhibits an overall 50% increase in runtime, a reduction in performance attributable to the immaturity of contemporary compilers. The ported application is shown to have platform independence, running on both NVIDIA and AMD GPUs without modification. We conclude that OpenCL is a viable platform for developing portable GPU applications but that the more mature CUDA tools continue to provide best performance. Program summaryProgram title: Swan Catalogue identifier: AEIH_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEIH_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU Public License version 2 No. of lines in distributed program, including test data, etc.: 17 736 No. of bytes in distributed program, including test data, etc.: 131 177 Distribution format: tar.gz Programming language: C Computer: PC Operating system: Linux RAM: 256 Mbytes Classification: 6.5 External routines: NVIDIA CUDA, OpenCL Nature of problem: Graphical Processing Units (GPUs) from NVIDIA are preferentially programed with the proprietary CUDA programming toolkit. An alternative programming model promoted as an industry standard, OpenCL, provides similar capabilities to CUDA and is also supported on non-NVIDIA hardware (including multicore ×86 CPUs, AMD GPUs and IBM Cell processors). The adaptation of a program from CUDA to OpenCL is relatively straightforward but laborious. The Swan tool facilitates this conversion. Solution method:Swan performs a translation of CUDA kernel source code into an OpenCL equivalent. It also generates the C source code for entry point functions, simplifying kernel invocation from the host program. A concise host-side API abstracts the CUDA and OpenCL APIs. A program adapted to use Swan has no dependency on the CUDA compiler for the host-side program. The converted program may be built for either CUDA or OpenCL, with the selection made at compile time. Restrictions: No support for CUDA C++ features Running time: Nominal
A Laminar Flow-Based Microfluidic Tesla Pump via Lithography Enabled 3D Printing.

PubMed

Habhab, Mohammed-Baker; Ismail, Tania; Lo, Joe Fujiou

2016-11-23

Tesla turbine and its applications in power generation and fluid flow were demonstrated by Nicholas Tesla in 1913. However, its real-world implementations were limited by the difficulty to maintain laminar flow between rotor disks, transient efficiencies during rotor acceleration, and the lack of other applications that fully utilize the continuous flow outputs. All of the aforementioned limits of Tesla turbines can be addressed by scaling to the microfluidic flow regime. Demonstrated here is a microscale Tesla pump designed and fabricated using a Digital Light Processing (DLP) based 3D printer with 43 µm lateral and 30 µm thickness resolutions. The miniaturized pump is characterized by low Reynolds number of 1000 and a flow rate of up to 12.6 mL/min at 1200 rpm, unloaded. It is capable of driving a mixer network to generate microfluidic gradient. The continuous, laminar flow from Tesla turbines is well-suited to the needs of flow-sensitive microfluidics, where the integrated pump will enable numerous compact lab-on-a-chip applications.
Quantitative techniques for musculoskeletal MRI at 7 Tesla

PubMed Central

Taylor, Meredith D.; Tarbox, Grayson J.; Palmer, Antony J.; Park, Daniel J.

2016-01-01

Whole-body 7 Tesla MRI scanners have been approved solely for research since they appeared on the market over 10 years ago, but may soon be approved for selected clinical neurological and musculoskeletal applications in both the EU and the United States. There has been considerable research work on musculoskeletal applications at 7 Tesla over the past decade, including techniques for ultra-high resolution morphological imaging, 3D T2 and T2* mapping, ultra-short TE applications, diffusion tensor imaging of cartilage, and several techniques for assessing proteoglycan content in cartilage. Most of this work has been done in the knee or other extremities, due to technical difficulties associated with scanning areas such as the hip and torso at 7 Tesla. In this manuscript, we first provide some technical context for 7 Tesla imaging, including challenges and potential advantages. We then review the major quantitative MRI techniques being applied to musculoskeletal applications on 7 Tesla whole-body systems. PMID:28090448
Distributing coil elements in three dimensions enhances parallel transmission multiband RF performance: A simulation study in the human brain at 7 Tesla.

PubMed

Wu, Xiaoping; Tian, Jinfeng; Schmitter, Sebastian; Vaughan, J Tommy; Uğurbil, Kâmil; Van de Moortele, Pierre-François

2016-06-01

We explore the advantages of using a double-ring radiofrequency (RF) array and slice orientation to design parallel transmission (pTx) multiband (MB) pulses for simultaneous multislice (SMS) imaging with whole-brain coverage at 7 Tesla (T). A double-ring head array with 16 elements split evenly in two rings stacked in the z-direction was modeled and compared with two single-ring arrays consisting of 8 or 16 elements. The array performance was evaluated by designing band-specific pTx MB pulses with local specific absorption rate (SAR) control. The impact of slice orientations was also investigated. The double-ring array consistently and significantly outperformed the other two single-ring arrays, with peak local SAR reduced by up to 40% at a fixed excitation error of 0.024. For all three arrays, exciting sagittal or coronal slices yielded better RF performance than exciting axial or oblique slices. A double-ring RF array can be used to drastically improve SAR versus excitation fidelity tradeoff for pTx MB pulse design for brain imaging at 7 T; therefore, it is preferable against single-ring RF array designs when pursuing various biomedical applications of pTx SMS imaging. In comparing the stripline arrays, coronal and sagittal slices are more advantageous than axial and oblique slices for pTx MB pulses. Magn Reson Med 75:2464-2472, 2016. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

Retinotopic mapping with Spin Echo BOLD at 7 Tesla

PubMed Central

Olman, Cheryl A.; Van de Moortele, Pierre-Francois; Schumacher, Jennifer F.; Guy, Joe; Uğurbil, Kâmil; Yacoub, Essa

2010-01-01

For blood oxygenation level-dependent (BOLD) functional MRI experiments, contrast-to-noise ratio (CNR) increases with increasing field strength for both gradient echo (GE) and spin echo (SE) BOLD techniques. However, susceptibility artifacts and non-uniform coil sensitivity profiles complicate large field-of-view fMRI experiments (e.g., experiments covering multiple visual areas instead of focusing on a single cortical region). Here, we use SE BOLD to acquire retinotopic mapping data in early visual areas, testing the feasibility of SE BOLD experiments spanning multiple cortical areas at 7 Tesla. We also use a recently developed method for normalizing signal intensity in T1-weighted anatomical images to enable automated segmentation of the cortical gray matter for scans acquired at 7T with either surface or volume coils. We find that the CNR of the 7T GE data (average single-voxel, single-scan stimulus coherence: 0.41) is almost twice that of the 3T GE BOLD data (average coherence: 0.25), with the CNR of the SE BOLD data (average coherence: 0.23) comparable to that of the 3T GE data. Repeated measurements in individual subjects find that maps acquired with 1.8 mm resolution at 3T and 7T with GE BOLD and at 7T with SE BOLD show no systematic differences in either the area or the boundary locations for V1, V2 and V3, demonstrating the feasibility of high-resolution SE BOLD experiments with good sensitivity throughout multiple visual areas. PMID:20656431
Combining microscopic and macroscopic probes to untangle the single-ion anisotropy and exchange energies in an S =1 quantum antiferromagnet

NASA Astrophysics Data System (ADS)

Brambleby, Jamie; Manson, Jamie L.; Goddard, Paul A.; Stone, Matthew B.; Johnson, Roger D.; Manuel, Pascal; Villa, Jacqueline A.; Brown, Craig M.; Lu, Helen; Chikara, Shalinee; Zapf, Vivien; Lapidus, Saul H.; Scatena, Rebecca; Macchi, Piero; Chen, Yu-sheng; Wu, Lai-Chin; Singleton, John

2017-04-01

The magnetic ground state of the quasi-one-dimensional spin-1 antiferromagnetic chain is sensitive to the relative sizes of the single-ion anisotropy (D ) and the intrachain (J ) and interchain (J') exchange interactions. The ratios D /J and J'/J dictate the material's placement in one of three competing phases: a Haldane gapped phase, a quantum paramagnet, and an X Y -ordered state, with a quantum critical point at their junction. We have identified [Ni (HF2) (pyz) 2] SbF6 , where pyz = pyrazine, as a rare candidate in which this behavior can be explored in detail. Combining neutron scattering (elastic and inelastic) in applied magnetic fields of up to 10 tesla and magnetization measurements in fields of up to 60 tesla with numerical modeling of experimental observables, we are able to obtain accurate values of all of the parameters of the Hamiltonian [D =13.3 (1 ) K, J =10.4 (3 ) K, and J'=1.4 (2 ) K], despite the polycrystalline nature of the sample. Density-functional theory calculations result in similar couplings (J =9.2 K, J'=1.8 K) and predict that the majority of the total spin population resides on the Ni(II) ion, while the remaining spin density is delocalized over both ligand types. The general procedures outlined in this paper permit phase boundaries and quantum-critical points to be explored in anisotropic systems for which single crystals are as yet unavailable.
Montblanc1: GPU accelerated radio interferometer measurement equations in support of Bayesian inference for radio observations

NASA Astrophysics Data System (ADS)

Perkins, S. J.; Marais, P. C.; Zwart, J. T. L.; Natarajan, I.; Tasse, C.; Smirnov, O.

2015-09-01

We present Montblanc, a GPU implementation of the Radio interferometer measurement equation (RIME) in support of the Bayesian inference for radio observations (BIRO) technique. BIRO uses Bayesian inference to select sky models that best match the visibilities observed by a radio interferometer. To accomplish this, BIRO evaluates the RIME multiple times, varying sky model parameters to produce multiple model visibilities. χ2 values computed from the model and observed visibilities are used as likelihood values to drive the Bayesian sampling process and select the best sky model. As most of the elements of the RIME and χ2 calculation are independent of one another, they are highly amenable to parallel computation. Additionally, Montblanc caters for iterative RIME evaluation to produce multiple χ2 values. Modified model parameters are transferred to the GPU between each iteration. We implemented Montblanc as a Python package based upon NVIDIA's CUDA architecture. As such, it is easy to extend and implement different pipelines. At present, Montblanc supports point and Gaussian morphologies, but is designed for easy addition of new source profiles. Montblanc's RIME implementation is performant: On an NVIDIA K40, it is approximately 250 times faster than MEQTREES on a dual hexacore Intel E5-2620v2 CPU. Compared to the OSKAR simulator's GPU-implemented RIME components it is 7.7 and 12 times faster on the same K40 for single and double-precision floating point respectively. However, OSKAR's RIME implementation is more general than Montblanc's BIRO-tailored RIME. Theoretical analysis of Montblanc's dominant CUDA kernel suggests that it is memory bound. In practice, profiling shows that is balanced between compute and memory, as much of the data required by the problem is retained in L1 and L2 caches.
Design and construction of the astronautics refrigerator magnet

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dresner, L.

1994-05-01

This document reports on the design, construction, and testing of a 7-Tesla, 4-in. bore superconducting magnet for use in the Astronautics Refrigerator Experiment. The magnet is a single-strand, layer-wound, potted solenoid wound with Formvar-insulated SSC strands. The magnet was constructed by American Magnetics, Inc. of Oak Ridge and has been installed in the Astronautics Refrigerator Experiment at the Astronautics Technology Center in Madison, Wisconsin.
Using 3 Tesla magnetic resonance imaging in the pre-operative evaluation of tongue carcinoma.

PubMed

Moreno, K F; Cornelius, R S; Lucas, F V; Meinzen-Derr, J; Patil, Y J

2017-09-01

This study aimed to evaluate the role of 3 Tesla magnetic resonance imaging in predicting tongue tumour thickness via direct and reconstructed measures, and their correlations with corresponding histological measures, nodal metastasis and extracapsular spread. A prospective study was conducted of 25 patients with histologically proven squamous cell carcinoma of the tongue and pre-operative 3 Tesla magnetic resonance imaging from 2009 to 2012. Correlations between 3 Tesla magnetic resonance imaging and histological measures of tongue tumour thickness were assessed using the Pearson correlation coefficient: r values were 0.84 (p < 0.0001) and 0.81 (p < 0.0001) for direct and reconstructed measurements, respectively. For magnetic resonance imaging, direct measures of tumour thickness (mean ± standard deviation, 18.2 ± 7.3 mm) did not significantly differ from the reconstructed measures (mean ± standard deviation, 17.9 ± 7.2 mm; r = 0.879). Moreover, 3 Tesla magnetic resonance imaging had 83 per cent sensitivity, 82 per cent specificity, 82 per cent accuracy and a 90 per cent negative predictive value for detecting cervical lymph node metastasis. In this cohort, 3 Tesla magnetic resonance imaging measures of tumour thickness correlated highly with the corresponding histological measures. Further, 3 Tesla magnetic resonance imaging was an effective method of detecting malignant adenopathy with extracapsular spread.
Evaluating Multi-core Architectures through Accelerating the Three-Dimensional Lax–Wendroff Correction

DOE Office of Scientific and Technical Information (OSTI.GOV)

You, Yang; Fu, Haohuan; Song, Shuaiwen

2014-07-18

Wave propagation forward modeling is a widely used computational method in oil and gas exploration. The iterative stencil loops in such problems have broad applications in scientific computing. However, executing such loops can be highly time time-consuming, which greatly limits application’s performance and power efficiency. In this paper, we accelerate the forward modeling technique on the latest multi-core and many-core architectures such as Intel Sandy Bridge CPUs, NVIDIA Fermi C2070 GPU, NVIDIA Kepler K20x GPU, and the Intel Xeon Phi Co-processor. For the GPU platforms, we propose two parallel strategies to explore the performance optimization opportunities for our stencil kernels.more » For Sandy Bridge CPUs and MIC, we also employ various optimization techniques in order to achieve the best.« less
GPU-based cone beam computed tomography.

PubMed

Noël, Peter B; Walczak, Alan M; Xu, Jinhui; Corso, Jason J; Hoffmann, Kenneth R; Schafer, Sebastian

2010-06-01

The use of cone beam computed tomography (CBCT) is growing in the clinical arena due to its ability to provide 3D information during interventions, its high diagnostic quality (sub-millimeter resolution), and its short scanning times (60 s). In many situations, the short scanning time of CBCT is followed by a time-consuming 3D reconstruction. The standard reconstruction algorithm for CBCT data is the filtered backprojection, which for a volume of size 256(3) takes up to 25 min on a standard system. Recent developments in the area of Graphic Processing Units (GPUs) make it possible to have access to high-performance computing solutions at a low cost, allowing their use in many scientific problems. We have implemented an algorithm for 3D reconstruction of CBCT data using the Compute Unified Device Architecture (CUDA) provided by NVIDIA (NVIDIA Corporation, Santa Clara, California), which was executed on a NVIDIA GeForce GTX 280. Our implementation results in improved reconstruction times from minutes, and perhaps hours, to a matter of seconds, while also giving the clinician the ability to view 3D volumetric data at higher resolutions. We evaluated our implementation on ten clinical data sets and one phantom data set to observe if differences occur between CPU and GPU-based reconstructions. By using our approach, the computation time for 256(3) is reduced from 25 min on the CPU to 3.2 s on the GPU. The GPU reconstruction time for 512(3) volumes is 8.5 s. Copyright 2009 Elsevier Ireland Ltd. All rights reserved.
The National Geoelectromagnetic Facility - an open access resource for ultra wideband electromagnetic geophysics (Invited)

NASA Astrophysics Data System (ADS)

Schultz, A.; Urquhart, S.; Slater, M.

2010-12-01

At present, the US academic community has access to two national electromagnetic (EM) instrument pools that support long-period magnetotelluric (MT) equipment suitable for crust-mantle scale studies. The requirements of near surface geophysics, hydrology, glaciology, as well as the full range of crust and mantle investigations require development of new capabilities in data acquisition with broader frequency bandwidth than these existing units, increased instrument numbers, and concomitant developments in 3D/4D data interpretation. NSF Major Research Instrumentation support has been obtained to meet these requirements by developing an initial set of next-generation instruments as a National Geoelectromagnetic Facility (NGF), available to all PIs on a cost recovery basis, and operated by Oregon State University (OSU). In contrast to existing instruments with data acquisition systems specialized to operate within specific frequency bands and for specific electromagnetic methods, the NGF model "Zen/5" instruments being co-developed by OSU and Zonge Research and Engineering Organization are based on modular receivers with a flexible number of digital and analog input channels, designed to acquire EM data at dc, and from frequencies ranging from micro-Hz to MHz. These systems can be deployed in a compact, low power configuration for extended deployments (e.g. for crust-mantle scale experiments), or in a high frequency sampling mode for near surface work. The NGF is also acquiring controlled source EM transmitters, so that investigators may carry out magnetotelluric, audio-MT, radiofrequency-MT, as well as time-domain/transient EM and DC resistivity studies. The instruments are designed to simultaneously accommodate multiple electric field dipole sensors, magnetic fluxgates and induction coil sensors. Sample rates as high as 2.5 MHz with resolution between 24 and 32 bits, depending on sample rate, are specified to allow for high fidelity recording of waveforms. The NGF is accepting instrument use requests from investigators planning electromagnetic surveys via webform submission on its web site ngf.coas.oregonstate.edu. The site is also a port of entry to request access to the 46 long period magnetotelluric instruments also operated by OSU as national instrument pools. Cyberinfrastructure support is available to investigators, including field computers, EM data processing software, and access to a hybrid CPU-GPU parallel computing environment, currently configured with dual Intel Westmere hexacore CPUs and 960 NVidia Tesla and 1792 Nvidia Fermi GPU cores. The capabilities of the Zen/5 receivers will be presented, with examples of data acquired from a recent shallow water marine controlled source experiment conducted in coastal Oregon as part of an effort to locate a buried submarine pipeline, using a 1.1 KW 256 Hz signal source imposed on the pipeline from shore. A Zen/5 prototype instrument, modified for marine use through support by the Oregon Wave Energy Trust, demonstrated the marine capabilities of the NGF instrument design.
MRI safety of a programmable shunt assistant at 3 and 7 Tesla.

PubMed

Mirzayan, M Javad; Klinge, Petra M; Samii, Madjid; Goetz, Friedrich; Krauss, Joachim K

2012-06-01

Several new shunt technologies have been developed to optimize hydrocephalus treatment within the past few years. Overdrainage, however, still remains an unresolved problem. One new technology which may reduce the frequency of this complication is the use of a programmable shunt assistant (proSA). Inactive in a horizontal position, it impedes CSF flow in a vertical position according to a prescribed pressure level ranging from 0 to 40 cm H(2)O. We exposed the proSA valve in an ex vivo protocol to MR systems operating at 3 and 7 Tesla to investigate its MRI safety. Following 3 Tesla exposure, no changes in valve settings were noted. Adjustment to any pressure level was possible thereafter. The mean deflection angle was 23 ± 3°. After exposure to 7 Tesla, however, there were unintended pressure changes, and the mechanism for further adjustment of the valves even disintegrated. According to the results of this study, proSA is safe with heteropolar vertical magnet alignment at 3 Tesla. Following 7 Tesla exposure, the valves lost their functional capability.
A Laminar Flow-Based Microfluidic Tesla Pump via Lithography Enabled 3D Printing

PubMed Central

Habhab, Mohammed-Baker; Ismail, Tania; Lo, Joe Fujiou

2016-01-01

Tesla turbine and its applications in power generation and fluid flow were demonstrated by Nicholas Tesla in 1913. However, its real-world implementations were limited by the difficulty to maintain laminar flow between rotor disks, transient efficiencies during rotor acceleration, and the lack of other applications that fully utilize the continuous flow outputs. All of the aforementioned limits of Tesla turbines can be addressed by scaling to the microfluidic flow regime. Demonstrated here is a microscale Tesla pump designed and fabricated using a Digital Light Processing (DLP) based 3D printer with 43 µm lateral and 30 µm thickness resolutions. The miniaturized pump is characterized by low Reynolds number of 1000 and a flow rate of up to 12.6 mL/min at 1200 rpm, unloaded. It is capable of driving a mixer network to generate microfluidic gradient. The continuous, laminar flow from Tesla turbines is well-suited to the needs of flow-sensitive microfluidics, where the integrated pump will enable numerous compact lab-on-a-chip applications. PMID:27886051
Human in vivo cardiac phosphorus NMR spectroscopy at 3.0 Tesla

NASA Astrophysics Data System (ADS)

Bruner, Angela Properzio

One of the newest methods with great potential for use in clinical diagnosis of heart disease is human, cardiac, phosphorus NMR spectroscopy (cardiac p 31 MRS). Cardiac p31 MRS is able to provide quantitative, non-invasive, functional information about the myocardial energy metabolites such as pH, phosphocreatine (PCr), and adenosinetriphosphate (ATP). In addition to the use of cardiac p3l MRS for other types of cardiac problems, studies have shown that the ratio of PCr/ATP and pH are sensitive and specific markers of ischemia at the myocardial level. In human studies, typically performed at 1.5 Tesla, PCr/ATP has been relatively easy to measure but often requires long scan times to provide adequate signal-to-noise (SNR). In addition, pH which relies on identification of inorganic phosphate (Pi), has rarely been obtained. Significant improvement in the quality of cardiac p31 MRS was achieved through the use of the General Electric SIGNATM 3.0 Tesla whole body magnet, improved coil designs and optimized pulse sequences. Phantom and human studies performed on many types of imaging and spectroscopy sequences, identified breathhold gradient-echo imaging and oblique DRESS p31 spectroscopy as the best compromises between SNR, flexibility and quality localization. Both single-turn and quadrature 10-cm diameter, p31 radiofrequency coils, were tested with the quadrature coil providing greater SNR, but at a greater depth to avoid skeletal muscle contamination. Cardiac p31 MRS obtained in just 6 to 8 minutes, gated, showed both improved SNR and discernment of Pi allowing for pH measurement. A handgrip, in-magnet exerciser was designed, created and tested at 1.5 and 3.0 Tesla on volunteers and patients. In ischemic patients, this exercise was adequate to cause a repeated drop in PCr/ATP and pH with approximately eight minutes of isometric exercise at 30% maximum effort. As expected from literature, this exercise did not cause a drop in PCr/ATP for reference volunteers.
A 4 Tesla Superconducting Magnet Developed for a 6 Circle Huber Diffractometer at the XMaS Beamline

DOE Office of Scientific and Technical Information (OSTI.GOV)

Thompson, P. B. J.; Brown, S. D.; Bouchenoire, L.

2007-01-19

We report here on the development and testing of a 4 Tesla cryogen free superconducting magnet designed to fit within the Euler cradle of a 6 circle Huber diffractometer, allowing scattering in both the vertical and horizontal planes. The geometry of this magnet allows the field to be applied in three orientations. The first being along the beam direction, the second with the field transverse to the beam direction a horizontal plane and finally the field can be applied vertically with respect to the beam. The magnet has a warm bore and an open geometry of 180 deg. , allowingmore » large access to reciprocal space. A variable temperature insert has been developed, which is capable of working down to a temperature of 1.7 K and operating over a wide range of angles whilst maintaining a temperature stability of a few mK. Initial ferromagnetic diffraction measurements have been carried out on single crystal Tb and Dy samples.« less
Cervical external immobilization devices: evaluation of magnetic resonance imaging issues at 3.0 Tesla.

PubMed

Diaz, Francis L; Tweardy, Lisa; Shellock, Frank G

2010-02-15

Laboratory investigation, ex vivo. Currently, no studies have addressed the magnetic resonance imaging (MRI) issues for cervical external immobilization devices at 3-Tesla. Under certain conditions significant heating may occur, resulting in patient burns. Furthermore, artifacts can be substantial and prevent the diagnostic use of MRI. Therefore, the objective of this investigation was to evaluate MRI issues for 4 different cervical external immobilization devices at 3-Tesla. Excessive heating and substantial artifacts are 2 potential complications associated with performing MRI at 3-Tesla in patients with cervical external immobilization devices. Using ex vivo testing techniques, MRI-related heating and artifacts were evaluated for 4 different cervical devices during MRI at 3-Tesla. Four cervical external immobilization devices (Generation 80, Resolve Ring and Superstructure, Resolve Ring and Jerome Vest/Jerome Superstructure, and the V1 Halo System; Ossur Americas, Aliso Viejo, CA) underwent MRI testing at 3-Tesla. All devices were made from nonmetallic or nonmagnetic materials. Heating was determined using a gelled-saline-filled skull phantom with fluoroptic thermometry probes attached to the skull pins. MRI was performed at 3-Tesla, using a high level of RF energy. Artifacts were assessed at 3-Tesla, using standard cervical imaging techniques. The Generation 80 and V1 Halo devices exhibited substantial temperature rises (11.6 degrees C and 8.5 degrees C, respectively), with "sparking" evident for the Generation 80 during the MRI procedure. Artifacts were problematic for these devices, as well. By comparison, the 2 Resolve Ring-based cervical external immobilization devices showed little or no heating (< or = 0.6 degrees C) and the artifacts were acceptable for diagnostic MRI examinations. The low degree of heating and minor artifacts associated with the Resolve-based cervical external immobilization devices indicated that these products are safe for patients undergoing MRI at 3-Tesla.
Combining microscopic and macroscopic probes to untangle the single-ion anisotropy and exchange energies in an S = 1 quantum antiferromagnet [Combining micro- and macroscopic probes to untangle single-ion and spatial exchange anisotropies in a S = 1 quantum antiferromagnet

DOE PAGES

Brambleby, Jamie; Manson, Jamie L.; Goddard, Paul A.; ...

2017-04-20

The magnetic ground state of the quasi-one-dimensional spin-1 antiferromagnetic chain is sensitive to the relative sizes of the single-ion anisotropy (D) and the intrachain (J) and interchain (J') exchange interactions. The ratios D/J and J' /J dictate the material's placement in one of three competing phases: a Haldane gapped phase, a quantum paramagnet, and an XY-ordered state, with a quantum critical point at their junction. We have identified [Ni(HF 2)(pyz) 2] SbF 6, where pyz = pyrazine, as a rare candidate in which this behavior can be explored in detail. Combining neutron scattering (elastic and inelastic) in applied magnetic fieldsmore » of up to 10 tesla and magnetization measurements in fields of up to 60 tesla with numerical modeling of experimental observables, we are able to obtain accurate values of all of the parameters of the Hamiltonian [D = 13.3(1) K, J = 10.4(3) K, and J' = 1.4(2) K], despite the polycrystalline nature of the sample. Density-functional theory calculations result in similar couplings (J = 9.2 K, J' = 1.8 K) and predict that the majority of the total spin population resides on the Ni(II) ion, while the remaining spin density is delocalized over both ligand types. Finally, the general procedures outlined in this paper permit phase boundaries and quantum-critical points to be explored in anisotropic systems for which single crystals are as yet unavailable.« less
A Simple GPU-Accelerated Two-Dimensional MUSCL-Hancock Solver for Ideal Magnetohydrodynamics

NASA Technical Reports Server (NTRS)

Bard, Christopher; Dorelli, John C.

2013-01-01

We describe our experience using NVIDIA's CUDA (Compute Unified Device Architecture) C programming environment to implement a two-dimensional second-order MUSCL-Hancock ideal magnetohydrodynamics (MHD) solver on a GTX 480 Graphics Processing Unit (GPU). Taking a simple approach in which the MHD variables are stored exclusively in the global memory of the GTX 480 and accessed in a cache-friendly manner (without further optimizing memory access by, for example, staging data in the GPU's faster shared memory), we achieved a maximum speed-up of approx. = 126 for a sq 1024 grid relative to the sequential C code running on a single Intel Nehalem (2.8 GHz) core. This speedup is consistent with simple estimates based on the known floating point performance, memory throughput and parallel processing capacity of the GTX 480.
Real-Space Density Functional Theory on Graphical Processing Units: Computational Approach and Comparison to Gaussian Basis Set Methods.

PubMed

Andrade, Xavier; Aspuru-Guzik, Alán

2013-10-08

We discuss the application of graphical processing units (GPUs) to accelerate real-space density functional theory (DFT) calculations. To make our implementation efficient, we have developed a scheme to expose the data parallelism available in the DFT approach; this is applied to the different procedures required for a real-space DFT calculation. We present results for current-generation GPUs from AMD and Nvidia, which show that our scheme, implemented in the free code Octopus, can reach a sustained performance of up to 90 GFlops for a single GPU, representing a significant speed-up when compared to the CPU version of the code. Moreover, for some systems, our implementation can outperform a GPU Gaussian basis set code, showing that the real-space approach is a competitive alternative for DFT simulations on GPUs.
Imaging of patients with hippocampal sclerosis at 7 Tesla: initial results.

PubMed

Breyer, Tobias; Wanke, Isabel; Maderwald, Stefan; Woermann, Friedrich G; Kraff, Oliver; Theysohn, Jens M; Ebner, Alois; Forsting, Michael; Ladd, Mark E; Schlamann, Marc

2010-04-01

Focal epilepsies potentially can be cured by neurosurgery; other treatment options usually remain symptomatic. High-resolution magnetic resonance (MR) imaging is the central imaging strategy in the evaluation of focal epilepsy. The most common substrate of temporal epilepsies is hippocampal sclerosis (HS), which cannot always be sufficiently characterized with current MR field strengths. Therefore, the purpose of our study was to demonstrate the feasibility of high-resolution MR imaging at 7 Tesla in patients with focal epilepsy resulting from a HS and to improve image resolution at 7 Tesla in patients with HS. Six patients with known HS were investigated with T1-, T2-, T2(*)-, and fluid-attenuated inversion recovery-weighted sequences at 7 Tesla with an eight-channel transmit-receive head coil. Total imaging time did not exceed 90 minutes per patient. High-resolution imaging at 7 Tesla is feasible and reveals high resolution of intrahippocampal structures in vivo. HS was confirmed in all patients. The maximum non-interpolated in-plane resolution reached 0.2 x 0.2 mm(2) in T2(*)-weighted images. The increased susceptibility effects at 7 Tesla revealed identification of intrahippocampal structures in more detail than at 1.5 Tesla, but otherwise led to stronger artifacts. Imaging revealed regional differences in hippocampal atrophy between patients. The scan volume was limited because of specific absorption rate restrictions, scanning time was reasonable. High-resolution imaging at 7 Tesla is promising in presurgical epilepsy imaging. "New" contrasts may further improve detection of even very small intrahippocampal structural changes. Therefore, further investigations will be necessary to demonstrate the potential benefit for presurgical selection of patients with various lesion patterns in mesial temporal epilepsies resulting from a unilateral HS. Copyright 2010 AUR. Published by Elsevier Inc. All rights reserved.
Two-Layer 16 Tesla Cosθ Dipole Design for the FCC

DOE PAGES

Holik, Eddie Frank; Ambrosio, Giorgio; Apollinari, G.

2018-02-13

The Future Circular Collider or FCC is a study aimed at exploring the possibility to reach 100 TeV total collision energy which would require 16 tesla dipoles. Upon the conclusion of the High Luminosity Upgrade, the US LHC Accelerator Upgrade Pro-ject in collaboration with CERN will have extensive Nb3Sn magnet fabrication experience. This experience includes robust Nb3Sn conductor and insulation scheming, 2-layer cos2θ coil fabrication, and bladder-and-key structure and assembly. By making im-provements and modification to existing technology the feasibility of a two-layer 16 tesla dipole is investigated. Preliminary designs indicate that fields up to 16.6 tesla are feasible withmore » conductor grading while satisfying the HE-LHC and FCC specifications. Key challenges include accommodating high-aspect ratio conductor, narrow wedge design, Nb3Sn conductor grading, and especially quench protection of a 16 tesla device.« less
High-resolution motion compensated MRA in patients with congenital heart disease using extracellular contrast agent at 3 Tesla

PubMed Central

2012-01-01

Background Using first-pass MRA (FP-MRA) spatial resolution is limited by breath-hold duration. In addition, image quality may be hampered by respiratory and cardiac motion artefacts. In order to overcome these limitations an ECG- and navigator-gated high-resolution-MRA sequence (HR-MRA) with slow infusion of extracellular contrast agent was implemented at 3 Tesla for the assessment of congenital heart disease and compared to standard first-pass-MRA (FP-MRA). Methods 34 patients (median age: 13 years) with congenital heart disease (CHD) were prospectively examined on a 3 Tesla system. The CMR-protocol comprised functional imaging, FP- and HR-MRA, and viability imaging. After the acquisition of the FP-MRA sequence using a single dose of extracellular contrast agent the motion compensated HR-MRA sequence with isotropic resolution was acquired while injecting the second single dose, utilizing the timeframe before viability imaging. Qualitative scores for image quality (two independent reviewers) as well as quantitative measurements of vessel sharpness and relative contrast were compared using the Wilcoxon signed-rank test. Quantitative measurements of vessel diameters were compared using the Bland-Altman test. Results The mean image quality score revealed significantly better image quality of the HR-MRA sequence compared to the FP-MRA sequence in all vessels of interest (ascending aorta (AA), left pulmonary artery (LPA), left superior pulmonary vein (LSPV), coronary sinus (CS), and coronary ostia (CO); all p < 0.0001). In comparison to FP-MRA, HR-MRA revealed significantly better vessel sharpness for all considered vessels (AA, LSPV and LPA; all p < 0.0001). The relative contrast of the HR-MRA sequence was less compared to the FP-MRA sequence (AA: p <0.028, main pulmonary artery: p <0.004, LSPV: p <0.005). Both, the results of the intra- and interobserver measurements of the vessel diameters revealed closer correlation and closer 95 % limits of agreement for the HR-MRA. HR-MRA revealed one additional clinical finding, missed by FP-MRA. Conclusions An ECG- and navigator-gated HR-MRA-protocol with infusion of extracellular contrast agent at 3 Tesla is feasible. HR-MRA delivers significantly better image quality and vessel sharpness compared to FP-MRA. It may be integrated into a standard CMR-protocol for patients with CHD without the need for additional contrast agent injection and without any additional examination time. PMID:23107424
Performance analysis of a parallel Monte Carlo code for simulating solar radiative transfer in cloudy atmospheres using CUDA-enabled NVIDIA GPU

NASA Astrophysics Data System (ADS)

Russkova, Tatiana V.

2017-11-01

One tool to improve the performance of Monte Carlo methods for numerical simulation of light transport in the Earth's atmosphere is the parallel technology. A new algorithm oriented to parallel execution on the CUDA-enabled NVIDIA graphics processor is discussed. The efficiency of parallelization is analyzed on the basis of calculating the upward and downward fluxes of solar radiation in both a vertically homogeneous and inhomogeneous models of the atmosphere. The results of testing the new code under various atmospheric conditions including continuous singlelayered and multilayered clouds, and selective molecular absorption are presented. The results of testing the code using video cards with different compute capability are analyzed. It is shown that the changeover of computing from conventional PCs to the architecture of graphics processors gives more than a hundredfold increase in performance and fully reveals the capabilities of the technology used.

Application Modernization at LLNL and the Sierra Center of Excellence

DOE Office of Scientific and Technical Information (OSTI.GOV)

Neely, J. Robert; de Supinski, Bronis R.

We repport that in 2014, Lawrence Livermore National Laboratory began acquisition of Sierra, a pre-exascale system from IBM and Nvidia. It marks a significant shift in direction for LLNL by introducing the concept of heterogeneous computing via GPUs. LLNL’s mission requires application teams to prepare for this paradigm shift. Thus, the Sierra procurement required a proposed Center of Excellence that would align the expertise of the chosen vendors with laboratory personnel that represent the application developers, system software, and tool providers in a concentrated effort to prepare the laboratory’s codes in advance of the system transitioning to production in 2018.more » Finally, this article presents LLNL’s overall application strategy, with a focus on how LLNL is collaborating with IBM and Nvidia to ensure a successful transition of its mission-oriented applications into the exascale era.« less
General purpose graphic processing unit implementation of adaptive pulse compression algorithms

NASA Astrophysics Data System (ADS)

Cai, Jingxiao; Zhang, Yan

2017-07-01

This study introduces a practical approach to implement real-time signal processing algorithms for general surveillance radar based on NVIDIA graphical processing units (GPUs). The pulse compression algorithms are implemented using compute unified device architecture (CUDA) libraries such as CUDA basic linear algebra subroutines and CUDA fast Fourier transform library, which are adopted from open source libraries and optimized for the NVIDIA GPUs. For more advanced, adaptive processing algorithms such as adaptive pulse compression, customized kernel optimization is needed and investigated. A statistical optimization approach is developed for this purpose without needing much knowledge of the physical configurations of the kernels. It was found that the kernel optimization approach can significantly improve the performance. Benchmark performance is compared with the CPU performance in terms of processing accelerations. The proposed implementation framework can be used in various radar systems including ground-based phased array radar, airborne sense and avoid radar, and aerospace surveillance radar.
GPU-accelerated simulations of isolated black holes

NASA Astrophysics Data System (ADS)

Lewis, Adam G. M.; Pfeiffer, Harald P.

2018-05-01

We present a port of the numerical relativity code SpEC which is capable of running on NVIDIA GPUs. Since this code must be maintained in parallel with SpEC itself, a primary design consideration is to perform as few explicit code changes as possible. We therefore rely on a hierarchy of automated porting strategies. At the highest level we use TLoops, a C++ library of our design, to automatically emit CUDA code equivalent to tensorial expressions written into C++ source using a syntax similar to analytic calculation. Next, we trace out and cache explicit matrix representations of the numerous linear transformations in the SpEC code, which allows these to be performed on the GPU using pre-existing matrix-multiplication libraries. We port the few remaining important modules by hand. In this paper we detail the specifics of our port, and present benchmarks of it simulating isolated black hole spacetimes on several generations of NVIDIA GPU.
Application Modernization at LLNL and the Sierra Center of Excellence

DOE PAGES

Neely, J. Robert; de Supinski, Bronis R.

2017-09-01

We repport that in 2014, Lawrence Livermore National Laboratory began acquisition of Sierra, a pre-exascale system from IBM and Nvidia. It marks a significant shift in direction for LLNL by introducing the concept of heterogeneous computing via GPUs. LLNL’s mission requires application teams to prepare for this paradigm shift. Thus, the Sierra procurement required a proposed Center of Excellence that would align the expertise of the chosen vendors with laboratory personnel that represent the application developers, system software, and tool providers in a concentrated effort to prepare the laboratory’s codes in advance of the system transitioning to production in 2018.more » Finally, this article presents LLNL’s overall application strategy, with a focus on how LLNL is collaborating with IBM and Nvidia to ensure a successful transition of its mission-oriented applications into the exascale era.« less
Accelerating gravitational microlensing simulations using the Xeon Phi coprocessor

NASA Astrophysics Data System (ADS)

Chen, B.; Kantowski, R.; Dai, X.; Baron, E.; Van der Mark, P.

2017-04-01

Recently Graphics Processing Units (GPUs) have been used to speed up very CPU-intensive gravitational microlensing simulations. In this work, we use the Xeon Phi coprocessor to accelerate such simulations and compare its performance on a microlensing code with that of NVIDIA's GPUs. For the selected set of parameters evaluated in our experiment, we find that the speedup by Intel's Knights Corner coprocessor is comparable to that by NVIDIA's Fermi family of GPUs with compute capability 2.0, but less significant than GPUs with higher compute capabilities such as the Kepler. However, the very recently released second generation Xeon Phi, Knights Landing, is about 5.8 times faster than the Knights Corner, and about 2.9 times faster than the Kepler GPU used in our simulations. We conclude that the Xeon Phi is a very promising alternative to GPUs for modern high performance microlensing simulations.
Rapid data processing for ultrafast X-ray computed tomography using scalable and modular CUDA based pipelines

NASA Astrophysics Data System (ADS)

Frust, Tobias; Wagner, Michael; Stephan, Jan; Juckeland, Guido; Bieberle, André

2017-10-01

Ultrafast X-ray tomography is an advanced imaging technique for the study of dynamic processes basing on the principles of electron beam scanning. A typical application case for this technique is e.g. the study of multiphase flows, that is, flows of mixtures of substances such as gas-liquidflows in pipelines or chemical reactors. At Helmholtz-Zentrum Dresden-Rossendorf (HZDR) a number of such tomography scanners are operated. Currently, there are two main points limiting their application in some fields. First, after each CT scan sequence the data of the radiation detector must be downloaded from the scanner to a data processing machine. Second, the current data processing is comparably time-consuming compared to the CT scan sequence interval. To enable online observations or use this technique to control actuators in real-time, a modular and scalable data processing tool has been developed, consisting of user-definable stages working independently together in a so called data processing pipeline, that keeps up with the CT scanner's maximal frame rate of up to 8 kHz. The newly developed data processing stages are freely programmable and combinable. In order to achieve the highest processing performance all relevant data processing steps, which are required for a standard slice image reconstruction, were individually implemented in separate stages using Graphics Processing Units (GPUs) and NVIDIA's CUDA programming language. Data processing performance tests on different high-end GPUs (Tesla K20c, GeForce GTX 1080, Tesla P100) showed excellent performance. Program Files doi:http://dx.doi.org/10.17632/65sx747rvm.1 Licensing provisions: LGPLv3 Programming language: C++/CUDA Supplementary material: Test data set, used for the performance analysis. Nature of problem: Ultrafast computed tomography is performed with a scan rate of up to 8 kHz. To obtain cross-sectional images from projection data computer-based image reconstruction algorithms must be applied. The objective of the presented program is to reconstruct a data stream of around 1.3 GB s-1 in a minimum time period. Thus, the program allows to go into new fields of application and to use in the future even more compute-intensive algorithms, especially for data post-processing, to improve the quality of data analysis. Solution method: The program solves the given problem using a two-step process: first, by a generic, expandable and widely applicable template library implementing the streaming paradigm (GLADOS); second, by optimized processing stages for ultrafast computed tomography implementing the required algorithms in a performance-oriented way using CUDA (RISA). Thereby, task-parallelism between the processing stages as well as data parallelism within one processing stage is realized.
77 FR 22383 - Petition for Exemption From the Federal Motor Vehicle Motor Theft Prevention Standard; TESLA

Federal Register 2010, 2011, 2012, 2013, 2014

2012-04-13

... the Security Controller as long as the PET is in close proximity to the car and the driver either... exemption. SUMMARY: This document grants in full the petition of Tesla Motors Inc's. (Tesla) for an exemption of the Model S vehicle line in accordance with 49 CFR Part 543, Exemption from the Theft...
DOE Office of Scientific and Technical Information (OSTI.GOV)

Hogan, Kathleen; Wallace, Hal; Ivestor, Rob

As Edison vs. Tesla week heats up at the Energy Department, we are exploring the rivalry between Thomas Edison and Nikola Tesla and how their work is still impacting the way we use energy today. Whether you're on Team Tesla or Team Edison, both inventors were key players in creating things like batteries, power plants and wireless technologies -- all innovations we still use today. And as we move toward a clean energy future, energy efficient lighting, like LED bulbs, and more efficient electric motors not only help us save money on electricity costs but help combat climate change. Formore » this, Tesla and Edison both deserve our recognition.« less
Diagnostic usefulness of 3 tesla MRI of the brain for cushing disease in a child.

PubMed

Ono, Erina; Ozawa, Ayako; Matoba, Kaori; Motoki, Takanori; Tajima, Asako; Miyata, Ichiro; Ito, Junko; Inoshita, Naoko; Yamada, Syozo; Ida, Hiroyuki

2011-10-01

It is sometimes difficult to confirm the location of a microadenoma in Cushing disease. Recently, we experienced an 11-yr-old female case of Cushing disease with hyperprolactinemia. She was referred to our hospital because of decrease of height velocity with body weight gain. On admission, she had typical symptoms of Cushing syndrome. Although no pituitary microadenomas were detected on 1.5 Tesla MRI of the brain, endocrinological examinations including IPS and CS sampling were consistent with Cushing disease with hyperprolactinemia. Oral administration of methyrapone instead of neurosurgery was started after discharge, but subsequent 3 Tesla MRI of the brain clearly demonstrated a 3-mm less-enhanced lesion in the left side of the pituitary gland. Finally, transsphenoidal surgery was performed, and a 3.5-mm left-sided microadenoma was resected. Compared with 1.5 Tesla MRI, 3 Tesla MRI offers the advantage of a higher signal to noise ratio (SNR), which provides higher resolution and proper image quality. Therefore, 3 Tesla MRI is a very useful tool to localize microadenomas in Cushing disease in children as well as in adults. It will be the first choice of radiological examinations in suspected cases of Cushing disease.
Real-time electroholography using a multiple-graphics processing unit cluster system with a single spatial light modulator and the InfiniBand network

NASA Astrophysics Data System (ADS)

Niwase, Hiroaki; Takada, Naoki; Araki, Hiromitsu; Maeda, Yuki; Fujiwara, Masato; Nakayama, Hirotaka; Kakue, Takashi; Shimobaba, Tomoyoshi; Ito, Tomoyoshi

2016-09-01

Parallel calculations of large-pixel-count computer-generated holograms (CGHs) are suitable for multiple-graphics processing unit (multi-GPU) cluster systems. However, it is not easy for a multi-GPU cluster system to accomplish fast CGH calculations when CGH transfers between PCs are required. In these cases, the CGH transfer between the PCs becomes a bottleneck. Usually, this problem occurs only in multi-GPU cluster systems with a single spatial light modulator. To overcome this problem, we propose a simple method using the InfiniBand network. The computational speed of the proposed method using 13 GPUs (NVIDIA GeForce GTX TITAN X) was more than 3000 times faster than that of a CPU (Intel Core i7 4770) when the number of three-dimensional (3-D) object points exceeded 20,480. In practice, we achieved ˜40 tera floating point operations per second (TFLOPS) when the number of 3-D object points exceeded 40,960. Our proposed method was able to reconstruct a real-time movie of a 3-D object comprising 95,949 points.
Laser guiding of Tesla coil high voltage discharges.

PubMed

Henriksson, Markus; Daigle, Jean-Francois; Théberge, Francis; Châteauneuf, Marc; Dubois, Jacques

2012-06-04

We have investigated the guiding and triggering of discharges from a Tesla coil type 280 kHz AC high voltage source using filaments created by a femtosecond Terawatt laser pulse. Without the laser the discharges were maximum 30 cm long. With the laser straight, guided discharges up to 110 cm length were detected. The discharge length was limited by the voltage amplitude of the Tesla coil.
The Road to FUNCTIONAL IMAGING and ULTRAHIGH FIELDS

PubMed Central

Uğurbil, Kâmil

2012-01-01

The Center for Magnetic Resonance (CMRR) at the University of Minnesota was one of laboratories where the work that simultaneously and independently introduced functional magnetic resonance imaging (fMRI) of human brain activity was carried out. However, unlike other laboratories pursuing fMRI at the time, our work was performed at 4 Tesla magnetic field and coincided with the effort to push human magnetic resonance imaging to field strength significantly beyond 1.5 Tesla which was the high-end standard of the time. The human fMRI experiments performed in CMRR were planned between two colleagues who had known each other and had worked together previously in Bell Laboratories, namely Seiji Ogawa and myself, immediately after the Blood Oxygenation Level Dependent (BOLD) contrast was developed by Seiji. We were waiting for our first human system, a 4 Tesla system, to arrive in order to attempt at imaging brain activity in the human brain and these were the first experiments we performed on the 4 Tesla instrument in CMRR when it became marginally operational. This was a prelude to a subsequent systematic push we initiated for exploiting higher magnetic fields to improve the accuracy and sensitivity of fMRI maps, first going to 9.4 Tesla for animal model studies and subsequently developing a 7 Tesla human system for the first time. Steady improvements in high field instrumentation and ever expanding armamentarium of image acquisition and engineering solutions to challenges posed by ultrahigh fields has brought fMRI to submillimeter resolution in the whole brain at 7 Tesla, the scale necessary to reach cortical columns and laminar differentiation in the whole brain. The solutions that emerged in response to technological challenges posed by 7 Tesla also propagated and continues to propagate to lower field clinical systems, a major advantage of the ultrahigh fields effort that is underappreciated. Further improvements at 7T are inevitable. Further translation of these improvements to lower field clinical systems to achieve new capabilities and to magnetic fields significantly higher than 7 Tesla to enable human imaging is inescapable. PMID:22333670
DOE Office of Scientific and Technical Information (OSTI.GOV)

Brambleby, Jamie; Manson, Jamie L.; Goddard, Paul A.

The magnetic ground state of the quasi-one-dimensional spin-1 antiferromagnetic chain is sensitive to the relative sizes of the single-ion anisotropy (D) and the intrachain (J) and interchain (J') exchange interactions. The ratios D/J and J' /J dictate the material's placement in one of three competing phases: a Haldane gapped phase, a quantum paramagnet, and an XY-ordered state, with a quantum critical point at their junction. We have identified [Ni(HF 2)(pyz) 2] SbF 6, where pyz = pyrazine, as a rare candidate in which this behavior can be explored in detail. Combining neutron scattering (elastic and inelastic) in applied magnetic fieldsmore » of up to 10 tesla and magnetization measurements in fields of up to 60 tesla with numerical modeling of experimental observables, we are able to obtain accurate values of all of the parameters of the Hamiltonian [D = 13.3(1) K, J = 10.4(3) K, and J' = 1.4(2) K], despite the polycrystalline nature of the sample. Density-functional theory calculations result in similar couplings (J = 9.2 K, J' = 1.8 K) and predict that the majority of the total spin population resides on the Ni(II) ion, while the remaining spin density is delocalized over both ligand types. Finally, the general procedures outlined in this paper permit phase boundaries and quantum-critical points to be explored in anisotropic systems for which single crystals are as yet unavailable.« less
Validation of a GPU-based Monte Carlo code (gPMC) for proton radiation therapy: clinical cases study.

PubMed

Giantsoudi, Drosoula; Schuemann, Jan; Jia, Xun; Dowdell, Stephen; Jiang, Steve; Paganetti, Harald

2015-03-21

Monte Carlo (MC) methods are recognized as the gold-standard for dose calculation, however they have not replaced analytical methods up to now due to their lengthy calculation times. GPU-based applications allow MC dose calculations to be performed on time scales comparable to conventional analytical algorithms. This study focuses on validating our GPU-based MC code for proton dose calculation (gPMC) using an experimentally validated multi-purpose MC code (TOPAS) and compare their performance for clinical patient cases. Clinical cases from five treatment sites were selected covering the full range from very homogeneous patient geometries (liver) to patients with high geometrical complexity (air cavities and density heterogeneities in head-and-neck and lung patients) and from short beam range (breast) to large beam range (prostate). Both gPMC and TOPAS were used to calculate 3D dose distributions for all patients. Comparisons were performed based on target coverage indices (mean dose, V95, D98, D50, D02) and gamma index distributions. Dosimetric indices differed less than 2% between TOPAS and gPMC dose distributions for most cases. Gamma index analysis with 1%/1 mm criterion resulted in a passing rate of more than 94% of all patient voxels receiving more than 10% of the mean target dose, for all patients except for prostate cases. Although clinically insignificant, gPMC resulted in systematic underestimation of target dose for prostate cases by 1-2% compared to TOPAS. Correspondingly the gamma index analysis with 1%/1 mm criterion failed for most beams for this site, while for 2%/1 mm criterion passing rates of more than 94.6% of all patient voxels were observed. For the same initial number of simulated particles, calculation time for a single beam for a typical head and neck patient plan decreased from 4 CPU hours per million particles (2.8-2.9 GHz Intel X5600) for TOPAS to 2.4 s per million particles (NVIDIA TESLA C2075) for gPMC. Excellent agreement was demonstrated between our fast GPU-based MC code (gPMC) and a previously extensively validated multi-purpose MC code (TOPAS) for a comprehensive set of clinical patient cases. This shows that MC dose calculations in proton therapy can be performed on time scales comparable to analytical algorithms with accuracy comparable to state-of-the-art CPU-based MC codes.
Nikola Tesla, the Ether and his Telautomaton

NASA Astrophysics Data System (ADS)

Milar, Kendall

2014-03-01

In the nineteenth century physicists' understanding of the ether changed dramatically. New developments in thermodynamics, energy physics, and electricity and magnetism dictated new properties of the ether. These have traditionally been examined from the perspective of the scientists re-conceptualizing the ether. However Nikola Tesla, a prolific inventor and writer, presents a different picture of nineteenth century physics. Alongside the displays that showcased his inventions he presented alternative interpretations of physical, physiological and even psychical research. This is particularly evident in his telautomaton, a radio remote controlled boat. This invention and Tesla's descriptions of it showcase some of his novel interpretations of physical theories. He offered a perspective on nineteenth century physics that focused on practical application instead of experiment. Sometimes the understanding of physical theories that Tesla reached was counterproductive to his own inventive work; other times he offered new insights. Tesla's utilitarian interpretation of physical theories suggests a more scientifically curious and invested inventor than previously described and a connection between the scientific and inventive communities.
76 FR 384 - Certain Semiconductor Chips and Products Containing Same; Notice of Investigation

Federal Register 2010, 2011, 2012, 2013, 2014

2011-01-04

..., Dusing Road 1, Hsinchu Science Park, Hsin-Chu, Taiwan 30078. nVidia Corporation, 2701 San Tomas... respondent, to find the facts to be as alleged in the complaint and this notice and to enter an initial...
75 FR 25294 - Notice Pursuant to the National Cooperative Research and Production Act of 1993-DVD Copy Control...

Federal Register 2010, 2011, 2012, 2013, 2014

2010-05-07

..., Baarlo Noord Limburg, THE NETHERLANDS; MIT Technology Co., Ltd., Dongguan, Guangdong, PEOPLE'S REPUBLIC... media b.v., Tilburg, THE NETHERLANDS; Mattel Inc., El Segundo, CA; nVidia Corporation, Santa Clara, CA...
SU-F-J-166: Volumetric Spatial Distortions Comparison for 1.5 Tesla Versus 3 Tesla MRI for Gamma Knife Radiosurgery Scans Using Frame Marker Fusion and Co-Registration Modes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Neyman, G

Purpose: To compare typical volumetric spatial distortions for 1.5 Tesla versus 3 Tesla MRI Gamma Knife radiosurgery scans in the frame marker fusion and co-registration frame-less modes. Methods: Quasar phantom by Modus Medical Devices Inc. with GRID image distortion software was used for measurements of volumetric distortions. 3D volumetric T1 weighted scans of the phantom were produced on 1.5 T Avanto and 3 T Skyra MRI Siemens scanners. The analysis was done two ways: for scans with localizer markers from the Leksell frame and relatively to the phantom only (simulated co-registration technique). The phantom grid contained a total of 2002more » vertices or control points that were used in the assessment of volumetric geometric distortion for all scans. Results: Volumetric mean absolute spatial deviations relatively to the frame localizer markers for 1.5 and 3 Tesla machine were: 1.39 ± 0.15 and 1.63 ± 0.28 mm with max errors of 1.86 and 2.65 mm correspondingly. Mean 2D errors from the Gamma Plan were 0.3 and 1.0 mm. For simulated co-registration technique the volumetric mean absolute spatial deviations relatively to the phantom for 1.5 and 3 Tesla machine were: 0.36 ± 0.08 and 0.62 ± 0.13 mm with max errors of 0.57 and 1.22 mm correspondingly. Conclusion: Volumetric spatial distortions are lower for 1.5 Tesla versus 3 Tesla MRI machines localized with markers on frames and significantly lower for co-registration techniques with no frame localization. The results show the advantage of using co-registration technique for minimizing MRI volumetric spatial distortions which can be especially important for steep dose gradient fields typically used in Gamma Knife radiosurgery. Consultant for Elekta AB.« less
CUDAEASY - a GPU accelerated cosmological lattice program

NASA Astrophysics Data System (ADS)

Sainio, J.

2010-05-01

This paper presents, to the author's knowledge, the first graphics processing unit (GPU) accelerated program that solves the evolution of interacting scalar fields in an expanding universe. We present the implementation in NVIDIA's Compute Unified Device Architecture (CUDA) and compare the performance to other similar programs in chaotic inflation models. We report speedups between one and two orders of magnitude depending on the used hardware and software while achieving small errors in single precision. Simulations that used to last roughly one day to compute can now be done in hours and this difference is expected to increase in the future. The program has been written in the spirit of LATTICEEASY and users of the aforementioned program should find it relatively easy to start using CUDAEASY in lattice simulations. The program is available at http://www.physics.utu.fi/theory/particlecosmology/cudaeasy/ under the GNU General Public License.
Toward performance portability of the Albany finite element analysis code using the Kokkos library

DOE Office of Scientific and Technical Information (OSTI.GOV)

Demeshko, Irina; Watkins, Jerry; Tezaur, Irina K.

Performance portability on heterogeneous high-performance computing (HPC) systems is a major challenge faced today by code developers: parallel code needs to be executed correctly as well as with high performance on machines with different architectures, operating systems, and software libraries. The finite element method (FEM) is a popular and flexible method for discretizing partial differential equations arising in a wide variety of scientific, engineering, and industrial applications that require HPC. This paper presents some preliminary results pertaining to our development of a performance portable implementation of the FEM-based Albany code. Performance portability is achieved using the Kokkos library. We presentmore » performance results for the Aeras global atmosphere dynamical core module in Albany. Finally, numerical experiments show that our single code implementation gives reasonable performance across three multicore/many-core architectures: NVIDIA General Processing Units (GPU’s), Intel Xeon Phis, and multicore CPUs.« less

Toward performance portability of the Albany finite element analysis code using the Kokkos library

DOE PAGES

Demeshko, Irina; Watkins, Jerry; Tezaur, Irina K.; ...

2018-02-05

Performance portability on heterogeneous high-performance computing (HPC) systems is a major challenge faced today by code developers: parallel code needs to be executed correctly as well as with high performance on machines with different architectures, operating systems, and software libraries. The finite element method (FEM) is a popular and flexible method for discretizing partial differential equations arising in a wide variety of scientific, engineering, and industrial applications that require HPC. This paper presents some preliminary results pertaining to our development of a performance portable implementation of the FEM-based Albany code. Performance portability is achieved using the Kokkos library. We presentmore » performance results for the Aeras global atmosphere dynamical core module in Albany. Finally, numerical experiments show that our single code implementation gives reasonable performance across three multicore/many-core architectures: NVIDIA General Processing Units (GPU’s), Intel Xeon Phis, and multicore CPUs.« less
Multigroup Monte Carlo on GPUs: Comparison of history- and event-based algorithms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hamilton, Steven P.; Slattery, Stuart R.; Evans, Thomas M.

This article presents an investigation of the performance of different multigroup Monte Carlo transport algorithms on GPUs with a discussion of both history-based and event-based approaches. Several algorithmic improvements are introduced for both approaches. By modifying the history-based algorithm that is traditionally favored in CPU-based MC codes to occasionally filter out dead particles to reduce thread divergence, performance exceeds that of either the pure history-based or event-based approaches. The impacts of several algorithmic choices are discussed, including performance studies on Kepler and Pascal generation NVIDIA GPUs for fixed source and eigenvalue calculations. Single-device performance equivalent to 20–40 CPU cores onmore » the K40 GPU and 60–80 CPU cores on the P100 GPU is achieved. Last, in addition, nearly perfect multi-device parallel weak scaling is demonstrated on more than 16,000 nodes of the Titan supercomputer.« less
Multigroup Monte Carlo on GPUs: Comparison of history- and event-based algorithms

DOE PAGES

Hamilton, Steven P.; Slattery, Stuart R.; Evans, Thomas M.

2017-12-22

This article presents an investigation of the performance of different multigroup Monte Carlo transport algorithms on GPUs with a discussion of both history-based and event-based approaches. Several algorithmic improvements are introduced for both approaches. By modifying the history-based algorithm that is traditionally favored in CPU-based MC codes to occasionally filter out dead particles to reduce thread divergence, performance exceeds that of either the pure history-based or event-based approaches. The impacts of several algorithmic choices are discussed, including performance studies on Kepler and Pascal generation NVIDIA GPUs for fixed source and eigenvalue calculations. Single-device performance equivalent to 20–40 CPU cores onmore » the K40 GPU and 60–80 CPU cores on the P100 GPU is achieved. Last, in addition, nearly perfect multi-device parallel weak scaling is demonstrated on more than 16,000 nodes of the Titan supercomputer.« less
Model-independent partial wave analysis using a massively-parallel fitting framework

NASA Astrophysics Data System (ADS)

Sun, L.; Aoude, R.; dos Reis, A. C.; Sokoloff, M.

2017-10-01

The functionality of GooFit, a GPU-friendly framework for doing maximum-likelihood fits, has been extended to extract model-independent {\\mathscr{S}}-wave amplitudes in three-body decays such as D + → h + h + h -. A full amplitude analysis is done where the magnitudes and phases of the {\\mathscr{S}}-wave amplitudes are anchored at a finite number of m 2(h + h -) control points, and a cubic spline is used to interpolate between these points. The amplitudes for {\\mathscr{P}}-wave and {\\mathscr{D}}-wave intermediate states are modeled as spin-dependent Breit-Wigner resonances. GooFit uses the Thrust library, with a CUDA backend for NVIDIA GPUs and an OpenMP backend for threads with conventional CPUs. Performance on a variety of platforms is compared. Executing on systems with GPUs is typically a few hundred times faster than executing the same algorithm on a single CPU.
Nanoscale multireference quantum chemistry: full configuration interaction on graphical processing units.

PubMed

Fales, B Scott; Levine, Benjamin G

2015-10-13

Methods based on a full configuration interaction (FCI) expansion in an active space of orbitals are widely used for modeling chemical phenomena such as bond breaking, multiply excited states, and conical intersections in small-to-medium-sized molecules, but these phenomena occur in systems of all sizes. To scale such calculations up to the nanoscale, we have developed an implementation of FCI in which electron repulsion integral transformation and several of the more expensive steps in σ vector formation are performed on graphical processing unit (GPU) hardware. When applied to a 1.7 × 1.4 × 1.4 nm silicon nanoparticle (Si72H64) described with the polarized, all-electron 6-31G** basis set, our implementation can solve for the ground state of the 16-active-electron/16-active-orbital CASCI Hamiltonian (more than 100,000,000 configurations) in 39 min on a single NVidia K40 GPU.
GPU accelerated implementation of NCI calculations using promolecular density.

PubMed

Rubez, Gaëtan; Etancelin, Jean-Matthieu; Vigouroux, Xavier; Krajecki, Michael; Boisson, Jean-Charles; Hénon, Eric

2017-05-30

The NCI approach is a modern tool to reveal chemical noncovalent interactions. It is particularly attractive to describe ligand-protein binding. A custom implementation for NCI using promolecular density is presented. It is designed to leverage the computational power of NVIDIA graphics processing unit (GPU) accelerators through the CUDA programming model. The code performances of three versions are examined on a test set of 144 systems. NCI calculations are particularly well suited to the GPU architecture, which reduces drastically the computational time. On a single compute node, the dual-GPU version leads to a 39-fold improvement for the biggest instance compared to the optimal OpenMP parallel run (C code, icc compiler) with 16 CPU cores. Energy consumption measurements carried out on both CPU and GPU NCI tests show that the GPU approach provides substantial energy savings. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Decoupling capabilities of split-loop resonator structure for 7 Tesla MRI surface array coils

NASA Astrophysics Data System (ADS)

Hurshkainen, A.; Kurdjumov, S.; Simovski, C.; Glybovski, S.; Melchakova, I.; van den Berg, C. A. T.; Raaijmakers, A.; Belov, P.

2017-09-01

In this work we studied electromagnetic properties of one-dimentional periodic structures composed of split-loop res-onators (SLRs) and investigated their capabilities in decoupling of two dipole antennas for full-body magnetic resonance imaging (MRI). Two different finite structures comprising a single-SLR and a double-SLR constitutive elements were studied. Numerical simulations of the structures were performed to evaluate their decoupling capabilities. As it was demonstrated two dipole antennas equipped with either a single or a double-SLR structure exhibit high isolation even for an electrically short distance between the dipoles. Double-SLR structure while dramatically improving isolation of the dipoles keeps the field created by each of the decoupled dipoles comparable with one of a single dipole inside the target area.
Crystal growth and characterization of bulk Sb2Te3 topological insulator

NASA Astrophysics Data System (ADS)

Sultana, Rabia; Gurjar, Ganesh; Patnaik, S.; Awana, V. P. S.

2018-04-01

The Sb2Te3 crystals are grown using the conventional self flux method via solid state reaction route, by melting constituent elements (Sb and Te) at high temperature (850 °C), followed by slow cooling (2 °C/h). As grown Sb2Te3 crystals are analysed for various physical properties by x-ray diffraction (XRD), Raman Spectroscopy, Scanning Electron Microscopy (SEM) coupled with Energy Dispersive x-ray Spectroscopy (EDAX) and electrical measurements under magnetic field (6 Tesla) down to low temperature (2.5 K). The XRD pattern revealed the growth of synthesized Sb2Te3 sample along (00l) plane, whereas the SEM along with EDAX measurements displayed the layered structure with near stoichiometric composition, without foreign contamination. The Raman scattering studies displayed known ({{{{A}}}1{{g}}}1, {{{{E}}}{{g}}}2 and {{{{A}}}1{{g}}}2) vibrational modes for the studied Sb2Te3. The temperature dependent electrical resistivity measurements illustrated the metallic nature of the as grown Sb2Te3 single crystal. Further, the magneto—transport studies represented linear positive magneto-resistance (MR) reaching up to 80% at 2.5 K under an applied field of 6 Tesla. The weak anti localization (WAL) related low field (±2 Tesla) magneto-conductance at low temperatures (2.5 K and 20 K) has been analysed and discussed using the Hikami—Larkin—Nagaoka (HLN) model. Summarily, the short letter reports an easy and versatile method for crystal growth of bulk Sb2Te3 topological insulator (TI) and its brief physical property characterization.
ARCHERRT – A GPU-based and photon-electron coupled Monte Carlo dose computing engine for radiation therapy: Software development and application to helical tomotherapy

PubMed Central

Su, Lin; Yang, Youming; Bednarz, Bryan; Sterpin, Edmond; Du, Xining; Liu, Tianyu; Ji, Wei; Xu, X. George

2014-01-01

Purpose: Using the graphical processing units (GPU) hardware technology, an extremely fast Monte Carlo (MC) code ARCHERRT is developed for radiation dose calculations in radiation therapy. This paper describes the detailed software development and testing for three clinical TomoTherapy® cases: the prostate, lung, and head & neck. Methods: To obtain clinically relevant dose distributions, phase space files (PSFs) created from optimized radiation therapy treatment plan fluence maps were used as the input to ARCHERRT. Patient-specific phantoms were constructed from patient CT images. Batch simulations were employed to facilitate the time-consuming task of loading large PSFs, and to improve the estimation of statistical uncertainty. Furthermore, two different Woodcock tracking algorithms were implemented and their relative performance was compared. The dose curves of an Elekta accelerator PSF incident on a homogeneous water phantom were benchmarked against DOSXYZnrc. For each of the treatment cases, dose volume histograms and isodose maps were produced from ARCHERRT and the general-purpose code, GEANT4. The gamma index analysis was performed to evaluate the similarity of voxel doses obtained from these two codes. The hardware accelerators used in this study are one NVIDIA K20 GPU, one NVIDIA K40 GPU, and six NVIDIA M2090 GPUs. In addition, to make a fairer comparison of the CPU and GPU performance, a multithreaded CPU code was developed using OpenMP and tested on an Intel E5-2620 CPU. Results: For the water phantom, the depth dose curve and dose profiles from ARCHERRT agree well with DOSXYZnrc. For clinical cases, results from ARCHERRT are compared with those from GEANT4 and good agreement is observed. Gamma index test is performed for voxels whose dose is greater than 10% of maximum dose. For 2%/2mm criteria, the passing rates for the prostate, lung case, and head & neck cases are 99.7%, 98.5%, and 97.2%, respectively. Due to specific architecture of GPU, modified Woodcock tracking algorithm performed inferior to the original one. ARCHERRT achieves a fast speed for PSF-based dose calculations. With a single M2090 card, the simulations cost about 60, 50, 80 s for three cases, respectively, with the 1% statistical error in the PTV. Using the latest K40 card, the simulations are 1.7–1.8 times faster. More impressively, six M2090 cards could finish the simulations in 8.9–13.4 s. For comparison, the same simulations on Intel E5-2620 (12 hyperthreading) cost about 500–800 s. Conclusions: ARCHERRT was developed successfully to perform fast and accurate MC dose calculation for radiotherapy using PSFs and patient CT phantoms. PMID:24989378
ARCHERRT - a GPU-based and photon-electron coupled Monte Carlo dose computing engine for radiation therapy: software development and application to helical tomotherapy.

PubMed

Su, Lin; Yang, Youming; Bednarz, Bryan; Sterpin, Edmond; Du, Xining; Liu, Tianyu; Ji, Wei; Xu, X George

2014-07-01

Using the graphical processing units (GPU) hardware technology, an extremely fast Monte Carlo (MC) code ARCHERRT is developed for radiation dose calculations in radiation therapy. This paper describes the detailed software development and testing for three clinical TomoTherapy® cases: the prostate, lung, and head & neck. To obtain clinically relevant dose distributions, phase space files (PSFs) created from optimized radiation therapy treatment plan fluence maps were used as the input to ARCHERRT. Patient-specific phantoms were constructed from patient CT images. Batch simulations were employed to facilitate the time-consuming task of loading large PSFs, and to improve the estimation of statistical uncertainty. Furthermore, two different Woodcock tracking algorithms were implemented and their relative performance was compared. The dose curves of an Elekta accelerator PSF incident on a homogeneous water phantom were benchmarked against DOSXYZnrc. For each of the treatment cases, dose volume histograms and isodose maps were produced from ARCHERRT and the general-purpose code, GEANT4. The gamma index analysis was performed to evaluate the similarity of voxel doses obtained from these two codes. The hardware accelerators used in this study are one NVIDIA K20 GPU, one NVIDIA K40 GPU, and six NVIDIA M2090 GPUs. In addition, to make a fairer comparison of the CPU and GPU performance, a multithreaded CPU code was developed using OpenMP and tested on an Intel E5-2620 CPU. For the water phantom, the depth dose curve and dose profiles from ARCHERRT agree well with DOSXYZnrc. For clinical cases, results from ARCHERRT are compared with those from GEANT4 and good agreement is observed. Gamma index test is performed for voxels whose dose is greater than 10% of maximum dose. For 2%/2mm criteria, the passing rates for the prostate, lung case, and head & neck cases are 99.7%, 98.5%, and 97.2%, respectively. Due to specific architecture of GPU, modified Woodcock tracking algorithm performed inferior to the original one. ARCHERRT achieves a fast speed for PSF-based dose calculations. With a single M2090 card, the simulations cost about 60, 50, 80 s for three cases, respectively, with the 1% statistical error in the PTV. Using the latest K40 card, the simulations are 1.7-1.8 times faster. More impressively, six M2090 cards could finish the simulations in 8.9-13.4 s. For comparison, the same simulations on Intel E5-2620 (12 hyperthreading) cost about 500-800 s. ARCHERRT was developed successfully to perform fast and accurate MC dose calculation for radiotherapy using PSFs and patient CT phantoms.
[Nikola Tesla in medicine, too].

PubMed

Hanzek, Branko; Jakobović, Zvonimir

2007-12-01

Using primary and secondary sources we have shown in this paper the influence of Nikola Tesla's work on the field of medicine. The description of his experiments conduced within secondary-school education programs aimed to present the popularization of his work in Croatia. Although Tesla was dedicated primarily to physics and was not directly involved in biomedical research, his work significantly contributed to paving the way of medical physics particularly radiology and high-frequency electrotherapy.
Multi-Kepler GPU vs. multi-Intel MIC for spin systems simulations

NASA Astrophysics Data System (ADS)

Bernaschi, M.; Bisson, M.; Salvadore, F.

2014-10-01

We present and compare the performances of two many-core architectures: the Nvidia Kepler and the Intel MIC both in a single system and in cluster configuration for the simulation of spin systems. As a benchmark we consider the time required to update a single spin of the 3D Heisenberg spin glass model by using the Over-relaxation algorithm. We present data also for a traditional high-end multi-core architecture: the Intel Sandy Bridge. The results show that although on the two Intel architectures it is possible to use basically the same code, the performances of a Intel MIC change dramatically depending on (apparently) minor details. Another issue is that to obtain a reasonable scalability with the Intel Phi coprocessor (Phi is the coprocessor that implements the MIC architecture) in a cluster configuration it is necessary to use the so-called offload mode which reduces the performances of the single system. As to the GPU, the Kepler architecture offers a clear advantage with respect to the previous Fermi architecture maintaining exactly the same source code. Scalability of the multi-GPU implementation remains very good by using the CPU as a communication co-processor of the GPU. All source codes are provided for inspection and for double-checking the results.
Ultraviolet Communication for Medical Applications

DTIC Science & Technology

2015-06-01

In the previous Phase I effort, Directed Energy Inc.’s (DEI) parent company Imaging Systems Technology (IST) demonstrated feasibility of several key...accurately model high path loss. Custom photon scatter code was rewritten for parallel execution on a graphics processing unit (GPU). The NVidia CUDA
Analysis of the Finite Precision s-Step Biconjugate Gradient Method

DTIC Science & Technology

2014-03-13

Center for Future Architecture Research, a member of STARnet, a Semiconductor Research Corporation program sponsored by MARCO and DARPA, and ASPIRE Lab...industrial sponsors and affiliates Intel, Google, Nokia, NVIDIA , Oracle, and Samsung. Any opinions, findings, conclusions, or recommendations in this
Development of seismic tomography software for hybrid supercomputers

NASA Astrophysics Data System (ADS)

Nikitin, Alexandr; Serdyukov, Alexandr; Duchkov, Anton

2015-04-01

Seismic tomography is a technique used for computing velocity model of geologic structure from first arrival travel times of seismic waves. The technique is used in processing of regional and global seismic data, in seismic exploration for prospecting and exploration of mineral and hydrocarbon deposits, and in seismic engineering for monitoring the condition of engineering structures and the surrounding host medium. As a consequence of development of seismic monitoring systems and increasing volume of seismic data, there is a growing need for new, more effective computational algorithms for use in seismic tomography applications with improved performance, accuracy and resolution. To achieve this goal, it is necessary to use modern high performance computing systems, such as supercomputers with hybrid architecture that use not only CPUs, but also accelerators and co-processors for computation. The goal of this research is the development of parallel seismic tomography algorithms and software package for such systems, to be used in processing of large volumes of seismic data (hundreds of gigabytes and more). These algorithms and software package will be optimized for the most common computing devices used in modern hybrid supercomputers, such as Intel Xeon CPUs, NVIDIA Tesla accelerators and Intel Xeon Phi co-processors. In this work, the following general scheme of seismic tomography is utilized. Using the eikonal equation solver, arrival times of seismic waves are computed based on assumed velocity model of geologic structure being analyzed. In order to solve the linearized inverse problem, tomographic matrix is computed that connects model adjustments with travel time residuals, and the resulting system of linear equations is regularized and solved to adjust the model. The effectiveness of parallel implementations of existing algorithms on target architectures is considered. During the first stage of this work, algorithms were developed for execution on supercomputers using multicore CPUs only, with preliminary performance tests showing good parallel efficiency on large numerical grids. Porting of the algorithms to hybrid supercomputers is currently ongoing.
Multi-GPU configuration of 4D intensity modulated radiation therapy inverse planning using global optimization

NASA Astrophysics Data System (ADS)

Hagan, Aaron; Sawant, Amit; Folkerts, Michael; Modiri, Arezoo

2018-01-01

We report on the design, implementation and characterization of a multi-graphic processing unit (GPU) computational platform for higher-order optimization in radiotherapy treatment planning. In collaboration with a commercial vendor (Varian Medical Systems, Palo Alto, CA), a research prototype GPU-enabled Eclipse (V13.6) workstation was configured. The hardware consisted of dual 8-core Xeon processors, 256 GB RAM and four NVIDIA Tesla K80 general purpose GPUs. We demonstrate the utility of this platform for large radiotherapy optimization problems through the development and characterization of a parallelized particle swarm optimization (PSO) four dimensional (4D) intensity modulated radiation therapy (IMRT) technique. The PSO engine was coupled to the Eclipse treatment planning system via a vendor-provided scripting interface. Specific challenges addressed in this implementation were (i) data management and (ii) non-uniform memory access (NUMA). For the former, we alternated between parameters over which the computation process was parallelized. For the latter, we reduced the amount of data required to be transferred over the NUMA bridge. The datasets examined in this study were approximately 300 GB in size, including 4D computed tomography images, anatomical structure contours and dose deposition matrices. For evaluation, we created a 4D-IMRT treatment plan for one lung cancer patient and analyzed computation speed while varying several parameters (number of respiratory phases, GPUs, PSO particles, and data matrix sizes). The optimized 4D-IMRT plan enhanced sparing of organs at risk by an average reduction of 26% in maximum dose, compared to the clinical optimized IMRT plan, where the internal target volume was used. We validated our computation time analyses in two additional cases. The computation speed in our implementation did not monotonically increase with the number of GPUs. The optimal number of GPUs (five, in our study) is directly related to the hardware specifications. The optimization process took 35 min using 50 PSO particles, 25 iterations and 5 GPUs.
Multi-GPU configuration of 4D intensity modulated radiation therapy inverse planning using global optimization.

PubMed

Hagan, Aaron; Sawant, Amit; Folkerts, Michael; Modiri, Arezoo

2018-01-16

We report on the design, implementation and characterization of a multi-graphic processing unit (GPU) computational platform for higher-order optimization in radiotherapy treatment planning. In collaboration with a commercial vendor (Varian Medical Systems, Palo Alto, CA), a research prototype GPU-enabled Eclipse (V13.6) workstation was configured. The hardware consisted of dual 8-core Xeon processors, 256 GB RAM and four NVIDIA Tesla K80 general purpose GPUs. We demonstrate the utility of this platform for large radiotherapy optimization problems through the development and characterization of a parallelized particle swarm optimization (PSO) four dimensional (4D) intensity modulated radiation therapy (IMRT) technique. The PSO engine was coupled to the Eclipse treatment planning system via a vendor-provided scripting interface. Specific challenges addressed in this implementation were (i) data management and (ii) non-uniform memory access (NUMA). For the former, we alternated between parameters over which the computation process was parallelized. For the latter, we reduced the amount of data required to be transferred over the NUMA bridge. The datasets examined in this study were approximately 300 GB in size, including 4D computed tomography images, anatomical structure contours and dose deposition matrices. For evaluation, we created a 4D-IMRT treatment plan for one lung cancer patient and analyzed computation speed while varying several parameters (number of respiratory phases, GPUs, PSO particles, and data matrix sizes). The optimized 4D-IMRT plan enhanced sparing of organs at risk by an average reduction of [Formula: see text] in maximum dose, compared to the clinical optimized IMRT plan, where the internal target volume was used. We validated our computation time analyses in two additional cases. The computation speed in our implementation did not monotonically increase with the number of GPUs. The optimal number of GPUs (five, in our study) is directly related to the hardware specifications. The optimization process took 35 min using 50 PSO particles, 25 iterations and 5 GPUs.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Cornuelle, John C.

In March, 2001, the TESLA Collaboration published its Technical Design Report (TDR, see references and links in Appendix), the first sentence of which stated ''...TESLA (TeV-Energy Superconducting Linear Collider) (will be) a superconducting electron-positron collider of initially 500 GeV total energy, extendable to 800 GeV, and an integrated X-ray laser laboratory.'' The TDR included cost and manpower estimates for a 500 GeV e{sup +}e{sup -} collider (250 on 250 GeV) based on superconducting RF cavity technology. This was submitted as a proposal to the German government. The government asked the German Science Council to evaluate this proposal. The recommendation frommore » this body is anticipated to be available by November 2002. The government has indicated that it will react on this recommendation by mid-2003. In June 2001, Steve Holmes, Fermilab's Associate Director for Accelerators, commissioned Helen Edwards and Peter Garbincius to organize a study of the TESLA Technical Design Report and the associated cost and manpower estimates. Since the elements and methodology used in producing the TESLA cost estimate were somewhat different from those used in preparing similar estimates for projects within the U.S., it is important to understand the similarities, differences, and equivalences between the TESLA estimate and U.S. cost estimates. In particular, the project cost estimate includes only purchased equipment, materials, and services, but not manpower from DESY or other TESLA collaborating institutions, which is listed separately. It does not include the R&D on the TESLA Test Facility (TTF) nor the costs of preparing the TDR nor the costs of performing the conceptual studies so far. The manpower for the pre-operations commissioning program (up to beam) is included in the estimate, but not the electrical power or liquid Nitrogen (for initial cooldown of the cryogenics plant). There is no inclusion of any contingency or management reserve. If the U.S. were to become involved with the TESLA project, either as a collaborator for an LC in Germany, or as host country for TESLA in the U.S., it is important to begin to understand the scope and technical details of the project, what R&D still needs to be done, and how the U.S. can contribute. The charge for this study is included in the Appendix to this report.« less
40-Tesla pulsed-field cryomagnet for single crystal neutron diffraction

NASA Astrophysics Data System (ADS)

Duc, F.; Tonon, X.; Billette, J.; Rollet, B.; Knafo, W.; Bourdarot, F.; Béard, J.; Mantegazza, F.; Longuet, B.; Lorenzo, J. E.; Lelièvre-Berna, E.; Frings, P.; Regnault, L.-P.

2018-05-01

We present the first long-duration and high duty cycle 40-T pulsed-field cryomagnet addressed to single crystal neutron diffraction experiments at temperatures down to 2 K. The magnet produces a horizontal field in a bi-conical geometry, ±15° and ±30° upstream and downstream of the sample, respectively. Using a 1.15 MJ mobile generator, magnetic field pulses of 100 ms length are generated in the magnet, with a rise time of 23 ms and a repetition rate of 6-7 pulses per hour at 40 T. The setup was validated for neutron diffraction on the CEA-CRG three-axis spectrometer IN22 at the Institut Laue Langevin.
Memory transfer optimization for a lattice Boltzmann solver on Kepler architecture nVidia GPUs

NASA Astrophysics Data System (ADS)

Mawson, Mark J.; Revell, Alistair J.

2014-10-01

The Lattice Boltzmann method (LBM) for solving fluid flow is naturally well suited to an efficient implementation for massively parallel computing, due to the prevalence of local operations in the algorithm. This paper presents and analyses the performance of a 3D lattice Boltzmann solver, optimized for third generation nVidia GPU hardware, also known as 'Kepler'. We provide a review of previous optimization strategies and analyse data read/write times for different memory types. In LBM, the time propagation step (known as streaming), involves shifting data to adjacent locations and is central to parallel performance; here we examine three approaches which make use of different hardware options. Two of which make use of 'performance enhancing' features of the GPU; shared memory and the new shuffle instruction found in Kepler based GPUs. These are compared to a standard transfer of data which relies instead on optimized storage to increase coalesced access. It is shown that the more simple approach is most efficient; since the need for large numbers of registers per thread in LBM limits the block size and thus the efficiency of these special features is reduced. Detailed results are obtained for a D3Q19 LBM solver, which is benchmarked on nVidia K5000M and K20C GPUs. In the latter case the use of a read-only data cache is explored, and peak performance of over 1036 Million Lattice Updates Per Second (MLUPS) is achieved. The appearance of a periodic bottleneck in the solver performance is also reported, believed to be hardware related; spikes in iteration-time occur with a frequency of around 11 Hz for both GPUs, independent of the size of the problem.

Impact of magnetic field strength and receiver coil in ocular MRI: a phantom and patient study.

PubMed

Erb-Eigner, K; Warmuth, C; Taupitz, M; Willerding, G; Bertelmann, E; Asbach, P

2013-09-01

Generally, high-resolution MRI of the eye is performed with small loop surface coils. The purpose of this phantom and patient study was to investigate the influence of magnetic field strength and receiver coils on image quality in ocular MRI. The eyeball and the complex geometry of the facial bone were simulated by a skull phantom with swine eyes. MR images were acquired with two small loop surface coils with diameters of 4 cm and 7 cm and with a multi-channel head coil at 1.5 and 3 Tesla, respectively. Furthermore, MRI of the eye was performed prospectively in 20 patients at 1.5 Tesla (7 cm loop surface coil) and 3 Tesla (head coil). These images were analysed qualitatively and quantitatively and statistical significance was tested using the Wilcoxon-signed-rank test (a p-value of less than 0.05 was considered to indicate statistical significance). The analysis of the phantom images yielded the highest mean signal-to-noise ratio (SNR) at 3 Tesla with the use of the 4 cm loop surface coil. In the phantom experiment as well as in the patient studies the SNR was higher at 1.5 Tesla by applying the 7 cm surface coil than at 3 Tesla by applying the head coil. Concerning the delineation of anatomic structures no statistically significant differences were found. Our results show that the influence of small loop surface coils on image quality (expressed in SNR) in ocular MRI is higher than the influence of the magnetic field strength. The similar visibility of detailed anatomy leads to the conclusion that the image quality of ocular MRI at 3 Tesla remains acceptable by applying the head coil as a receiver coil. © Georg Thieme Verlag KG Stuttgart · New York.
In Vitro Magnetic Resonance Imaging Evaluation of Fragmented, Open-Coil, Percutaneous Peripheral Nerve Stimulation Leads.

PubMed

Shellock, Frank G; Zare, Armaan; Ilfeld, Brian M; Chae, John; Strother, Robert B

2018-04-01

Percutaneous peripheral nerve stimulation (PNS) is an FDA-cleared pain treatment. Occasionally, fragments of the lead (MicroLead, SPR Therapeutics, LLC, Cleveland, OH, USA) may be retained following lead removal. Since the lead is metallic, there are associated magnetic resonance imaging (MRI) risks. Therefore, the objective of this investigation was to evaluate MRI-related issues (i.e., magnetic field interactions, heating, and artifacts) for various lead fragments. Testing was conducted using standardized techniques on lead fragments of different lengths (i.e., 50, 75, and 100% of maximum possible fragment length of 12.7 cm) to determine MRI-related problems. Magnetic field interactions (i.e., translational attraction and torque) and artifacts were tested for the longest lead fragment at 3 Tesla. MRI-related heating was evaluated at 1.5 Tesla/64 MHz and 3 Tesla/128 MHz with each lead fragment placed in a gelled-saline filled phantom. Temperatures were recorded on the lead fragments while using relatively high RF power levels. Artifacts were evaluated using T1-weighted, spin echo, and gradient echo (GRE) pulse sequences. The longest lead fragment produced only minor magnetic field interactions. For the lead fragments evaluated, physiologically inconsequential MRI-related heating occurred at 1.5 Tesla/64 MHz while under certain 3 Tesla/128 MHz conditions, excessive temperature elevations may occur. Artifacts extended approximately 7 mm from the lead fragment on the GRE pulse sequence, suggesting that anatomy located at a position greater than this distance may be visualized on MRI. MRI may be performed safely in patients with retained lead fragments at 1.5 Tesla using the specific conditions of this study (i.e., MR Conditional). Due to possible excessive temperature rises at 3 Tesla, performing MRI at that field strength is currently inadvisable. © 2017 International Neuromodulation Society.
Magnetic resonance imaging evaluation after implantation of a titanium cervical disc prosthesis: a comparison of 1.5 and 3 Tesla magnet strength.

PubMed

Sundseth, Jarle; Jacobsen, Eva A; Kolstad, Frode; Nygaard, Oystein P; Zwart, John A; Hol, Per K

2013-10-01

Cervical disc prostheses induce significant amount of artifact in magnetic resonance imaging which may complicate radiologic follow-up after surgery. The purpose of this study was to investigate as to what extent the artifact, induced by the frequently used Discover(®) cervical disc prosthesis, impedes interpretation of the MR images at operated and adjacent levels in 1.5 and 3 Tesla MR. Ten subsequent patients were investigated in both 1.5 and 3 Tesla MR with standard image sequences one year following anterior cervical discectomy with arthroplasty. Two neuroradiologists evaluated the images by consensus. Emphasis was made on signal changes in medulla at all levels and visualization of root canals at operated and adjacent levels. A "blur artifact ratio" was calculated and defined as the height of the artifact on T1 sagittal images related to the operated level. The artifacts induced in 1.5 and 3 Tesla MR were of entirely different character and evaluation of the spinal cord at operated level was impossible in both magnets. Artifacts also made the root canals difficult to assess at operated level and more pronounced in the 3 Tesla MR. At the adjacent levels however, the spinal cord and root canals were completely visualized in all patients. The "blur artifact" induced at operated level was also more pronounced in the 3 Tesla MR. The artifact induced by the Discover(®) titanium disc prosthesis in both 1.5 and 3 Tesla MR, makes interpretation of the spinal cord impossible and visualization of the root canals difficult at operated level. Adjusting the MR sequences to produce the least amount of artifact is important.
[3-Tesla MRI vs. arthroscopy for diagnostics of degenerative knee cartilage diseases: preliminary clinical results].

PubMed

von Engelhardt, L V; Schmitz, A; Burian, B; Pennekamp, P H; Schild, H H; Kraft, C N; von Falkenhausen, M

2008-09-01

The literature contains only a few studies investigating the magnetic resonance imaging (MRI) diagnostics of degenerative cartilage diseases. Studies on MRI diagnostics of the cartilage using field strengths of 3-Tesla demonstrate promising results. To assess the value of 3-Tesla MRI for decision making regarding conservative or operative treatment possibilities, this study focused on patients with degenerative cartilage diseases. Thirty-two patients with chronic knee pain, a minimum age of 40 years, a negative history of trauma, and at least grade II degenerative cartilage disease were included. Cartilage abnormalities detected at preoperative 3-Tesla MRI (axial/koronar/sagittal PD-TSE-SPAIR, axial/sagittal 3D-T1-FFE, axial T2-FFE; Intera 3.0T, Philips Medical Systems) were classified (grades I-IV) and compared with arthroscopic findings. Thirty-six percent (70/192) of the examined cartilage surfaces demonstrated no agreement between MRI and arthroscopic grading. In most of these cases, grades II and III cartilage lesions were confounded with each other. Regarding the positive predictive values, the probability that a positive finding in MRI would be exactly confirmed by arthroscopy was 39-72%. In contrast, specificities and negative predictive values of different grades of cartilage diseases were 85-95%. Regarding the high specificities and negative predictive values, 3-Tesla MRI is a reliable method for excluding even slight cartilage degeneration. In summary, in degenerative cartilage diseases, 3-Tesla MRI is a supportive, noninvasive method for clinical decision making regarding conservative or operative treatment possibilities. However, the value of diagnostic arthroscopy for a definitive assessment of the articular surfaces and for therapeutic planning currently cannot be replaced by 3-Tesla MRI. This applies especially to treatment options in which a differentiation between grade II and III cartilage lesions is of interest.
Code TESLA for Modeling and Design of High-Power High-Efficiency Klystrons

DTIC Science & Technology

2011-03-01

CODE TESLA FOR MODELING AND DESIGN OF HIGH - POWER HIGH -EFFICIENCY KLYSTRONS * I.A. Chernyavskiy, SAIC, McLean, VA 22102, U.S.A. S.J. Cooke, B...and multiple-beam klystrons as high - power RF sources. These sources are widely used or proposed to be used in accelerators in the future. Comparison...of TESLA modelling results with experimental data for a few multiple-beam klystrons are shown. INTRODUCTION High - power and high -efficiency
The Cloud Effects Phase of the Laser Induced Lightning Investigation.

DTIC Science & Technology

1980-04-01

electromagnetic sensors: Magnetic field derivative signals in excess of 17 Teslas /second were observed in one of the triggered discharges. Our studies on this...largest electromagnetic signals that we have ever measured with values of dB/dt in excess of 17 Teslas / second at distances in excess of 500 m...Natural lightning strikes to earth within 100 m of our measuring instruments have produced peak signals of only 5 Teslas /second during our measuring window
Cortical microinfarcts detected in vivo on 3 Tesla MRI: clinical and radiological correlates.

PubMed

van Dalen, Jan Willem; Scuric, Eva E M; van Veluw, Susanne J; Caan, Matthan W A; Nederveen, Aart J; Biessels, Geert Jan; van Gool, Willem A; Richard, Edo

2015-01-01

Cortical microinfarcts (CMIs) are a common postmortem finding associated with vascular risk factors, cognitive decline, and dementia. Recently, CMIs identified in vivo on 7 Tesla MRI also proved retraceable on 3 Tesla MRI. We evaluated CMIs on 3 Tesla MRI in a population-based cohort of 194 nondemented older people (72-80 years) with systolic hypertension. Using a case-control design, participants with and without CMIs were compared on age, sex, cardiovascular risk factors, and white matter hyperintensity volume. We identified 23 CMIs in 12 participants (6%). CMIs were associated with older age, higher diastolic blood pressure, and a history of recent stroke. There was a trend for a higher white matter hyperintensity volume in participants with CMIs. We found an association of CMIs with clinical parameters, including age and cardiovascular risk factors. Although the prevalence of CMIs is relatively low, our results suggest that the study of CMIs in larger clinical studies is possible using 3 Tesla MRI. This opens the possibility of large-scale prospective investigation of the clinical relevance of CMIs in older people. © 2014 American Heart Association, Inc.
A therapeutic dose of zolpidem reduces thalamic GABA in healthy volunteers: A proton MRS study at 4 Tesla

PubMed Central

Licata, Stephanie C.; Jensen, J. Eric; Penetar, David M.; Prescot, Andrew P.; Lukas, scott E.; Renshaw, Perry F.

2009-01-01

Background Zolpidem is a non-benzodiazepine sedative/hypnotic that acts at GABAA receptors to influence inhibitory neurotransmission throughout the central nervous system. A great deal is known about the behavioral effects of this drug in humans and laboratory animals, but little is known about zolpidem’s specific effects on neurochemistry in vivo. Objectives We evaluated how acute administration of zolpidem affected levels of GABA, glutamate, glutamine, and other brain metabolites. Methods Proton magnetic resonance spectroscopy (1H MRS) at 4 Tesla was employed to measure the effects of zolpidem on brain chemistry in 19 healthy volunteers. Participants underwent scanning following acute oral administration of a therapeutic dose of zolpidem (10 mg) in a within-subject, single-blind, placebo-controlled, single-visit study. In addition to neurochemical measurements from single voxels within the anterior cingulate (ACC) and thalamus, a series of questionnaires were administered periodically throughout the experimental session to assess subjective mood states. Results Zolpidem reduced GABA levels in the thalamus, but not the ACC. There were no treatment effects with respect to other metabolite levels. Self-reported ratings of “dizzy”, “nauseous”, “confused”, and “bad effects” were increased relative to placebo, as were ratings on the sedation/intoxication (PCAG) and psychotomimetic/dysphoria (LSD) scales of the Addiction Research Center Inventory. Moreover, there was a significant correlation between the decrease in GABA and “dizzy”. Conclusions Zolpidem engendered primarily dysphoric-like effects and the correlation between reduced thalamic GABA and “dizzy” may be a function of zolpidem’s interaction with α1GABAA receptors in the cerebellum, projecting through the vestibular system to the thalamus. PMID:19125238
Optic Nerve Assessment Using 7-Tesla Magnetic Resonance Imaging.

PubMed

Singh, Arun D; Platt, Sean M; Lystad, Lisa; Lowe, Mark; Oh, Sehong; Jones, Stephen E; Alzahrani, Yahya; Plesec, Thomas

2016-04-01

The purpose of this study was to correlate high-resolution magnetic resonance imaging (MRI) and histologic findings in a case of juxtapapillary choroidal melanoma with clinical evidence of optic nerve invasion. With institutional review board approval, an enucleated globe with choroidal melanoma and optic nerve invasion was imaged using a 7-tesla MRI followed by histopathologic evaluation. Optical coherence tomography, B-scan ultrasonography, and 1.5-tesla MRI of the orbit (1-mm sections) could not detect optic disc invasion. Ex vivo, 7-tesla MRI detected optic nerve invasion, which correlated with histopathologic features. Our case demonstrates the potential to document the existence of optic nerve invasion in the presence of an intraocular tumor, a feature that has a major bearing on decision making, particularly for consideration of enucleation.
Note: Tesla based pulse generator for electrical breakdown study of liquid dielectrics

NASA Astrophysics Data System (ADS)

Veda Prakash, G.; Kumar, R.; Patel, J.; Saurabh, K.; Shyam, A.

2013-12-01

In the process of studying charge holding capability and delay time for breakdown in liquids under nanosecond (ns) time scales, a Tesla based pulse generator has been developed. Pulse generator is a combination of Tesla transformer, pulse forming line, a fast closing switch, and test chamber. Use of Tesla transformer over conventional Marx generators makes the pulse generator very compact, cost effective, and requires less maintenance. The system has been designed and developed to deliver maximum output voltage of 300 kV and rise time of the order of tens of nanoseconds. The paper deals with the system design parameters, breakdown test procedure, and various experimental results. To validate the pulse generator performance, experimental results have been compared with PSPICE simulation software and are in good agreement with simulation results.
Pushing Memory Bandwidth Limitations Through Efficient Implementations of Block-Krylov Space Solvers on GPUs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Clark, M. A.; Strelchenko, Alexei; Vaquero, Alejandro

Lattice quantum chromodynamics simulations in nuclear physics have benefited from a tremendous number of algorithmic advances such as multigrid and eigenvector deflation. These improve the time to solution but do not alleviate the intrinsic memory-bandwidth constraints of the matrix-vector operation dominating iterative solvers. Batching this operation for multiple vectors and exploiting cache and register blocking can yield a super-linear speed up. Block-Krylov solvers can naturally take advantage of such batched matrix-vector operations, further reducing the iterations to solution by sharing the Krylov space between solves. However, practical implementations typically suffer from the quadratic scaling in the number of vector-vector operations.more » Using the QUDA library, we present an implementation of a block-CG solver on NVIDIA GPUs which reduces the memory-bandwidth complexity of vector-vector operations from quadratic to linear. We present results for the HISQ discretization, showing a 5x speedup compared to highly-optimized independent Krylov solves on NVIDIA's SaturnV cluster.« less
Very high frame rate volumetric integration of depth images on mobile devices.

PubMed

Kähler, Olaf; Adrian Prisacariu, Victor; Yuheng Ren, Carl; Sun, Xin; Torr, Philip; Murray, David

2015-11-01

Volumetric methods provide efficient, flexible and simple ways of integrating multiple depth images into a full 3D model. They provide dense and photorealistic 3D reconstructions, and parallelised implementations on GPUs achieve real-time performance on modern graphics hardware. To run such methods on mobile devices, providing users with freedom of movement and instantaneous reconstruction feedback, remains challenging however. In this paper we present a range of modifications to existing volumetric integration methods based on voxel block hashing, considerably improving their performance and making them applicable to tablet computer applications. We present (i) optimisations for the basic data structure, and its allocation and integration; (ii) a highly optimised raycasting pipeline; and (iii) extensions to the camera tracker to incorporate IMU data. In total, our system thus achieves frame rates up 47 Hz on a Nvidia Shield Tablet and 910 Hz on a Nvidia GTX Titan XGPU, or even beyond 1.1 kHz without visualisation.
A Follow-up Study on Wireless Power Transmission for Unmanned Air Vehicles

DTIC Science & Technology

2007-12-01

Schlesak, Adrian Alden and Tom Ohno, “A Microwave Powered High Altitude Platform,” IEEE MTT- S International, vol. 1, pp. 283-286, 25- 27 May 1988. 86 [10...Tesla [2]. Tesla aimed to develop a high power transmitter to ascertain the law of propagation of current through the earth and the atmosphere. Although...According to an article by Brown [3], Nokola Tesla carried out numerous experiments on high power transmission in the early 1900s in Colorado
Response of Materials Subjected to Magnetic Fields

DTIC Science & Technology

2011-08-31

is a superconducting Helmholtz coil capable of operating at up to 6 Tesla. Access to the high magnetic field at the center of the magnet is by...conducting sphere moves through the magnetic field gradient (0 to 4 Tesla over ~20cm) at low velocity (under the influence of gravity for 1 meter). Area...sphere moves through the magnetic field gradient (0 to 4 Tesla over ~20cm) at high velocity (under the influence of gravity for 1 meter). Figure 8
Unshielded asymmetric transmit-only and endorectal receive-only radiofrequency coil for (23) Na MRI of the prostate at 3 tesla.

PubMed

Farag, Adam; Peterson, Justin Charles; Szekeres, Trevor; Bauman, Glenn; Chin, Joseph; Romagnoli, Cesare; Bartha, Robert; Scholl, Timothy J

2015-08-01

To develop and optimize radiofrequency (RF) hardware for the detection of endogenous sodium ((23) Na) by 3.0 Tesla (T) MRI in the human prostate. A transmit-only receive-only (TORO) RF system of resonators consisting of an unshielded, asymmetric, quadrature birdcage (transmit), and an endorectal (ER), linear, surface (receive) coil were developed and tested on a 3T MRI scanner. Two different ER receivers were constructed; a single-tuned ((23) Na) and a dual-tuned ((1) H/(23) Na). Both receivers were evaluated by the measurements of signal-to-noise ratio (SNR) and B1 homogeneity. For tissue sodium concentration (TSC) quantification, vials containing known sodium concentrations were incorporated into the ER. The system was used to measure the prostate TSC of three men (age 55 ± 5 years) with biopsy-proven prostate cancer. B1 field inhomogeneity of the asymmetric transmitter was estimated to be less than 5%. The mean SNR measured in a region of interest within the prostate using the single-tuned ER coil was 54.0 ± 4.6. The mean TSC in the central gland was 60.2 ± 5.7 mmol/L and in the peripheral gland was 70.5 ± 9.0 mmol/L. A TORO system was developed and optimized for (23) Na MRI of the human prostate which showed good sensitivity throughout the prostate for quantitative measurement of TSC. © 2014 Wiley Periodicals, Inc.
Adiabatic Demagnetisation Refrigerators for Future Sub-Millimetre Space Missions

NASA Astrophysics Data System (ADS)

Hepburn, I. D.; Davenport, I.; Smith, A.

1995-10-01

Space worthy refrigeration capable of providing a 100 mK and below heat load sink for bolometric detectors will be required for the next generation of sub-millimetre space missions. Adiabatic demagnetisation refrigeration (ADR), being a gravity independent laboratory method for obtaining such temperatures, is a favourable technique for utilisation in space. We show that by considering a 3 salt pill refrigerator rather than the classic single salt pill design the space prohibitive laboratory ADR properties of high magnetic field (6 Tesla) and a<2 K environment (provided by a bath of liquid4He) can be alleviated, while maintaining a sufficient low temperature hold time and short recycle time. The additional salt pills, composed of Gadolinium Gallium Garnet (GGG) provide intermediate cooling stages, enabling operation from a 4 K environment provided by a single 4 K mechanical cooler, thereby providing consumable free operation. Such ADRs could operate with fields as low as 1 Tesla allowing the use of high temperature, mechanically cooled superconducting magnets and so effectively remove the risk of quenching. We discuss the possibility of increasing the hold time from 3 hours, for the model presented, to between 40 and 80 hours, plus reducing the number of salt pills to two, through the use of a more efficient Garnet. We believe the technical advances necessitated by the envisaged ADRs are minimal and conclude that such ADRs offer a long orbital life time, consumable free, high efficiency means of milli-Kelvin cooling, requiring relatively little laboratory development.
Nikola Tesla: the Moon's rotation.

NASA Astrophysics Data System (ADS)

Tomić, A.; Jovanović, B. S.

1993-09-01

The review of three articles by N. Tesla, published in the year 1919 in the journal "Electrical experimenter" is given, with special reference to the astronomical contents and to circumstances in which they appeared.
75 FR 32826 - Self-Regulatory Organizations; NASDAQ OMX PHLX, Inc.; Notice of Filing and Immediate...

Federal Register 2010, 2011, 2012, 2013, 2014

2010-06-09

...''), American Express Company (``AXP''), Ciena Corp. (``CIEN''), Star Scientific, Inc. (``CIGX''), Dendreon Corp. (``DNDN''), eBay Inc. (``EBAY''), Corning Inc. (``GLW''), Halliburton Company (``HAL''), iShares Dow Jones US Real Estate (``IYR''), Motorola, Inc., (``MOT''), NVIDIA Corporation (``NVDA''), ON Semiconductor...
Note: A high-energy-density Tesla-type pulse generator with novel insulating oil

NASA Astrophysics Data System (ADS)

Liu, Sheng; Su, Jiancang; Fan, Xuliang

2017-09-01

A 10-GW high-energy-density Tesla-type pulse generator is developed with an improved insulating liquid based on a modified Tesla pulser—TPG700, of which the pulse forming line (PFL) is filled with novel insulating oil instead of transformer oil. Properties of insulating oil determining the stored energy density of the PFL are analyzed, and a criterion for appropriate oil is proposed. Midel 7131 is chosen as an application example. The results of insulating property experiment under tens-of-microsecond pulse charging demonstrate that the insulation capability of Midel 7131 is better than that of KI45X transformer oil. The application test in Tesla pulser TPG700 shows that the output power is increased to 10.5 GW with Midel 7131. The output energy density of TPG700 increases for about 60% with Midel 7131.
Optic Nerve Assessment Using 7-Tesla Magnetic Resonance Imaging

PubMed Central

Singh, Arun D.; Platt, Sean M.; Lystad, Lisa; Lowe, Mark; Oh, Sehong; Jones, Stephen E.; Alzahrani, Yahya; Plesec, Thomas

2016-01-01

Purpose The purpose of this study was to correlate high-resolution magnetic resonance imaging (MRI) and histologic findings in a case of juxtapapillary choroidal melanoma with clinical evidence of optic nerve invasion. Methods With institutional review board approval, an enucleated globe with choroidal melanoma and optic nerve invasion was imaged using a 7-tesla MRI followed by histopathologic evaluation. Results Optical coherence tomography, B-scan ultrasonography, and 1.5-tesla MRI of the orbit (1-mm sections) could not detect optic disc invasion. Ex vivo, 7-tesla MRI detected optic nerve invasion, which correlated with histopathologic features. Conclusions Our case demonstrates the potential to document the existence of optic nerve invasion in the presence of an intraocular tumor, a feature that has a major bearing on decision making, particularly for consideration of enucleation. PMID:27239461

Note: A high-energy-density Tesla-type pulse generator with novel insulating oil.

PubMed

Liu, Sheng; Su, Jiancang; Fan, Xuliang

2017-09-01

A 10-GW high-energy-density Tesla-type pulse generator is developed with an improved insulating liquid based on a modified Tesla pulser-TPG700, of which the pulse forming line (PFL) is filled with novel insulating oil instead of transformer oil. Properties of insulating oil determining the stored energy density of the PFL are analyzed, and a criterion for appropriate oil is proposed. Midel 7131 is chosen as an application example. The results of insulating property experiment under tens-of-microsecond pulse charging demonstrate that the insulation capability of Midel 7131 is better than that of KI45X transformer oil. The application test in Tesla pulser TPG700 shows that the output power is increased to 10.5 GW with Midel 7131. The output energy density of TPG700 increases for about 60% with Midel 7131.
Nikola Tesla: Why was he so much resisted and forgotten? [Retrospectroscope].

PubMed

Valentinuzzi, Max E; Ortiz, Martin Hill; Cervantes, Daniel; Leder, Ron S

2016-01-01

Recently, during the Christmas season, a friend of mine visited me and, sneaking a look at my bookshelves, found two rather old Nikola Tesla biographies, which I had used to prepare a "Retrospectroscope" column for the then-named IEEE Engineering in Medicine and Biology Magazine when our dear friend Alvin Wald was its editor-inchief [2]. Eighteen years have elapsed since then; soon, the idea came up of revamping the article. Cynthia Weber, the magazine's current associate editor, considered it acceptable, and here is the new note divided in two parts: that is, a slightly revised version of the original article followed by new material, including some quite interesting information regarding Tesla's homes and laboratories. On top of this, Tesla is not devoid of a science fiction touch, as mentioned at the end.
Functionality of veterinary identification microchips following low- (0.5 tesla) and high-field (3 tesla) magnetic resonance imaging.

PubMed

Piesnack, Susann; Frame, Mairi E; Oechtering, Gerhard; Ludewig, Eberhard

2013-01-01

The ability to read patient identification microchips relies on the use of radiofrequency pulses. Since radiofrequency pulses also form an integral part of the magnetic resonance imaging (MRI) process, the possibility of loss of microchip function during MRI scanning is of concern. Previous clinical trials have shown microchip function to be unaffected by MR imaging using a field strength of 1 Tesla and 1.5. As veterinary MRI scanners range widely in field strength, this study was devised to determine whether exposure to lower or higher field strengths than 1 Tesla would affect the function of different types of microchip. In a phantom study, a total of 300 International Standards Organisation (ISO)-approved microchips (100 each of three different types: ISO FDX-B 1.4 × 9 mm, ISO FDX-B 2.12 × 12 mm, ISO HDX 3.8 × 23 mm) were tested in a low field (0.5) and a high field scanner (3.0 Tesla). A total of 50 microchips of each type were tested in each scanner. The phantom was composed of a fluid-filled freezer pack onto which a plastic pillow and a cardboard strip with affixed microchips were positioned. Following an MRI scan protocol simulating a head study, all of the microchips were accurately readable. Neither 0.5 nor 3 Tesla imaging affected microchip function in this study. © 2013 Veterinary Radiology & Ultrasound.
Using YOLO based deep learning network for real time detection and localization of lung nodules from low dose CT scans

NASA Astrophysics Data System (ADS)

Ramachandran S., Sindhu; George, Jose; Skaria, Shibon; V. V., Varun

2018-02-01

Lung cancer is the leading cause of cancer related deaths in the world. The survival rate can be improved if the presence of lung nodules are detected early. This has also led to more focus being given to computer aided detection (CAD) and diagnosis of lung nodules. The arbitrariness of shape, size and texture of lung nodules is a challenge to be faced when developing these detection systems. In the proposed work we use convolutional neural networks to learn the features for nodule detection, replacing the traditional method of handcrafting features like geometric shape or texture. Our network uses the DetectNet architecture based on YOLO (You Only Look Once) to detect the nodules in CT scans of lung. In this architecture, object detection is treated as a regression problem with a single convolutional network simultaneously predicting multiple bounding boxes and class probabilities for those boxes. By performing training using chest CT scans from Lung Image Database Consortium (LIDC), NVIDIA DIGITS and Caffe deep learning framework, we show that nodule detection using this single neural network can result in reasonably low false positive rates with high sensitivity and precision.
Exploring DeepMedic for the purpose of segmenting white matter hyperintensity lesions

NASA Astrophysics Data System (ADS)

Lippert, Fiona; Cheng, Bastian; Golsari, Amir; Weiler, Florian; Gregori, Johannes; Thomalla, Götz; Klein, Jan

2018-02-01

DeepMedic, an open source software library based on a multi-channel multi-resolution 3D convolutional neural network, has recently been made publicly available for brain lesion segmentations. It has already been shown that segmentation tasks on MRI data of patients having traumatic brain injuries, brain tumors, and ischemic stroke lesions can be performed very well. In this paper we describe how it can efficiently be used for the purpose of detecting and segmenting white matter hyperintensity lesions. We examined if it can be applied to single-channel routine 2D FLAIR data. For evaluation, we annotated 197 datasets with different numbers and sizes of white matter hyperintensity lesions. Our experiments have shown that substantial results with respect to the segmentation quality can be achieved. Compared to the original parametrization of the DeepMedic neural network, the timings for training can be drastically reduced if adjusting corresponding training parameters, while at the same time the Dice coefficients remain nearly unchanged. This enables for performing a whole training process within a single day utilizing a NVIDIA GeForce GTX 580 graphics board which makes this library also very interesting for research purposes on low-end GPU hardware.
Role of 3.0 Tesla magnetic resonance hysterosalpingography in the diagnostic work-up of female infertility.

PubMed

Cipolla, Valentina; Guerrieri, Daniele; Pietrangeli, Daniela; Santucci, Domiziana; Argirò, Renato; de Felice, Carlo

2016-09-01

Imaging evaluation plays a crucial role in the diagnostic work-up of female infertility. In recent years, the possibility to evaluate tubal patency using 1.5 Tesla magnetic resonance (1.5T MR) has been studied. To assess the feasibility of 3.0 Tesla magnetic resonance (3.0T MR) hysterosalpingography and its role in the diagnostic work-up of female infertility and to evaluate if this fast "one-stop-shop" imaging approach should be proposed as a first-line examination. A total of 116 infertile women were enrolled in this prospective study; all underwent 3.0T MR hysterosalpingography. After standard imaging of the pelvis, tubal patency was assessed by acquiring 3D dynamic time-resolved T1-weighted (T1W) sequences during manual injection of 4-5 mL of contrast solution consisting of gadolinium and normal sterile saline. Images were evaluated by two radiologists with different experience in MR imaging (MRI). The examination was successfully completed in 96.5% of cases, failure rate was 3.5%. Dynamic sequences showed bilateral tubal patency in 64.3%, unilateral tubal patency in 25.9%, and bilateral tubal occlusion in 9.8%. Extratubal abnormalities were found in 69.9% of patients. Comprehensive analysis of morphological and dynamic sequences showed extratubal abnormalities in 43.1% of patients with bilateral tubal patency. 3.0T MR hysterosalpingography is a feasible, simple, fast, safe, and well-tolerated examination, which allows evaluation of tubal patency and other pelvic causes of female infertility in a single session, and it may thus represent a "one-stop-shop" solution in female infertility diagnostic work-up. © The Foundation Acta Radiologica 2015.
75 FR 30887 - Self-Regulatory Organizations; The NASDAQ Stock Market LLC; Notice of Filing and Immediate...

Federal Register 2010, 2011, 2012, 2013, 2014

2010-06-02

...''), American Express Company (``AXP''), Ciena Corp. (``CIEN''), Star Scientific, Inc. (``CIGX''), Dendreon Corp. (``DNDN''), eBay Inc. (``EBAY''), Corning Inc. (``GLW''), Halliburton Company (``HAL''), iShares Dow Jones US Real Estate (``IYR''), Motorola, Inc., (``MOT''), NVIDIA Corporation (``NVDA''), ON Semiconductor...
Leading in the Arctic; Translating the United States Arctic Strategy into Opportunities for Peace and Stability

DTIC Science & Technology

2015-02-16

Elon Musk , the well-known founder of Tesla and Space-X. His life’s work focuses on developing the means to find and get to...nat_arctic_strategy.pdf 58. Elon Musk , “ Elon Musk -The man behind Tesla, Space X, Solar City…” (Filmed February 2013, TED video, 21:04. Posted March 2013) https...Exploitation. London, UK: Reaktion Books Ltd., 2012. Musk , Elon . “ Elon Musk -The man behind Tesla, Space X, Solar City…” Filmed February 2013, TED
Magnetohydrodynamics (MHD) Engineering Test Facility (ETF) 200 MWe power plant. Conceptual Design Engineering Report (CDER) supplement. Magnet system special investigations

NASA Technical Reports Server (NTRS)

1981-01-01

The results of magnet system special investigations listed below are summarized: 4 Tesla Magnet Alternate Design Study; 6 Tesla Magnet Manufacturability Study. The conceptual design for a 4 Tesla superconducting magnet system for use with an alternate (supersonic) ETF power train is described, and estimated schedule and cost are identified. The magnet design is scaled from the ETF 6 T Tesla design. Results of a manufacturability study and a revised schedule and cost estimate for the ETF 6 T magnet are reported. Both investigations are extensions of the conceptual design of a 6 T magnet system performed earlier as a part of the overall MED-ETF conceptual design described in Conceptual Design Engineering Report (CDER) Vol. V, System Design Description (SDD) 503 dated September, 1981, DOE/NASA/0224-1; NASA CR-165/52.
Liquid neon heat transfer as applied to a 30 tesla cryomagnet

NASA Technical Reports Server (NTRS)

Papell, S. S.; Hendricks, R. C.

1975-01-01

Since superconducting magnets cooled by liquid helium are limited to magnetic fields of about 18 teslas, the design of a 30 tesla cryomagnet necessitates forced convection liquid neon heat transfer in small coolant channels. As these channels are too small to handle the vapor flow if the coolant were to boil, the design philosophy calls for suppressing boiling by subjecting the fluid to high pressures. Forced convection heat transfer data are obtained by using a blowdown technique to force the fluid vertically through a resistance-heated instrumented tube. The data are obtained at inlet temperatures between 28 and 34 K and system pressures between 28 to 29 bars. Data correlation is limited to a very narrow range of test conditions, since the tests were designed to simulate the heat transfer characteristics in the coolant channels of the 30 tesla cryomagnet concerned. The results can therefore be applied directly to the design of the magnet system.-
Three-dimensional flow measurements in a tesla turbine rotor

NASA Astrophysics Data System (ADS)

Fuchs, Thomas; Schosser, Constantin; Hain, Rainer; Kaehler, Christian

2015-11-01

Tesla turbines are fluid mechanical devices converting flow energy into rotation energy by two physical effects: friction and adhesion. The advantages of the tesla turbine are its simple and robust design, as well as its scalability, which makes it suitable for custom power supply solutions, and renewable energy applications. To this day, there is a lack of experimental data to validate theoretical studies, and CFD simulations of these turbines. This work presents a comprehensive analysis of the flow through a tesla turbine rotor gap, with a gap height of only 0.5 mm, by means of three-dimensional Particle Tracking Velocimetry (3D-PTV). For laminar flows, the experimental results match the theory very well, since the measured flow profiles show the predicted second order parabolic shape in radial direction and a fourth order behavior in circumferential direction. In addition to these laminar measurements, turbulent flows at higher mass flow rates were investigated.
Construction of 0.15 Tesla Overhauser Enhanced MRI.

PubMed

Tokunaga, Yuumi; Nakao, Motonao; Naganuma, Tatsuya; Ichikawa, Kazuhiro

2017-01-01

Overhauser enhanced MRI (OMRI) is one of the free radical imaging technologies and has been used in biomedical research such as for partial oxygen measurements in tumor, and redox status in acute oxidative diseases. The external magnetic field of OMRI is frequently in the range of 5-10 mTesla to ensure microwave penetration into small animals, and the S/N ratio is limited. In this study, a 0.15 Tesla OMRI was constructed and tested to improve the S/N ratio for a small sample, or skin measurement. Specification of the main magnet was as follows: 0.15 Tesla permanent magnet; gap size 160 mm; homogenous spherical volume of 80 mm in diameter. The OMRI resonator was designed based on TE 101 cavity mode and machined from a phosphorus deoxidized copper block for electron spin resonance (ESR) excitation and a solenoid transmission/receive resonator for NMR detection. The resonant frequencies and Q values were 6.38 MHz/150 and 4.31-4.41 GHz/120 for NMR and ESR, respectively. The Q values were comparable to those of conventional low field OMRI resonators at 15 mTesla. As expected, the MRI S/N ratio was improved by a factor of 30. Triplet dynamic nuclear polarization spectra were observed for 14 N carboxy-PROXYL, along the excitation microwave sweep. In the current setup, the enhancement factor was ca. 0.5. In conclusion, the results of this preliminary evaluation indicate that the 0.15 Tesla OMRI could be useful for free radical measurement for small samples.
MRI information for commonly used otologic implants: review and update.

PubMed

Azadarmaki, Roya; Tubbs, Rhonda; Chen, Douglas A; Shellock, Frank G

2014-04-01

To review information on magnetic resonance imaging (MRI) issues for commonly used otologic implants. Manufacturing companies, National Library of Medicine's online database, and an additional online database (www.MRIsafety.com). A literature review of the National Library of Medicine's online database with focus on MRI issues for otologic implants was performed. The MRI information on implants provided by manufacturers was reviewed. Baha and Ponto Pro osseointegrated implants' abutment and fixture and the implanted magnet of the Sophono Alpha 1 and 2 abutment-free systems are approved for 3-Tesla magnetic resonance (MR) systems. The external processors of these devices are MR Unsafe. Of the implants tested, middle ear ossicular prostheses, including stapes prostheses, except for the 1987 McGee prosthesis, are MR Conditional for 1.5-Tesla (and many are approved for 3-Tesla) MR systems. Cochlear implants with removable magnets are approved for patients undergoing MRI at 1.5 Tesla after magnet removal. The MED-EL PULSAR, SONATA, CONCERT, and CONCERT PIN cochlear implants can be used in patients undergoing MRI at 1.5 Tesla with application of a protective bandage. The MED-EL COMBI 40+ can be used in 0.2-Tesla MR systems. Implants made from nonmagnetic and nonconducting materials are MR Safe. Knowledge of MRI guidelines for commonly used otologic implants is important. Guidelines on MRI issues approved by the US Food and Drug Administration are not always the same compared with other parts of the world. This monograph provides a current reference for physicians on MRI issues for commonly used otologic implants.
Evaluation of magnetic resonance imaging issues for a wirelessly powered lead used for epidural, spinal cord stimulation.

PubMed

Shellock, Frank G; Audet-Griffin, Annabelle J

2014-06-01

The objective of this investigation was to evaluate magnetic resonance imaging (MRI) issues (magnetic field interactions, MRI-related heating, and artifacts) for a wirelessly powered lead used for spinal cord stimulation (SCS). A newly developed, wirelessly powered lead (Freedom-4, Stimwave Technologies Inc., Scottsdale, AZ, USA) underwent evaluation for magnetic field interactions (translational attraction and torque) at 3 Tesla, MRI-related heating at 1.5 Tesla/64 MHz and 3 Tesla/128 MHz, and artifacts at 3 Tesla using standardized techniques. MRI-related heating tests were conducted by placing the lead in a gelled-saline-filled phantom and performing MRI procedures using relatively high levels of radiofrequency energy. Artifacts were characterized using T1-weighted, spin echo (SE), and gradient echo (GRE) pulse sequences. The lead exhibited minor magnetic field interactions (2 degree deflection angle and no torque). Heating was not substantial under 1.5 Tesla/64 MHz (highest temperature change, 2.3°C) and 3 Tesla/128 MHz (highest temperature change, 2.2°C) MRI conditions. Artifacts were moderate in size relative to the size and shape of the lead. These findings demonstrated that it is acceptable for a patient with this wirelessly powered lead used for SCS to undergo MRI under the conditions utilized in this investigation and according to other necessary guidelines. Artifacts seen on magnetic resonance images may pose possible problems if the area of interest is in the same area or close to this lead. © 2013 International Neuromodulation Society.
Nikola Tesla Educational Opportunity School.

ERIC Educational Resources Information Center

Design Cost Data, 2001

2001-01-01

Describes the architectural design, costs, general description, and square footage data for the Nikola Tesla Educational Opportunity School in Colorado Springs, Colorado. A floor plan and photos are included along with a list of manufacturers and suppliers used for the project. (GR)
ARCHER{sub RT} – A GPU-based and photon-electron coupled Monte Carlo dose computing engine for radiation therapy: Software development and application to helical tomotherapy

DOE Office of Scientific and Technical Information (OSTI.GOV)

Su, Lin; Du, Xining; Liu, Tianyu

Purpose: Using the graphical processing units (GPU) hardware technology, an extremely fast Monte Carlo (MC) code ARCHER{sub RT} is developed for radiation dose calculations in radiation therapy. This paper describes the detailed software development and testing for three clinical TomoTherapy® cases: the prostate, lung, and head and neck. Methods: To obtain clinically relevant dose distributions, phase space files (PSFs) created from optimized radiation therapy treatment plan fluence maps were used as the input to ARCHER{sub RT}. Patient-specific phantoms were constructed from patient CT images. Batch simulations were employed to facilitate the time-consuming task of loading large PSFs, and to improvemore » the estimation of statistical uncertainty. Furthermore, two different Woodcock tracking algorithms were implemented and their relative performance was compared. The dose curves of an Elekta accelerator PSF incident on a homogeneous water phantom were benchmarked against DOSXYZnrc. For each of the treatment cases, dose volume histograms and isodose maps were produced from ARCHER{sub RT} and the general-purpose code, GEANT4. The gamma index analysis was performed to evaluate the similarity of voxel doses obtained from these two codes. The hardware accelerators used in this study are one NVIDIA K20 GPU, one NVIDIA K40 GPU, and six NVIDIA M2090 GPUs. In addition, to make a fairer comparison of the CPU and GPU performance, a multithreaded CPU code was developed using OpenMP and tested on an Intel E5-2620 CPU. Results: For the water phantom, the depth dose curve and dose profiles from ARCHER{sub RT} agree well with DOSXYZnrc. For clinical cases, results from ARCHER{sub RT} are compared with those from GEANT4 and good agreement is observed. Gamma index test is performed for voxels whose dose is greater than 10% of maximum dose. For 2%/2mm criteria, the passing rates for the prostate, lung case, and head and neck cases are 99.7%, 98.5%, and 97.2%, respectively. Due to specific architecture of GPU, modified Woodcock tracking algorithm performed inferior to the original one. ARCHER{sub RT} achieves a fast speed for PSF-based dose calculations. With a single M2090 card, the simulations cost about 60, 50, 80 s for three cases, respectively, with the 1% statistical error in the PTV. Using the latest K40 card, the simulations are 1.7–1.8 times faster. More impressively, six M2090 cards could finish the simulations in 8.9–13.4 s. For comparison, the same simulations on Intel E5-2620 (12 hyperthreading) cost about 500–800 s. Conclusions: ARCHER{sub RT} was developed successfully to perform fast and accurate MC dose calculation for radiotherapy using PSFs and patient CT phantoms.« less
Lobar intracerebral haematomas: Neuropathological and 7.0-tesla magnetic resonance imaging evaluation.

PubMed

De Reuck, Jacques; Cordonnier, Charlotte; Deramecourt, Vincent; Auger, Florent; Durieux, Nicolas; Leys, Didier; Pasquier, Florence; Maurage, Claude-Alain; Bordet, Regis

2016-10-15

The Boston criteria for cerebral amyloid angiopathy (CAA) need validation by neuropathological examination in patients with lobar cerebral haematomas (LCHs). In "vivo" 1.5-tesla magnetic resonance imaging (MRI) is unreliable to detect the age-related signal changes in LCHs. This post-mortem study investigates the validity of the Boston criteria in brains with LCHs and the signal changes during their time course with 7.0-tesla MRI. Seventeen CAA brains including 26 LCHs were compared to 13 non-CAA brains with 14 LCHs. The evolution of the signal changes with time was examined in 25 LCHs with T2 and T2* 7.0-tesla MRI. In the CAA group LCHs were predominantly located in the parieto-occipital lobes. Also white matter changes were more severe with more cortical microinfarcts and cortical microbleeds. On MRI there was a progressive shift of the intensity of the hyposignal from the haematoma core in the acute stage to the boundaries later on. During the residual stage the hyposignal mildly decreased in the boundaries with an increase of the superficial siderosis and haematoma core collapse. Our post-mortem study of LCHs confirms the validity of the Boston criteria for CAA. Also 7.0-tesla MRI allows staging the age of the LCHs. Copyright © 2016 Elsevier B.V. All rights reserved.
Inter-speaker speech variability assessment using statistical deformable models from 3.0 tesla magnetic resonance images.

PubMed

Vasconcelos, Maria J M; Ventura, Sandra M R; Freitas, Diamantino R S; Tavares, João Manuel R S

2012-03-01

The morphological and dynamic characterisation of the vocal tract during speech production has been gaining greater attention due to the motivation of the latest improvements in magnetic resonance (MR) imaging; namely, with the use of higher magnetic fields, such as 3.0 Tesla. In this work, the automatic study of the vocal tract from 3.0 Tesla MR images was assessed through the application of statistical deformable models. Therefore, the primary goal focused on the analysis of the shape of the vocal tract during the articulation of European Portuguese sounds, followed by the evaluation of the results concerning the automatic segmentation, i.e. identification of the vocal tract in new MR images. In what concerns speech production, this is the first attempt to automatically characterise and reconstruct the vocal tract shape of 3.0 Tesla MR images by using deformable models; particularly, by using active and appearance shape models. The achieved results clearly evidence the adequacy and advantage of the automatic analysis of the 3.0 Tesla MR images of these deformable models in order to extract the vocal tract shape and assess the involved articulatory movements. These achievements are mostly required, for example, for a better knowledge of speech production, mainly of patients suffering from articulatory disorders, and to build enhanced speech synthesizer models.
75 FR 48338 - Intel Corporation; Analysis of Proposed Consent Order to Aid Public Comment

Federal Register 2010, 2011, 2012, 2013, 2014

2010-08-10

... integrated into chipsets as well as discrete graphics cards. NVIDIA has been at the forefront of developing... to connect peripheral products such as discrete GPUs to the CPU. A bus is a connection point between... platform. Intel's commitment to maintain an open PCIe bus will provide discrete graphics manufacturers...
Evaluation of an Adaptive Automation Trigger Based on Task Performance, Priority, and Frequency

DTIC Science & Technology

2013-06-01

with dual Intel ® Xeon ® CPU x5550 processors @ 2.67 GHz each, 12.0 GB RAM, and a 1.5 GB PCIe nVidia Quadro FX 4800 graphics card (Microsoft...Cole Publishing Company . Miller, C. A., & Parasuraman, R. (2007). Designing for flexible interaction between humans and automation: Delegation

Peregrine Software Toolchains | High-Performance Computing | NREL

Science.gov Websites

toolchain is an open-source alternative against which many technical applications are natively developed and tested. The Portland Group compilers are not fully supported, but are available to the HPC community. Use Group (PGI) C/C++ and Fortran (partially supported) The PGI Accelerator compilers include NVIDIA GPU
78 FR 12085 - Environmental Documents Prepared for Oil, Gas, and Mineral Operations by the Gulf of Mexico Outer...

Federal Register 2010, 2011, 2012, 2013, 2014

2013-02-21

.... Planning Area of the Gulf of Mexico. Tesla Offshore, LLC, Geological Eugene Island South 12/5/2012.... Tesla Offshore, LLC, Geological Eugene Island South 12/18/2012 & Geophysical Survey, SEA L12- Addition...
Ability of preoperative 3.0-Tesla magnetic resonance imaging to predict the absence of side-specific extracapsular extension of prostate cancer.

PubMed

Hara, Tomohiko; Nakanishi, Hiroyuki; Nakagawa, Tohru; Komiyama, Motokiyo; Kawahara, Takashi; Manabe, Tomoko; Miyake, Mototaka; Arai, Eri; Kanai, Yae; Fujimoto, Hiroyuki

2013-10-01

Recent studies have shown an improvement in prostate cancer diagnosis with the use of 3.0-Tesla magnetic resonance imaging. We retrospectively assessed the ability of this imaging technique to predict side-specific extracapsular extension of prostate cancer. From October 2007 to August 2011, prostatectomy was carried out in 396 patients after preoperative 3.0-Tesla magnetic resonance imaging. Among these, 132 (primary sample) and 134 patients (validation sample) underwent 12-core prostate biopsy at the National Cancer Center Hospital of Tokyo, Japan, and at other institutions, respectively. In the primary dataset, univariate and multivariate analyses were carried out to predict side-specific extracapsular extension using variables determined preoperatively, including 3.0-Tesla magnetic resonance imaging findings (T2-weighted and diffusion-weighted imaging). A prediction model was then constructed and applied to the validation study sample. Multivariate analysis identified four significant independent predictors (P < 0.05), including a biopsy Gleason score of ≥8, positive 3.0-Tesla diffusion-weighted magnetic resonance imaging findings, ≥2 positive biopsy cores on each side and a maximum percentage of positive cores ≥31% on each side. The negative predictive value was 93.9% in the combination model with these four predictors, meanwhile the positive predictive value was 33.8%. Good reproducibility of these four significant predictors and the combination model was observed in the validation study sample. The side-specific extracapsular extension prediction by the biopsy Gleason score and factors associated with tumor location, including a positive 3.0-Tesla diffusion-weighted magnetic resonance imaging finding, have a high negative predictive value, but a low positive predictive value. © 2013 The Japanese Urological Association.
Intraoperative Magnetic Resonance Imaging for Cranial and Spinal Cases Using Preexisting "C" Shaped Three Side Open 0.2 Tesla Magnetic Resonance Imaging.

PubMed

Tewari, Vinod Kumar; Tripathi, Ravindra; Aggarwal, Subodh; Hussain, Mazhar; Das Gupta, Hari Kishan

2017-01-01

The existing Intraoperative MRI (IMRI) of developed countries is too costly to be affordable in any developing country and out of the reach of common and poor people of developing country at remote areas. We have used the pre-existing (refurbished) 3 side open "C" shaped 0.2 Tesla MRI for IMRI in a very remote area. In this technique the 0.2 Tesla MRI and the operating theatre were merged. MRI table was used as an operation table. We have operated 36 cases via IMRI from November 2005 to till date. First case operated was on 13 th nov 2005. Low (0.2) Tesla open setup costs very low (around Rs 40 lakhs) so highly affordable to management and thus to patients, used for diagnostic and therapeutic purposes both, the equipments like Nitrous, oxygen and suction is outside the MRI room so no noise inside operative room, positioning the patient didn't take much time due to manual adjustments, no special training to nurses and technicians required because of low (0.2) Tesla power of magnet and same instruments and techniques, sequencing took only 1.31 mints per sequence and re registration is not required since we always note down the two orthogonal axis in x and y axis in preoperative imaging and we were able to operate on posterior fossa tumors as well because of no head fixation except with leucoplast strap. Moreover the images we got intraoperative are highly acceptable. Three side open 0.2 Tesla MRI system, if used for intraoperative guidance, is highly affordable and overcomes the limitations of western setup of IMRI. Postoperative MRI images were highly acceptable and also highly affordable too.
MGUPGMA: A Fast UPGMA Algorithm With Multiple Graphics Processing Units Using NCCL

PubMed Central

Hua, Guan-Jie; Hung, Che-Lun; Lin, Chun-Yuan; Wu, Fu-Che; Chan, Yu-Wei; Tang, Chuan Yi

2017-01-01

A phylogenetic tree is a visual diagram of the relationship between a set of biological species. The scientists usually use it to analyze many characteristics of the species. The distance-matrix methods, such as Unweighted Pair Group Method with Arithmetic Mean and Neighbor Joining, construct a phylogenetic tree by calculating pairwise genetic distances between taxa. These methods have the computational performance issue. Although several new methods with high-performance hardware and frameworks have been proposed, the issue still exists. In this work, a novel parallel Unweighted Pair Group Method with Arithmetic Mean approach on multiple Graphics Processing Units is proposed to construct a phylogenetic tree from extremely large set of sequences. The experimental results present that the proposed approach on a DGX-1 server with 8 NVIDIA P100 graphic cards achieves approximately 3-fold to 7-fold speedup over the implementation of Unweighted Pair Group Method with Arithmetic Mean on a modern CPU and a single GPU, respectively. PMID:29051701
MGUPGMA: A Fast UPGMA Algorithm With Multiple Graphics Processing Units Using NCCL.

PubMed

Hua, Guan-Jie; Hung, Che-Lun; Lin, Chun-Yuan; Wu, Fu-Che; Chan, Yu-Wei; Tang, Chuan Yi

2017-01-01

A phylogenetic tree is a visual diagram of the relationship between a set of biological species. The scientists usually use it to analyze many characteristics of the species. The distance-matrix methods, such as Unweighted Pair Group Method with Arithmetic Mean and Neighbor Joining, construct a phylogenetic tree by calculating pairwise genetic distances between taxa. These methods have the computational performance issue. Although several new methods with high-performance hardware and frameworks have been proposed, the issue still exists. In this work, a novel parallel Unweighted Pair Group Method with Arithmetic Mean approach on multiple Graphics Processing Units is proposed to construct a phylogenetic tree from extremely large set of sequences. The experimental results present that the proposed approach on a DGX-1 server with 8 NVIDIA P100 graphic cards achieves approximately 3-fold to 7-fold speedup over the implementation of Unweighted Pair Group Method with Arithmetic Mean on a modern CPU and a single GPU, respectively.
Rectal cancer confined to the bowel wall: the role of 3 Tesla phased-array MR imaging in T categorization.

PubMed

Çolakoğlu Er, Hale; Peker, Elif; Erden, Ayşe; Erden, İlhan; Geçim, Ethem; Savaş, Berna

2018-02-01

To determine the diagnostic value of 3 Tesla MR imaging in detection of mucosal (Tis), submucosal (T 1 ) and muscularis propria (T 2 ) invasion in patients with early rectal cancer. A total of 50 consecutive patients who underwent 3 Tesla MR imaging and curative-intent intervention for MRI-staged Tis/T 1 /T 2 rectal cancer from March 2012 to December 2016 were included. The radiological T category of each rectal tumour was compared retrospectively with histopathological results assessed according to the tumor, node, metastasis (TNM) classification. The sensitivities, specificities, and overall accuracy rates of 3 Tesla MR imaging for Tis, T 1 , and T 2 cases were calculated using MedCalc statistical software v. 16. The sensitivity, specificity, PPV, NPV of 3 Tesla MR imaging in T categorization for T 2 were: 93.7% [95% CI (0.79-0.99)], 77.7% [95% CI (0.52-0.93)], 88.2% [95% CI (0.75-0.94)] and 87.5% [95% CI (0.64-0.96)]; for T 1 were 92% [95% CI (0.63-0.99)], 91.8% [95% CI (0.78-0.98)], 80% [95% CI (0.57-0.92)] and 97.1% [95% CI (0.83-0.99)]; for Tis were: 20% [95% CI (0.51-0.71)], 100% [95% CI (0.92-1)], 100%, 91.8% [95% CI (0.87-0.94)], respectively. MR categorization accuracy rates for T 2 , T 1 and Tis were calculated as 88, 92 and 92%, respectively. 3 Tesla MR imaging seems to be useful for accurate categorization of T-stage in early rectal cancer, especially for T 1 cancers. The method is not a reliable tool to detect Tis cases. The potential for overstaging and understaging of the technique should be realized and taken into consideration when tailoring the treatment protocol for each patient. Advances in knowledge: High-resolution MR with phased-array coil is being increasingly used in the pre-operative assessment of rectal cancer. 3 Tesla high-resolution MR imaging allows improved definition of bowel wall and tumour infiltration.
Ultrastructural changes in the receptor parts of retinal rods in experimental alloxan-induced diabetes in rabbits.

PubMed

Zarebska, A; Łańcut, M; Bakiera, K; Matejko, E; Czerny, K; Kiś, G; Wójtowicz, Z

2001-01-01

Mature rabbits were administered a single dose of alloxan at the dose 100 mg/kg b.m. After 3 and 6 weeks and after 3 and 6 months, the samples of the retina were taken from the areas immediate to the papilla of the optic nerve. Ultrathin sections were dyed according to the Reynold's method, and the receptive parts of the rods were examined under electron microscope BS-500 Tesla. After 6 weeks following alloxan administration, distinct morphological changes in the form of enlargement of certain discs in the receptive parts of rod cells were observed. After 3 months the majority of the discs was damaged, and after 6 months only single, quite well preserved rod cells were found to be present in the retina.
Paul Drude's Prediction of Nonreciprocal Mutual Inductance for Tesla Transformers

PubMed Central

McGuyer, Bart

2014-01-01

Inductors, transmission lines, and Tesla transformers have been modeled with lumped-element equivalent circuits for over a century. In a well-known paper from 1904, Paul Drude predicts that the mutual inductance for an unloaded Tesla transformer should be nonreciprocal. This historical curiosity is mostly forgotten today, perhaps because it appears incorrect. However, Drude's prediction is shown to be correct for the conditions treated, demonstrating the importance of constraints in deriving equivalent circuits for distributed systems. The predicted nonreciprocity is not fundamental, but instead is an artifact of the misrepresentation of energy by an equivalent circuit. The application to modern equivalent circuits is discussed. PMID:25542040
Paul Drude's prediction of nonreciprocal mutual inductance for Tesla transformers.

PubMed

McGuyer, Bart

2014-01-01

Inductors, transmission lines, and Tesla transformers have been modeled with lumped-element equivalent circuits for over a century. In a well-known paper from 1904, Paul Drude predicts that the mutual inductance for an unloaded Tesla transformer should be nonreciprocal. This historical curiosity is mostly forgotten today, perhaps because it appears incorrect. However, Drude's prediction is shown to be correct for the conditions treated, demonstrating the importance of constraints in deriving equivalent circuits for distributed systems. The predicted nonreciprocity is not fundamental, but instead is an artifact of the misrepresentation of energy by an equivalent circuit. The application to modern equivalent circuits is discussed.
An initial physical mechanism in the treatment of neurologic disorders with externally applied pico Tesla magnetic fields.

PubMed

Jacobson, J I; Yamanashi, W S

1995-04-01

The recent clinical studies describing the treatment of some neurological disorders with an externally applied pico Tesla (10(-12) Tesla, or 10(-8) gauss) magnetic field are considered from a physical view point. An equation relating the intrinsic (or rest) energy of a charged particle of mass m with its energy of interaction in an externally applied magnetic field B is presented. The equation represents an initial basic physical interaction as a part of a more complex biological mechanism to explain the therapeutic effects of externally applied magnetic fields in these and other neurologic disorders.
A physical mechanism in the treatment of neurologic disorders with externally applied pico Tesla magnetic fields.

PubMed

Jacobson, J I; Yamanashi, W S

1995-06-01

The clinical studies describing the treatment of some neurological disorders with an externally applied pico Tesla (10R Tesla, or 10(-8) gauss) magnetic field are considered from a physical view point. An equation relating the intrinsic or "rest" energy of a charged particle of mass with its energy of interaction in an externally applied magnetic field B is presented. The equation is proposed to represent an initial basic physical interaction as a part of a more complex biological mechanism to explain the therapeutic effects of externally applied magnetic fields in these and other neurologic disorders.
Design study of steady-state 30-tesla liquid-neon-cooled magnet

NASA Technical Reports Server (NTRS)

Prok, G. M.; Brown, G. V.

1976-01-01

A design for a 30-tesla, liquid-neon-cooled magnet was reported which is capable of continuous operation. Cooled by nonboiling, forced-convection heat transfer to liquid neon flowing at 2.8 cu m/min in a closed, pressurized heat-transfer loop and structurally supported by a tapered structural ribbon, the tape-wound coils with a high-purity-aluminum conductor will produce over 30 teslas for 1 minute at 850 kilowatts. The magnet will have an inside diameter of 7.5 centimeters and an outside diameter of 54 centimeters. The minimum current density at design field will be 15.7 kA/sq cm.
Hazardous Waste Cleanup: AGFA Corporation - Peerless Photo Products in Shoreham, New York

EPA Pesticide Factsheets

The site is located on approximately 16.2 acres in a predominantly residential area. The site was originally developed in 1903 when Nikola Tesla constructed a building that served as a residence and a laboratory. Mr. Tesla also constructed a radio tower on
MR imaging of the prostate at 3 Tesla: comparison of an external phased-array coil to imaging with an endorectal coil at 1.5 Tesla.

PubMed

Sosna, Jacob; Pedrosa, Ivan; Dewolf, William C; Mahallati, Houman; Lenkinski, Robert E; Rofsky, Neil M

2004-08-01

To qualitatively compare the image quality of torso phased-array 3-Tesla (3T) imaging of the prostate with that of endorectal 1.5-Tesla imaging. Twenty cases of torso phased-array prostate imaging performed at 3-Tesla with FSE T2 weighted images were evaluated by two readers independently for visualization of the posterior border (PB), seminal vesicles (SV), neurovascular bundles (NVB), and image quality rating (IQR). Studies were performed at large fields of view(FOV) (25 cm) (14 cases) (3TL) and smaller FOV (14 cm) (19 cases) (3TS). A comparison was made to 20 consecutive cases of 1.5-T endorectal evaluation performed during the same time period.Results. 3TL produced a significantly better image quality compared with the small FOV for PB (P = .0001), SV (P =.0001), and IQR (P = .0001). There was a marginally significant difference within the NVB category (P = .0535). 3TL produced an image of similar quality to image quality at 1.5 T for PB (P = .3893), SV (P = .8680), NB (P = .2684), and IQR (P = .8599). Prostate image quality at 3T with a torso phased-array coil can be comparable with that of endorectal 1.5-T imaging. These findings suggest that additional options are now available for magnetic resonance imaging of the prostate gland.
Geophysical variables and behavior: XCIX. Reductions in numbers of neurons within the parasolitary nucleus in rats exposed perinatally to a magnetic pattern designed to imitate geomagnetic continuous pulsations: implications for sudden infant death.

PubMed

Dupont, M J; McKay, B E; Parker, G; Persinger, M A

2004-06-01

Correlational analyses have shown a moderate strength association between the occurrence of continuous pulsations, a type of geomagnetic activity within the 0.2-Hz to 5-Hz range, and the occurrence of Sudden Infant Deaths. In the present study, rats were exposed continuously from two days before birth to seven days after birth to 0.5-Hz pulsed-square wave magnetic fields whose intensities were within either the nanoTesla or microTesla range. The magnetic fields were generated in either an east-west (E-W) or north-south (N-S) direction. At 21 days of age, the area of the parasolitary nucleus (but not the solitary nucleus) was significantly smaller, and the numbers of neurons were significantly less in rats that had been exposed to the nanoT fields generated in the east-west direction or to the microTesla fields generated within either E-W or N-S direction relative to those exposed to the N-S nanoTesla fields. These results suggest nanoTesla magnetic fields, when applied in a specific direction, might interact with the local geomagnetic field to affect cell migration in structures within the brain stem that modulate vestibular-related arousal and respiratory or cardiovascular stability.
Finite Element Optimization for Nondestructive Evaluation on a Graphics Processing Unit for Ground Vehicle Hull Inspection

DTIC Science & Technology

2013-08-22

4 cores, where the code may simultaneously run on the multiple cores or the graphics processing unit (or GPU – to be more specific on an NVIDIA ...allowed to get accurate crack shapes. DISCLAIMER Reference herein to any specific commercial company , product, process, or service by trade name
An Interview with John Liontas

ERIC Educational Resources Information Center

Sadeghi, Karim

2017-01-01

John I. Liontas, Ph.D. is an associate professor of foreign languages, English for speakers of other languages (ESOL), and technology in education and second language acquisition (TESLA), and director and faculty of the TESLA doctoral program at the University of South Florida. Dr. Liontas is a distinguished thought leader, author, and…
Identifying Needed Technical Standards: The LITA TESLA Committee at Work.

ERIC Educational Resources Information Center

Carter, Ruth C.

1984-01-01

Efforts of the Technical Standards for Library Automation Committee (TESLA), a division-wide committee of the Library Information and Technology Association (LITA) of the American Library Association, are described. The current status of suggested technical standards and recommended action are detailed. Five sources are given. (Author/EJS)
DOE Office of Scientific and Technical Information (OSTI.GOV)

Majzoobi, A.; Joshi, R. P., E-mail: ravi.joshi@ttu.edu; Neuber, A. A.

Particle-in-cell simulations are performed to analyze the efficiency, output power and leakage currents in a 12-Cavity, 12-Cathode rising-sun magnetron with diffraction output (MDO). The central goal is to conduct a parameter study of a rising-sun magnetron that comprehensively incorporates performance enhancing features such as transparent cathodes, axial extraction, the use of endcaps, and cathode extensions. Our optimum results demonstrate peak output power of about 2.1 GW, with efficiencies of ∼70% and low leakage currents at a magnetic field of 0.45 Tesla, a 400 kV bias with a single endcap, for a range of cathode extensions between 3 and 6 centimeters.

GRay: A Massively Parallel GPU-based Code for Ray Tracing in Relativistic Spacetimes

NASA Astrophysics Data System (ADS)

Chan, Chi-kwan; Psaltis, Dimitrios; Özel, Feryal

2013-11-01

We introduce GRay, a massively parallel integrator designed to trace the trajectories of billions of photons in a curved spacetime. This graphics-processing-unit (GPU)-based integrator employs the stream processing paradigm, is implemented in CUDA C/C++, and runs on nVidia graphics cards. The peak performance of GRay using single-precision floating-point arithmetic on a single GPU exceeds 300 GFLOP (or 1 ns per photon per time step). For a realistic problem, where the peak performance cannot be reached, GRay is two orders of magnitude faster than existing central-processing-unit-based ray-tracing codes. This performance enhancement allows more effective searches of large parameter spaces when comparing theoretical predictions of images, spectra, and light curves from the vicinities of compact objects to observations. GRay can also perform on-the-fly ray tracing within general relativistic magnetohydrodynamic algorithms that simulate accretion flows around compact objects. Making use of this algorithm, we calculate the properties of the shadows of Kerr black holes and the photon rings that surround them. We also provide accurate fitting formulae of their dependencies on black hole spin and observer inclination, which can be used to interpret upcoming observations of the black holes at the center of the Milky Way, as well as M87, with the Event Horizon Telescope.
Implementation and Optimization of miniGMG - a Compact Geometric Multigrid Benchmark

DOE Office of Scientific and Technical Information (OSTI.GOV)

Williams, Samuel; Kalamkar, Dhiraj; Singh, Amik

2012-12-01

Multigrid methods are widely used to accelerate the convergence of iterative solvers for linear systems used in a number of different application areas. In this report, we describe miniGMG, our compact geometric multigrid benchmark designed to proxy the multigrid solves found in AMR applications. We explore optimization techniques for geometric multigrid on existing and emerging multicore systems including the Opteron-based Cray XE6, Intel Sandy Bridge and Nehalem-based Infiniband clusters, as well as manycore-based architectures including NVIDIA's Fermi and Kepler GPUs and Intel's Knights Corner (KNC) co-processor. This report examines a variety of novel techniques including communication-aggregation, threaded wavefront-based DRAM communication-avoiding,more » dynamic threading decisions, SIMDization, and fusion of operators. We quantify performance through each phase of the V-cycle for both single-node and distributed-memory experiments and provide detailed analysis for each class of optimization. Results show our optimizations yield significant speedups across a variety of subdomain sizes while simultaneously demonstrating the potential of multi- and manycore processors to dramatically accelerate single-node performance. However, our analysis also indicates that improvements in networks and communication will be essential to reap the potential of manycore processors in large-scale multigrid calculations.« less
Rapid simulation of X-ray transmission imaging for baggage inspection via GPU-based ray-tracing

NASA Astrophysics Data System (ADS)

Gong, Qian; Stoian, Razvan-Ionut; Coccarelli, David S.; Greenberg, Joel A.; Vera, Esteban; Gehm, Michael E.

2018-01-01

We present a pipeline that rapidly simulates X-ray transmission imaging for arbitrary system architectures using GPU-based ray-tracing techniques. The purpose of the pipeline is to enable statistical analysis of threat detection in the context of airline baggage inspection. As a faster alternative to Monte Carlo methods, we adopt a deterministic approach for simulating photoelectric absorption-based imaging. The highly-optimized NVIDIA OptiX API is used to implement ray-tracing, greatly speeding code execution. In addition, we implement the first hierarchical representation structure to determine the interaction path length of rays traversing heterogeneous media described by layered polygons. The accuracy of the pipeline has been validated by comparing simulated data with experimental data collected using a heterogenous phantom and a laboratory X-ray imaging system. On a single computer, our approach allows us to generate over 400 2D transmission projections (125 × 125 pixels per frame) per hour for a bag packed with hundreds of everyday objects. By implementing our approach on cloud-based GPU computing platforms, we find that the same 2D projections of approximately 3.9 million bags can be obtained in a single day using 400 GPU instances, at a cost of only 0.001 per bag.
Routine Microsecond Molecular Dynamics Simulations with AMBER on GPUs. 1. Generalized Born

PubMed Central

2012-01-01

We present an implementation of generalized Born implicit solvent all-atom classical molecular dynamics (MD) within the AMBER program package that runs entirely on CUDA enabled NVIDIA graphics processing units (GPUs). We discuss the algorithms that are used to exploit the processing power of the GPUs and show the performance that can be achieved in comparison to simulations on conventional CPU clusters. The implementation supports three different precision models in which the contributions to the forces are calculated in single precision floating point arithmetic but accumulated in double precision (SPDP), or everything is computed in single precision (SPSP) or double precision (DPDP). In addition to performance, we have focused on understanding the implications of the different precision models on the outcome of implicit solvent MD simulations. We show results for a range of tests including the accuracy of single point force evaluations and energy conservation as well as structural properties pertainining to protein dynamics. The numerical noise due to rounding errors within the SPSP precision model is sufficiently large to lead to an accumulation of errors which can result in unphysical trajectories for long time scale simulations. We recommend the use of the mixed-precision SPDP model since the numerical results obtained are comparable with those of the full double precision DPDP model and the reference double precision CPU implementation but at significantly reduced computational cost. Our implementation provides performance for GB simulations on a single desktop that is on par with, and in some cases exceeds, that of traditional supercomputers. PMID:22582031
76 FR 60124 - Tesla Motors, Inc.; Grant of Petition for Temporary Exemption From the Electronic Stability...

Federal Register 2010, 2011, 2012, 2013, 2014

2011-09-28

... intended its manufacturing and production line workers to complete manufacture of the remaining Roadsters... many Roadster manufacturing employees to the production operations for the Model S, and that it... additional 80 vehicles covered by its exemption request, Tesla's production and manufacturing would have a...
Multicast Delayed Authentication For Streaming Synchrophasor Data in the Smart Grid

PubMed Central

Câmara, Sérgio; Anand, Dhananjay; Pillitteri, Victoria; Carmo, Luiz

2017-01-01

Multicast authentication of synchrophasor data is challenging due to the design requirements of Smart Grid monitoring systems such as low security overhead, tolerance of lossy networks, time-criticality and high data rates. In this work, we propose inf -TESLA, Infinite Timed Efficient Stream Loss-tolerant Authentication, a multicast delayed authentication protocol for communication links used to stream synchrophasor data for wide area control of electric power networks. Our approach is based on the authentication protocol TESLA but is augmented to accommodate high frequency transmissions of unbounded length. inf TESLA protocol utilizes the Dual Offset Key Chains mechanism to reduce authentication delay and computational cost associated with key chain commitment. We provide a description of the mechanism using two different modes for disclosing keys and demonstrate its security against a man-in-the-middle attack attempt. We compare our approach against the TESLA protocol in a 2-day simulation scenario, showing a reduction of 15.82% and 47.29% in computational cost, sender and receiver respectively, and a cumulative reduction in the communication overhead. PMID:28736582
An 8-GW long-pulse generator based on Tesla transformer and pulse forming network

DOE Office of Scientific and Technical Information (OSTI.GOV)

Su, Jiancang; Zhang, Xibo; Li, Rui

A long-pulse generator TPG700L based on a Tesla transformer and a series pulse forming network (PFN) is constructed to generate intense electron beams for the purpose of high power microwave (HPM) generation. The TPG700L mainly consists of a 12-stage PFN, a built-in Tesla transformer in a pulse forming line, a three-electrode gas switch, a transmission line with a trigger, and a load. The Tesla transformer and the compact PFN are the key technologies for the development of the TPG700L. This generator can output electrical pulses with a width as long as 200 ns at a level of 8 GW andmore » a repetition rate of 50 Hz. When used to drive a relative backward wave oscillator for HPM generation, the electrical pulse width is about 100 ns on a voltage level of 520 kV. Factors affecting the pulse waveform of the TPG700L are also discussed. At present, the TPG700L performs well for long-pulse HPM generation in our laboratory.« less
An 8-GW long-pulse generator based on Tesla transformer and pulse forming network.

PubMed

Su, Jiancang; Zhang, Xibo; Li, Rui; Zhao, Liang; Sun, Xu; Wang, Limin; Zeng, Bo; Cheng, Jie; Wang, Ying; Peng, Jianchang; Song, Xiaoxin

2014-06-01

A long-pulse generator TPG700L based on a Tesla transformer and a series pulse forming network (PFN) is constructed to generate intense electron beams for the purpose of high power microwave (HPM) generation. The TPG700L mainly consists of a 12-stage PFN, a built-in Tesla transformer in a pulse forming line, a three-electrode gas switch, a transmission line with a trigger, and a load. The Tesla transformer and the compact PFN are the key technologies for the development of the TPG700L. This generator can output electrical pulses with a width as long as 200 ns at a level of 8 GW and a repetition rate of 50 Hz. When used to drive a relative backward wave oscillator for HPM generation, the electrical pulse width is about 100 ns on a voltage level of 520 kV. Factors affecting the pulse waveform of the TPG700L are also discussed. At present, the TPG700L performs well for long-pulse HPM generation in our laboratory.
Multicast Delayed Authentication For Streaming Synchrophasor Data in the Smart Grid.

PubMed

Câmara, Sérgio; Anand, Dhananjay; Pillitteri, Victoria; Carmo, Luiz

2016-01-01

Multicast authentication of synchrophasor data is challenging due to the design requirements of Smart Grid monitoring systems such as low security overhead, tolerance of lossy networks, time-criticality and high data rates. In this work, we propose inf -TESLA, Infinite Timed Efficient Stream Loss-tolerant Authentication, a multicast delayed authentication protocol for communication links used to stream synchrophasor data for wide area control of electric power networks. Our approach is based on the authentication protocol TESLA but is augmented to accommodate high frequency transmissions of unbounded length. inf TESLA protocol utilizes the Dual Offset Key Chains mechanism to reduce authentication delay and computational cost associated with key chain commitment. We provide a description of the mechanism using two different modes for disclosing keys and demonstrate its security against a man-in-the-middle attack attempt. We compare our approach against the TESLA protocol in a 2-day simulation scenario, showing a reduction of 15.82% and 47.29% in computational cost, sender and receiver respectively, and a cumulative reduction in the communication overhead.
Focused tight dressing does not prevent cochlear implant magnet migration under 1.5 Tesla MRI.

PubMed

Cuda, D; Murri, A; Succo, G

2013-04-01

We report a retrospective case of inner magnet migration, which occurred after 1.5 Tesla MRI scanning in an adult recipient of a bilateral cochlear implant (CI) despite a focused head dressing. The patient, bilaterally implanted with Nucleus 5 CIs (Cochlear LTD, Sydney, Australia), underwent a 1.5 Tesla cholangio-MRI scan for biliary duct pathology. In subsequent days, a focal skin alteration appeared over the left inner coil. Plain skull radiographs showed partial magnet migration on the left side. Surgical exploration confirmed magnet twisting; the magnet was effectively repositioned. Left CI performance was restored to pre-migration level. The wound healed without complications. Thus, focused dressing does not prevent magnet migration in CI recipients undergoing 1.5 Tesla MRI. All patients should be counselled on this potential complication. A minor surgical procedure is required to reposition the magnet. Nevertheless, timely diagnosis is necessary to prevent skin breakdown and subsequent device contamination. Plain skull radiograph is very effective in identifying magnet twisting; it should be performed systematically after MRI or minimally on all suspected cases.
Phase transition in the quantum limit of the Weyl semimetal TaAs

NASA Astrophysics Data System (ADS)

Ramshaw, Brad

Under extreme magnetic fields, electrons in a metal are confined to a single highly-degenerate quantum state -a regime known as the quantum limit. This state is unstable to the formation of new states of matter, such as the fractional quantum Hall effect in two dimensions. The fate of 3D metals in the quantum limit, on the other hand, has been relatively unexplored. The discovery of monopnictide Weyl semimetals has renewed interest in the high-field properties of 3D electrons, particularly those with linear dispersions. Several difficulties in determining the high-field properties have arisen, including the highly anisotropic nature of the magnetoresistance, and the presence of trivial (parabolic) Fermi pockets that cloud the underlying behaviour of Weyl pockets. We use magnetic fields up to 90 Tesla to put the Weyl semimetal TaAs into its extreme quantum limit, isolating its linear 0th Landau level from the rest of the electronic spectrum. We find that a gap opens in the conductivity parallel to the magnetic field above 70 Tesla, and also find an abrupt reversal in the field-evolution of the sound velocity at the same magnetic field, suggesting a thermodynamic phase transition to a new state of matter. DOE BES ''Science at 100 T''.
Quantification of Liver Proton-Density Fat Fraction in an 7.1 Tesla preclinical MR Systems: Impact of the Fitting Technique

PubMed Central

Mahlke, C; Hernando, D; Jahn, C; Cigliano, A; Ittermann, T; Mössler, A; Kromrey, ML; Domaska, G; Reeder, SB; Kühn, JP

2016-01-01

Purpose To investigate the feasibility of estimating the proton-density fat fraction (PDFF) using a 7.1 Tesla magnetic resonance imaging (MRI) system and to compare the accuracy of liver fat quantification using different fitting approaches. Materials and Methods Fourteen leptin-deficient ob/ob mice and eight intact controls were examined in a 7.1 Tesla animal scanner using a 3-dimensional six-echo chemical shift-encoded pulse sequence. Confounder-corrected PDFF was calculated using magnitude (magnitude data alone) and combined fitting (complex and magnitude data). Differences between fitting techniques were compared using Bland-Altman analysis. In addition, PDFFs derived with both reconstructions were correlated with histopathological fat content and triglyceride mass fraction using linear regression analysis. Results The PDFFs determined with use of both reconstructions correlated very strongly (r=0.91). However, small mean bias between reconstructions demonstrated divergent results (3.9%; CI 2.7%-5.1%). For both reconstructions, there was linear correlation with histopathology (combined fitting: r=0.61; magnitude fitting: r=0.64) and triglyceride content (combined fitting: r=0.79; magnitude fitting: r=0.70). Conclusion Liver fat quantification using the PDFF derived from MRI performed at 7.1 Tesla is feasible. PDFF has strong correlations with histopathologically determined fat and with triglyceride content. However, small differences between PDFF reconstruction techniques may impair the robustness and reliability of the biomarker at 7.1 Tesla. PMID:27197806
Spatial Distortion in MRI-Guided Stereotactic Procedures: Evaluation in 1.5-, 3- and 7-Tesla MRI Scanners.

PubMed

Neumann, Jan-Oliver; Giese, Henrik; Biller, Armin; Nagel, Armin M; Kiening, Karl

2015-01-01

Magnetic resonance imaging (MRI) is replacing computed tomography (CT) as the main imaging modality for stereotactic transformations. MRI is prone to spatial distortion artifacts, which can lead to inaccuracy in stereotactic procedures. Modern MRI systems provide distortion correction algorithms that may ameliorate this problem. This study investigates the different options of distortion correction using standard 1.5-, 3- and 7-tesla MRI scanners. A phantom was mounted on a stereotactic frame. One CT scan and three MRI scans were performed. At all three field strengths, two 3-dimensional sequences, volumetric interpolated breath-hold examination (VIBE) and magnetization-prepared rapid acquisition with gradient echo, were acquired, and automatic distortion correction was performed. Global stereotactic transformation of all 13 datasets was performed and two stereotactic planning workflows (MRI only vs. CT/MR image fusion) were subsequently analysed. Distortion correction on the 1.5- and 3-tesla scanners caused a considerable reduction in positional error. The effect was more pronounced when using the VIBE sequences. By using co-registration (CT/MR image fusion), even a lower positional error could be obtained. In ultra-high-field (7 T) MR imaging, distortion correction introduced even higher errors. However, the accuracy of non-corrected 7-tesla sequences was comparable to CT/MR image fusion 3-tesla imaging. MRI distortion correction algorithms can reduce positional errors by up to 60%. For stereotactic applications of utmost precision, we recommend a co-registration to an additional CT dataset. © 2015 S. Karger AG, Basel.
Genotoxic Effects of Superconducting Static Magnetic Fields (SMFs) on Wheat (Triticum aestivum) Pollen Mother Cells (PMCs)

NASA Astrophysics Data System (ADS)

Zhang, Pingping; Yin, Ruochun; Chen, Zhiyou; Wu, Lifang; Yu, Zengliang

2007-04-01

The effects of superconducting static magnetic fields (SMFs) on the pollen mother cells (PMCs) of wheat were investigated in order to evaluate the possible genotoxic effect of such non-ionizing radiation. The seeds of wheat were exposed to static magnetic fields with either different magnetic flux densities (0, 1, 3, 5 and 7 Tesla) for 5 h or different durations (1, 3 and 5 h) at a magnetic flux density of 7 Tesla. The seeds were germinated at 23oC after exposure and the seedlings were transplanted into the field. The PMCs from young wheat ears were taken and slides were made following the conventional method. The genotoxic effect was evaluated in terms of micronucleus (MN), chromosomal bridge, lagging chromosome and fragments in PMCs. Although the exposed groups of a low field intensity (below 5 Tesla) showed no statistically significant difference in the aberration frequency compared with the unexposed control groups and sham exposed groups, a significant increase in the chromosomal bridge, lagging chromosome, triple-polar segregation or micronucleus was observed at a field strength of 5 Tesla or 7 Tesla, respectively. The analysis of dose-effect relationships indicated that the increased frequency of meiotic abnormal cells correlated with the flux density of the magnetic field and duration, but no linear relationship was observed. Such statistically significant differences indicated a potential genotoxic effect of high static magnetic fields above 5 T.
Impairment of chondrocyte biosynthetic activity by exposure to 3-tesla high-field magnetic resonance imaging is temporary

PubMed Central

Sunk, Ilse-Gerlinde; Trattnig, Siegfried; Graninger, Winfried B; Amoyo, Love; Tuerk, Birgit; Steiner, Carl-Walter; Smolen, Josef S; Bobacz, Klaus

2006-01-01

The influence of magnetic resonance imaging (MRI) devices at high field strengths on living tissues is unknown. We investigated the effects of a 3-tesla electromagnetic field (EMF) on the biosynthetic activity of bovine articular cartilage. Bovine articular cartilage was obtained from juvenile and adult animals. Whole joints or cartilage explants were subjected to a pulsed 3-tesla EMF; controls were left unexposed. Synthesis of sulfated glycosaminoglycans (sGAGs) was measured by using [35S]sulfate incorporation; mRNA encoding the cartilage markers aggrecan and type II collagen, as well as IL-1β, were analyzed by RT–PCR. Furthermore, effects of the 3-tesla EMF were determined over the course of time directly after exposure (day 0) and at days 3 and 6. In addition, the influence of a 1.5-tesla EMF on cartilage sGAG synthesis was evaluated. Chondrocyte cell death was assessed by staining with Annexin V and TdT-mediated dUTP nick end labelling (TUNEL). Exposure to the EMF resulted in a significant decrease in cartilage macromolecule synthesis. Gene expression of both aggrecan and IL-1β, but not of collagen type II, was reduced in comparison with controls. Staining with Annexin V and TUNEL revealed no evidence of cell death. Interestingly, chondrocytes regained their biosynthetic activity within 3 days after exposure, as shown by proteoglycan synthesis rate and mRNA expression levels. Cartilage samples exposed to a 1.5-tesla EMF remained unaffected. Although MRI devices with a field strength of more than 1.5 T provide a better signal-to-noise ratio and thereby higher spatial resolution, their high field strength impairs the biosynthetic activity of articular chondrocytes in vitro. Although this decrease in biosynthetic activity seems to be transient, articular cartilage exposed to high-energy EMF may become vulnerable to damage. PMID:16831232
Impairment of chondrocyte biosynthetic activity by exposure to 3-tesla high-field magnetic resonance imaging is temporary.

PubMed

Sunk, Ilse-Gerlinde; Trattnig, Siegfried; Graninger, Winfried B; Amoyo, Love; Tuerk, Birgit; Steiner, Carl-Walter; Smolen, Josef S; Bobacz, Klaus

2006-01-01

The influence of magnetic resonance imaging (MRI) devices at high field strengths on living tissues is unknown. We investigated the effects of a 3-tesla electromagnetic field (EMF) on the biosynthetic activity of bovine articular cartilage. Bovine articular cartilage was obtained from juvenile and adult animals. Whole joints or cartilage explants were subjected to a pulsed 3-tesla EMF; controls were left unexposed. Synthesis of sulfated glycosaminoglycans (sGAGs) was measured by using [35S]sulfate incorporation; mRNA encoding the cartilage markers aggrecan and type II collagen, as well as IL-1beta, were analyzed by RT-PCR. Furthermore, effects of the 3-tesla EMF were determined over the course of time directly after exposure (day 0) and at days 3 and 6. In addition, the influence of a 1.5-tesla EMF on cartilage sGAG synthesis was evaluated. Chondrocyte cell death was assessed by staining with Annexin V and TdT-mediated dUTP nick end labelling (TUNEL). Exposure to the EMF resulted in a significant decrease in cartilage macromolecule synthesis. Gene expression of both aggrecan and IL-1beta, but not of collagen type II, was reduced in comparison with controls. Staining with Annexin V and TUNEL revealed no evidence of cell death. Interestingly, chondrocytes regained their biosynthetic activity within 3 days after exposure, as shown by proteoglycan synthesis rate and mRNA expression levels. Cartilage samples exposed to a 1.5-tesla EMF remained unaffected. Although MRI devices with a field strength of more than 1.5 T provide a better signal-to-noise ratio and thereby higher spatial resolution, their high field strength impairs the biosynthetic activity of articular chondrocytes in vitro. Although this decrease in biosynthetic activity seems to be transient, articular cartilage exposed to high-energy EMF may become vulnerable to damage.
MRI T2 Mapping of the Knee Articular Cartilage Using Different Acquisition Sequences and Calculation Methods at 1.5 Tesla.

PubMed

Mars, Mokhtar; Bouaziz, Mouna; Tbini, Zeineb; Ladeb, Fethi; Gharbi, Souha

2018-06-12

This study aims to determine how Magnetic Resonance Imaging (MRI) acquisition techniques and calculation methods affect T2 values of knee cartilage at 1.5 Tesla and to identify sequences that can be used for high-resolution T2 mapping in short scanning times. This study was performed on phantom and twenty-nine patients who underwent MRI of the knee joint at 1.5 Tesla. The protocol includes T2 mapping sequences based on Single Echo Spin Echo (SESE), Multi-Echo Spin Echo (MESE), Fast Spin Echo (FSE) and Turbo Gradient Spin Echo (TGSE). The T2 relaxation times were quantified and evaluated using three calculation methods (MapIt, Syngo Offline and monoexponential fit). Signal to Noise Ratios (SNR) were measured in all sequences. All statistical analyses were performed using the t-test. The average T2 values in phantom were 41.7 ± 13.8 ms for SESE, 43.2 ± 14.4 ms for MESE, 42.4 ± 14.1 ms for FSE and 44 ± 14.5 ms for TGSE. In the patient study, the mean differences were 6.5 ± 8.2 ms, 7.8 ± 7.6 ms and 8.4 ± 14.2 ms for MESE, FSE and TGSE compared to SESE respectively; these statistical results were not significantly different (p > 0.05). The comparison between the three calculation methods showed no significant difference (p > 0.05). t-Test showed no significant difference between SNR values for all sequences. T2 values depend not only on the sequence type but also on the calculation method. None of the sequences revealed significant differences compared to the SESE reference sequence. TGSE with its short scanning time can be used for high-resolution T2 mapping. ©2018The Author(s). Published by S. Karger AG, Basel.
Prosepective Study to Evaluate Rate and Frequency of Perturbations of Implanted Programmable Hakim Codman Valve After 1.5-Tesla Magnetic Resonance Imaging.

PubMed

Capitanio, Jody Filippo; Venier, Alice; Mazzeo, Lucio Aniello; Barzaghi, Lina Raffaella; Acerno, Stefania; Mortini, Pietro

2016-04-01

Exposure to magnetic fields may alter the settings of programmable ventriculoperitoneal shunt valves or even cause permanent damages to these devices. There is little information about this topic, none on live patients. To investigate the effects of 1.5-tesla magnetic resonance imaging (MRI) on Hakim-Codman (HC) pressure programmable valves implanted in our hospital. A single-center prospective study to assess the rate of perturbations of HC programmable valve implanted. One hundred consecutive patients implanted for different clinical reasons between 2008 and 2012 were examined. A conventional skull x-ray before and after a standard MRI on 1.5 tesla. We evaluated before and after results, analyzed modification rate, and verified eventual damages to the implanted devices. Implanted HC valves are extremely handy and durable, even if they are likely to change often due to the exposure to magnetic fields. None of the patients complained of heating effects. Oscillations range from 10-30 mm H2O with a patient who reached 50 mm H2O and 1 who reached 60 mm H2O. Global alteration rate was 40%: 10 patients (10%) experienced a 10 mm H2O change; 14 patients (14%) had a 20 mm H2O change; 6 patients (6%) had a 30 mm H2O change; 8 patients (8%) had a 40 mm H2O change; 1 patient had a 50 mm H2O change; and 1 patient had a 60 mm H2O change. HC valves presented a variable perturbation rate, with an alteration rate of 40% with 1.5-telsa MRI. We have not observed malfunctioning hardware as a result of magnetic influence. We claim a cranial x-ray immediately after the MRI because of a high risk (40%) of decalibration, especially in patients with low ventricles compliance. Copyright © 2016 Elsevier Inc. All rights reserved.
Computational Omics Funding Opportunity | Office of Cancer Clinical Proteomics Research

Cancer.gov

The National Cancer Institute's Clinical Proteomic Tumor Analysis Consortium (CPTAC) and the NVIDIA Foundation are pleased to announce funding opportunities in the fight against cancer. Each organization has launched a request for proposals (RFP) that will collectively fund up to $2 million to help to develop a new generation of data-intensive scientific tools to find new ways to treat cancer.
Universal Batch Steganalysis

DTIC Science & Technology

2014-06-01

in large-scale datasets such as might be obtained by monitoring a corporate network or social network. Identifying guilty actors, rather than payload...by monitoring a corporate network or social network. Identifying guilty actors, rather than payload-carrying objects, is entirely novel in steganalysis...implementation using Compute Unified Device Architecture (CUDA) on NVIDIA graphics cards. The key to good performance is to combine computations so that

Universal Batch Steganalysis

DTIC Science & Technology

2014-06-30

steganalysis) in large-scale datasets such as might be obtained by monitoring a corporate network or social network. Identifying guilty actors...guilty’ user (of steganalysis) in large-scale datasets such as might be obtained by monitoring a corporate network or social network. Identifying guilty...floating point operations (1 TFLOPs) for a 1 megapixel image. We designed a new implementation using Compute Unified Device Architecture (CUDA) on NVIDIA
Using 3D Computer Graphics Multimedia to Motivate Preservice Teachers' Learning of Geometry and Pedagogy

ERIC Educational Resources Information Center

Goodson-Espy, Tracy; Lynch-Davis, Kathleen; Schram, Pamela; Quickenton, Art

2010-01-01

This paper describes the genesis and purpose of our geometry methods course, focusing on a geometry-teaching technology we created using NVIDIA[R] Chameleon demonstration. This article presents examples from a sequence of lessons centered about a 3D computer graphics demonstration of the chameleon and its geometry. In addition, we present data…
Supercomputing with toys: harnessing the power of NVIDIA 8800GTX and playstation 3 for bioinformatics problem.

PubMed

Wilson, Justin; Dai, Manhong; Jakupovic, Elvis; Watson, Stanley; Meng, Fan

2007-01-01

Modern video cards and game consoles typically have much better performance to price ratios than that of general purpose CPUs. The parallel processing capabilities of game hardware are well-suited for high throughput biomedical data analysis. Our initial results suggest that game hardware is a cost-effective platform for some computationally demanding bioinformatics problems.
Nonenhanced ECG-gated quiescent-interval single shot MRA: image quality and stenosis assessment at 3 tesla compared with contrast-enhanced MRA and digital subtraction angiography.

PubMed

Hansmann, Jan; Morelli, John N; Michaely, Henrik J; Riester, Thomas; Budjan, Johannes; Schoenberg, Stefan O; Attenberger, Ulrike I

2014-06-01

To evaluate the diagnostic accuracy of a nonenhanced electrocardiograph-gated quiescent-interval single shot MR-angiography (QISS-MRA) at 3 Tesla with contrast-enhanced MRA (CE-MRA) and digital subtraction angiography (DSA) serving as reference standard. Following institutional review board approval, 16 consecutive patients with peripheral arterial disease underwent a combined peripheral MRA protocol consisting of a large field-of-view QISS-MRA, continuous table movement MRA, and an additional time-resolved MRA of the calves. DSA correlation was available in eight patients. Image quality and degree of stenosis was assessed. Sensitivity and specificity of QISS-MRA was evaluated with CE-MRA and DSA serving as the standards of reference and compared using the Fisher exact test. With the exception of the calf station, image quality with QISS-MRA was rated statistically significantly less than that of CE-MRA (P < 0.05, P = 0.17, and P = 0.6, respectively). A greater percentage of segments were not accessible with QISS-MRA (19.5-20.1%) in comparison to CE-MRA (10.9%). Relative to DSA, sensitivity for QISS-MRA was high (100% versus 91.2% for CE-MRA, P = 0.24) in the evaluated segments; however, specificity (76.5%) was substantially less than that of CE-MRA (94.6%, P = 0.003). Overall image quality and specificity of QISS-MRA at 3T are diminished relative to CE-MRA. However, when image quality is adequate, QISS-MRA has high sensitivity and, thus, has potential use in patients with contraindications to gadolinium. Copyright © 2013 Wiley Periodicals, Inc.
GPUmotif: An Ultra-Fast and Energy-Efficient Motif Analysis Program Using Graphics Processing Units

PubMed Central

Zandevakili, Pooya; Hu, Ming; Qin, Zhaohui

2012-01-01

Computational detection of TF binding patterns has become an indispensable tool in functional genomics research. With the rapid advance of new sequencing technologies, large amounts of protein-DNA interaction data have been produced. Analyzing this data can provide substantial insight into the mechanisms of transcriptional regulation. However, the massive amount of sequence data presents daunting challenges. In our previous work, we have developed a novel algorithm called Hybrid Motif Sampler (HMS) that enables more scalable and accurate motif analysis. Despite much improvement, HMS is still time-consuming due to the requirement to calculate matching probabilities position-by-position. Using the NVIDIA CUDA toolkit, we developed a graphics processing unit (GPU)-accelerated motif analysis program named GPUmotif. We proposed a “fragmentation" technique to hide data transfer time between memories. Performance comparison studies showed that commonly-used model-based motif scan and de novo motif finding procedures such as HMS can be dramatically accelerated when running GPUmotif on NVIDIA graphics cards. As a result, energy consumption can also be greatly reduced when running motif analysis using GPUmotif. The GPUmotif program is freely available at http://sourceforge.net/projects/gpumotif/ PMID:22662128
Graphics processing unit (GPU)-based computation of heat conduction in thermally anisotropic solids

NASA Astrophysics Data System (ADS)

Nahas, C. A.; Balasubramaniam, Krishnan; Rajagopal, Prabhu

2013-01-01

Numerical modeling of anisotropic media is a computationally intensive task since it brings additional complexity to the field problem in such a way that the physical properties are different in different directions. Largely used in the aerospace industry because of their lightweight nature, composite materials are a very good example of thermally anisotropic media. With advancements in video gaming technology, parallel processors are much cheaper today and accessibility to higher-end graphical processing devices has increased dramatically over the past couple of years. Since these massively parallel GPUs are very good in handling floating point arithmetic, they provide a new platform for engineers and scientists to accelerate their numerical models using commodity hardware. In this paper we implement a parallel finite difference model of thermal diffusion through anisotropic media using the NVIDIA CUDA (Compute Unified device Architecture). We use the NVIDIA GeForce GTX 560 Ti as our primary computing device which consists of 384 CUDA cores clocked at 1645 MHz with a standard desktop pc as the host platform. We compare the results from standard CPU implementation for its accuracy and speed and draw implications for simulation using the GPU paradigm.
GPUmotif: an ultra-fast and energy-efficient motif analysis program using graphics processing units.

PubMed

Zandevakili, Pooya; Hu, Ming; Qin, Zhaohui

2012-01-01

Computational detection of TF binding patterns has become an indispensable tool in functional genomics research. With the rapid advance of new sequencing technologies, large amounts of protein-DNA interaction data have been produced. Analyzing this data can provide substantial insight into the mechanisms of transcriptional regulation. However, the massive amount of sequence data presents daunting challenges. In our previous work, we have developed a novel algorithm called Hybrid Motif Sampler (HMS) that enables more scalable and accurate motif analysis. Despite much improvement, HMS is still time-consuming due to the requirement to calculate matching probabilities position-by-position. Using the NVIDIA CUDA toolkit, we developed a graphics processing unit (GPU)-accelerated motif analysis program named GPUmotif. We proposed a "fragmentation" technique to hide data transfer time between memories. Performance comparison studies showed that commonly-used model-based motif scan and de novo motif finding procedures such as HMS can be dramatically accelerated when running GPUmotif on NVIDIA graphics cards. As a result, energy consumption can also be greatly reduced when running motif analysis using GPUmotif. The GPUmotif program is freely available at http://sourceforge.net/projects/gpumotif/
SU-F-T-256: 4D IMRT Planning Using An Early Prototype GPU-Enabled Eclipse Workstation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hagan, A; Modiri, A; Sawant, A

Purpose: True 4D IMRT planning, based on simultaneous spatiotemporal optimization has been shown to significantly improve plan quality in lung radiotherapy. However, the high computational complexity associated with such planning represents a significant barrier to widespread clinical deployment. We introduce an early prototype GPU-enabled Eclipse workstation for inverse planning. To our knowledge, this is the first GPUintegrated Eclipse system demonstrating the potential for clinical translation of GPU computing on a major commercially-available TPS. Methods: The prototype system comprised of four NVIDIA Tesla K80 GPUs, with a maximum processing capability of 8.5 Tflops per K80 card. The system architecture consisted ofmore » three key modules: (i) a GPU-based inverse planning module using a highly-parallelizable, swarm intelligence-based global optimization algorithm, (ii) a GPU-based open-source b-spline deformable image registration module, Elastix, and (iii) a CUDA-based data management module. For evaluation, aperture fluence weights in an IMRT plan were optimized over 9 beams,166 apertures and 10 respiratory phases (14940 variables) for a lung cancer case (GTV = 95 cc, right lower lobe, 15 mm cranio-caudal motion). Sensitivity of the planning time and memory expense to parameter variations was quantified. Results: GPU-based inverse planning was significantly accelerated compared to its CPU counterpart (36 vs 488 min, for 10 phases, 10 search agents and 10 iterations). The optimized IMRT plan significantly improved OAR sparing compared to the original internal target volume (ITV)-based clinical plan, while maintaining prescribed tumor coverage. The dose-sparing improvements were: Esophagus Dmax 50%, Heart Dmax 42% and Spinal cord Dmax 25%. Conclusion: Our early prototype system demonstrates that through massive parallelization, computationally intense tasks such as 4D treatment planning can be accomplished in clinically feasible timeframes. With further optimization, such systems are expected to enable the eventual clinical translation of higher-dimensional and complex treatment planning strategies to significantly improve plan quality. This work was partially supported through research funding from National Institutes of Health (R01CA169102) and Varian Medical Systems, Palo Alto, CA, USA.« less
Tesla coil discharges guided by femtosecond laser filaments in air

NASA Astrophysics Data System (ADS)

Brelet, Yohann; Houard, Aurélien; Arantchouk, Leonid; Forestier, Benjamin; Liu, Yi; Prade, Bernard; Carbonnel, Jérôme; André, Yves-Bernard; Mysyrowicz, André

2012-04-01

A Tesla coil generator was designed to produce high voltage pulses oscillating at 100 kHz synchronisable with a nanosecond temporal jitter. Using this compact high voltage generator, we demonstrate reproducible meter long discharges in air at a repetition rate of 1 Hz. Triggering and guiding of the discharges are performed in air by femtosecond laser filaments.
Quadrature transmit array design using single-feed circularly polarized patch antenna for parallel transmission in MR imaging.

PubMed

Pang, Yong; Yu, Baiying; Vigneron, Daniel B; Zhang, Xiaoliang

2014-02-01

Quadrature coils are often desired in MR applications because they can improve MR sensitivity and also reduce excitation power. In this work, we propose, for the first time, a quadrature array design strategy for parallel transmission at 298 MHz using single-feed circularly polarized (CP) patch antenna technique. Each array element is a nearly square ring microstrip antenna and is fed at a point on the diagonal of the antenna to generate quadrature magnetic fields. Compared with conventional quadrature coils, the single-feed structure is much simple and compact, making the quadrature coil array design practical. Numerical simulations demonstrate that the decoupling between elements is better than -35 dB for all the elements and the RF fields are homogeneous with deep penetration and quadrature behavior in the area of interest. Bloch equation simulation is also performed to simulate the excitation procedure by using an 8-element quadrature planar patch array to demonstrate its feasibility in parallel transmission at the ultrahigh field of 7 Tesla.
A 12-coil superconducting 'bumpy torus' magnet facility for plasma research.

NASA Technical Reports Server (NTRS)

Roth, J. R.; Holmes, A. D.; Keller, T. A.; Krawczonek, W. M.

1972-01-01

A retrospective summary is presented of the performance of the two-coil superconducting pilot rig which preceded the NASA Lewis bumpy torus. The NASA Lewis bumpy torus facility consists of 12 superconducting coils, each with a 19 cm i.d. and capable of producing magnetic field strengths of 3.0 teslas on their axes. The magnets are equally spaced around a major circumference 1.52 m in diameter, and are mounted with the major axis of the torus vertical in a single vacuum tank 2.59 m in diameter. The design value of maximum magnetic field on the magnetic axis (3.0 T) has been reached and exceeded.
Pulse X-ray device for stereo imaging and few-projection tomography of explosive and fast processes

NASA Astrophysics Data System (ADS)

Palchikov, E. I.; Dolgikh, A. V.; Klypin, V. V.; Krasnikov, I. Y.; Ryabchun, A. M.

2017-10-01

This paper describes the operation principles and design features of the device for single pulse X-raying of explosive and high-speed processes, developed on the basis of a Tesla transformer with lumped secondary capacitor bank. The circuit with the lumped capacitor bank allows transferring a greater amount of energy to the discharge circuit as compared with the Marks-surge generator for more effective operation with remote X-ray tubes connected by coaxial cables. The device equipped with multiple X-ray tubes provides simultaneous X-raying of extended or spaced objects, stereo imaging, or few-projection tomography.
Comparison of pelvic phased-array versus endorectal coil magnetic resonance imaging at 3 Tesla for local staging of prostate cancer.

PubMed

Kim, Bum Soo; Kim, Tae-Hwan; Kwon, Tae Gyun; Yoo, Eun Sang

2012-05-01

Several studies have demonstrated the superiority of endorectal coil magnetic resonance imaging (MRI) over pelvic phased-array coil MRI at 1.5 Tesla for local staging of prostate cancer. However, few have studied which evaluation is more accurate at 3 Tesla MRI. In this study, we compared the accuracy of local staging of prostate cancer using pelvic phased-array coil or endorectal coil MRI at 3 Tesla. Between January 2005 and May 2010, 151 patients underwent radical prostatectomy. All patients were evaluated with either pelvic phased-array coil or endorectal coil prostate MRI prior to surgery (63 endorectal coils and 88 pelvic phased-array coils). Tumor stage based on MRI was compared with pathologic stage. We calculated the specificity, sensitivity and accuracy of each group in the evaluation of extracapsular extension and seminal vesicle invasion. Both endorectal coil and pelvic phased-array coil MRI achieved high specificity, low sensitivity and moderate accuracy for the detection of extracapsular extension and seminal vesicle invasion. There were statistically no differences in specificity, sensitivity and accuracy between the two groups. Overall staging accuracy, sensitivity and specificity were not significantly different between endorectal coil and pelvic phased-array coil MRI.
Magnetic, Electrical and Dielectric Properties of LaMnO3+η Perovskite Manganite.

NASA Astrophysics Data System (ADS)

v, Punith Kumar; Dayal, Vijaylakshmi

The high pure polycrystalline LaMnO3+η perovskite manganite has been synthesized using conventional solid state reaction method. The studied sample crystallizes into orthorhombic O', phase indexed with Pbnm space group. The magnetization measurement exhibits that the studied sample shows paramagnetic (PM) to ferromagnetic (FM) phase transition at TC = 191.6K followed with a frustration due to antiferromagnetic (AFM) kind of spin ordering at low temperature, Tf = 85.8K. The electrical resistivity measurements carried out at 0 tesla and 8 tesla magnetic field exhibits insulating kind of behavior throughout the measured temperature range. The resistivity at 0 tesla exhibits low temperature FM insulator to high temperature PM insulator type phase transition at TC = 191.6K similarly as observed from magnetization measurement. The application of the magnetic field (8 tesla) shifts TC to higher temperature side and the charge transport follows Shklovskii Efros variable range hopping (SE VRH) mechanism. The temperature and frequency dependent dielectric permittivity studied for the sample exhibits relaxation process explained based on Debye +Maxwell-Wagner relaxation mechanism. Department of Atomic Energy-Board of Research in Nuclear Sciences, Government of INDIA.
Pulsatility of Lenticulostriate Arteries Assessed by 7 Tesla Flow MRI-Measurement, Reproducibility, and Applicability to Aging Effect.

PubMed

Schnerr, Roald S; Jansen, Jacobus F A; Uludag, Kamil; Hofman, Paul A M; Wildberger, Joachim E; van Oostenbrugge, Robert J; Backes, Walter H

2017-01-01

Characterization of flow properties in cerebral arteries with 1.5 and 3 Tesla MRI is usually limited to large cerebral arteries and difficult to evaluate in the small perforating arteries due to insufficient spatial resolution. In this study, we assessed the feasibility to measure blood flow waveforms in the small lenticulostriate arteries with 7 Tesla velocity-sensitive MRI. The middle cerebral artery was included as reference. Imaging was performed in five young and five old healthy volunteers. Flow was calculated by integrating time-varying velocity values over the vascular cross-section. MRI acquisitions were performed twice in each subject to determine reproducibility. From the flow waveforms, the pulsatility index and damping factor were deduced. Reproducibility values, in terms of the intraclass correlation coefficients, were found to be good to excellent. Measured pulsatility index of the lenticulostriate arteries significantly increased and damping factor significantly decreased with age. In conclusion, we demonstrate that blood flow through the lenticostriate arteries can be precisely measured using 7 Tesla MRI and reveal effects of arterial stiffness due to aging. These findings hold promise to provide relevant insights into the pathologies involving perforating cerebral arteries.
A Tesla-type repetitive nanosecond pulse generator for solid dielectric breakdown research.

PubMed

Zhao, Liang; Pan, Ya Feng; Su, Jian Cang; Zhang, Xi Bo; Wang, Li Min; Fang, Jin Peng; Sun, Xu; Lui, Rui

2013-10-01

A Tesla-type repetitive nanosecond pulse generator including a pair of electrode and a matched absorption resistor is established for the application of solid dielectric breakdown research. As major components, a built-in Tesla transformer and a gas-gap switch are designed to boost and shape the output pulse, respectively; the electrode is to form the anticipated electric field; the resistor is parallel to the electrode to absorb the reflected energy from the test sample. The parameters of the generator are a pulse width of 10 ns, a rise and fall time of 3 ns, and a maximum amplitude of 300 kV. By modifying the primary circuit of the Tesla transformer, the generator can produce both positive and negative pulses at a repetition rate of 1-50 Hz. In addition, a real-time measurement and control system is established based on the solid dielectric breakdown requirements for this generator. With this system, experiments on test samples made of common insulation materials in pulsed power systems are conducted. The preliminary experimental results show that the constructed generator is capable to research the solid dielectric breakdown phenomenon on a nanosecond time scale.
Safety of magnetic resonance imaging of stapes prostheses.

PubMed

Syms, Mark James

2005-03-01

Assess the safety of performing magnetic resonance imaging (MRI) on patients with stapes prostheses. Survey and animal model. A survey regarding implant usage, MRI procedures, and adverse outcomes after MRI in patients previously undergoing stapes procedures. Guinea pigs implanted with ferromagnetic 17 to 4 stainless steel, 316L nonferromagnetic stainless steel, titanium, and fluoroplastic stapes prostheses underwent a MRI in a 4.7 Tesla MR system. : Three adverse outcomes were reported on the clinical survey. One adverse event occurred during an MRI performed on a recalled ferromagnetic prosthesis. The other two adverse events were probably not secondary to MRI exposure. No damage or inflammation was observed in the region of the oval window or vestibule of implanted guinea pigs exposed to a 4.7 Tesla MR system. The combination of prior studies, the clinical survey, and the absence of histopathologic evidence of damage in the guinea pigs is compelling evidence that MRI for patients with stapes prostheses is safe. Implanting physicians should feel comfortable clearing a patient for a MRI in a 1.5 Tesla or 3.0 Tesla MRI. It is imperative for the physician to qualify the field strength when clearing a patient to undergo a MRI.
LU Factorization with Partial Pivoting for a Multi-CPU, Multi-GPU Shared Memory System

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kurzak, Jakub; Luszczek, Pitior; Faverge, Mathieu

2012-03-01

LU factorization with partial pivoting is a canonical numerical procedure and the main component of the High Performance LINPACK benchmark. This article presents an implementation of the algorithm for a hybrid, shared memory, system with standard CPU cores and GPU accelerators. Performance in excess of one TeraFLOPS is achieved using four AMD Magny Cours CPUs and four NVIDIA Fermi GPUs.
Superposed epoch analysis and storm statistics from 25 years of the global geomagnetic disturbance index, USGS-Dst

USGS Publications Warehouse

Gannon, J.L.

2012-01-01

Statistics on geomagnetic storms with minima below -50 nanoTesla are compiled using a 25-year span of the 1-minute resolution disturbance index, U.S. Geological Survey Dst. A sudden commencement, main phase minimum, and time between the two has a magnitude of 35 nanoTesla, -100 nanoTesla, and 12 hours, respectively, at the 50th percentile level. The cumulative distribution functions for each of these features are presented. Correlation between sudden commencement magnitude and main phase magnitude is shown to be low. Small, medium, and large storm templates at the 33rd, 50th, and 90th percentile are presented and compared to real examples. In addition, the relative occurrence of rates of change in Dst are presented.
GeNN: a code generation framework for accelerated brain simulations

NASA Astrophysics Data System (ADS)

Yavuz, Esin; Turner, James; Nowotny, Thomas

2016-01-01

Large-scale numerical simulations of detailed brain circuit models are important for identifying hypotheses on brain functions and testing their consistency and plausibility. An ongoing challenge for simulating realistic models is, however, computational speed. In this paper, we present the GeNN (GPU-enhanced Neuronal Networks) framework, which aims to facilitate the use of graphics accelerators for computational models of large-scale neuronal networks to address this challenge. GeNN is an open source library that generates code to accelerate the execution of network simulations on NVIDIA GPUs, through a flexible and extensible interface, which does not require in-depth technical knowledge from the users. We present performance benchmarks showing that 200-fold speedup compared to a single core of a CPU can be achieved for a network of one million conductance based Hodgkin-Huxley neurons but that for other models the speedup can differ. GeNN is available for Linux, Mac OS X and Windows platforms. The source code, user manual, tutorials, Wiki, in-depth example projects and all other related information can be found on the project website http://genn-team.github.io/genn/.

GeNN: a code generation framework for accelerated brain simulations.

PubMed

Yavuz, Esin; Turner, James; Nowotny, Thomas

2016-01-07

Large-scale numerical simulations of detailed brain circuit models are important for identifying hypotheses on brain functions and testing their consistency and plausibility. An ongoing challenge for simulating realistic models is, however, computational speed. In this paper, we present the GeNN (GPU-enhanced Neuronal Networks) framework, which aims to facilitate the use of graphics accelerators for computational models of large-scale neuronal networks to address this challenge. GeNN is an open source library that generates code to accelerate the execution of network simulations on NVIDIA GPUs, through a flexible and extensible interface, which does not require in-depth technical knowledge from the users. We present performance benchmarks showing that 200-fold speedup compared to a single core of a CPU can be achieved for a network of one million conductance based Hodgkin-Huxley neurons but that for other models the speedup can differ. GeNN is available for Linux, Mac OS X and Windows platforms. The source code, user manual, tutorials, Wiki, in-depth example projects and all other related information can be found on the project website http://genn-team.github.io/genn/.
GeNN: a code generation framework for accelerated brain simulations

PubMed Central

Yavuz, Esin; Turner, James; Nowotny, Thomas

2016-01-01

Large-scale numerical simulations of detailed brain circuit models are important for identifying hypotheses on brain functions and testing their consistency and plausibility. An ongoing challenge for simulating realistic models is, however, computational speed. In this paper, we present the GeNN (GPU-enhanced Neuronal Networks) framework, which aims to facilitate the use of graphics accelerators for computational models of large-scale neuronal networks to address this challenge. GeNN is an open source library that generates code to accelerate the execution of network simulations on NVIDIA GPUs, through a flexible and extensible interface, which does not require in-depth technical knowledge from the users. We present performance benchmarks showing that 200-fold speedup compared to a single core of a CPU can be achieved for a network of one million conductance based Hodgkin-Huxley neurons but that for other models the speedup can differ. GeNN is available for Linux, Mac OS X and Windows platforms. The source code, user manual, tutorials, Wiki, in-depth example projects and all other related information can be found on the project website http://genn-team.github.io/genn/. PMID:26740369
Accelerating Monte Carlo simulations of photon transport in a voxelized geometry using a massively parallel graphics processing unit.

PubMed

Badal, Andreu; Badano, Aldo

2009-11-01

It is a known fact that Monte Carlo simulations of radiation transport are computationally intensive and may require long computing times. The authors introduce a new paradigm for the acceleration of Monte Carlo simulations: The use of a graphics processing unit (GPU) as the main computing device instead of a central processing unit (CPU). A GPU-based Monte Carlo code that simulates photon transport in a voxelized geometry with the accurate physics models from PENELOPE has been developed using the CUDATM programming model (NVIDIA Corporation, Santa Clara, CA). An outline of the new code and a sample x-ray imaging simulation with an anthropomorphic phantom are presented. A remarkable 27-fold speed up factor was obtained using a GPU compared to a single core CPU. The reported results show that GPUs are currently a good alternative to CPUs for the simulation of radiation transport. Since the performance of GPUs is currently increasing at a faster pace than that of CPUs, the advantages of GPU-based software are likely to be more pronounced in the future.
Adaptive mesh fluid simulations on GPU

NASA Astrophysics Data System (ADS)

Wang, Peng; Abel, Tom; Kaehler, Ralf

2010-10-01

We describe an implementation of compressible inviscid fluid solvers with block-structured adaptive mesh refinement on Graphics Processing Units using NVIDIA's CUDA. We show that a class of high resolution shock capturing schemes can be mapped naturally on this architecture. Using the method of lines approach with the second order total variation diminishing Runge-Kutta time integration scheme, piecewise linear reconstruction, and a Harten-Lax-van Leer Riemann solver, we achieve an overall speedup of approximately 10 times faster execution on one graphics card as compared to a single core on the host computer. We attain this speedup in uniform grid runs as well as in problems with deep AMR hierarchies. Our framework can readily be applied to more general systems of conservation laws and extended to higher order shock capturing schemes. This is shown directly by an implementation of a magneto-hydrodynamic solver and comparing its performance to the pure hydrodynamic case. Finally, we also combined our CUDA parallel scheme with MPI to make the code run on GPU clusters. Close to ideal speedup is observed on up to four GPUs.
Stacked Multilayer Self-Organizing Map for Background Modeling.

PubMed

Zhao, Zhenjie; Zhang, Xuebo; Fang, Yongchun

2015-09-01

In this paper, a new background modeling method called stacked multilayer self-organizing map background model (SMSOM-BM) is proposed, which presents several merits such as strong representative ability for complex scenarios, easy to use, and so on. In order to enhance the representative ability of the background model and make the parameters learned automatically, the recently developed idea of representative learning (or deep learning) is elegantly employed to extend the existing single-layer self-organizing map background model to a multilayer one (namely, the proposed SMSOM-BM). As a consequence, the SMSOM-BM gains several merits including strong representative ability to learn background model of challenging scenarios, and automatic determination for most network parameters. More specifically, every pixel is modeled by a SMSOM, and spatial consistency is considered at each layer. By introducing a novel over-layer filtering process, we can train the background model layer by layer in an efficient manner. Furthermore, for real-time performance consideration, we have implemented the proposed method using NVIDIA CUDA platform. Comparative experimental results show superior performance of the proposed approach.
Optimizing Likelihood Models for Particle Trajectory Segmentation in Multi-State Systems.

PubMed

Young, Dylan Christopher; Scrimgeour, Jan

2018-06-19

Particle tracking offers significant insight into the molecular mechanics that govern the behav- ior of living cells. The analysis of molecular trajectories that transition between different motive states, such as diffusive, driven and tethered modes, is of considerable importance, with even single trajectories containing significant amounts of information about a molecule's environment and its interactions with cellular structures. Hidden Markov models (HMM) have been widely adopted to perform the segmentation of such complex tracks. In this paper, we show that extensive analysis of hidden Markov model outputs using data derived from multi-state Brownian dynamics simulations can be used both for the optimization of the likelihood models used to describe the states of the system and for characterization of the technique's failure mechanisms. This analysis was made pos- sible by the implementation of parallelized adaptive direct search algorithm on a Nvidia graphics processing unit. This approach provides critical information for the visualization of HMM failure and successful design of particle tracking experiments where trajectories contain multiple mobile states. © 2018 IOP Publishing Ltd.
GPU-accelerated Tersoff potentials for massively parallel Molecular Dynamics simulations

NASA Astrophysics Data System (ADS)

Nguyen, Trung Dac

2017-03-01

The Tersoff potential is one of the empirical many-body potentials that has been widely used in simulation studies at atomic scales. Unlike pair-wise potentials, the Tersoff potential involves three-body terms, which require much more arithmetic operations and data dependency. In this contribution, we have implemented the GPU-accelerated version of several variants of the Tersoff potential for LAMMPS, an open-source massively parallel Molecular Dynamics code. Compared to the existing MPI implementation in LAMMPS, the GPU implementation exhibits a better scalability and offers a speedup of 2.2X when run on 1000 compute nodes on the Titan supercomputer. On a single node, the speedup ranges from 2.0 to 8.0 times, depending on the number of atoms per GPU and hardware configurations. The most notable features of our GPU-accelerated version include its design for MPI/accelerator heterogeneous parallelism, its compatibility with other functionalities in LAMMPS, its ability to give deterministic results and to support both NVIDIA CUDA- and OpenCL-enabled accelerators. Our implementation is now part of the GPU package in LAMMPS and accessible for public use.
Parallel algorithm for solving Kepler’s equation on Graphics Processing Units: Application to analysis of Doppler exoplanet searches

NASA Astrophysics Data System (ADS)

Ford, Eric B.

2009-05-01

We present the results of a highly parallel Kepler equation solver using the Graphics Processing Unit (GPU) on a commercial nVidia GeForce 280GTX and the "Compute Unified Device Architecture" (CUDA) programming environment. We apply this to evaluate a goodness-of-fit statistic (e.g., χ2) for Doppler observations of stars potentially harboring multiple planetary companions (assuming negligible planet-planet interactions). Given the high-dimensionality of the model parameter space (at least five dimensions per planet), a global search is extremely computationally demanding. We expect that the underlying Kepler solver and model evaluator will be combined with a wide variety of more sophisticated algorithms to provide efficient global search, parameter estimation, model comparison, and adaptive experimental design for radial velocity and/or astrometric planet searches. We tested multiple implementations using single precision, double precision, pairs of single precision, and mixed precision arithmetic. We find that the vast majority of computations can be performed using single precision arithmetic, with selective use of compensated summation for increased precision. However, standard single precision is not adequate for calculating the mean anomaly from the time of observation and orbital period when evaluating the goodness-of-fit for real planetary systems and observational data sets. Using all double precision, our GPU code outperforms a similar code using a modern CPU by a factor of over 60. Using mixed precision, our GPU code provides a speed-up factor of over 600, when evaluating nsys > 1024 models planetary systems each containing npl = 4 planets and assuming nobs = 256 observations of each system. We conclude that modern GPUs also offer a powerful tool for repeatedly evaluating Kepler's equation and a goodness-of-fit statistic for orbital models when presented with a large parameter space.
A Magnetoresistive Heat Switch for the Continuous ADR

NASA Technical Reports Server (NTRS)

Canavan, E. R.; Dipirro, M. J.; Jackson, M.; Panek, J.; Shirron, P. J.; Tuttle, J. G.; Krebs, C. (Technical Monitor)

2001-01-01

In compensated elemental metals at low temperature, a several Tesla field can suppress electronic heat conduction so thoroughly that heat is effectively carried by phonons alone. In approximately one mm diameter single crystal samples with impurity concentrations low enough that electron conduction is limited by surface scattering, the ratio of zerofield to high-field thermal conductivity can exceed ten thousand. We have used this phenomenon to build a compact, solid-state heat switch with no moving parts and no enclosed fluids. The time scale for switching states is limited by time scale for charging the magnet that supplies the controlling field. Our design and fabrication techniques overcome the difficulties associated with manufacturing and assembling parts from single crystal tungsten. A clear disadvantage of the magnetoresistive switch is the mass and complexity of the magnet system for the controlling field. We have discovered a technique of minimizing this mass and complexity, applicable to the continuous adiabatic demagnetization refrigerator.
Encouraging a "Romantic Understanding" of Science: The Effect of the Nikola Tesla Story

ERIC Educational Resources Information Center

Hadzigeorgiou, Yannis; Klassen, Stephen; Klassen, Cathrine Froese

2012-01-01

The purpose of this paper is to discuss and apply the notion of romantic understanding by outlining its features and its potential role in science education, to identify its features in the story of Nikola Tesla, and to describe an empirical study conducted to determine the effect of telling such a story to Grade 9 students. Elaborated features of…
DOE Office of Scientific and Technical Information (OSTI.GOV)

Halavanau, A.; Eddy, N.; Edstrom, D.

Superconducting linacs are capable of producing intense, ultra-stable, high-quality electron beams that have widespread applications in Science and Industry. Many project are based on the 1.3-GHz TESLA-type superconducting cavity. In this paper we provide an update on a recent experiment aimed at measuring the transfer matrix of a TESLA cavity at the Fermilab Accelerator Science and Technology (FAST) facility. The results are discussed and compared with analytical and numerical simulations.
Wright Research and Development Center Test Facilities Handbook

DTIC Science & Technology

1990-01-01

Variable Temperature (2-400K) and Field (0-5 Tesla) Squid Susceptometer Variable Temperature (10-80K) and Field (0-10 Tesla) Transport Current...determine products of combustion using extraction type probes INSTRUMENTATION: Mini computer/data acquisiton system Networking provides access to larger...data recorder, Masscomp MC-500 computer with acquisition digitizer, laser and ink -jet printers,lo-pass filters, pulse code modulation AVAILABILITY
Repeatability of Brain Volume Measurements Made with the Atlas-based Method from T1-weighted Images Acquired Using a 0.4 Tesla Low Field MR Scanner.

PubMed

Goto, Masami; Suzuki, Makoto; Mizukami, Shinya; Abe, Osamu; Aoki, Shigeki; Miyati, Tosiaki; Fukuda, Michinari; Gomi, Tsutomu; Takeda, Tohoru

2016-10-11

An understanding of the repeatability of measured results is important for both the atlas-based and voxel-based morphometry (VBM) methods of magnetic resonance (MR) brain volumetry. However, many recent studies that have investigated the repeatability of brain volume measurements have been performed using static magnetic fields of 1-4 tesla, and no study has used a low-strength static magnetic field. The aim of this study was to investigate the repeatability of measured volumes using the atlas-based method and a low-strength static magnetic field (0.4 tesla). Ten healthy volunteers participated in this study. Using a 0.4 tesla magnetic resonance imaging (MRI) scanner and a quadrature head coil, three-dimensional T 1 -weighted images (3D-T 1 WIs) were obtained from each subject, twice on the same day. VBM8 software was used to construct segmented normalized images [gray matter (GM), white matter (WM), and cerebrospinal fluid (CSF) images]. The regions-of-interest (ROIs) of GM, WM, CSF, hippocampus (HC), orbital gyrus (OG), and cerebellum posterior lobe (CPL) were generated using WFU PickAtlas. The percentage change was defined as[100 × (measured volume with first segmented image - mean volume in each subject)/(mean volume in each subject)]The average percentage change was calculated as the percentage change in the 6 ROIs of the 10 subjects. The mean of the average percentage changes for each ROI was as follows: GM, 0.556%; WM, 0.324%; CSF, 0.573%; HC, 0.645%; OG, 1.74%; and CPL, 0.471%. The average percentage change was higher for the orbital gyrus than for the other ROIs. We consider that repeatability of the atlas-based method is similar between 0.4 and 1.5 tesla MR scanners. To our knowledge, this is the first report to show that the level of repeatability with a 0.4 tesla MR scanner is adequate for the estimation of brain volume change by the atlas-based method.
Hippocampal MRI volumetry at 3 Tesla: reliability and practical guidance.

PubMed

Jeukens, Cécile R L P N; Vlooswijk, Mariëlle C G; Majoie, H J Marian; de Krom, Marc C T F M; Aldenkamp, Albert P; Hofman, Paul A M; Jansen, Jacobus F A; Backes, Walter H

2009-09-01

Although volumetry of the hippocampus is considered to be an established technique, protocols reported in literature are not described in great detail. This article provides a complete and detailed protocol for hippocampal volumetry applicable to T1-weighted magnetic resonance (MR) images acquired at 3 Tesla, which has become the standard for structural brain research. The protocol encompasses T1-weighted image acquisition at 3 Tesla, anatomic guidelines for manual hippocampus delineation, requirements of delineation software, reliability measures, and criteria to assess and ensure sufficient reliability. Moreover, the validity of the correction for total intracranial volume size was critically assessed. The protocol was applied by 2 readers to the MR images of 36 patients with cryptogenic localization-related epilepsy, 4 patients with unilateral hippocampal sclerosis, and 20 healthy control subjects. The uncorrected hippocampal volumes were 2923 +/- 500 mm3 (mean +/- SD) (left) and 3120 +/- 416 mm3 (right) for the patient group and 3185 +/- 411 mm3 (left) and 3302 +/- 411 mm3 (right) for the healthy control group. The volume of the 4 pathologic hippocampi of the patients with unilateral hippocampal sclerosis was 2980 +/- 422 mm3. The inter-reader reliability values were determined: intraclass-correlation-coefficient (ICC) = 0.87 (left) and 0.86 (right), percentage volume difference (VD) = 7.0 +/- 4.7% (left) and 6.0 +/- 3.8% (right), and overlap ratio (OR) = 0.82 +/- 0.04 (left) and 0.82 +/- 0.03 (right). The positive Pearson correlation between hippocampal volume and total intracranial volume was found to be low: r = 0.48 (P = 0.03, left) and r = 0.62 (P = 0.004, right) and did not significantly reduce the volumetric variances, showing the limited benefit of the brain size correction. A protocol was described to determine hippocampal volumes based on 3 Tesla MR images with high inter-reader reliability. Although the reliability of hippocampal volumetry at 3 Tesla was similar to the literature values obtained at 1.5 Tesla, hippocampal border definition is argued to be more confident and easier because of the improved signal-to-noise characteristics.
GRay: A MASSIVELY PARALLEL GPU-BASED CODE FOR RAY TRACING IN RELATIVISTIC SPACETIMES

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chan, Chi-kwan; Psaltis, Dimitrios; Özel, Feryal

We introduce GRay, a massively parallel integrator designed to trace the trajectories of billions of photons in a curved spacetime. This graphics-processing-unit (GPU)-based integrator employs the stream processing paradigm, is implemented in CUDA C/C++, and runs on nVidia graphics cards. The peak performance of GRay using single-precision floating-point arithmetic on a single GPU exceeds 300 GFLOP (or 1 ns per photon per time step). For a realistic problem, where the peak performance cannot be reached, GRay is two orders of magnitude faster than existing central-processing-unit-based ray-tracing codes. This performance enhancement allows more effective searches of large parameter spaces when comparingmore » theoretical predictions of images, spectra, and light curves from the vicinities of compact objects to observations. GRay can also perform on-the-fly ray tracing within general relativistic magnetohydrodynamic algorithms that simulate accretion flows around compact objects. Making use of this algorithm, we calculate the properties of the shadows of Kerr black holes and the photon rings that surround them. We also provide accurate fitting formulae of their dependencies on black hole spin and observer inclination, which can be used to interpret upcoming observations of the black holes at the center of the Milky Way, as well as M87, with the Event Horizon Telescope.« less
AMITIS: A 3D GPU-Based Hybrid-PIC Model for Space and Plasma Physics

NASA Astrophysics Data System (ADS)

Fatemi, Shahab; Poppe, Andrew R.; Delory, Gregory T.; Farrell, William M.

2017-05-01

We have developed, for the first time, an advanced modeling infrastructure in space simulations (AMITIS) with an embedded three-dimensional self-consistent grid-based hybrid model of plasma (kinetic ions and fluid electrons) that runs entirely on graphics processing units (GPUs). The model uses NVIDIA GPUs and their associated parallel computing platform, CUDA, developed for general purpose processing on GPUs. The model uses a single CPU-GPU pair, where the CPU transfers data between the system and GPU memory, executes CUDA kernels, and writes simulation outputs on the disk. All computations, including moving particles, calculating macroscopic properties of particles on a grid, and solving hybrid model equations are processed on a single GPU. We explain various computing kernels within AMITIS and compare their performance with an already existing well-tested hybrid model of plasma that runs in parallel using multi-CPU platforms. We show that AMITIS runs ∼10 times faster than the parallel CPU-based hybrid model. We also introduce an implicit solver for computation of Faraday’s Equation, resulting in an explicit-implicit scheme for the hybrid model equation. We show that the proposed scheme is stable and accurate. We examine the AMITIS energy conservation and show that the energy is conserved with an error < 0.2% after 500,000 timesteps, even when a very low number of particles per cell is used.
Advances in Large Grain/Single Crystal SC Resonators at DESY

DOE Office of Scientific and Technical Information (OSTI.GOV)

W. Singer; A. Brinkmann; A. Ermakov

The main aim of the DESY large grain R&D program is to check whether this option is reasonable to apply for fabrication of ca. 1'000 XFEL cavities. Two aspects are being pursued. On one hand the basic material investigation, on the other hand the material availability, fabrication and preparation procedure. Several single cell large grain cavities of TESLA shape have been fabricated and tested. The best accelerating gradients of 41 MV/m was measured on electropolished cavity. First large grain nine-cell cavities worldwide have been produced under contract of DESY with ACCEL Instruments Co. All three cavities fulfil the XFEL specificationmore » already in first RF test after only BCP (Buffered Chemical Polishing) treatment and 800 degrees C annealing. Accelerating gradient of 27 - 29 MV/m was reached. A fabrication method of single crystal cavity of ILC like shape was proposed. A single cell single crystal cavity was build at the company ACCEL. Accelerating gradient of 37.5 MV/m reached after only 112 microns BCP and in situ baking 120 degrees C for 6 hrs with the quality factor higher as 2x1010. The developed method can be extended on fabrication of multi cell single crystal cavities.« less
In vitro assessment of MRI issues at 3-Tesla for a breast tissue expander with a remote port.

PubMed

Linnemeyer, Hannah; Shellock, Frank G; Ahn, Christina Y

2014-04-01

A patient with a breast tissue expander may require a diagnostic assessment using magnetic resonance imaging (MRI). To ensure patient safety, this type of implant must undergo in vitro MRI testing using proper techniques. Therefore, this investigation evaluated MRI issues (i.e., magnetic field interactions, heating, and artifacts) at 3-Tesla for a breast tissue expander with a remote port. A breast tissue expander with a remote port (Integra Breast Tissue Expander, Model 3612-06 with Standard Remote Port, PMT Corporation, Chanhassen, MN) underwent evaluation for magnetic field interactions (translational attraction and torque), MRI-related heating, and artifacts using standardized techniques. Heating was evaluated by placing the implant in a gelled-saline-filled phantom and MRI was performed using a transmit/receive RF body coil at an MR system reported, whole body averaged specific absorption rate of 2.9-W/kg. Artifacts were characterized using T1-weighted and GRE pulse sequences. Magnetic field interactions were not substantial and, thus, will not pose a hazard to a patient in a 3-Tesla or less MRI environment. The highest temperature rise was 1.7°C, which is physiologically inconsequential. Artifacts were large in relation to the remote port and metal connector of the implant but will only present problems if the MR imaging area of interest is where these components are located. A patient with this breast tissue expander with a remote port may safely undergo MRI at 3-Tesla or less under the conditions used for this investigation. These findings are the first reported at 3-Tesla for a tissue expander. Copyright © 2014 Elsevier Inc. All rights reserved.
Diagnosis of rotator cuff tears using 3-Tesla MRI versus 3-Tesla MRA: a systematic review and meta-analysis.

PubMed

McGarvey, Ciaran; Harb, Ziad; Smith, Christian; Houghton, Russell; Corbett, Steven; Ajuied, Adil

2016-02-01

To compare the diagnostic accuracy of magnetic resonance imaging (MRI), 2-dimensional magnetic resonance arthrogram (MRA) and 3-dimensional isotropic MRA in the diagnosis of rotator cuff tears when performed exclusively at 3-T. A systematic review was undertaken of the Cochrane, MEDLINE and PubMed databases in accordance with the PRISMA guidelines. Studies comparing 3-T MRI or 3-T MRA (index tests) to arthroscopic surgical findings (reference test) were included. Methodological appraisal was performed using QUADAS 2. Pooled sensitivity and specificity were calculated and summary receiver-operating curves generated. Kappa coefficients quantified inter-observer reliability. Fourteen studies comprising 1332 patients were identified for inclusion. Twelve studies were retrospective and there were concerns regarding index test bias and applicability in nine and six studies respectively. Reference test bias was a concern in all studies. Both 3-T MRI and 3-T MRA showed similar excellent diagnostic accuracy for full-thickness supraspinatus tears. Concerning partial-thickness supraspinatus tears, 3-T 2D MRA was significantly more sensitive (86.6 vs. 80.5 %, p = 0.014) but significantly less specific (95.2 vs. 100 %, p < 0.001). There was a trend towards greater accuracy in the diagnosis of subscapularis tears with 3-T MRA. Three-Tesla 3D isotropic MRA showed similar accuracy to 3-T conventional 2D MRA. Three-Tesla MRI appeared equivalent to 3-T MRA in the diagnosis of full- and partial-thickness tears, although there was a trend towards greater accuracy in the diagnosis of subscapularis tears with 3-T MRA. Three-Tesla 3D isotropic MRA appears equivalent to 3-T 2D MRA for all types of tears.
Time-resolved magnetic resonance angiography (MRA) at 3.0 Tesla for evaluation of hemodynamic characteristics of vascular malformations: description of distinct subgroups.

PubMed

Hammer, Simone; Uller, Wibke; Manger, Florentine; Fellner, Claudia; Zeman, Florian; Wohlgemuth, Walter A

2017-01-01

Quantitative evaluation of hemodynamic characteristics of arteriovenous and venous malformations using time-resolved magnetic resonance angiography (MRA) at 3.0 Tesla. Time-resolved MRA with interleaved stochastic trajectories (TWIST) at 3.0 Tesla was studied in 83 consecutive patients with venous malformations (VM) and arteriovenous malformations (AVM). Enhancement characteristics were calculated as percentage increase of signal intensity above baseline over time. Maximum percentage signal intensity increase (signal max ), time intervals between onset of arterial enhancement and lesion enhancement (t onset ), and time intervals between beginning of lesion enhancement and maximum percentage of lesion enhancement (t max ) were analyzed. All AVMs showed a high-flow hemodynamic pattern. Two significantly different (p < 0.001) types of venous malformations emerged: VMs with arteriovenous fistulas (AVF) (median signal max 737 %, IQR [interquartile range] = 511 - 1182 %; median t onset 5 s, IQR = 5 - 10 s; median t max 35 s, IQR = 26 - 40 s) and without AVFs (median signal max 284 %, IQR = 177-432 %; median t onset 23 s, IQR = 15 - 30 s; median t max 60 s, IQR = 55 - 75 s). Quantitative evaluation of time-resolved MRA at 3.0 Tesla provides hemodynamic characterization of vascular malformations. VMs can be subclassified into two hemodynamic subgroups due to presence or absence of AVFs. • Time-resolved MRA at 3.0 Tesla provides quantitative hemodynamic characterization of vascular malformations. • Malformations significantly differ in time courses of enhancement and signal intensity increase. • AVMs show a distinctive high-flow hemodynamic pattern. • Two significantly different types of VMs emerged: VMs with and without AVFs.

Ventricular Assist Device implant (AB 5000) prototype cannula: In vitro assessment of MRI issues at 3-Tesla

PubMed Central

Shellock, Frank G; Valencerina, Samuel

2008-01-01

Purpose To evaluate MRI issues at 3-Tesla for a ventricular assist device (VAD). Methods The AB5000 Ventricle with a prototype Nitinol wire-reinforced In-Flow Cannula and Out-Flow Cannula attached (Abiomed, Inc., Danvers, MA) was evaluated for magnetic field interactions, heating, and artifacts at 3-Tesla. MRI-related heating was assessed with the device in a gelled-saline-filled, head/torso phantom using a transmit/received RF body coil while performing MRI at a whole body averaged SAR of 3-W/kg for 15-min. Artifacts were assessed for the main metallic component of this VAD (atrial cannula) using T1-weighted, spin echo and gradient echo pulse sequences. Results The AB5000 Ventricle with the prototype In-Flow Cannula and Out-Flow Cannula attached showed relatively minor magnetic field interactions that will not cause movement in situ. Heating was not excessive (highest temperature change, +0.8°C). Artifacts may create issues for diagnostic imaging if the area of interest is in the same area or close to the implanted metallic component of this VAD (i.e., the venous cannula). Conclusion The results of this investigation demonstrated that it would be acceptable for a patient with this VAD (AB5000 Ventricle with a prototype Nitinol wire-reinforced In-Flow Cannula and Out-Flow Cannula attached) to undergo MRI at 3-Tesla or less. Notably, it is likely that the operation console for this device requires positioning a suitable distance (beyond the 100 Gauss line or in the MR control room) from the 3-Tesla MR system to ensure proper function of the VAD. PMID:18495028
Functional magnetic resonance imaging in a low-field intraoperative scanner.

PubMed

Schulder, Michael; Azmi, Hooman; Biswal, Bharat

2003-01-01

Functional magnetic resonance imaging (fMRI) has been used for preoperative planning and intraoperative surgical navigation. However, most experience to date has been with preoperative images acquired on high-field echoplanar MRI units. We explored the feasibility of acquiring fMRI of the motor cortex with a dedicated low-field intraoperative MRI (iMRI). Five healthy volunteers were scanned with the 0.12-tesla PoleStar N-10 iMRI (Odin Medical Technologies, Israel). A finger-tapping motor paradigm was performed with sequential scans, acquired alternately at rest and during activity. In addition, scans were obtained during breath holding alternating with normal breathing. The same paradigms were repeated using a 3-tesla MRI (Siemens Corp., Allandale, N.J., USA). Statistical analysis was performed offline using cross-correlation and cluster techniques. Data were resampled using the 'jackknife' process. The location, number of activated voxels and degrees of statistical significance between the two scanners were compared. With both the 0.12- and 3-tesla imagers, motor cortex activation was seen in all subjects to a significance of p < 0.02 or greater. No clustered pixels were seen outside the sensorimotor cortex. The resampled correlation coefficients were normally distributed, with a mean of 0.56 for both the 0.12- and 3-tesla scanners (standard deviations 0.11 and 0.08, respectively). The breath holding paradigm confirmed that the expected diffuse activation was seen on 0.12- and 3-tesla scans. Accurate fMRI with a low-field iMRI is feasible. Such data could be acquired immediately before or even during surgery. This would increase the utility of iMRI and allow for updated intraoperative functional imaging, free of the limitations of brain shift. Copyright 2003 S. Karger AG, Basel
A Flexible Nested Sodium and Proton Coil Array with Wideband Matching for Knee Cartilage MRI at 3 Tesla

PubMed Central

Brown, Ryan; Lakshmanan, Karthik; Madelin, Guillaume; Alon, Leeor; Chang, Gregory; Sodickson, Daniel K.; Regatte, Ravinder R.; Wiggins, Graham C.

2015-01-01

Purpose We describe a 6×2 channel sodium/proton array for knee MRI at 3 Tesla. Multi-element coil arrays are desirable because of well-known signal-to-noise ratio advantages over volume and single-element coils. However, low coil-tissue coupling that is characteristic of coils operating at low frequency can make the potential gains from a phased array difficult to realize. Methods The issue of low coil-tissue coupling in the developed six channel sodium receive array was addressed by implementing 1) a mechanically flexible former to minimize coil-to-tissue distance and reduce the overall diameter of the array and 2) a wideband matching scheme that counteracts preamplifier noise degradation caused by coil coupling and a high quality factor. The sodium array was complemented with a nested proton array to enable standard MRI. Results The wideband matching scheme and tight-fitting mechanical design contributed to greater than 30% central SNR gain on the sodium module over a mono-nuclear sodium birdcage coil, while the performance of the proton module was sufficient for clinical imaging. Conclusion We expect the strategies presented in this work to be generally relevant in high density receive arrays, particularly in x-nuclei or small animal applications, or in those where the array is distant from the targeted tissue. PMID:26502310
Three-layered radio frequency coil arrangement for sodium MRI of the human brain at 9.4 Tesla.

PubMed

Shajan, G; Mirkes, Christian; Buckenmaier, Kai; Hoffmann, Jens; Pohmann, Rolf; Scheffler, Klaus

2016-02-01

A multinuclei imaging setup with the capability to acquire both sodium ((23) Na) and proton ((1) H) signals at 9.4 Tesla is presented. The main objective was to optimize coil performance at the (23) Na frequency while still having the ability to acquire satisfactory (1) H images. The setup consisted of a combination of three radio frequency (RF) coils arranged in three layers: the innermost layer was a 27-channel (23) Na receive helmet which was surrounded by a four-channel (23) Na transceiver array. The outer layer consisted of a four-channel (1) H dipole array for B0 shimming and anatomical localization. Transmit and receive performance of the (23) Na arrays was compared to a single-tuned (23) Na birdcage resonator. While the transmit efficiency of the (23) Na transceiver array was comparable to the birdcage, the (23) Na receive array provided substantial signal-to-noise ratio (SNR) gain near the surface and comparable SNR in the center. The utility of this customized setup was demonstrated by (23) Na images of excellent quality. High SNR, efficient transmit excitation and B0 shimming capability can be achieved for (23) Na MRI at 9.4T using novel coil combination. This RF configuration is easily adaptable to other multinuclei applications at ultra high field (≥ 7T). © 2015 Wiley Periodicals, Inc.
A Novel X-ray Diffractometer for the Florida Split Coil 25 Tesla Magnet

NASA Astrophysics Data System (ADS)

Wang, Shengyu; Kovalev, Alexey; Suslov, Alexey; Siegrist, Theo

2014-03-01

At National High Magnetic Field Laboratory (NHMFL), we are developing a unique X-ray diffractometer for the 25 Tesla Florida Split Coil Magnet for scattering experiments under extremely high static magnetic fields. The X-ray source is a sealed tube (copper or molybdenum anode), connected to the magnet by an evacuated beam tunnel. The detectors are either an image plate or a silicon drift detector, with the data acquisition system based on LabVIEW. Our preliminary experimental results showed that the performance of the detector electronics and the X-ray generator is reliable in the fringe magnetic fields produced at the highest field of 25 T. Using this diffractometer, we will make measurements on standard samples, such as LaB6, Al2O3 and Si, to calibrate the diffraction system. Magnetic samples, such as single crystal HoMnO3 and stainless steel 301 alloys will be measured subsequently. The addition of X-ray diffraction to the unique split coil magnet will significantly expand the NHMFL experimental capabilities. Therefore, external users will be able to probe spin - lattice interactions at static magnetic fields up to 25T. This project is supported by NSF-DMR Award No.1257649. NHMFL is supported by NSF Cooperative Agreement No. DMR-1157490, the State of Florida, and the U.S. DoE.
Effect of low refocusing angle in T1-weighted spin echo and fast spin echo MRI on low-contrast detectability: a comparative phantom study at 1.5 and 3 Tesla.

PubMed

Sarkar, Subhendra N; Mangosing, Jason L; Sarkar, Pooja R

2013-01-01

MRI tissue contrast is not well preserved at high field. In this work, we used a phantom with known, intrinsic contrast (3.6%) for model tissue pairs to test the effects of low angle refocusing pulses and magnetization transfer from adjacent slices on intrinsic contrast at 1.5 and 3 Tesla. Only T1-weighted spin echo sequences were tested since for such sequences the contrast loss, tissue heating, and image quality degradation at high fields seem to present significant diagnostic and quality issues. We hypothesized that the sources of contrast loss could be attributed to low refocusing angles that do not fulfill the Hahn spin echo conditions or to magnetization transfer effects from adjacent slices in multislice imaging. At 1.5 T the measured contrast was 3.6% for 180° refocusing pulses and 2% for 120° pulses, while at 3 T, it was 4% for 180° and only 1% for 120° refocusing pulses. There was no significant difference between single slice and multislice imaging suggesting little or no role played by magnetization transfer in the phantom chosen. Hence, one may conclude that low angle refocusing pulses not fulfilling the Hahn spin echo conditions are primarily responsible for significant deterioration of T1-weighted spin echo image contrast in high-field MRI.
Computational Omics Pre-Awardees | Office of Cancer Clinical Proteomics Research

Cancer.gov

The National Cancer Institute's Clinical Proteomic Tumor Analysis Consortium (CPTAC) is pleased to announce the pre-awardees of the Computational Omics solicitation. Working with NVIDIA Foundation's Compute the Cure initiative and Leidos Biomedical Research Inc., the NCI, through this solicitation, seeks to leverage computational efforts to provide tools for the mining and interpretation of large-scale publicly available ‘omics’ datasets.
NDetermin: Inferring Nondeterministic Sequential Specifications for Parallelism Correctness

DTIC Science & Technology

2011-12-16

other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a...Lab affiliates National Instruments, NEC, Nokia , NVIDIA, and Samsung. NDetermin: Inferring Nondeterministic Sequential Specifications for Parallelism...concurrently update x, some of these CAS’s will fail and those parallel loop iterations will recompute their updates to x and try again. Consider the parallel
Contention Bounds for Combinations of Computation Graphs and Network Topologies

DTIC Science & Technology

2014-08-08

member of STARnet, a Semiconductor Research Corporation program sponsored by MARCO and DARPA, and ASPIRE Lab industrial sponsors and affiliates Intel...Google, Nokia, NVIDIA , Oracle, MathWorks and Samsung. Also funded by U.S. DOE Office of Science, Office of Advanced Scientific Computing Research...DARPA Award Number HR0011-12-2- 0016, the Center for Future Architecture Research, a mem- ber of STARnet, a Semiconductor Research Corporation
Assessing the MR compatibility of dental retainer wires at 7 Tesla.

PubMed

Wezel, Joep; Kooij, Bert Jan; Webb, Andrew G

2014-10-01

To determine the MR compatibility of common dental retainer wires at 7 Tesla in terms of potential RF heating and magnetic susceptibility effects. Electromagnetic simulations and experimental results were compared for dental retainer wires placed in tissue-mimicking phantoms. Simulations were then performed for a human model with wire in place. Finally, image quality was assessed for different scanning protocols and wires. Simulations and experimental data in phantoms agreed well, with the length of the wire correlating to maximum heating in phantoms being approximately 47 mm. Even in this case, no substantial heating occurs when scanning within the specific absorption rate (SAR) guidelines for the head. Image distortions from the most ferromagnetic dental wire were not significant for any brain region. Dental retainer wires appear to be MR compatible at 7 Tesla. Copyright © 2013 Wiley Periodicals, Inc.
A parallelization scheme of the periodic signals tracking algorithm for isochronous mass spectrometry on GPUs

NASA Astrophysics Data System (ADS)

Chen, R. J.; Wang, M.; Yan, X. L.; Yang, Q.; Lam, Y. H.; Yang, L.; Zhang, Y. H.

2017-12-01

The periodic signals tracking algorithm has been used to determine the revolution times of ions stored in storage rings in isochronous mass spectrometry (IMS) experiments. It has been a challenge to perform real-time data analysis by using the periodic signals tracking algorithm in the IMS experiments. In this paper, a parallelization scheme of the periodic signals tracking algorithm is introduced and a new program is developed. The computing time of data analysis can be reduced by a factor of ∼71 and of ∼346 by using our new program on Tesla C1060 GPU and Tesla K20c GPU, compared to using old program on Xeon E5540 CPU. We succeed in performing real-time data analysis for the IMS experiments by using the new program on Tesla K20c GPU.
Optimizing Approximate Weighted Matching on Nvidia Kepler K40

DOE Office of Scientific and Technical Information (OSTI.GOV)

Naim, Md; Manne, Fredrik; Halappanavar, Mahantesh

Matching is a fundamental graph problem with numerous applications in science and engineering. While algorithms for computing optimal matchings are difficult to parallelize, approximation algorithms on the other hand generally compute high quality solutions and are amenable to parallelization. In this paper, we present efficient implementations of the current best algorithm for half-approximate weighted matching, the Suitor algorithm, on Nvidia Kepler K-40 platform. We develop four variants of the algorithm that exploit hardware features to address key challenges for a GPU implementation. We also experiment with different combinations of work assigned to a warp. Using an exhaustive set ofmore » $269$ inputs, we demonstrate that the new implementation outperforms the previous best GPU algorithm by $10$ to $$100\\times$$ for over $100$ instances, and from $100$ to $$1000\\times$$ for $15$ instances. We also demonstrate up to $$20\\times$$ speedup relative to $2$ threads, and up to $$5\\times$$ relative to $16$ threads on Intel Xeon platform with $16$ cores for the same algorithm. The new algorithms and implementations provided in this paper will have a direct impact on several applications that repeatedly use matching as a key compute kernel. Further, algorithm designs and insights provided in this paper will benefit other researchers implementing graph algorithms on modern GPU architectures.« less
Quantum Chemical Calculations Using Accelerators: Migrating Matrix Operations to the NVIDIA Kepler GPU and the Intel Xeon Phi.

PubMed

Leang, Sarom S; Rendell, Alistair P; Gordon, Mark S

2014-03-11

Increasingly, modern computer systems comprise a multicore general-purpose processor augmented with a number of special purpose devices or accelerators connected via an external interface such as a PCI bus. The NVIDIA Kepler Graphical Processing Unit (GPU) and the Intel Phi are two examples of such accelerators. Accelerators offer peak performances that can be well above those of the host processor. How to exploit this heterogeneous environment for legacy application codes is not, however, straightforward. This paper considers how matrix operations in typical quantum chemical calculations can be migrated to the GPU and Phi systems. Double precision general matrix multiply operations are endemic in electronic structure calculations, especially methods that include electron correlation, such as density functional theory, second order perturbation theory, and coupled cluster theory. The use of approaches that automatically determine whether to use the host or an accelerator, based on problem size, is explored, with computations that are occurring on the accelerator and/or the host. For data-transfers over PCI-e, the GPU provides the best overall performance for data sizes up to 4096 MB with consistent upload and download rates between 5-5.6 GB/s and 5.4-6.3 GB/s, respectively. The GPU outperforms the Phi for both square and nonsquare matrix multiplications.
Wavelet-based multicomponent denoising on GPU to improve the classification of hyperspectral images

NASA Astrophysics Data System (ADS)

Quesada-Barriuso, Pablo; Heras, Dora B.; Argüello, Francisco; Mouriño, J. C.

2017-10-01

Supervised classification allows handling a wide range of remote sensing hyperspectral applications. Enhancing the spatial organization of the pixels over the image has proven to be beneficial for the interpretation of the image content, thus increasing the classification accuracy. Denoising in the spatial domain of the image has been shown as a technique that enhances the structures in the image. This paper proposes a multi-component denoising approach in order to increase the classification accuracy when a classification method is applied. It is computed on multicore CPUs and NVIDIA GPUs. The method combines feature extraction based on a 1Ddiscrete wavelet transform (DWT) applied in the spectral dimension followed by an Extended Morphological Profile (EMP) and a classifier (SVM or ELM). The multi-component noise reduction is applied to the EMP just before the classification. The denoising recursively applies a separable 2D DWT after which the number of wavelet coefficients is reduced by using a threshold. Finally, inverse 2D-DWT filters are applied to reconstruct the noise free original component. The computational cost of the classifiers as well as the cost of the whole classification chain is high but it is reduced achieving real-time behavior for some applications through their computation on NVIDIA multi-GPU platforms.
Fast parallel tandem mass spectral library searching using GPU hardware acceleration.

PubMed

Baumgardner, Lydia Ashleigh; Shanmugam, Avinash Kumar; Lam, Henry; Eng, Jimmy K; Martin, Daniel B

2011-06-03

Mass spectrometry-based proteomics is a maturing discipline of biologic research that is experiencing substantial growth. Instrumentation has steadily improved over time with the advent of faster and more sensitive instruments collecting ever larger data files. Consequently, the computational process of matching a peptide fragmentation pattern to its sequence, traditionally accomplished by sequence database searching and more recently also by spectral library searching, has become a bottleneck in many mass spectrometry experiments. In both of these methods, the main rate-limiting step is the comparison of an acquired spectrum with all potential matches from a spectral library or sequence database. This is a highly parallelizable process because the core computational element can be represented as a simple but arithmetically intense multiplication of two vectors. In this paper, we present a proof of concept project taking advantage of the massively parallel computing available on graphics processing units (GPUs) to distribute and accelerate the process of spectral assignment using spectral library searching. This program, which we have named FastPaSS (for Fast Parallelized Spectral Searching), is implemented in CUDA (Compute Unified Device Architecture) from NVIDIA, which allows direct access to the processors in an NVIDIA GPU. Our efforts demonstrate the feasibility of GPU computing for spectral assignment, through implementation of the validated spectral searching algorithm SpectraST in the CUDA environment.
GPU-powered model analysis with PySB/cupSODA.

PubMed

Harris, Leonard A; Nobile, Marco S; Pino, James C; Lubbock, Alexander L R; Besozzi, Daniela; Mauri, Giancarlo; Cazzaniga, Paolo; Lopez, Carlos F

2017-11-01

A major barrier to the practical utilization of large, complex models of biochemical systems is the lack of open-source computational tools to evaluate model behaviors over high-dimensional parameter spaces. This is due to the high computational expense of performing thousands to millions of model simulations required for statistical analysis. To address this need, we have implemented a user-friendly interface between cupSODA, a GPU-powered kinetic simulator, and PySB, a Python-based modeling and simulation framework. For three example models of varying size, we show that for large numbers of simulations PySB/cupSODA achieves order-of-magnitude speedups relative to a CPU-based ordinary differential equation integrator. The PySB/cupSODA interface has been integrated into the PySB modeling framework (version 1.4.0), which can be installed from the Python Package Index (PyPI) using a Python package manager such as pip. cupSODA source code and precompiled binaries (Linux, Mac OS/X, Windows) are available at github.com/aresio/cupSODA (requires an Nvidia GPU; developer.nvidia.com/cuda-gpus). Additional information about PySB is available at pysb.org. paolo.cazzaniga@unibg.it or c.lopez@vanderbilt.edu. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
Chondrule magnetic properties

NASA Technical Reports Server (NTRS)

Wasilewski, P. J.; Obryan, M. V.

1994-01-01

The topics discussed include the following: chondrule magnetic properties; chondrules from the same meteorite; and REM values (the ratio for remanence initially measured to saturation remanence in 1 Tesla field). The preliminary field estimates for chondrules magnetizing environments range from minimal to a least several mT. These estimates are based on REM values and the characteristics of the remanence initially measured (natural remanence) thermal demagnetization compared to the saturation remanence in 1 Tesla field demagnetization.
Assessment of MRI Issues at 3 Tesla for a New Metallic Tissue Marker

PubMed Central

Cronenweth, Charlotte M.; Shellock, Frank G.

2015-01-01

Purpose. To assess the MRI issues at 3 Tesla for a metallic tissue marker used to localize removal areas of tissue abnormalities. Materials and Methods. A newly designed, metallic tissue marker (Achieve Marker, CareFusion, Vernon Hills, IL) used to mark biopsy sites, particularly in breasts, was assessed for MRI issues which included standardized tests to determine magnetic field interactions (i.e., translational attraction and torque), MRI-related heating, and artifacts at 3 Tesla. Temperature changes were determined for the marker using a gelled-saline-filled phantom. MRI was performed at a relatively high specific absorption rate (whole body averaged SAR, 2.9-W/kg). MRI artifacts were evaluated using T1-weighted, spin echo and gradient echo pulse sequences. Results. The marker displayed minimal magnetic field interactions (2-degree deflection angle and no torque). MRI-related heating was only 0.1°C above background heating (i.e., the heating without the tissue marker present). Artifacts seen as localized signal loss were relatively small in relation to the size and shape of the marker. Conclusions. Based on the findings, the new metallic tissue marker is acceptable or “MR Conditional” (using current labeling terminology) for a patient undergoing an MRI procedure at 3 Tesla or less. PMID:26266051
Cortical Cerebral Microinfarcts on 3 Tesla MRI in Patients with Vascular Cognitive Impairment.

PubMed

Ferro, Doeschka A; van Veluw, Susanne J; Koek, Huiberdina L; Exalto, Lieza G; Biessels, Geert Jan

2017-01-01

Cerebral microinfarcts (CMIs) are small ischemic lesions that are a common neuropathological finding in patients with stroke or dementia. CMIs in the cortex can now be detected in vivo on 3 Tesla MRI. To determine the occurrence of CMIs and associated clinical features in patients with possible vascular cognitive impairment (VCI). 182 memory-clinic patients (mean age 71.4±10.6, 55% male) with vascular injury on brain MRI (i.e., possible VCI) underwent a standardized work-up including 3 Tesla MRI and cognitive assessment. A control group consisted of 70 cognitively normal subjects (mean age 70.6±4.7, 60% male). Cortical CMIs and other neuroimaging markers of vascular brain injury were rated according to established criteria. Occurrence of CMIs was higher (20%) in patients compared to controls (10%). Among patients, the presence of CMIs was associated with male sex, history of stroke, infarcts, and white matter hyperintensities. CMI presence was also associated with a diagnosis of vascular dementia and reduced performance in multiple cognitive domains. CMIs on 3 Tesla MRI are common in patients with possible VCI and co-occur with imaging markers of small and large vessel disease, likely reflecting a heterogeneous etiology. CMIs are associated with worse cognitive performance, independent of other markers of vascular brain injury.
In vivo functional connectome of human brainstem nuclei of the ascending arousal, autonomic, and motor systems by high spatial resolution 7-Tesla fMRI.

PubMed

Bianciardi, Marta; Toschi, Nicola; Eichner, Cornelius; Polimeni, Jonathan R; Setsompop, Kawin; Brown, Emery N; Hämäläinen, Matti S; Rosen, Bruce R; Wald, Lawrence L

2016-06-01

Our aim was to map the in vivo human functional connectivity of several brainstem nuclei with the rest of the brain by using seed-based correlation of ultra-high magnetic field functional magnetic resonance imaging (fMRI) data. We used the recently developed template of 11 brainstem nuclei derived from multi-contrast structural MRI at 7 Tesla as seed regions to determine their connectivity to the rest of the brain. To achieve this, we used the increased contrast-to-noise ratio of 7-Tesla fMRI compared with 3 Tesla and time-efficient simultaneous multi-slice imaging to cover the brain with high spatial resolution (1.1-mm isotropic nominal resolution) while maintaining a short repetition time (2.5 s). The delineated Pearson's correlation-based functional connectivity diagrams (connectomes) of 11 brainstem nuclei of the ascending arousal, motor, and autonomic systems from 12 controls are presented and discussed in the context of existing histology and animal work. Considering that the investigated brainstem nuclei play a crucial role in several vital functions, the delineated preliminary connectomes might prove useful for future in vivo research and clinical studies of human brainstem function and pathology, including disorders of consciousness, sleep disorders, autonomic disorders, Parkinson's disease, and other motor disorders.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yamada, R.; Ambrosio, G.; Barzi, E.

The design study of the block type 15-Tesla RHQT Nb{sub 3}Al dipole magnet, and its merits over Nb{sub 3}Sn magnets are presented. The copper stabilized RHQT Nb{sub 3}Al strand is now becoming commercially available for the application to the accelerator magnets. A 1 mm diameter RHQT Nb{sub 3}Al strand with filament size about 50 {mu}, non-copper Jc about 1000 A/mm{sup 2} at 15 Tesla at 4.2K, copper ratio of 50%, can now be produced over several hundred meters. The stress and strain characteristics of the Nb{sub 3}Al strand are superior to the Nb{sub 3}Sn strand. Another advantage is that itmore » can tolerate a longitudinal strain up to 0.55%. The RHQT Nb{sub 3}Al Rutherford cable will have less chance of contamination of the stabilizer, compared to Nb{sub 3}Sn cable. These characteristics of the RHQT Nb{sub 3}Al will be beneficial for designing and producing 15-Tesla dipole magnets. An example 15-Tesla magnet cross section, utilizing the RHQT Nb{sub 3}Sn strand is presented. A systematic investigation on RHQT Nb{sub 3}Al strands, its Rutherford cables, and building a small racetrack magnet for cable testing are proposed.« less
A semi-automated algorithm for hypothalamus volumetry in 3 Tesla magnetic resonance images.

PubMed

Wolff, Julia; Schindler, Stephanie; Lucas, Christian; Binninger, Anne-Sophie; Weinrich, Luise; Schreiber, Jan; Hegerl, Ulrich; Möller, Harald E; Leitzke, Marco; Geyer, Stefan; Schönknecht, Peter

2018-07-30

The hypothalamus, a small diencephalic gray matter structure, is part of the limbic system. Volumetric changes of this structure occur in psychiatric diseases, therefore there is increasing interest in precise volumetry. Based on our detailed volumetry algorithm for 7 Tesla magnetic resonance imaging (MRI), we developed a method for 3 Tesla MRI, adopting anatomical landmarks and work in triplanar view. We overlaid T1-weighted MR images with gray matter-tissue probability maps to combine anatomical information with tissue class segmentation. Then, we outlined regions of interest (ROIs) that covered potential hypothalamus voxels. Within these ROIs, seed growing technique helped define the hypothalamic volume using gray matter probabilities from the tissue probability maps. This yielded a semi-automated method with short processing times of 20-40 min per hypothalamus. In the MRIs of ten subjects, reliabilities were determined as intraclass correlations (ICC) and volume overlaps in percent. Three raters achieved very good intra-rater reliabilities (ICC 0.82-0.97) and good inter-rater reliabilities (ICC 0.78 and 0.82). Overlaps of intra- and inter-rater runs were very good (≥ 89.7%). We present a fast, semi-automated method for in vivo hypothalamus volumetry in 3 Tesla MRI. Copyright © 2018 Elsevier B.V. All rights reserved.
First Signal on the Cryogenic Fourier-Transform Ion Cyclotron Resonance Mass Spectrometer

PubMed Central

Lin, Cheng; Mathur, Raman; Aizikov, Kostantin; O'Connor, Peter B.

2009-01-01

The construction and achievement of the first signal on a cryogenic Fourier-transform ion cyclotron resonance mass spectrometer (FT-ICR-MS) are reported here, demonstrating proof-of-concept of this new instrument design. Building the FTICR cell into the cold bore of a superconducting magnet provided advantages over conventional warm bore design. At 4.2 K, the vacuum system cryopumps itself, thus removing the requirement for a large bore to achieve the desired pumping speed for maintaining base pressure. Furthermore, because the bore diameter has been reduced, the amount of magnet wire needed to achieve high field and homogeneity was also reduced, greatly decreasing the cost/Tesla of the magnet. The current instrument implements an actively shielded 14-Tesla magnet of vertical design with an external matrix assisted laser desorption/ionization (MALDI) source. The first signal was obtained by detecting the laser desorbed/ionized (LDI) C60+• ions, with the magnet at 7 Tesla, unshimmed, and the preamplifier mounted outside of the vacuum chamber at room temperature. A subsequent experiment done with the magnet at 14 Tesla and properly shimmed produced a C60 spectrum showing ∼350,000 resolving power at m/z ∼720. Increased magnetic field strength improves many FTMS performance parameters simultaneously, particularly mass resolving power and accuracy. PMID:17931882
Comparison of Pelvic Phased-Array versus Endorectal Coil Magnetic Resonance Imaging at 3 Tesla for Local Staging of Prostate Cancer

PubMed Central

Kim, Bum Soo; Kim, Tae-Hwan; Kwon, Tae Gyun

2012-01-01

Purpose Several studies have demonstrated the superiority of endorectal coil magnetic resonance imaging (MRI) over pelvic phased-array coil MRI at 1.5 Tesla for local staging of prostate cancer. However, few have studied which evaluation is more accurate at 3 Tesla MRI. In this study, we compared the accuracy of local staging of prostate cancer using pelvic phased-array coil or endorectal coil MRI at 3 Tesla. Materials and Methods Between January 2005 and May 2010, 151 patients underwent radical prostatectomy. All patients were evaluated with either pelvic phased-array coil or endorectal coil prostate MRI prior to surgery (63 endorectal coils and 88 pelvic phased-array coils). Tumor stage based on MRI was compared with pathologic stage. We calculated the specificity, sensitivity and accuracy of each group in the evaluation of extracapsular extension and seminal vesicle invasion. Results Both endorectal coil and pelvic phased-array coil MRI achieved high specificity, low sensitivity and moderate accuracy for the detection of extracapsular extension and seminal vesicle invasion. There were statistically no differences in specificity, sensitivity and accuracy between the two groups. Conclusion Overall staging accuracy, sensitivity and specificity were not significantly different between endorectal coil and pelvic phased-array coil MRI. PMID:22476999
Aircraft Pilot Situational Awareness Interface for Airborne Operation of Network Controlled Unmanned Systems (US)

DTIC Science & Technology

2008-03-01

operator, can be operated autonomously or remotely, can be expendable or recoverable, and can carry a lethal or nonlethal payload. Ballistic or semi ...states that vehicles should be recoverable, and that ballistic or semi - ballistic vehicles, cruise missiles, and artillery projectiles are not considered...2007-2032. 32 Nicola Tesla and his telautomatons (robots); Tesla further demonstrated remote control of objects by wireless in an exhibition in 1898
Two-Layer 16 T Cos θ Dipole Design for the FCC

DOE PAGES

Holik, Eddie Frank; Ambrosio, Giorgio; Apollinari, Giorgio

2018-02-22

Here, the Future Circular Collider or FCC is a study aimed at exploring the possibility to reach 100 TeV total collision energy which would require 16 tesla dipoles. Upon the conclusion of the High Luminosity Upgrade, the US LHC Accelerator Upgrade Pro-ject in collaboration with CERN will have extensive Nb 3Sn magnet fabrication experience. This experience includes robust Nb 3Sn conductor and insulation scheming, 2-layer cos2θ coil fabrication, and bladder-and-key structure and assembly. By making im-provements and modification to existing technology the feasibility of a two-layer 16 tesla dipole is investigated. Preliminary designs indicate that fields up to 16.6 teslamore » are feasible with conductor grading while satisfying the HE-LHC and FCC specifications. Key challenges include accommodating high-aspect ratio conductor, narrow wedge design, Nb 3Sn conductor grading, and especially quench protection of a 16 tesla device.« less
Two-Layer 16 T Cos θ Dipole Design for the FCC

DOE Office of Scientific and Technical Information (OSTI.GOV)

Holik, Eddie Frank; Ambrosio, Giorgio; Apollinari, Giorgio

Here, the Future Circular Collider or FCC is a study aimed at exploring the possibility to reach 100 TeV total collision energy which would require 16 tesla dipoles. Upon the conclusion of the High Luminosity Upgrade, the US LHC Accelerator Upgrade Pro-ject in collaboration with CERN will have extensive Nb 3Sn magnet fabrication experience. This experience includes robust Nb 3Sn conductor and insulation scheming, 2-layer cos2θ coil fabrication, and bladder-and-key structure and assembly. By making im-provements and modification to existing technology the feasibility of a two-layer 16 tesla dipole is investigated. Preliminary designs indicate that fields up to 16.6 teslamore » are feasible with conductor grading while satisfying the HE-LHC and FCC specifications. Key challenges include accommodating high-aspect ratio conductor, narrow wedge design, Nb 3Sn conductor grading, and especially quench protection of a 16 tesla device.« less
A 5.9 tesla conduction-cooled coil composed of a stack of four single pancakes wound with YBCO wide tapes

NASA Astrophysics Data System (ADS)

Iwai, Sadanori; Miyazaki, Hiroshi; Tosaka, Taizo; Tasaki, Kenji; Urata, Masami; Ioka, Shigeru; Ishii, Yusuke

2013-11-01

We have been developing a conduction-cooled coil wound with YBCO-coated conductors for HTS applications. Previously, we have fabricated a coil composed of a stack of 12 single pancakes wound with 4 mm-wide YBCO tapes. This coil had a central magnetic field as high as 5.1 T at 10 K under conduction-cooled conditions. In the present study, we fabricated and tested a coil composed of a stack of four single pancakes wound with 12 mm-wide YBCO tapes. The total size of the coil and the Jc value of the tapes were almost the same as those of the former coil. At 77 K, the voltage-current characteristics showed a high n-value of 24, confirming that the coil had no degradation. Furthermore, in a conduction-cooled configuration at 20 K to 60 K, the coil showed a high n-value of over 20. At 20 K, the central magnetic field reached 5.9 T at 903 A, which is 1.3-times higher than that of the former coil.
Determination of NAD + and NADH level in a Single Cell Under H 2O 2 Stress by Capillary Electrophoresis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Xi, Wenjun

2008-01-01

A capillary electrophoresis (CE) method is developed to determine both NAD + and NADH levels in a single cell, based on an enzymatic cycling reaction. The detection limit can reach down to 0.2 amol NAD + and 1 amol NADH on a home-made CE-LIF setup. The method showed good reproducibility and specificity. After an intact cell was injected into the inlet of a capillary and lysed using a Tesla coil, intracellular NAD + and NADH were separated, incubated with the cycling buffer, and quantified by the amount of fluorescent product generated. NADH and NAD + levels of single cells ofmore » three cell lines and primary astrocyte culture were determined using this method. Comparing cellular NAD + and NADH levels with and without exposure to oxidative stress induced by H 2O 2, it was found that H9c2 cells respond to the stress by reducing both cellular NAD + and NADH levels, while astrocytes respond by increasing cellular NADH/NAD + ratio.« less
Optically programmable electron spin memory using semiconductor quantum dots.

PubMed

Kroutvar, Miro; Ducommun, Yann; Heiss, Dominik; Bichler, Max; Schuh, Dieter; Abstreiter, Gerhard; Finley, Jonathan J

2004-11-04

The spin of a single electron subject to a static magnetic field provides a natural two-level system that is suitable for use as a quantum bit, the fundamental logical unit in a quantum computer. Semiconductor quantum dots fabricated by strain driven self-assembly are particularly attractive for the realization of spin quantum bits, as they can be controllably positioned, electronically coupled and embedded into active devices. It has been predicted that the atomic-like electronic structure of such quantum dots suppresses coupling of the spin to the solid-state quantum dot environment, thus protecting the 'spin' quantum information against decoherence. Here we demonstrate a single electron spin memory device in which the electron spin can be programmed by frequency selective optical excitation. We use the device to prepare single electron spins in semiconductor quantum dots with a well defined orientation, and directly measure the intrinsic spin flip time and its dependence on magnetic field. A very long spin lifetime is obtained, with a lower limit of about 20 milliseconds at a magnetic field of 4 tesla and at 1 kelvin.
High-Performance Analysis of Filtered Semantic Graphs

DTIC Science & Technology

2012-05-06

any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a...observation that explains why SEJITS+KDT performance is so close to CombBLAS performance in practice (as shown later in Section 7) even though its in-core...NEC, Nokia , NVIDIA, Oracle, and Samsung. This research used resources of the National Energy Research Sci- entific Computing Center, which is
Dataflow-Based Implementation of Layered Sensing Applications on High-Performance Embedded Processors

DTIC Science & Technology

2013-03-01

time (milliseconds) GFlops Comparison to GPU peak performance (%) Cascade Gaussian Filtering 13 45.19 6.3 Difference of Gaussian 0.512 152...values for the GPU-targeted actor implementations in terms of Giga Floating Point Operations Per Second ( GFLOPS ). Our GFLOPS calculation for an actor...kernels. The results for GFLOPS are provided in Table . The actors were implemented on an NVIDIA GTX260 GPU, which provides 715 GFLOPS as peak
Pathological and 3 Tesla Volumetric Magnetic Resonance Imaging Predictors of Biochemical Recurrence after Robotic Assisted Radical Prostatectomy: Correlation with Whole Mount Histopathology.

PubMed

Tan, Nelly; Shen, Luyao; Khoshnoodi, Pooria; Alcalá, Héctor E; Yu, Weixia; Hsu, William; Reiter, Robert E; Lu, David Y; Raman, Steven S

2018-05-01

We sought to identify the clinical and magnetic resonance imaging variables predictive of biochemical recurrence after robotic assisted radical prostatectomy in patients who underwent multiparametric 3 Tesla prostate magnetic resonance imaging. We performed an institutional review board approved, HIPAA (Health Insurance Portability and Accountability Act) compliant, single arm observational study of 3 Tesla multiparametric magnetic resonance imaging prior to robotic assisted radical prostatectomy from December 2009 to March 2016. Clinical, magnetic resonance imaging and pathological information, and clinical outcomes were compiled. Biochemical recurrence was defined as prostate specific antigen 0.2 ng/cc or greater. Univariate and multivariate regression analysis was performed. Biochemical recurrence had developed in 62 of the 255 men (24.3%) included in the study at a median followup of 23.5 months. Compared to the subcohort without biochemical recurrence the subcohort with biochemical recurrence had a greater proportion of patients with a high grade biopsy Gleason score, higher preoperative prostate specific antigen (7.4 vs 5.6 ng/ml), intermediate and high D'Amico classifications, larger tumor volume on magnetic resonance imaging (0.66 vs 0.30 ml), higher PI-RADS® (Prostate Imaging-Reporting and Data System) version 2 category lesions, a greater proportion of intermediate and high grade radical prostatectomy Gleason score lesions, higher pathological T3 stage (all p <0.01) and a higher positive surgical margin rate (19.3% vs 7.8%, p = 0.016). On multivariable analysis only tumor volume on magnetic resonance imaging (adjusted OR 1.57, p = 0.016), pathological T stage (adjusted OR 2.26, p = 0.02), positive surgical margin (adjusted OR 5.0, p = 0.004) and radical prostatectomy Gleason score (adjusted OR 2.29, p = 0.004) predicted biochemical recurrence. In this cohort tumor volume on magnetic resonance imaging and pathological variables, including Gleason score, staging and positive surgical margins, significantly predicted biochemical recurrence. This suggests an important new imaging biomarker. Copyright © 2018 American Urological Association Education and Research, Inc. Published by Elsevier Inc. All rights reserved.
MRI issues for ballistic objects: information obtained at 1.5-, 3- and 7-Tesla.

PubMed

Dedini, Russell D; Karacozoff, Alexandra M; Shellock, Frank G; Xu, Duan; McClellan, R Trigg; Pekmezci, Murat

2013-07-01

Few studies exist for magnetic resonance imaging (MRI) issues and ballistics, and there are no studies addressing movement, heating, and artifacts associated with ballistics at 3-tesla (T). Movement because of magnetic field interactions and radiofrequency (RF)-induced heating of retained bullets may injure nearby critical structures. Artifacts may also interfere with the diagnostic use of MRI. To investigate these potential hazards of MRI on a sample of bullets and shotgun pellets. Laboratory investigation, ex vivo. Thirty-two different bullets and seven different shotgun pellets, commonly encountered in criminal trauma, were assessed relative to 1.5-, 3-, and 7-T magnetic resonance systems. Magnetic field interactions, including translational attraction and torque, were measured. A representative sample of five bullets were then tested for magnetic field interactions, RF-induced heating, and the generation of artifacts at 3-T. At all static magnetic field strengths, non-steel-containing bullets and pellets exhibited no movement, whereas one steel core bullet and two steel pellets exhibited movement in excess of what might be considered safe for patients in MRI at 1.5-, 3- and 7-Tesla. At 3-T, the maximum temperature increase of five bullets tested was 1.7°C versus background heating of 1.5°C. Of five bullets tested for artifacts, those without a steel core exhibited small signal voids, whereas a single steel core bullet exhibited a very large signal void. Ballistics made of lead with copper or alloy jackets appear to be safe with respect to MRI-related movement at 1.5-, 3-, and 7-T static magnetic fields, whereas ballistics containing steel may pose a danger if near critical body structures because of strong magnetic field interactions. Temperature increases of selected ballistics during 3-T MRI was not clinically significant, even for the ferromagnetic projectiles. Finally, ballistics containing steel generated larger artifacts when compared with ballistics made of lead with copper and alloy jackets and may impair the diagnostic use of MRI. Copyright © 2013 Elsevier Inc. All rights reserved.
Use of a 1.0 Tesla open scanner for evaluation of pediatric and congenital heart disease: a retrospective cohort study.

PubMed

Lu, Jimmy C; Nielsen, James C; Morowitz, Layne; Musani, Muzammil; Ghadimi Mahani, Maryam; Agarwal, Prachi P; Ibrahim, El-Sayed H; Dorfman, Adam L

2015-05-25

Open cardiovascular magnetic resonance (CMR) scanners offer the potential for imaging patients with claustrophobia or large body size, but at a lower 1.0 Tesla magnetic field. This study aimed to evaluate the efficacy of open CMR for evaluation of pediatric and congenital heart disease. This retrospective, cross-sectional study included all patients ≤18 years old or with congenital heart disease who underwent CMR on an open 1.0 Tesla scanner at two centers from 2012-2014. Indications for CMR and clinical questions were extracted from the medical record. Studies were qualitatively graded for image quality and diagnostic utility. In a subset of 25 patients, signal-to-noise (SNR) and contrast-to-noise (CNR) ratios were compared to size- and diagnosis-matched patients with CMR on a 1.5 Tesla scanner. A total of 65 patients (median 17.3 years old, 60% male) were included. Congenital heart disease was present in 32 (50%), with tetralogy of Fallot and bicuspid aortic valve the most common diagnoses. Open CMR was used due to scheduling/equipment issues in 51 (80%), claustrophobia in 7 (11%), and patient size in 3 (5%); 4 patients with claustrophobia had failed CMR on a different scanner, but completed the study on open CMR without sedation. All patients had good or excellent image quality on black blood, phase contrast, magnetic resonance angiography, and late gadolinium enhancement imaging. There was below average image quality in 3/63 (5%) patients with cine images, and 4/15 (27%) patients with coronary artery imaging. SNR and CNR were decreased in cine and magnetic resonance angiography images compared to 1.5 Tesla. The clinical question was answered adequately in all but 2 patients; 1 patient with a Fontan had artifact from an embolization coil limiting RV volume analysis, and in 1 patient the right coronary artery origin was not well seen. Open 1.0 Tesla scanners can effectively evaluate pediatric and congenital heart disease, including patients with claustrophobia and larger body size. Despite minor artifacts and differences in SNR and CNR, the majority of clinical questions can be answered adequately, with some limitations with coronary artery imaging. Further evaluation is necessary to optimize protocols and image quality.
Land and Undersea Field Testing of Very Low Frequency RF Antennas and Loop Transceivers

DTIC Science & Technology

2017-12-01

VLF RF HARDWARE: SSC PACIFIC LOOP ANTENNAS ........................................... 4 2.3 EXPERIMENTAL CONCEPT...2.3 EXPERIMENTAL CONCEPT Figure 5 shows a drawing of a typical transmit/receive scenario. Each of the WFS units and loop antennas can both transmit...kilohertz is around 20 fT/root(Hz). One femtoTesla (fT) is equal to 10-15 Tesla. Our derived value is close to the 30 fT/root(Hz) value experimentally
Cost Analysis of Utilizing Electric Vehicles and Photovoltaic Solar Energy in the United States Marine Corps Commercial Vehicle Fleet

DTIC Science & Technology

2009-12-01

vehicles so do some electric vehicle braking systems (MIT, 2008). e. Brakes Regenerative braking on electric vehicles recoups some of the energy lost...engine is required to replace the energy lost by braking . Regenerative braking takes some of the lost energy during braking and turns it into...Motors and Tesla Motors offer regenerative breaking in their respective electric vehicles. Tesla explains regenerative braking as “engine braking
Perforating arteries originating from the posterior communicating artery: a 7.0-Tesla MRI study.

PubMed

Conijn, Mandy M A; Hendrikse, Jeroen; Zwanenburg, Jaco J M; Takahara, Taro; Geerlings, Mirjam I; Mali, Willem P Th M; Luijten, Peter R

2009-12-01

The aim of this study was to investigate the ability of time-of-flight (TOF) magnetic resonance (MR) angiography at 7.0 Tesla to show the perforating branches of the posterior communicating artery (PCoA), and to investigate the presence of such visible perforating branches in relation to the size of the feeding PCoA. The secondary aim was to visualise and describe the anterior choroidal artery and the perforating branches of the P1-segment of posterior cerebral artery (P1). Forty-six healthy volunteers underwent TOF MR angiography at 7.0 Tesla. With 7.0-Tesla imaging, we visualised for the first time perforating arteries originating from the PCoA in vivo without the use of contrast agents. A perforating artery from the PCoA was found in a large proportion of the PCoAs (64%). The presence was associated with a larger diameter of the underlying PCoA (1.23 versus 1.06 mm, P = 0.03). The anterior choroidal artery was visible bilaterally in all participants. In 83% of all P1s, one or two perforating branches were visible. Non-invasive assessment of the perforating arteries of the PCoA together with the anterior choroidal artery and the perforating arteries of the P1 may increase our understanding of infarcts in the deep brain structures supplied by these arteries.
Is the Ellipsoid Formula the New Standard for 3-Tesla MRI Prostate Volume Calculation without Endorectal Coil?

PubMed

Haas, Matthias; Günzel, Karsten; Miller, Kurt; Hamm, Bernd; Cash, Hannes; Asbach, Patrick

2017-01-01

Prostate volume in multiparametric MRI (mpMRI) is of clinical importance. For 3-Tesla mpMRI without endorectal coil, there is no distinctive standard for volume calculation. We tested the accuracy of the ellipsoid formula with planimetric volume measurements as reference and investigated the correlation of gland volume and cancer detection rate on MRI/ultrasound (MRI/US) fusion-guided biopsy. One hundred forty-three patients with findings on 3-Tesla mpMRI suspicious of cancer and subsequent MRI/US fusion-guided targeted biopsy and additional systematic biopsy were analyzed. T2-weighted images were used for measuring the prostate diameters and for planimetric volume measurement by a segmentation software. Planimetric and calculated prostate volumes were compared with clinical data. The median prostate volume was 48.1 ml (interquartile range (IQR) 36.9-62.1 ml). Volume calculated by the ellipsoid formula showed a strong concordance with planimetric volume, with a tendency to underestimate prostate volume (median volume 43.1 ml (IQR 31.2-58.8 ml); r = 0.903, p < 0.001). There was a moderate, significant inverse correlation of prostate volume to a positive biopsy result (r = -0.24, p = 0.004). The ellipsoid formula gives sufficient approximation of prostate volume on 3-Tesla mpMRI without endorectal coil. It allows a fast, valid volume calculation in prostate MRI datasets. © 2016 S. Karger AG, Basel.
Comparison of radiofrequency body coils for MRI at 3 Tesla: a simulation study using parallel transmission on various anatomical targets

PubMed Central

Wu, Xiaoping; Zhang, Xiaotong; Tian, Jinfeng; Schmitter, Sebastian; Hanna, Brian; Strupp, John; Pfeuffer, Josef; Hamm, Michael; Wang, Dingxin; Nistler, Juergen; He, Bin; Vaughan, J. Thomas; Ugurbil, Kamil; Van de Moortele, Pierre-Francois

2015-01-01

The performance of multichannel transmit coil layouts and parallel transmission (pTx) radiofrequency (RF) pulse design was evaluated with respect to transmit B1 (B1+) homogeneity and Specific Absorption Rate (SAR) at 3 Tesla for a whole body coil. Five specific coils were modeled and compared: a 32-rung birdcage body coil (driven either in a fixed quadrature mode or a two-channel transmit mode), two single-ring stripline arrays (with either 8 or 16 elements), and two multi-ring stripline arrays (with 2 or 3 identical rings, stacked in the z-axis and each comprising eight azimuthally distributed elements). Three anatomical targets were considered, each defined by a 3D volume representative of a meaningful region of interest (ROI) in routine clinical applications. For a given anatomical target, global or local SAR controlled pTx pulses were designed to homogenize RF excitation within the ROI. At the B1+ homogeneity achieved by the quadrature driven birdcage design, pTx pulses with multichannel transmit coils achieved up to ~8 fold reduction in local and global SAR. When used for imaging head and cervical spine or imaging thoracic spine, the double-ring array outperformed all coils including the single-ring arrays. While the advantage of the double-ring array became much less pronounced for pelvic imaging with a substantially larger ROI, the pTx approach still provided significant gains over the quadrature birdcage coil. For all design scenarios, using the 3-ring array did not necessarily improve the RF performance. Our results suggest that pTx pulses with multichannel transmit coils can reduce local and global SAR substantially for body coils while attaining improved B1+ homogeneity, particularly for a “z-stacked” double-ring design with coil elements arranged on two transaxial rings. PMID:26332290

CUDA programs for the GPU computing of the Swendsen-Wang multi-cluster spin flip algorithm: 2D and 3D Ising, Potts, and XY models

NASA Astrophysics Data System (ADS)

Komura, Yukihiro; Okabe, Yutaka

2014-03-01

We present sample CUDA programs for the GPU computing of the Swendsen-Wang multi-cluster spin flip algorithm. We deal with the classical spin models; the Ising model, the q-state Potts model, and the classical XY model. As for the lattice, both the 2D (square) lattice and the 3D (simple cubic) lattice are treated. We already reported the idea of the GPU implementation for 2D models (Komura and Okabe, 2012). We here explain the details of sample programs, and discuss the performance of the present GPU implementation for the 3D Ising and XY models. We also show the calculated results of the moment ratio for these models, and discuss phase transitions. Catalogue identifier: AERM_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AERM_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 5632 No. of bytes in distributed program, including test data, etc.: 14688 Distribution format: tar.gz Programming language: C, CUDA. Computer: System with an NVIDIA CUDA enabled GPU. Operating system: System with an NVIDIA CUDA enabled GPU. Classification: 23. External routines: NVIDIA CUDA Toolkit 3.0 or newer Nature of problem: Monte Carlo simulation of classical spin systems. Ising, q-state Potts model, and the classical XY model are treated for both two-dimensional and three-dimensional lattices. Solution method: GPU-based Swendsen-Wang multi-cluster spin flip Monte Carlo method. The CUDA implementation for the cluster-labeling is based on the work by Hawick et al. [1] and that by Kalentev et al. [2]. Restrictions: The system size is limited depending on the memory of a GPU. Running time: For the parameters used in the sample programs, it takes about a minute for each program. Of course, it depends on the system size, the number of Monte Carlo steps, etc. References: [1] K.A. Hawick, A. Leist, and D. P. Playne, Parallel Computing 36 (2010) 655-678 [2] O. Kalentev, A. Rai, S. Kemnitzb, and R. Schneider, J. Parallel Distrib. Comput. 71 (2011) 615-620
A GPU OpenCL based cross-platform Monte Carlo dose calculation engine (goMC).

PubMed

Tian, Zhen; Shi, Feng; Folkerts, Michael; Qin, Nan; Jiang, Steve B; Jia, Xun

2015-10-07

Monte Carlo (MC) simulation has been recognized as the most accurate dose calculation method for radiotherapy. However, the extremely long computation time impedes its clinical application. Recently, a lot of effort has been made to realize fast MC dose calculation on graphic processing units (GPUs). However, most of the GPU-based MC dose engines have been developed under NVidia's CUDA environment. This limits the code portability to other platforms, hindering the introduction of GPU-based MC simulations to clinical practice. The objective of this paper is to develop a GPU OpenCL based cross-platform MC dose engine named goMC with coupled photon-electron simulation for external photon and electron radiotherapy in the MeV energy range. Compared to our previously developed GPU-based MC code named gDPM (Jia et al 2012 Phys. Med. Biol. 57 7783-97), goMC has two major differences. First, it was developed under the OpenCL environment for high code portability and hence could be run not only on different GPU cards but also on CPU platforms. Second, we adopted the electron transport model used in EGSnrc MC package and PENELOPE's random hinge method in our new dose engine, instead of the dose planning method employed in gDPM. Dose distributions were calculated for a 15 MeV electron beam and a 6 MV photon beam in a homogenous water phantom, a water-bone-lung-water slab phantom and a half-slab phantom. Satisfactory agreement between the two MC dose engines goMC and gDPM was observed in all cases. The average dose differences in the regions that received a dose higher than 10% of the maximum dose were 0.48-0.53% for the electron beam cases and 0.15-0.17% for the photon beam cases. In terms of efficiency, goMC was ~4-16% slower than gDPM when running on the same NVidia TITAN card for all the cases we tested, due to both the different electron transport models and the different development environments. The code portability of our new dose engine goMC was validated by successfully running it on a variety of different computing devices including an NVidia GPU card, two AMD GPU cards and an Intel CPU processor. Computational efficiency among these platforms was compared.
Magnetic resonance imaging investigation of the bone conduction implant – a pilot study at 1.5 Tesla

PubMed Central

Jansson, Karl-Johan Fredén; Håkansson, Bo; Reinfeldt, Sabine; Rigato, Cristina; Eeg-Olofsson, Måns

2015-01-01

Purpose The objective of this pilot study was to investigate if an active bone conduction implant (BCI) used in an ongoing clinical study withstands magnetic resonance imaging (MRI) of 1.5 Tesla. In particular, the MRI effects on maximum power output (MPO), total harmonic distortion (THD), and demagnetization were investigated. Implant activation and image artifacts were also evaluated. Methods and materials One implant was placed on the head of a test person at the position corresponding to the normal position of an implanted BCI and applied with a static pressure using a bandage and scanned in a 1.5 Tesla MRI camera. Scanning was performed both with and without the implant, in three orthogonal planes, and for one spin-echo and one gradient-echo pulse sequence. Implant functionality was verified in-between the scans using an audio processor programmed to generate a sequence of tones when attached to the implant. Objective verification was also carried out by measuring MPO and THD on a skull simulator as well as retention force, before and after MRI. Results It was found that the exposure of 1.5 Tesla MRI only had a minor effect on the MPO, ie, it decreased over all frequencies with an average of 1.1±2.1 dB. The THD remained unchanged above 300 Hz and was increased only at lower frequencies. The retention magnet was demagnetized by 5%. The maximum image artifacts reached a distance of 9 and 10 cm from the implant in the coronal plane for the spin-echo and the gradient-echo sequence, respectively. The test person reported no MRI induced sound from the implant. Conclusion This pilot study indicates that the present BCI may withstand 1.5 Tesla MRI with only minor effects on its performance. No MRI induced sound was reported, but the head image was highly distorted near the implant. PMID:26604836
Magnetic resonance imaging investigation of the bone conduction implant - a pilot study at 1.5 Tesla.

PubMed

Jansson, Karl-Johan Fredén; Håkansson, Bo; Reinfeldt, Sabine; Rigato, Cristina; Eeg-Olofsson, Måns

2015-01-01

The objective of this pilot study was to investigate if an active bone conduction implant (BCI) used in an ongoing clinical study withstands magnetic resonance imaging (MRI) of 1.5 Tesla. In particular, the MRI effects on maximum power output (MPO), total harmonic distortion (THD), and demagnetization were investigated. Implant activation and image artifacts were also evaluated. One implant was placed on the head of a test person at the position corresponding to the normal position of an implanted BCI and applied with a static pressure using a bandage and scanned in a 1.5 Tesla MRI camera. Scanning was performed both with and without the implant, in three orthogonal planes, and for one spin-echo and one gradient-echo pulse sequence. Implant functionality was verified in-between the scans using an audio processor programmed to generate a sequence of tones when attached to the implant. Objective verification was also carried out by measuring MPO and THD on a skull simulator as well as retention force, before and after MRI. It was found that the exposure of 1.5 Tesla MRI only had a minor effect on the MPO, ie, it decreased over all frequencies with an average of 1.1±2.1 dB. The THD remained unchanged above 300 Hz and was increased only at lower frequencies. The retention magnet was demagnetized by 5%. The maximum image artifacts reached a distance of 9 and 10 cm from the implant in the coronal plane for the spin-echo and the gradient-echo sequence, respectively. The test person reported no MRI induced sound from the implant. This pilot study indicates that the present BCI may withstand 1.5 Tesla MRI with only minor effects on its performance. No MRI induced sound was reported, but the head image was highly distorted near the implant.
Local image variance of 7 Tesla SWI is a new technique for preoperative characterization of diffusely infiltrating gliomas: correlation with tumour grade and IDH1 mutational status.

PubMed

Grabner, Günther; Kiesel, Barbara; Wöhrer, Adelheid; Millesi, Matthias; Wurzer, Aygül; Göd, Sabine; Mallouhi, Ammar; Knosp, Engelbert; Marosi, Christine; Trattnig, Siegfried; Wolfsberger, Stefan; Preusser, Matthias; Widhalm, Georg

2017-04-01

To investigate the value of local image variance (LIV) as a new technique for quantification of hypointense microvascular susceptibility-weighted imaging (SWI) structures at 7 Tesla for preoperative glioma characterization. Adult patients with neuroradiologically suspected diffusely infiltrating gliomas were prospectively recruited and 7 Tesla SWI was performed in addition to standard imaging. After tumour segmentation, quantification of intratumoural SWI hypointensities was conducted by the SWI-LIV technique. Following surgery, the histopathological tumour grade and isocitrate dehydrogenase 1 (IDH1)-R132H mutational status was determined and SWI-LIV values were compared between low-grade gliomas (LGG) and high-grade gliomas (HGG), IDH1-R132H negative and positive tumours, as well as gliomas with significant and non-significant contrast-enhancement (CE) on MRI. In 30 patients, 9 LGG and 21 HGG were diagnosed. The calculation of SWI-LIV values was feasible in all tumours. Significantly higher mean SWI-LIV values were found in HGG compared to LGG (92.7 versus 30.8; p < 0.0001), IDH1-R132H negative compared to IDH1-R132H positive gliomas (109.9 versus 38.3; p < 0.0001) and tumours with significant CE compared to non-significant CE (120.1 versus 39.0; p < 0.0001). Our data indicate that 7 Tesla SWI-LIV might improve preoperative characterization of diffusely infiltrating gliomas and thus optimize patient management by quantification of hypointense microvascular structures. • 7 Tesla local image variance helps to quantify hypointense susceptibility-weighted imaging structures. • SWI-LIV is significantly increased in high-grade and IDH1-R132H negative gliomas. • SWI-LIV is a promising technique for improved preoperative glioma characterization. • Preoperative management of diffusely infiltrating gliomas will be optimized.
A compact multi-wire-layered secondary winding for Tesla transformer.

PubMed

Zhao, Liang; Su, Jian-Cang; Li, Rui; Wu, Xiao-Long; Xu, Xiu-Dong; Qiu, Xu-Dong; Zeng, Bo; Cheng, Jie; Zhang, Yu; Gao, Peng-Cheng

2017-05-01

A compact multi-wire-layered (MWL) secondary winding for a Tesla transformer is put forward. The basic principle of this winding is to wind the metal wire on a polymeric base tube in a multi-layer manner. The tube is tapered and has high electrical strength and high mechanical strength. Concentric-circle grooves perpendicular to the axis of the tube are carved on the surface of the tube to wind the wire. The width of the groove is basically equal to the diameter of the wire so that the metal wire can be fixed in the groove without glue. The depth of the groove is n times of the diameter of the wire to realize the n-layer winding manner. All the concentric-circle grooves are connected via a spiral groove on the surface of the tube to let the wire go through. Compared with the traditional one-wire-layered (OWL) secondary winding for the Tesla transformer, the most conspicuous advantage of the MWL secondary winding is that the latter is compact with only a length of 2/n of the OWL. In addition, the MWL winding has the following advantages: high electrical strength since voids are precluded from the surface of the winding, high mechanical strength because polymer is used as the material of the base tube, and reliable fixation in the Tesla transformer as special mechanical connections are designed. A 2000-turn MWL secondary winding is fabricated with a winding layer of 3 and a total length of 1.0 m. Experiments to test the performance of this winding on a Tesla-type pulse generator are conducted. The results show that this winding can boost the voltage to 1 MV at a repetition rate of 50 Hz reliably for a lifetime longer than 10 4 pulses, which proves the feasibility of the MWL secondary winding.
A compact multi-wire-layered secondary winding for Tesla transformer

NASA Astrophysics Data System (ADS)

Zhao, Liang; Su, Jian-cang; Li, Rui; Wu, Xiao-long; Xu, Xiu-dong; Qiu, Xu-dong; Zeng, Bo; Cheng, Jie; Zhang, Yu; Gao, Peng-cheng

2017-05-01

A compact multi-wire-layered (MWL) secondary winding for a Tesla transformer is put forward. The basic principle of this winding is to wind the metal wire on a polymeric base tube in a multi-layer manner. The tube is tapered and has high electrical strength and high mechanical strength. Concentric-circle grooves perpendicular to the axis of the tube are carved on the surface of the tube to wind the wire. The width of the groove is basically equal to the diameter of the wire so that the metal wire can be fixed in the groove without glue. The depth of the groove is n times of the diameter of the wire to realize the n-layer winding manner. All the concentric-circle grooves are connected via a spiral groove on the surface of the tube to let the wire go through. Compared with the traditional one-wire-layered (OWL) secondary winding for the Tesla transformer, the most conspicuous advantage of the MWL secondary winding is that the latter is compact with only a length of 2/n of the OWL. In addition, the MWL winding has the following advantages: high electrical strength since voids are precluded from the surface of the winding, high mechanical strength because polymer is used as the material of the base tube, and reliable fixation in the Tesla transformer as special mechanical connections are designed. A 2000-turn MWL secondary winding is fabricated with a winding layer of 3 and a total length of 1.0 m. Experiments to test the performance of this winding on a Tesla-type pulse generator are conducted. The results show that this winding can boost the voltage to 1 MV at a repetition rate of 50 Hz reliably for a lifetime longer than 104 pulses, which proves the feasibility of the MWL secondary winding.
Performance of parallel computation using CUDA for solving the one-dimensional elasticity equations

NASA Astrophysics Data System (ADS)

Darmawan, J. B. B.; Mungkasi, S.

2017-01-01

In this paper, we investigate the performance of parallel computation in solving the one-dimensional elasticity equations. Elasticity equations are usually implemented in engineering science. Solving these equations fast and efficiently is desired. Therefore, we propose the use of parallel computation. Our parallel computation uses CUDA of the NVIDIA. Our research results show that parallel computation using CUDA has a great advantage and is powerful when the computation is of large scale.
Quantification of myocardial blood flow with dynamic perfusion 3.0 Tesla MRI: Validation with (15) O-water PET.

PubMed

Tomiyama, Yuuki; Manabe, Osamu; Oyama-Manabe, Noriko; Naya, Masanao; Sugimori, Hiroyuki; Hirata, Kenji; Mori, Yuki; Tsutsui, Hiroyuki; Kudo, Kohsuke; Tamaki, Nagara; Katoh, Chietsugu

2015-09-01

To develop and validate a method for quantifying myocardial blood flow (MBF) using dynamic perfusion magnetic resonance imaging (MBFMRI ) at 3.0 Tesla (T) and compare the findings with those of (15) O-water positron emission tomography (MBFPET ). Twenty healthy male volunteers underwent magnetic resonance imaging (MRI) and (15) O-water positron emission tomography (PET) at rest and during adenosine triphosphate infusion. The single-tissue compartment model was used to estimate the inflow rate constant (K1). We estimated the extraction fraction of Gd-DTPA using K1 and MBF values obtained from (15) O-water PET for the first 10 subjects. For validation, we calculated MBFMRI values for the remaining 10 subjects and compared them with the MBFPET values. In addition, we compared MBFMRI values of 10 patients with coronary artery disease with those of healthy subjects. The mean resting and stress MBFMRI values were 0.76 ± 0.10 and 3.04 ± 0.82 mL/min/g, respectively, and showed excellent correlation with the mean MBFPET values (r = 0.96, P < 0.01). The mean stress MBFMRI value was significantly lower for the patients (1.92 ± 0.37) than for the healthy subjects (P < 0.001). The use of dynamic perfusion MRI at 3T is useful for estimating MBF and can be applied for patients with coronary artery disease. © 2014 Wiley Periodicals, Inc.
Cardiac imaging at 7 Tesla: Single- and two-spoke radiofrequency pulse design with 16-channel parallel excitation.

PubMed

Schmitter, Sebastian; DelaBarre, Lance; Wu, Xiaoping; Greiser, Andreas; Wang, Dingxin; Auerbach, Edward J; Vaughan, J Thomas; Uğurbil, Kâmil; Van de Moortele, Pierre-François

2013-11-01

Higher signal to noise ratio (SNR) and improved contrast have been demonstrated at ultra-high magnetic fields (≥7 Tesla [T]) in multiple targets, often with multi-channel transmit methods to address the deleterious impact on tissue contrast due to spatial variations in B1 (+) profiles. When imaging the heart at 7T, however, respiratory and cardiac motion, as well as B0 inhomogeneity, greatly increase the methodological challenge. In this study we compare two-spoke parallel transmit (pTX) RF pulses with static B1 (+) shimming in cardiac imaging at 7T. Using a 16-channel pTX system, slice-selective two-spoke pTX pulses and static B1 (+) shimming were applied in cardiac CINE imaging. B1 (+) and B0 mapping required modified cardiac triggered sequences. Excitation homogeneity and RF energy were compared in different imaging orientations. Two-spoke pulses provide higher excitation homogeneity than B1 (+) shimming, especially in the more challenging posterior region of the heart. The peak value of channel-wise RF energy was reduced, allowing for a higher flip angle, hence increased tissue contrast. Image quality with two-spoke excitation proved to be stable throughout the entire cardiac cycle. Two-spoke pTX excitation has been successfully demonstrated in the human heart at 7T, with improved image quality and reduced RF pulse energy when compared with B1 (+) shimming. Copyright © 2013 Wiley Periodicals, Inc.
Neurochemical and BOLD responses during neuronal activation measured in the human visual cortex at 7 Tesla.

PubMed

Bednařík, Petr; Tkáč, Ivan; Giove, Federico; DiNuzzo, Mauro; Deelchand, Dinesh K; Emir, Uzay E; Eberly, Lynn E; Mangia, Silvia

2015-03-31

Several laboratories have consistently reported small concentration changes in lactate, glutamate, aspartate, and glucose in the human cortex during prolonged stimuli. However, whether such changes correlate with blood oxygenation level-dependent functional magnetic resonance imaging (BOLD-fMRI) signals have not been determined. The present study aimed at characterizing the relationship between metabolite concentrations and BOLD-fMRI signals during a block-designed paradigm of visual stimulation. Functional magnetic resonance spectroscopy (fMRS) and fMRI data were acquired from 12 volunteers. A short echo-time semi-LASER localization sequence optimized for 7 Tesla was used to achieve full signal-intensity MRS data. The group analysis confirmed that during stimulation lactate and glutamate increased by 0.26 ± 0.06 μmol/g (~30%) and 0.28 ± 0.03 μmol/g (~3%), respectively, while aspartate and glucose decreased by 0.20 ± 0.04 μmol/g (~5%) and 0.19 ± 0.03 μmol/g (~16%), respectively. The single-subject analysis revealed that BOLD-fMRI signals were positively correlated with glutamate and lactate concentration changes. The results show a linear relationship between metabolic and BOLD responses in the presence of strong excitatory sensory inputs, and support the notion that increased functional energy demands are sustained by oxidative metabolism. In addition, BOLD signals were inversely correlated with baseline γ-aminobutyric acid concentration. Finally, we discussed the critical importance of taking into account linewidth effects on metabolite quantification in fMRS paradigms.
Gadolinium Enhanced MR Coronary Vessel Wall Imaging at 3.0 Tesla.

PubMed

Kelle, Sebastian; Schlendorf, Kelly; Hirsch, Glenn A; Gerstenblith, Gary; Fleck, Eckart; Weiss, Robert G; Stuber, Matthias

2010-10-11

Purpose. We evaluated the influence of the time between low-dose gadolinium (Gd) contrast administration and coronary vessel wall enhancement (LGE) detected by 3T magnetic resonance imaging (MRI) in healthy subjects and patients with coronary artery disease (CAD). Materials and Methods. Four healthy subjects (4 men, mean age 29 ± 3 years and eleven CAD patients (6 women, mean age 61 ± 10 years) were studied on a commercial 3.0 Tesla (T) whole-body MR imaging system (Achieva 3.0 T; Philips, Best, The Netherlands). T1-weighted inversion-recovery coronary magnetic resonance imaging (MRI) was repeated up to 75 minutes after administration of low-dose Gadolinium (Gd) (0.1 mmol/kg Gd-DTPA). Results. LGE was seen in none of the healthy subjects, however in all of the CAD patients. In CAD patients, fifty-six of 62 (90.3%) segments showed LGE of the coronary artery vessel wall at time-interval 1 after contrast. At time-interval 2, 34 of 42 (81.0%) and at time-interval 3, 29 of 39 evaluable segments (74.4%) were enhanced. Conclusion. In this work, we demonstrate LGE of the coronary artery vessel wall using 3.0 T MRI after a single, low-dose Gd contrast injection in CAD patients but not in healthy subjects. In the majority of the evaluated coronary segments in CAD patients, LGE of the coronary vessel wall was already detectable 30-45 minutes after administration of the contrast agent.
Three-dimensional Hadamard-encoded proton spectroscopic imaging in the human brain using time-cascaded pulses at 3 Tesla.

PubMed

Cohen, Ouri; Tal, Assaf; Gonen, Oded

2014-10-01

To reduce the specific-absorption-rate (SAR) and chemical shift displacement (CSD) of three-dimensional (3D) Hadamard spectroscopic imaging (HSI) and maintain its point spread function (PSF) benefits. A 3D hybrid of 2D longitudinal, 1D transverse HSI (L-HSI, T-HSI) sequence is introduced and demonstrated in a phantom and the human brain at 3 Tesla (T). Instead of superimposing each of the selective Hadamard radiofrequency (RF) pulses with its N single-slice components, they are cascaded in time, allowing N-fold stronger gradients, reducing the CSD. A spatially refocusing 180° RF pulse following the T-HSI encoding block provides variable, arbitrary echo time (TE) to eliminate undesirable short T2 species' signals, e.g., lipids. The sequence yields 10-15% better signal-to-noise ratio (SNR) and 8-16% less signal bleed than 3D chemical shift imaging of equal repetition time, spatial resolution and grid size. The 13 ± 6, 22 ± 7, 24 ± 8, and 31 ± 14 in vivo SNRs for myo-inositol, choline, creatine, and N-acetylaspartate were obtained in 21 min from 1 cm(3) voxels at TE ≈ 20 ms. Maximum CSD was 0.3 mm/ppm in each direction. The new hybrid HSI sequence offers a better localized PSF at reduced CSD and SAR at 3T. The short and variable TE permits acquisition of short T2 and J-coupled metabolites with higher SNR. Copyright © 2013 Wiley Periodicals, Inc.
Hardware and Software Design of FPGA-based PCIe Gen3 interface for APEnet+ network interconnect system

NASA Astrophysics Data System (ADS)

Ammendola, R.; Biagioni, A.; Frezza, O.; Lo Cicero, F.; Lonardo, A.; Martinelli, M.; Paolucci, P. S.; Pastorelli, E.; Rossetti, D.; Simula, F.; Tosoratto, L.; Vicini, P.

2015-12-01

In the attempt to develop an interconnection architecture optimized for hybrid HPC systems dedicated to scientific computing, we designed APEnet+, a point-to-point, low-latency and high-performance network controller supporting 6 fully bidirectional off-board links over a 3D torus topology. The first release of APEnet+ (named V4) was a board based on a 40 nm Altera FPGA, integrating 6 channels at 34 Gbps of raw bandwidth per direction and a PCIe Gen2 x8 host interface. It has been the first-of-its-kind device to implement an RDMA protocol to directly read/write data from/to Fermi and Kepler NVIDIA GPUs using NVIDIA peer-to-peer and GPUDirect RDMA protocols, obtaining real zero-copy GPU-to-GPU transfers over the network. The latest generation of APEnet+ systems (now named V5) implements a PCIe Gen3 x8 host interface on a 28 nm Altera Stratix V FPGA, with multi-standard fast transceivers (up to 14.4 Gbps) and an increased amount of configurable internal resources and hardware IP cores to support main interconnection standard protocols. Herein we present the APEnet+ V5 architecture, the status of its hardware and its system software design. Both its Linux Device Driver and the low-level libraries have been redeveloped to support the PCIe Gen3 protocol, introducing optimizations and solutions based on hardware/software co-design.
Fast parallel tandem mass spectral library searching using GPU hardware acceleration

PubMed Central

Baumgardner, Lydia Ashleigh; Shanmugam, Avinash Kumar; Lam, Henry; Eng, Jimmy K.; Martin, Daniel B.

2011-01-01

Mass spectrometry-based proteomics is a maturing discipline of biologic research that is experiencing substantial growth. Instrumentation has steadily improved over time with the advent of faster and more sensitive instruments collecting ever larger data files. Consequently, the computational process of matching a peptide fragmentation pattern to its sequence, traditionally accomplished by sequence database searching and more recently also by spectral library searching, has become a bottleneck in many mass spectrometry experiments. In both of these methods, the main rate limiting step is the comparison of an acquired spectrum with all potential matches from a spectral library or sequence database. This is a highly parallelizable process because the core computational element can be represented as a simple but arithmetically intense multiplication of two vectors. In this paper we present a proof of concept project taking advantage of the massively parallel computing available on graphics processing units (GPUs) to distribute and accelerate the process of spectral assignment using spectral library searching. This program, which we have named FastPaSS (for Fast Parallelized Spectral Searching) is implemented in CUDA (Compute Unified Device Architecture) from NVIDIA which allows direct access to the processors in an NVIDIA GPU. Our efforts demonstrate the feasibility of GPU computing for spectral assignment, through implementation of the validated spectral searching algorithm SpectraST in the CUDA environment. PMID:21545112
The CUBLAS and CULA based GPU acceleration of adaptive finite element framework for bioluminescence tomography.

PubMed

Zhang, Bo; Yang, Xiang; Yang, Fei; Yang, Xin; Qin, Chenghu; Han, Dong; Ma, Xibo; Liu, Kai; Tian, Jie

2010-09-13

In molecular imaging (MI), especially the optical molecular imaging, bioluminescence tomography (BLT) emerges as an effective imaging modality for small animal imaging. The finite element methods (FEMs), especially the adaptive finite element (AFE) framework, play an important role in BLT. The processing speed of the FEMs and the AFE framework still needs to be improved, although the multi-thread CPU technology and the multi CPU technology have already been applied. In this paper, we for the first time introduce a new kind of acceleration technology to accelerate the AFE framework for BLT, using the graphics processing unit (GPU). Besides the processing speed, the GPU technology can get a balance between the cost and performance. The CUBLAS and CULA are two main important and powerful libraries for programming on NVIDIA GPUs. With the help of CUBLAS and CULA, it is easy to code on NVIDIA GPU and there is no need to worry about the details about the hardware environment of a specific GPU. The numerical experiments are designed to show the necessity, effect and application of the proposed CUBLAS and CULA based GPU acceleration. From the results of the experiments, we can reach the conclusion that the proposed CUBLAS and CULA based GPU acceleration method can improve the processing speed of the AFE framework very much while getting a balance between cost and performance.
Note: Tesla transformer damping

NASA Astrophysics Data System (ADS)

Reed, J. L.

2012-07-01

Unexpected heavy damping in the two winding Tesla pulse transformer is shown to be due to small primary inductances. A small primary inductance is a necessary condition of operability, but is also a refractory inefficiency. A 30% performance loss is demonstrated using a typical "spiral strip" transformer. The loss is investigated by examining damping terms added to the transformer's governing equations. A significant alteration of the transformer's architecture is suggested to mitigate these losses. Experimental and simulated data comparing the 2 and 3 winding transformers are cited to support the suggestion.
3T MR-guided minimally-invasive penile fracture repair.

PubMed

Rosi, Giovanni; Fontanella, Paolo; Venzi, Giordano; Jermini, Fernando; Del Grande, Filippo

2016-03-31

We present the case of a 21 year old patient with an incomplete tear of the tunica albuginea occurred after violent masturbation. The diagnostic assessment was performed first clinically, then with ultrasound and with 3 Tesla MRI. 3 Tesla MRI, owing to its high resolution, allowed to exactly detect the tear location leading to precise preoperative planning. After adequate diagnosis through imaging and proper planning, we were able to perform a selective minimally invasive surgical approach to repair the lesion.
Alternative dipole magnets for ISABELLE

NASA Astrophysics Data System (ADS)

Taylor, C.; Althaus, R.; Caspi, S.; Gilbert, W.; Hassenzahl, W. V.; Meuser, R.; Rechen, J.; Warren, R.

1982-05-01

A dipole magnet, intended as a possible alternative for the ISABELLE main ring magnet, was designed. Three layers of FNAL Doubler/Saver conductor were used. Two 1.3-m-long models were built and tested, both with and without an iron core, and in both helium I and helium II. The training behavior, cyclic energy loss, point of quench initiation, and quench velocity were determined. A central field of 6.5 tesla was obtained in He I (4.4 K), and 7.6 tesla in He II (1.8K).
MM&T for Linear Resonant Cooler. Volume 1

DTIC Science & Technology

1988-02-16

Tesla *Magnet Material Samarium Cobalt Radially Magnetized Inner Diameter = 1.25" Length = 0.79" Coil Assembly Number of Turns/Section = 90 Outside...Diameter = 1.22" Inside Diameter = 0.86" Inner Iron Material 2 V Permendur Inside Diameter = 0.38" Length 1.84" Design Max. Flux Density = 2.4 Tesla 0 3-12...suspended with rubber bands 60 inches above the floor of the semi -anechoic room. A six foot square piece of 2 inch thick foam was centered on the floor

Some links on this page may take you to non-federal websites. Their policies may differ from this site.